History log of /petsc/src/mat/impls/sell/seq/sell.h (Results 26 – 50 of 72)
Revision Date Author Comments
# cf9512e4 17-Jul-2023 Satish Balay <balay@mcs.anl.gov>

Merge branch 'jolivet/fix-Wextra-semi-stmt' into 'main'

Fix -Wextra-semi-stmt

See merge request petsc/petsc!6708


# a8f51744 14-Jul-2023 Pierre Jolivet <pierre@joliv.et>

Fix -Wextra-semi-stmt


# dd874c20 10-Apr-2023 Satish Balay <balay@mcs.anl.gov>

Merge branch 'hongzh/sell-cuda' into 'main'

SELL-based SpMV

See merge request petsc/petsc!3428


# b921024e 06-Apr-2021 Hong Zhang <hongzhang@anl.gov>

Add MatSeqSELLGetAvgSliceSize

It returns the variance of the slice sizes.


# 90d2215b 12-Jan-2021 Hong Zhang <hongzhang@anl.gov>

Add the load-balancing kernel for MatMultAdd_SeqSELL and fine tune the heuristic

Kernel7 is significantly slower than kernel9x for the following two cases:
- nrows is too small. Kernel7 uses 2 threa

Add the load-balancing kernel for MatMultAdd_SeqSELL and fine tune the heuristic

Kernel7 is significantly slower than kernel9x for the following two cases:
- nrows is too small. Kernel7 uses 2 threads per row (assuming sliceheight=16), it does not fully utilize the GPU if nrows < 100K.
- maxslicewidth is too big.

Thanks-to: Peng Wang <penwang@nvidia.com>

show more ...


# 4e58db63 31-Dec-2020 Hong Zhang <hongzhang@anl.gov>

Make slice height more flexible

- The slice height now does not have to match device memory alignment; it just need to be divisible by DEVICE_MEM_ALIGN
- Pad each slice with extra columns to achieve

Make slice height more flexible

- The slice height now does not have to match device memory alignment; it just need to be divisible by DEVICE_MEM_ALIGN
- Pad each slice with extra columns to achieve coalesced memory access if needed

show more ...


# 07e43b41 10-Sep-2020 Hong Zhang <hongzhang@anl.gov>

Further optimization of MatMult_SeqSELLCUDA

- Add more kernels
- Use multiple threads per row for matrices with narrow slices
- Use multiple blocks per slice for matrices with wide slices
- Add thre

Further optimization of MatMult_SeqSELLCUDA

- Add more kernels
- Use multiple threads per row for matrices with narrow slices
- Use multiple blocks per slice for matrices with wide slices
- Add three new APIs to return the irregularity ratio, the maximum slice width and the average slice width

Experiments show that column blocking gives much worse performance for wide matrices and permulation based on slice width has almost no impact on the performance.

show more ...


# 2d1451d4 09-Jan-2020 Hong Zhang <hongzhang@anl.gov>

Initial commit for porting SELL to GPU

- Add tiled SPMV and basic SpMVfor SeqSELL
- Tested in serial
- Offloadmask is used to determine when the matrix should be copied to GPU
- Use different slice

Initial commit for porting SELL to GPU

- Add tiled SPMV and basic SpMVfor SeqSELL
- Tested in serial
- Offloadmask is used to determine when the matrix should be copied to GPU
- Use different slice height for CUDA version
- By checking the nonzerostate, PETSc can decide if the whole matrix need to be copied or just the values need to be copied
- Make the convert function public so that the very slow MatConvert_Basic can be avoided sometimes. E.g. one can use a two-step convert method: AIJ->SELL,SELL->SELLCUDA instead of the direct convert AIJ->SELLCUDA
- Make the FLOPS count for SELL same as that for AIJCUSPARSE.
- MatDisAssemble is not needed.
- Change slice height from 32 to 16 for GPU
- To overlap communication with MatMult, VecScatterBegin() should be called before MatMult() for the diagonal part.
- SLICE_HEIGHT is defined to be 32 to match the warp size of GPU. For other cases, it is still 8.

Funded-by:
Project: PETSc for GPU
Time: 42 hours
Reported-by:
Thanks-to:

show more ...


# 37d05b02 06-Feb-2023 Satish Balay <balay@mcs.anl.gov>

Merge remote-tracking branch 'origin/release'


# b877537e 05-Feb-2023 Satish Balay <balay@mcs.anl.gov>

Merge branch 'jolivet/fix-typos' into 'release'

Fix Typos

See merge request petsc/petsc!6024


# da81f932 05-Feb-2023 Pierre Jolivet <pierre@joliv.et>

Fix Typos


# 31d78bcd 02-Feb-2023 Satish Balay <balay@mcs.anl.gov>

Merge branch 'jacobf/2022-12-10/petscerrorcode-nodiscard' into 'main'

Feature: Non-discardable PetscErrorCode

See merge request petsc/petsc!5923


# 3ba16761 10-Dec-2022 Jacob Faibussowitsch <jacob.fai@gmail.com>

Make PetscErrorCode a non-discardable enum


# 2f91b18a 23-Nov-2022 Satish Balay <balay@mcs.anl.gov>

Merge branch 'jacobf/2022-11-08/remove-petsctable' into 'main'

Remove PetscTable

See merge request petsc/petsc!5819


# eec179cf 08-Nov-2022 Jacob Faibussowitsch <jacob.fai@gmail.com>

- Replace PetscTable with PetscHMapI.
- Rename:
- PetscTableCreate() -> PetscHMapICreateWithSize()
- PetscTableFind() -> PetscHMapIGetWithDefault()
- PetscTableAdd() -> PetscHMapISetWithMode()


# 061e922f 22-Sep-2022 Satish Balay <balay@mcs.anl.gov>

Merge branch 'jacobf/2022-09-21/2-bike-2-shed' into 'main'

Feature: Bicycle Storage Facility 2

See merge request petsc/petsc!5661


# d71ae5a4 21-Sep-2022 Jacob Faibussowitsch <jacob.fai@gmail.com>

source code format changes due to .clang-format changes


# 6524c165 21-Sep-2022 Jacob Faibussowitsch <jacob.fai@gmail.com>

Transform all header-guards into ifndefs to make clang-format ignore them for preprocessor indentation


# f0af967e 29-Aug-2022 Satish Balay <balay@mcs.anl.gov>

Merge branch 'jolivet/fix-style-one-liners' into 'main'

Remove braces from one-liners w/o PetscCall()

See merge request petsc/petsc!5561


# ad540459 29-Aug-2022 Pierre Jolivet <pierre@joliv.et>

Remove braces from one-liners w/o PetscCall()


# 58d68138 23-Aug-2022 Satish Balay <balay@mcs.anl.gov>

Merge branch 'barry/2022-08-21/clang-format-source' into 'main'

format repository with clang-format

See merge request petsc/petsc!5541


# 9371c9d4 22-Aug-2022 Satish Balay <balay@mcs.anl.gov>

clang-format: convert PETSc sources to comply with clang-format


# b33f4bec 05-Apr-2022 Satish Balay <balay@mcs.anl.gov>

Merge branch 'jolivet/feature-less-checkfalse' into 'main'

Dividing by four the number of PetscCheckFalse()

See merge request petsc/petsc!5072


# 08401ef6 04-Apr-2022 Pierre Jolivet <pierre@joliv.et>

Remove some PetscCheckFalse()


# f882803c 26-Mar-2022 Satish Balay <balay@mcs.anl.gov>

Merge branch 'jacobf/2022-02-23/variadic-chkerr' into 'main'

Variadic CHKERRQ()

See merge request petsc/petsc!4889


123