| #
90d2215b
|
| 12-Jan-2021 |
Hong Zhang <hongzhang@anl.gov> |
Add the load-balancing kernel for MatMultAdd_SeqSELL and fine tune the heuristic
Kernel7 is significantly slower than kernel9x for the following two cases: - nrows is too small. Kernel7 uses 2 threa
Add the load-balancing kernel for MatMultAdd_SeqSELL and fine tune the heuristic
Kernel7 is significantly slower than kernel9x for the following two cases: - nrows is too small. Kernel7 uses 2 threads per row (assuming sliceheight=16), it does not fully utilize the GPU if nrows < 100K. - maxslicewidth is too big.
Thanks-to: Peng Wang <penwang@nvidia.com>
show more ...
|
| #
4e58db63
|
| 31-Dec-2020 |
Hong Zhang <hongzhang@anl.gov> |
Make slice height more flexible
- The slice height now does not have to match device memory alignment; it just need to be divisible by DEVICE_MEM_ALIGN - Pad each slice with extra columns to achieve
Make slice height more flexible
- The slice height now does not have to match device memory alignment; it just need to be divisible by DEVICE_MEM_ALIGN - Pad each slice with extra columns to achieve coalesced memory access if needed
show more ...
|
| #
07e43b41
|
| 10-Sep-2020 |
Hong Zhang <hongzhang@anl.gov> |
Further optimization of MatMult_SeqSELLCUDA
- Add more kernels - Use multiple threads per row for matrices with narrow slices - Use multiple blocks per slice for matrices with wide slices - Add thre
Further optimization of MatMult_SeqSELLCUDA
- Add more kernels - Use multiple threads per row for matrices with narrow slices - Use multiple blocks per slice for matrices with wide slices - Add three new APIs to return the irregularity ratio, the maximum slice width and the average slice width
Experiments show that column blocking gives much worse performance for wide matrices and permulation based on slice width has almost no impact on the performance.
show more ...
|
| #
2d1451d4
|
| 09-Jan-2020 |
Hong Zhang <hongzhang@anl.gov> |
Initial commit for porting SELL to GPU
- Add tiled SPMV and basic SpMVfor SeqSELL - Tested in serial - Offloadmask is used to determine when the matrix should be copied to GPU - Use different slice
Initial commit for porting SELL to GPU
- Add tiled SPMV and basic SpMVfor SeqSELL - Tested in serial - Offloadmask is used to determine when the matrix should be copied to GPU - Use different slice height for CUDA version - By checking the nonzerostate, PETSc can decide if the whole matrix need to be copied or just the values need to be copied - Make the convert function public so that the very slow MatConvert_Basic can be avoided sometimes. E.g. one can use a two-step convert method: AIJ->SELL,SELL->SELLCUDA instead of the direct convert AIJ->SELLCUDA - Make the FLOPS count for SELL same as that for AIJCUSPARSE. - MatDisAssemble is not needed. - Change slice height from 32 to 16 for GPU - To overlap communication with MatMult, VecScatterBegin() should be called before MatMult() for the diagonal part. - SLICE_HEIGHT is defined to be 32 to match the warp size of GPU. For other cases, it is still 8.
Funded-by: Project: PETSc for GPU Time: 42 hours Reported-by: Thanks-to:
show more ...
|
| #
80f6d96d
|
| 01-Apr-2023 |
Satish Balay <balay@mcs.anl.gov> |
Merge remote-tracking branch 'origin/release'
|
| #
08eaad2d
|
| 01-Apr-2023 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'jolivet/fix-typos-portability' into 'release'
Fix typos, portability issues, segmentation fault
See merge request petsc/petsc!6267
|
| #
aaa8cc7d
|
| 31-Mar-2023 |
Pierre Jolivet <pierre@joliv.et> |
Fix some documentation and typos
|
| #
e9f36840
|
| 18-Mar-2023 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'barry/2023-03-08/fix-man-pages-detected-by-lint' into 'main'
Fix many manual pages
See merge request petsc/petsc!6162
|
| #
20f4b53c
|
| 09-Mar-2023 |
Barry Smith <bsmith@mcs.anl.gov> |
Fix manual pages based on reports from Jacob's lint tool
Commit-type: documentation
|
| #
6c749b74
|
| 07-Mar-2023 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'barry/2023-03-01/fix-mat-man-pages' into 'main'
Cleanup of mat manual pages
See merge request petsc/petsc!6134
|
| #
2ef1f0ff
|
| 01-Mar-2023 |
Barry Smith <bsmith@mcs.anl.gov> |
Cleanup of mat manual pages
Commit-type: documentation
|
| #
7a3a620f
|
| 24-Feb-2023 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'jolivet/housekeeping' into 'main'
Double spaces, wrong backticks, or unneeded braces
See merge request petsc/petsc!6110
|
| #
aa624791
|
| 24-Feb-2023 |
Pierre Jolivet <pierre@joliv.et> |
Double spaces, wrong backticks, or unneeded braces
|
| #
a682ec2a
|
| 23-Feb-2023 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'barry/2023-02-17/man-page-fixes-huge-jacob-automatic' into 'main'
A variety of manual page fixes for problems found by Jacob's lint or noted...
See merge request petsc/petsc!6088
|
| #
27430b45
|
| 23-Feb-2023 |
Barry Smith <bsmith@mcs.anl.gov> |
A variety of manual page fixes for problems found by Jacob's lint or noted while fixing those problems
Commit-type: docs-only
|
| #
2975ceb4
|
| 13-Feb-2023 |
Satish Balay <balay@mcs.anl.gov> |
Merge remote-tracking branch 'origin/release'
|
| #
0f3c9fe5
|
| 13-Feb-2023 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'barry/2023-02-07/fix-man-pages/release' into 'release'
Fix a few manual pages using Jacob's make lint information
See merge request petsc/petsc!6029
|
| #
67be906f
|
| 07-Feb-2023 |
Barry Smith <bsmith@mcs.anl.gov> |
Fix a few manual pages using Jacob's make lint information
Commit-type: documentation
|
| #
37d05b02
|
| 06-Feb-2023 |
Satish Balay <balay@mcs.anl.gov> |
Merge remote-tracking branch 'origin/release'
|
| #
b877537e
|
| 05-Feb-2023 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'jolivet/fix-typos' into 'release'
Fix Typos
See merge request petsc/petsc!6024
|
| #
da81f932
|
| 05-Feb-2023 |
Pierre Jolivet <pierre@joliv.et> |
Fix Typos
|
| #
31d78bcd
|
| 02-Feb-2023 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'jacobf/2022-12-10/petscerrorcode-nodiscard' into 'main'
Feature: Non-discardable PetscErrorCode
See merge request petsc/petsc!5923
|
| #
3ba16761
|
| 10-Dec-2022 |
Jacob Faibussowitsch <jacob.fai@gmail.com> |
Make PetscErrorCode a non-discardable enum
|
| #
d441b7a2
|
| 12-Nov-2022 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'hongzh/improve-fd-coloring' into 'main'
Add MatEliminateZeros
See merge request petsc/petsc!5816
|
| #
dec0b466
|
| 07-Nov-2022 |
Hong Zhang <hongzhang@anl.gov> |
Add MatEliminateZeros
|