| #
57d48284
|
| 30-Oct-2019 |
Junchao Zhang <jczhang@mcs.anl.gov> |
Map a cuda error code to its name and description
|
| #
81e64d77
|
| 06-Oct-2019 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'maint'
|
| #
6881a170
|
| 06-Oct-2019 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'jczhang/fix-valid-gpu-array' into maint
Rename: v->valid_GPU_array/matrix==> v->offloadmask and PetscOffloadFlag==>PetscOffloadMask
See merge request petsc/petsc!2141
|
| #
c70f7ee4
|
| 02-Oct-2019 |
Junchao Zhang <jczhang@mcs.anl.gov> |
Rename valid_GPU_array/matrix to offloadmask
|
| #
8311d4d7
|
| 29-Sep-2019 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'mark/fix-adhoc-cuda-bug' into 'master'
ad hoc fix for cuda bug in mat-transpose-mult
See merge request petsc/petsc!2117
|
| #
f6ae8131
|
| 29-Sep-2019 |
Mark <cal2princeton@yahoo.com> |
ad hoc fix for cuda bug in mat-transpose-mult
|
| #
040e670d
|
| 25-Sep-2019 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'karlrupp/fix-cuda-streams' into 'master'
GPU: Fixed incorrect use of CUDA streams, SNES ex19 and ex56 now working with CUDA
See merge request petsc/petsc!2091
|
| #
17403302
|
| 24-Sep-2019 |
Karl Rupp <me@karlrupp.net> |
CUDA: Fixed incorrect use of separate streams.
This solves synchronization problems that have arisen due to the incorrect use of multiple CUDA streams for vector and matrix operations (without using
CUDA: Fixed incorrect use of separate streams.
This solves synchronization problems that have arisen due to the incorrect use of multiple CUDA streams for vector and matrix operations (without using proper synchronization mechanisms). In particular, SNES ex19 and ex56 now run reliably (no failure after 20+ reruns). Instead, the default stream (NULL pointer) is now used for all CUDA operations. I don't have performance comparisons at hand for the performance implications in this commit, but expect any changes to be small. Correctness first :-)
show more ...
|
| #
8da4f93b
|
| 23-Sep-2019 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'stefanozampini/gpu-bddc' into 'master'
Improvements towards BDDC on GPUs
See merge request petsc/petsc!2067
|
| #
99acd6aa
|
| 22-Sep-2019 |
Stefano Zampini <stefano.zampini@gmail.com> |
Fix compilation error for nvcc in optimized code with AVX-512 (march=native on my GPU workstation)
for some reason, the host compiler fails with this error message /home/zampins/Devel/petsc/include/
Fix compilation error for nvcc in optimized code with AVX-512 (march=native on my GPU workstation)
for some reason, the host compiler fails with this error message /home/zampins/Devel/petsc/include/../src/mat/impls/aij/seq/aij.h(535): error: identifier "_mm512_reduce_add_pd" is undefined
This optimized C kernel is not used in the GPU classes, so it is safe to skip its declaration
show more ...
|
| #
4e4bbfaa
|
| 16-Sep-2019 |
Stefano Zampini <stefano.zampini@gmail.com> |
MATSEQAJIJCUSPARSE: multiple fixes
- Use MatMatSolve_Basic for multiple solves instead of the CPU version (need to write support for multiple solves from cusparse) - when using Cholesky, column perm
MATSEQAJIJCUSPARSE: multiple fixes
- Use MatMatSolve_Basic for multiple solves instead of the CPU version (need to write support for multiple solves from cusparse) - when using Cholesky, column permutation must be inverted - remove unneeded extra copy in MatSolve_ - the class does not implement PinToCPU, so remove usage of the flag
show more ...
|
| #
29ad97fd
|
| 07-Aug-2019 |
Karl Rupp <me@karlrupp.net> |
Merge branch 'dalcinl/feature-math' [PR #1904]
* dalcinl/feature-math: Math & PetscComplex: Various enhancements - Define PetscXXXScalar to PetscXXXReal for real scalar type - Add PetscCbrtReal(), P
Merge branch 'dalcinl/feature-math' [PR #1904]
* dalcinl/feature-math: Math & PetscComplex: Various enhancements - Define PetscXXXScalar to PetscXXXReal for real scalar type - Add PetscCbrtReal(), PetscHypotReal(), and PetscAtan2Real() - Add PetscArgComplex() and PetscArgScalar() - Add PetscAtan{Real|Complex|Scalar}() - Add PetscA{sin|cos|tan}h{Real|Complex|Scalar}() - Docs: Petsc{Real|Imaginary}Part() return PetscReal - Define __fp16 constants to use "F" suffix (ie. single precision) - Fix PETSC_[SQRT_]MACHINE_EPSILON values for __fp16
PetscComplex: Remove PETSC_USE_CXX_COMPLEX_FLOAT_WORKAROUND
- Move the C++ complex fixes to its own header file - Define PETSC_SKIP_CXX_COMPLEX_FIX to skip the C++ complex fixes
show more ...
|
| #
7afe75c1
|
| 06-Aug-2019 |
Karl Rupp <me@karlrupp.net> |
Merge branch 'karlrupp/fix-cuda-MatSeqAIJCUSPARSEGenerateTransposeForMult' [PR #1948]
* karlrupp/fix-cuda-MatSeqAIJCUSPARSEGenerateTransposeForMult: CUDA: Fixed issues in MatSeqAIJCUSPARSEGenerateTr
Merge branch 'karlrupp/fix-cuda-MatSeqAIJCUSPARSEGenerateTransposeForMult' [PR #1948]
* karlrupp/fix-cuda-MatSeqAIJCUSPARSEGenerateTransposeForMult: CUDA: Fixed issues in MatSeqAIJCUSPARSEGenerateTransposeForMult and MatMultTransposeAdd_SeqAIJCUSPARSE
show more ...
|
| #
a3fdcf43
|
| 05-Aug-2019 |
Karl Rupp <me@karlrupp.net> |
CUDA: Fixed issues in MatSeqAIJCUSPARSEGenerateTransposeForMult and MatMultTransposeAdd_SeqAIJCUSPARSE
This is a cherry-pick of commits dde4751, 435e334, 1d884b8, 4e32a5a Thanks-to: Mark Adams <ma23
CUDA: Fixed issues in MatSeqAIJCUSPARSEGenerateTransposeForMult and MatMultTransposeAdd_SeqAIJCUSPARSE
This is a cherry-pick of commits dde4751, 435e334, 1d884b8, 4e32a5a Thanks-to: Mark Adams <ma2325@columbia.edu>
show more ...
|
| #
53800007
|
| 05-Aug-2019 |
Karl Rupp <me@karlrupp.net> |
CUDA: Skipping CXX complex fix.
Should fix warnings obtained with newer math functions.
This fix should be obsolete once the wrapper for GPU functionality is in place.
|
| #
082a2362
|
| 03-Aug-2019 |
Karl Rupp <me@karlrupp.net> |
Merge branch 'karlrupp/fix-cuda-empty-procs' [PR #1938]
* karlrupp/fix-cuda-empty-procs: CUDA: added guards for empty process triangular solves
|
| #
504af54e
|
| 03-Aug-2019 |
Karl Rupp <me@karlrupp.net> |
Merge branch 'karlrupp/fix-cuda-vecset' [PR #1937]
* karlrupp/fix-cuda-vecset: Remove unneeded collective VecSet from a MatMultTranspose_SeqAIJCUSPARSE
|
| #
2cff351a
|
| 03-Aug-2019 |
Karl Rupp <me@karlrupp.net> |
Merge branch 'karlrupp/fix-cusparse-transpose-numrows' [PR #1936]
* karlrupp/fix-cusparse-transpose-numrow: Fixes an incorrect dimension when transposing a CUSPARSE matrix.
|
| #
cf00fe3b
|
| 02-Aug-2019 |
Karl Rupp <me@karlrupp.net> |
CUDA: added guards for empty process triangular solves
Cherry-pick of c9cf7f9 Thanks-to: Mark Adams <ma2325@columbia.edu>
|
| #
2b551a2f
|
| 03-Jul-2019 |
Mark Adams <ma2325@columbia.edu> |
use non-collective VecSet
|
| #
a8bd5306
|
| 09-Jul-2019 |
Mark Adams <ma2325@columbia.edu> |
fixed bug
|
| #
eef58048
|
| 02-Aug-2019 |
Karl Rupp <me@karlrupp.net> |
Merge branch 'hannah/gpu-logging-WaitForGPU' [PR #1927]
* hannah/gpu-logging-WaitForGPU: Adding WaitForGPU() to GPU time This branch rearranges GPU timers so that calls to WaitForGPU() are counted t
Merge branch 'hannah/gpu-logging-WaitForGPU' [PR #1927]
* hannah/gpu-logging-WaitForGPU: Adding WaitForGPU() to GPU time This branch rearranges GPU timers so that calls to WaitForGPU() are counted towards time spend on the GPU.
show more ...
|
| #
661c2d29
|
| 31-Jul-2019 |
hannah_mairs <hannah.mairs@gmail.com> |
Adding WaitForGPU() to GPU time
|
| #
6b804ed2
|
| 30-Jul-2019 |
Karl Rupp <me@karlrupp.net> |
Merge branch 'stefano_zampini/GPU-matdensecuda' [PR #1911]
* stefano_zampini/GPU-matdensecuda: GPU: Initial implementation for SeqDense class on GPUs.
|
| #
f20a8c50
|
| 26-Jul-2019 |
Karl Rupp <me@karlrupp.net> |
Merge branch 'karlrupp/remove-ancient-cuda-version-checks' [PR #1909]
* karlrupp/remove-ancient-cuda-version-checks: Remove checks for ancient CUDA versions
|