aijcusparse.cu - OpenGrok history log for /petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu

Revision	Date	Author	Comments
# 57d48284	30-Oct-2019	Junchao Zhang <jczhang@mcs.anl.gov>	Map a cuda error code to its name and description
# 81e64d77	06-Oct-2019	Satish Balay <balay@mcs.anl.gov>	Merge branch 'maint'
# 6881a170	06-Oct-2019	Satish Balay <balay@mcs.anl.gov>	Merge branch 'jczhang/fix-valid-gpu-array' into maint Rename: v->valid_GPU_array/matrix==> v->offloadmask and PetscOffloadFlag==>PetscOffloadMask See merge request petsc/petsc!2141
# c70f7ee4	02-Oct-2019	Junchao Zhang <jczhang@mcs.anl.gov>	Rename valid_GPU_array/matrix to offloadmask
# 8311d4d7	29-Sep-2019	Satish Balay <balay@mcs.anl.gov>	Merge branch 'mark/fix-adhoc-cuda-bug' into 'master' ad hoc fix for cuda bug in mat-transpose-mult See merge request petsc/petsc!2117
# f6ae8131	29-Sep-2019	Mark <cal2princeton@yahoo.com>	ad hoc fix for cuda bug in mat-transpose-mult
# 040e670d	25-Sep-2019	Satish Balay <balay@mcs.anl.gov>	Merge branch 'karlrupp/fix-cuda-streams' into 'master' GPU: Fixed incorrect use of CUDA streams, SNES ex19 and ex56 now working with CUDA See merge request petsc/petsc!2091
# 17403302	24-Sep-2019	Karl Rupp <me@karlrupp.net>	CUDA: Fixed incorrect use of separate streams. This solves synchronization problems that have arisen due to the incorrect use of multiple CUDA streams for vector and matrix operations (without using CUDA: Fixed incorrect use of separate streams. This solves synchronization problems that have arisen due to the incorrect use of multiple CUDA streams for vector and matrix operations (without using proper synchronization mechanisms). In particular, SNES ex19 and ex56 now run reliably (no failure after 20+ reruns). Instead, the default stream (NULL pointer) is now used for all CUDA operations. I don't have performance comparisons at hand for the performance implications in this commit, but expect any changes to be small. Correctness first :-) show more ...
# 8da4f93b	23-Sep-2019	Satish Balay <balay@mcs.anl.gov>	Merge branch 'stefanozampini/gpu-bddc' into 'master' Improvements towards BDDC on GPUs See merge request petsc/petsc!2067
# 99acd6aa	22-Sep-2019	Stefano Zampini <stefano.zampini@gmail.com>	Fix compilation error for nvcc in optimized code with AVX-512 (march=native on my GPU workstation) for some reason, the host compiler fails with this error message /home/zampins/Devel/petsc/include/ Fix compilation error for nvcc in optimized code with AVX-512 (march=native on my GPU workstation) for some reason, the host compiler fails with this error message /home/zampins/Devel/petsc/include/../src/mat/impls/aij/seq/aij.h(535): error: identifier "_mm512_reduce_add_pd" is undefined This optimized C kernel is not used in the GPU classes, so it is safe to skip its declaration show more ...
# 4e4bbfaa	16-Sep-2019	Stefano Zampini <stefano.zampini@gmail.com>	MATSEQAJIJCUSPARSE: multiple fixes - Use MatMatSolve_Basic for multiple solves instead of the CPU version (need to write support for multiple solves from cusparse) - when using Cholesky, column perm MATSEQAJIJCUSPARSE: multiple fixes - Use MatMatSolve_Basic for multiple solves instead of the CPU version (need to write support for multiple solves from cusparse) - when using Cholesky, column permutation must be inverted - remove unneeded extra copy in MatSolve_ - the class does not implement PinToCPU, so remove usage of the flag show more ...
# 29ad97fd	07-Aug-2019	Karl Rupp <me@karlrupp.net>	Merge branch 'dalcinl/feature-math' [PR #1904] * dalcinl/feature-math: Math & PetscComplex: Various enhancements - Define PetscXXXScalar to PetscXXXReal for real scalar type - Add PetscCbrtReal(), P Merge branch 'dalcinl/feature-math' [PR #1904] * dalcinl/feature-math: Math & PetscComplex: Various enhancements - Define PetscXXXScalar to PetscXXXReal for real scalar type - Add PetscCbrtReal(), PetscHypotReal(), and PetscAtan2Real() - Add PetscArgComplex() and PetscArgScalar() - Add PetscAtan{Real\|Complex\|Scalar}() - Add PetscA{sin\|cos\|tan}h{Real\|Complex\|Scalar}() - Docs: Petsc{Real\|Imaginary}Part() return PetscReal - Define __fp16 constants to use "F" suffix (ie. single precision) - Fix PETSC_[SQRT_]MACHINE_EPSILON values for __fp16 PetscComplex: Remove PETSC_USE_CXX_COMPLEX_FLOAT_WORKAROUND - Move the C++ complex fixes to its own header file - Define PETSC_SKIP_CXX_COMPLEX_FIX to skip the C++ complex fixes show more ...
# 7afe75c1	06-Aug-2019	Karl Rupp <me@karlrupp.net>	Merge branch 'karlrupp/fix-cuda-MatSeqAIJCUSPARSEGenerateTransposeForMult' [PR #1948] * karlrupp/fix-cuda-MatSeqAIJCUSPARSEGenerateTransposeForMult: CUDA: Fixed issues in MatSeqAIJCUSPARSEGenerateTr Merge branch 'karlrupp/fix-cuda-MatSeqAIJCUSPARSEGenerateTransposeForMult' [PR #1948] * karlrupp/fix-cuda-MatSeqAIJCUSPARSEGenerateTransposeForMult: CUDA: Fixed issues in MatSeqAIJCUSPARSEGenerateTransposeForMult and MatMultTransposeAdd_SeqAIJCUSPARSE show more ...
# a3fdcf43	05-Aug-2019	Karl Rupp <me@karlrupp.net>	CUDA: Fixed issues in MatSeqAIJCUSPARSEGenerateTransposeForMult and MatMultTransposeAdd_SeqAIJCUSPARSE This is a cherry-pick of commits dde4751, 435e334, 1d884b8, 4e32a5a Thanks-to: Mark Adams <ma23 CUDA: Fixed issues in MatSeqAIJCUSPARSEGenerateTransposeForMult and MatMultTransposeAdd_SeqAIJCUSPARSE This is a cherry-pick of commits dde4751, 435e334, 1d884b8, 4e32a5a Thanks-to: Mark Adams <ma2325@columbia.edu> show more ...
# 53800007	05-Aug-2019	Karl Rupp <me@karlrupp.net>	CUDA: Skipping CXX complex fix. Should fix warnings obtained with newer math functions. This fix should be obsolete once the wrapper for GPU functionality is in place.
# 082a2362	03-Aug-2019	Karl Rupp <me@karlrupp.net>	Merge branch 'karlrupp/fix-cuda-empty-procs' [PR #1938] * karlrupp/fix-cuda-empty-procs: CUDA: added guards for empty process triangular solves
# 504af54e	03-Aug-2019	Karl Rupp <me@karlrupp.net>	Merge branch 'karlrupp/fix-cuda-vecset' [PR #1937] * karlrupp/fix-cuda-vecset: Remove unneeded collective VecSet from a MatMultTranspose_SeqAIJCUSPARSE
# 2cff351a	03-Aug-2019	Karl Rupp <me@karlrupp.net>	Merge branch 'karlrupp/fix-cusparse-transpose-numrows' [PR #1936] * karlrupp/fix-cusparse-transpose-numrow: Fixes an incorrect dimension when transposing a CUSPARSE matrix.
# cf00fe3b	02-Aug-2019	Karl Rupp <me@karlrupp.net>	CUDA: added guards for empty process triangular solves Cherry-pick of c9cf7f9 Thanks-to: Mark Adams <ma2325@columbia.edu>
# 2b551a2f	03-Jul-2019	Mark Adams <ma2325@columbia.edu>	use non-collective VecSet
# a8bd5306	09-Jul-2019	Mark Adams <ma2325@columbia.edu>	fixed bug
# eef58048	02-Aug-2019	Karl Rupp <me@karlrupp.net>	Merge branch 'hannah/gpu-logging-WaitForGPU' [PR #1927] * hannah/gpu-logging-WaitForGPU: Adding WaitForGPU() to GPU time This branch rearranges GPU timers so that calls to WaitForGPU() are counted t Merge branch 'hannah/gpu-logging-WaitForGPU' [PR #1927] * hannah/gpu-logging-WaitForGPU: Adding WaitForGPU() to GPU time This branch rearranges GPU timers so that calls to WaitForGPU() are counted towards time spend on the GPU. show more ...
# 661c2d29	31-Jul-2019	hannah_mairs <hannah.mairs@gmail.com>	Adding WaitForGPU() to GPU time
# 6b804ed2	30-Jul-2019	Karl Rupp <me@karlrupp.net>	Merge branch 'stefano_zampini/GPU-matdensecuda' [PR #1911] * stefano_zampini/GPU-matdensecuda: GPU: Initial implementation for SeqDense class on GPUs.
# f20a8c50	26-Jul-2019	Karl Rupp <me@karlrupp.net>	Merge branch 'karlrupp/remove-ancient-cuda-version-checks' [PR #1909] * karlrupp/remove-ancient-cuda-version-checks: Remove checks for ancient CUDA versions
1...<<11 12 13 14 151617 18 19 20 >>...28