History log of /petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu (Results 376 – 400 of 685)
Revision Date Author Comments
# 57d48284 30-Oct-2019 Junchao Zhang <jczhang@mcs.anl.gov>

Map a cuda error code to its name and description


# 81e64d77 06-Oct-2019 Satish Balay <balay@mcs.anl.gov>

Merge branch 'maint'


# 6881a170 06-Oct-2019 Satish Balay <balay@mcs.anl.gov>

Merge branch 'jczhang/fix-valid-gpu-array' into maint

Rename: v->valid_GPU_array/matrix==> v->offloadmask and PetscOffloadFlag==>PetscOffloadMask

See merge request petsc/petsc!2141


# c70f7ee4 02-Oct-2019 Junchao Zhang <jczhang@mcs.anl.gov>

Rename valid_GPU_array/matrix to offloadmask


# 8311d4d7 29-Sep-2019 Satish Balay <balay@mcs.anl.gov>

Merge branch 'mark/fix-adhoc-cuda-bug' into 'master'

ad hoc fix for cuda bug in mat-transpose-mult

See merge request petsc/petsc!2117


# f6ae8131 29-Sep-2019 Mark <cal2princeton@yahoo.com>

ad hoc fix for cuda bug in mat-transpose-mult


# 040e670d 25-Sep-2019 Satish Balay <balay@mcs.anl.gov>

Merge branch 'karlrupp/fix-cuda-streams' into 'master'

GPU: Fixed incorrect use of CUDA streams, SNES ex19 and ex56 now working with CUDA

See merge request petsc/petsc!2091


# 17403302 24-Sep-2019 Karl Rupp <me@karlrupp.net>

CUDA: Fixed incorrect use of separate streams.

This solves synchronization problems that have arisen due to the incorrect use of multiple CUDA streams for vector and matrix operations (without using

CUDA: Fixed incorrect use of separate streams.

This solves synchronization problems that have arisen due to the incorrect use of multiple CUDA streams for vector and matrix operations (without using proper synchronization mechanisms).
In particular, SNES ex19 and ex56 now run reliably (no failure after 20+ reruns).
Instead, the default stream (NULL pointer) is now used for all CUDA operations.
I don't have performance comparisons at hand for the performance implications in this commit, but expect any changes to be small.
Correctness first :-)

show more ...


# 8da4f93b 23-Sep-2019 Satish Balay <balay@mcs.anl.gov>

Merge branch 'stefanozampini/gpu-bddc' into 'master'

Improvements towards BDDC on GPUs

See merge request petsc/petsc!2067


# 99acd6aa 22-Sep-2019 Stefano Zampini <stefano.zampini@gmail.com>

Fix compilation error for nvcc in optimized code with AVX-512 (march=native on my GPU workstation)

for some reason, the host compiler fails with this error message
/home/zampins/Devel/petsc/include/

Fix compilation error for nvcc in optimized code with AVX-512 (march=native on my GPU workstation)

for some reason, the host compiler fails with this error message
/home/zampins/Devel/petsc/include/../src/mat/impls/aij/seq/aij.h(535): error: identifier "_mm512_reduce_add_pd" is undefined

This optimized C kernel is not used in the GPU classes, so it is safe to skip its declaration

show more ...


# 4e4bbfaa 16-Sep-2019 Stefano Zampini <stefano.zampini@gmail.com>

MATSEQAJIJCUSPARSE: multiple fixes

- Use MatMatSolve_Basic for multiple solves instead of the CPU version (need to write support for multiple solves from cusparse)
- when using Cholesky, column perm

MATSEQAJIJCUSPARSE: multiple fixes

- Use MatMatSolve_Basic for multiple solves instead of the CPU version (need to write support for multiple solves from cusparse)
- when using Cholesky, column permutation must be inverted
- remove unneeded extra copy in MatSolve_
- the class does not implement PinToCPU, so remove usage of the flag

show more ...


# 29ad97fd 07-Aug-2019 Karl Rupp <me@karlrupp.net>

Merge branch 'dalcinl/feature-math' [PR #1904]

* dalcinl/feature-math:
Math & PetscComplex: Various enhancements
- Define PetscXXXScalar to PetscXXXReal for real scalar type
- Add PetscCbrtReal(), P

Merge branch 'dalcinl/feature-math' [PR #1904]

* dalcinl/feature-math:
Math & PetscComplex: Various enhancements
- Define PetscXXXScalar to PetscXXXReal for real scalar type
- Add PetscCbrtReal(), PetscHypotReal(), and PetscAtan2Real()
- Add PetscArgComplex() and PetscArgScalar()
- Add PetscAtan{Real|Complex|Scalar}()
- Add PetscA{sin|cos|tan}h{Real|Complex|Scalar}()
- Docs: Petsc{Real|Imaginary}Part() return PetscReal
- Define __fp16 constants to use "F" suffix (ie. single precision)
- Fix PETSC_[SQRT_]MACHINE_EPSILON values for __fp16

PetscComplex: Remove PETSC_USE_CXX_COMPLEX_FLOAT_WORKAROUND

- Move the C++ complex fixes to its own header file
- Define PETSC_SKIP_CXX_COMPLEX_FIX to skip the C++ complex fixes

show more ...


# 7afe75c1 06-Aug-2019 Karl Rupp <me@karlrupp.net>

Merge branch 'karlrupp/fix-cuda-MatSeqAIJCUSPARSEGenerateTransposeForMult' [PR #1948]

* karlrupp/fix-cuda-MatSeqAIJCUSPARSEGenerateTransposeForMult:
CUDA: Fixed issues in MatSeqAIJCUSPARSEGenerateTr

Merge branch 'karlrupp/fix-cuda-MatSeqAIJCUSPARSEGenerateTransposeForMult' [PR #1948]

* karlrupp/fix-cuda-MatSeqAIJCUSPARSEGenerateTransposeForMult:
CUDA: Fixed issues in MatSeqAIJCUSPARSEGenerateTransposeForMult and MatMultTransposeAdd_SeqAIJCUSPARSE

show more ...


# a3fdcf43 05-Aug-2019 Karl Rupp <me@karlrupp.net>

CUDA: Fixed issues in MatSeqAIJCUSPARSEGenerateTransposeForMult and MatMultTransposeAdd_SeqAIJCUSPARSE

This is a cherry-pick of commits dde4751, 435e334, 1d884b8, 4e32a5a
Thanks-to: Mark Adams <ma23

CUDA: Fixed issues in MatSeqAIJCUSPARSEGenerateTransposeForMult and MatMultTransposeAdd_SeqAIJCUSPARSE

This is a cherry-pick of commits dde4751, 435e334, 1d884b8, 4e32a5a
Thanks-to: Mark Adams <ma2325@columbia.edu>

show more ...


# 53800007 05-Aug-2019 Karl Rupp <me@karlrupp.net>

CUDA: Skipping CXX complex fix.

Should fix warnings obtained with newer math functions.

This fix should be obsolete once the wrapper for GPU functionality is in place.


# 082a2362 03-Aug-2019 Karl Rupp <me@karlrupp.net>

Merge branch 'karlrupp/fix-cuda-empty-procs' [PR #1938]

* karlrupp/fix-cuda-empty-procs:
CUDA: added guards for empty process triangular solves


# 504af54e 03-Aug-2019 Karl Rupp <me@karlrupp.net>

Merge branch 'karlrupp/fix-cuda-vecset' [PR #1937]

* karlrupp/fix-cuda-vecset:
Remove unneeded collective VecSet from a MatMultTranspose_SeqAIJCUSPARSE


# 2cff351a 03-Aug-2019 Karl Rupp <me@karlrupp.net>

Merge branch 'karlrupp/fix-cusparse-transpose-numrows' [PR #1936]

* karlrupp/fix-cusparse-transpose-numrow:
Fixes an incorrect dimension when transposing a CUSPARSE matrix.


# cf00fe3b 02-Aug-2019 Karl Rupp <me@karlrupp.net>

CUDA: added guards for empty process triangular solves

Cherry-pick of c9cf7f9
Thanks-to: Mark Adams <ma2325@columbia.edu>


# 2b551a2f 03-Jul-2019 Mark Adams <ma2325@columbia.edu>

use non-collective VecSet


# a8bd5306 09-Jul-2019 Mark Adams <ma2325@columbia.edu>

fixed bug


# eef58048 02-Aug-2019 Karl Rupp <me@karlrupp.net>

Merge branch 'hannah/gpu-logging-WaitForGPU' [PR #1927]

* hannah/gpu-logging-WaitForGPU:
Adding WaitForGPU() to GPU time
This branch rearranges GPU timers so that calls to WaitForGPU() are counted t

Merge branch 'hannah/gpu-logging-WaitForGPU' [PR #1927]

* hannah/gpu-logging-WaitForGPU:
Adding WaitForGPU() to GPU time
This branch rearranges GPU timers so that calls to WaitForGPU() are counted towards time spend on the GPU.

show more ...


# 661c2d29 31-Jul-2019 hannah_mairs <hannah.mairs@gmail.com>

Adding WaitForGPU() to GPU time


# 6b804ed2 30-Jul-2019 Karl Rupp <me@karlrupp.net>

Merge branch 'stefano_zampini/GPU-matdensecuda' [PR #1911]

* stefano_zampini/GPU-matdensecuda:
GPU: Initial implementation for SeqDense class on GPUs.


# f20a8c50 26-Jul-2019 Karl Rupp <me@karlrupp.net>

Merge branch 'karlrupp/remove-ancient-cuda-version-checks' [PR #1909]

* karlrupp/remove-ancient-cuda-version-checks:
Remove checks for ancient CUDA versions


1...<<11121314151617181920>>...28