| #
4d55d066
|
| 24-Feb-2020 |
Junchao Zhang <jczhang@mcs.anl.gov> |
Delete out-of-date comments and do better overlap
|
| #
f6516afe
|
| 03-Feb-2020 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'rmills/bindtocpu-not-pintocpu' into 'master'
Changed XXXPinToCPU() to XXXBindToCPU() to prevent confusion.
See merge request petsc/petsc!2477
|
| #
b470e4b4
|
| 03-Feb-2020 |
Richard Tran Mills <rmills@rmills.org> |
Changed XXXPinToCPU() to XXXBindToCPU() to prevent confusion.
The reason for this change is that we already use the terminology "pinned" to refer to memory that is non-pageable, in the context of Pe
Changed XXXPinToCPU() to XXXBindToCPU() to prevent confusion.
The reason for this change is that we already use the terminology "pinned" to refer to memory that is non-pageable, in the context of PetscSF as well as allocating host memory when GPUs are being employed.
show more ...
|
| #
f3e33b7c
|
| 30-Oct-2019 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'jczhang/feature-cuda-error-string' into 'master'
Map a cuda error code to its name and description
See merge request petsc/petsc!2228
|
| #
57d48284
|
| 30-Oct-2019 |
Junchao Zhang <jczhang@mcs.anl.gov> |
Map a cuda error code to its name and description
|
| #
040e670d
|
| 25-Sep-2019 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'karlrupp/fix-cuda-streams' into 'master'
GPU: Fixed incorrect use of CUDA streams, SNES ex19 and ex56 now working with CUDA
See merge request petsc/petsc!2091
|
| #
17403302
|
| 24-Sep-2019 |
Karl Rupp <me@karlrupp.net> |
CUDA: Fixed incorrect use of separate streams.
This solves synchronization problems that have arisen due to the incorrect use of multiple CUDA streams for vector and matrix operations (without using
CUDA: Fixed incorrect use of separate streams.
This solves synchronization problems that have arisen due to the incorrect use of multiple CUDA streams for vector and matrix operations (without using proper synchronization mechanisms). In particular, SNES ex19 and ex56 now run reliably (no failure after 20+ reruns). Instead, the default stream (NULL pointer) is now used for all CUDA operations. I don't have performance comparisons at hand for the performance implications in this commit, but expect any changes to be small. Correctness first :-)
show more ...
|
| #
8da4f93b
|
| 23-Sep-2019 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'stefanozampini/gpu-bddc' into 'master'
Improvements towards BDDC on GPUs
See merge request petsc/petsc!2067
|
| #
99acd6aa
|
| 22-Sep-2019 |
Stefano Zampini <stefano.zampini@gmail.com> |
Fix compilation error for nvcc in optimized code with AVX-512 (march=native on my GPU workstation)
for some reason, the host compiler fails with this error message /home/zampins/Devel/petsc/include/
Fix compilation error for nvcc in optimized code with AVX-512 (march=native on my GPU workstation)
for some reason, the host compiler fails with this error message /home/zampins/Devel/petsc/include/../src/mat/impls/aij/seq/aij.h(535): error: identifier "_mm512_reduce_add_pd" is undefined
This optimized C kernel is not used in the GPU classes, so it is safe to skip its declaration
show more ...
|
| #
29ad97fd
|
| 07-Aug-2019 |
Karl Rupp <me@karlrupp.net> |
Merge branch 'dalcinl/feature-math' [PR #1904]
* dalcinl/feature-math: Math & PetscComplex: Various enhancements - Define PetscXXXScalar to PetscXXXReal for real scalar type - Add PetscCbrtReal(), P
Merge branch 'dalcinl/feature-math' [PR #1904]
* dalcinl/feature-math: Math & PetscComplex: Various enhancements - Define PetscXXXScalar to PetscXXXReal for real scalar type - Add PetscCbrtReal(), PetscHypotReal(), and PetscAtan2Real() - Add PetscArgComplex() and PetscArgScalar() - Add PetscAtan{Real|Complex|Scalar}() - Add PetscA{sin|cos|tan}h{Real|Complex|Scalar}() - Docs: Petsc{Real|Imaginary}Part() return PetscReal - Define __fp16 constants to use "F" suffix (ie. single precision) - Fix PETSC_[SQRT_]MACHINE_EPSILON values for __fp16
PetscComplex: Remove PETSC_USE_CXX_COMPLEX_FLOAT_WORKAROUND
- Move the C++ complex fixes to its own header file - Define PETSC_SKIP_CXX_COMPLEX_FIX to skip the C++ complex fixes
show more ...
|
| #
7afe75c1
|
| 06-Aug-2019 |
Karl Rupp <me@karlrupp.net> |
Merge branch 'karlrupp/fix-cuda-MatSeqAIJCUSPARSEGenerateTransposeForMult' [PR #1948]
* karlrupp/fix-cuda-MatSeqAIJCUSPARSEGenerateTransposeForMult: CUDA: Fixed issues in MatSeqAIJCUSPARSEGenerateTr
Merge branch 'karlrupp/fix-cuda-MatSeqAIJCUSPARSEGenerateTransposeForMult' [PR #1948]
* karlrupp/fix-cuda-MatSeqAIJCUSPARSEGenerateTransposeForMult: CUDA: Fixed issues in MatSeqAIJCUSPARSEGenerateTransposeForMult and MatMultTransposeAdd_SeqAIJCUSPARSE
show more ...
|
| #
a3fdcf43
|
| 05-Aug-2019 |
Karl Rupp <me@karlrupp.net> |
CUDA: Fixed issues in MatSeqAIJCUSPARSEGenerateTransposeForMult and MatMultTransposeAdd_SeqAIJCUSPARSE
This is a cherry-pick of commits dde4751, 435e334, 1d884b8, 4e32a5a Thanks-to: Mark Adams <ma23
CUDA: Fixed issues in MatSeqAIJCUSPARSEGenerateTransposeForMult and MatMultTransposeAdd_SeqAIJCUSPARSE
This is a cherry-pick of commits dde4751, 435e334, 1d884b8, 4e32a5a Thanks-to: Mark Adams <ma2325@columbia.edu>
show more ...
|
| #
53800007
|
| 05-Aug-2019 |
Karl Rupp <me@karlrupp.net> |
CUDA: Skipping CXX complex fix.
Should fix warnings obtained with newer math functions.
This fix should be obsolete once the wrapper for GPU functionality is in place.
|
| #
b6a92dca
|
| 26-Jun-2019 |
BarryFSmith <bsmith@mcs.anl.gov> |
Merged in barry/cuda-multigrid-test (pull request #1763)
Various improvements for GPUs (mostly for performance and CUDA)
|
| #
fdc842d1
|
| 31-May-2019 |
Barry Smith <bsmith@mcs.anl.gov> |
Various improvements for GPUs (mostly for performance and CUDA)
1) Add VecPinToCPU() for CUDA vector and matrices 2) Move initialization of cuBLAS to PetscInitialize() since it takes 1/2 second and
Various improvements for GPUs (mostly for performance and CUDA)
1) Add VecPinToCPU() for CUDA vector and matrices 2) Move initialization of cuBLAS to PetscInitialize() since it takes 1/2 second and distorts timing with -log_view 3) Add logging for DMCreateMatrix (for large meshes this is very large) 4) Add VecGet/RestoreArrayWrite() to prevent unneeded copies from GPU (only implemented so far for CUDA); added a small number of usages in the source so that snes tutorials ex19 does not do unneeded communication from the GPU 5) Automatically convert MAIJ matrices to AIJ for CUDA since they are not yet supported natively in PETSc's CUDA matrix implementation 6) Pinned objects should still use the CUDA/ViennaCL versions of Destroy to clean up the GPU stuff
Commit-type: feature
show more ...
|
| #
613bfe33
|
| 02-Jun-2019 |
BarryFSmith <bsmith@mcs.anl.gov> |
Merged in barry/update-collective-on (pull request #1744)
Update the use of Collective on in the manual pages to reflect the new style
|
| #
d083f849
|
| 01-Jun-2019 |
Barry Smith <bsmith@mcs.anl.gov> |
Update the use of Collective on in the manual pages to reflect the new style
Commit-type: style-fix, documentation Thanks-to: Patrick Sanan <patrick.sanan@gmail.com>
|
| #
a041468a
|
| 06-Mar-2019 |
Lawrence Mitchell <lawrence@wence.uk> |
Merge branch 'master' into wence/feature-patch-all-at-once
|
| #
8b2e997c
|
| 24-Feb-2019 |
Karl Rupp <me@karlrupp.net> |
Merge branch 'jczhang/fix-vecscatter-cuda/maint' into maint [PR #1388]
* jczhang/fix-vecscatter-cuda/maint: CUDA vecscatter needs to take care of the ScatterMode argument
|
| #
29302ad0
|
| 24-Feb-2019 |
Karl Rupp <me@karlrupp.net> |
Merge branch 'jczhang/fix-vecscatter-cuda/maint' [PR #1388]
* jczhang/fix-vecscatter-cuda/maint: CUDA vecscatter needs to take care of the ScatterMode argument
|
| #
a5873c6d
|
| 24-Feb-2019 |
Karl Rupp <me@karlrupp.net> |
Merge branch 'jczhang/restore-error-check' [PR #1392]
* jczhang/restore-error-check: Restore an error checking line in MatMultTranspose_MPIAIJCUSPARSE
|
| #
ccf5f80b
|
| 21-Feb-2019 |
Junchao Zhang <jczhang@mcs.anl.gov> |
Restore the error checking code
|
| #
959dcdf5
|
| 19-Feb-2019 |
Junchao Zhang <jczhang@mcs.anl.gov> |
Add a ScatterMode arg in cuda vecscat to select to/from context
The old code VecScatterInitializeForGPU() initializes the pointer (PetscCUDAIndices*)&inctx->spptr) based on an input ScatterMode befo
Add a ScatterMode arg in cuda vecscat to select to/from context
The old code VecScatterInitializeForGPU() initializes the pointer (PetscCUDAIndices*)&inctx->spptr) based on an input ScatterMode before VecScatterBegin() is called.
If a vecscatter context is firstly used for a SCATTER_FORWARD, and secondly used for a SCATTER_REVERSE, there will be an error. Since in the second VecScatter, it uses out-of-date (PetscCUDAIndices*)&inctx->spptr)
The solution is "do not prematurely consider ScatterMode when building (PetscCUDAIndices*)&inctx->spptr). Instead, select correct to/from until VecScatterBegin() is called"
show more ...
|
| #
a5a49157
|
| 25-Oct-2018 |
Joseph Pusztay <josephpusztay@Josephs-MacBook-Pro.local> |
Merge branch 'master' into jpusztay/feature-swarm-symplectic-example
|
| #
e901d7f7
|
| 25-Oct-2018 |
Joseph Pusztay <josephpusztay@Josephs-MacBook-Pro.local> |
Merge branch 'master' into jpustay/feature-swarm-example
|