| aa406ff9 | 24-Nov-2020 |
Junchao Zhang <jczhang@mcs.anl.gov> |
Remove vecnode |
| 54f467a8 | 23-Nov-2020 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'jose/release/pgi-20.9-warnings' into 'release'
Fix warnings in NVIDIA compilers (formerly PGI)
See merge request petsc/petsc!3397 |
| 4d2eff18 | 19-Nov-2020 |
Satish Balay <balay@mcs.anl.gov> |
doc: fix typo
Reported-by: Massimiliano Leoni <leoni.massimiliano1@gmail.com> |
| 3685966c | 12-Nov-2020 |
Satish Balay <balay@mcs.anl.gov> |
Merge remote-tracking branch 'origin/release' into master |
| ec4bef21 | 05-Nov-2020 |
Jose E. Roman <jroman@dsic.upv.es> |
Fix warnings in NVIDIA compilers (formerly PGI): variable was never used |
| b458e8f1 | 05-Nov-2020 |
Jose E. Roman <jroman@dsic.upv.es> |
Fix warnings in NVIDIA compilers (formerly PGI): unreachable statement |
| 036c5622 | 11-Nov-2020 |
Barry Smith <bsmith@mcs.anl.gov> |
VecMDot_SeqCUDA combine all the memory copies from GPU to CPU into a single copy
The time for VecMDot_SeqCUDA in snes/tutorials/ex19 -da_refine 7 -dm_mat_type aijcusparse -dm_vec_type cuda -pc_type
VecMDot_SeqCUDA combine all the memory copies from GPU to CPU into a single copy
The time for VecMDot_SeqCUDA in snes/tutorials/ex19 -da_refine 7 -dm_mat_type aijcusparse -dm_vec_type cuda -pc_type none -log_view dropped by 9% on the UTK xSDK machine
The GPU timings for VecMDot_SeqCUDA are now just for the GPU computation and no longer include the copy to the CPU or CPU computations
Log scalar copies between CPU and GPU for CUDA vector operations
Remove a couple unneeded WaitForGPU()
/spend 2h
show more ...
|
| 45010448 | 11-Nov-2020 |
Barry Smith <bsmith@mcs.anl.gov> |
Fix timings for VecMAXPY_SeqCUDA and VecMDot_SeqCUDA.
The time for VecMAXPY_SeqCUDA for snes/tutorials/ex19 -da_refine 7 -dm_mat_type aijcusparse -dm_vec_type cuda -pc_type none -log_view improved b
Fix timings for VecMAXPY_SeqCUDA and VecMDot_SeqCUDA.
The time for VecMAXPY_SeqCUDA for snes/tutorials/ex19 -da_refine 7 -dm_mat_type aijcusparse -dm_vec_type cuda -pc_type none -log_view improved by 16 percent on the UTK xSDK machine from removing all the individual WaitForGPU() and timer calls.
Fix flops for VecMAXPY_SeqCUDA, previously it double counted all flops because it call VecAXPY_SeqCUDA() which also counted flops.
Log the GPU to CPU copies in VecMDot_SeqCUDA
Note the GPU timings for VecMDot_SeqCUDA now include the copy to CPU and CPU computations, because otherwise one would need to wait for each kernel to complete on the CPU to get the timings. Previously the time recorded was only the kernel launch time leading to huge wrong flop rates.
/spend 1.3h
show more ...
|
| 61bf59e3 | 09-Nov-2020 |
Junchao Zhang <jczhang@mcs.anl.gov> |
Fix PetscErrorCode typos in Fortran stubs |
| 252985ae | 03-Nov-2020 |
Junchao Zhang <jczhang@mcs.anl.gov> |
Add zvscat.c to makefile |
| 9babe2dd | 06-Nov-2020 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'jczhang/add-WaitForKokkos' into 'master'
Add WaitForKokkos to AIJKOKKOS
See merge request petsc/petsc!3378 |
| 017c806d | 05-Nov-2020 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'stefanozampini/feature-mataij-create-fromcoo' into 'master'
Fast GPU assembly from COO data
See merge request petsc/petsc!3362 |
| 00b38f4f | 05-Nov-2020 |
Satish Balay <balay@mcs.anl.gov> |
Merge remote-tracking branch 'origin/release' into master |
| a00c7f43 | 05-Nov-2020 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'jczhang/fix-sfcuda-int64-atomics' into 'release'
Fix device atomics with 64-bit indices and prefer long long over int64_t
See merge request petsc/petsc!3388 |
| f9e47d40 | 03-Nov-2020 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'release' into master |
| fe3444ab | 03-Nov-2020 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'stefanozampini/fix-kokkos' into 'release'
Some minor fixes to KOKKOS
See merge request petsc/petsc!3373 |
| 874d28e3 | 03-Nov-2020 |
Junchao Zhang <jczhang@mcs.anl.gov> |
Fix device atomics with 64-bit indices and prefer long long over int64_t
CUDA uses long long int, which is a different type than int64_t in function overloading
Reported-by: Stefano Zampini <stefan
Fix device atomics with 64-bit indices and prefer long long over int64_t
CUDA uses long long int, which is a different type than int64_t in function overloading
Reported-by: Stefano Zampini <stefano.zampini@gmail.com>
show more ...
|
| 033aa4b1 | 27-Oct-2020 |
Stefano Zampini <stefano.zampini@gmail.com> |
KOKKOS/CUDA: add some more tests
Vec ex4: kokkos test
KSP test ex60: added PCASM CUDA and KOKKOS tests |
| 5407e870 | 22-Oct-2020 |
Stefano Zampini <stefano.zampini@gmail.com> |
VEQSEQCUDA: fix VecGet/RestoreLocalVector |
| 80b62af8 | 02-Nov-2020 |
Satish Balay <balay@mcs.anl.gov> |
Merge remote-tracking branch 'origin/release' into master |
| f2012a66 | 02-Nov-2020 |
Barry Smith <bsmith@mcs.anl.gov> |
Fix word usage
Commit-type: docs-only /spend 1m Reported-by: Massimiliano Leoni <leoni.massimiliano1@gmail.com> |
| 72fa4726 | 18-Oct-2020 |
Stefano Zampini <stefano.zampini@gmail.com> |
Minor |
| 0d8a268c | 29-Oct-2020 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'jczhang/clean-up-vecscatter' into 'master'
typedef PetscSF VecScatter
See merge request petsc/petsc!3365 |
| bb2d6e60 | 28-Oct-2020 |
Junchao Zhang <jczhang@mcs.anl.gov> |
Add WaitForKokkos to AIJKOKKOS |
| 00b03749 | 27-Oct-2020 |
Stefano Zampini <stefano.zampini@gmail.com> |
VecAXPY_SeqKokkos: both vectors must be of type KOKKOS |