Rename valid_GPU_array/matrix to offloadmask
Fix compilation error for nvcc in optimized code with AVX-512 (march=native on my GPU workstation)for some reason, the host compiler fails with this error message/home/zampins/Devel/petsc/include/
Fix compilation error for nvcc in optimized code with AVX-512 (march=native on my GPU workstation)for some reason, the host compiler fails with this error message/home/zampins/Devel/petsc/include/../src/mat/impls/aij/seq/aij.h(535): error: identifier "_mm512_reduce_add_pd" is undefinedThis optimized C kernel is not used in the GPU classes, so it is safe to skip its declaration
show more ...
MATSEQAIJVIENNACL: implement MatSeqAIJGetArray
MATSEQAIJVIENNACL: minor changes
MatSEQAIJ{CUSPARSE|VIENNACL}: do not copy to the GPU if not at the final stage of assembly
PetscLogGpuTimeStart -> Begin
Adding Gpu flop rate and GPU time
Merged in hannah/gpu-communication-logging (pull request #1814)Hannah/gpu communication loggingApproved-by: BarryFSmith <bsmith@mcs.anl.gov>Approved-by: Richard Mills <rtm@eecs.utk.edu>
Adding vector logging, started matrix logging
Merged in barry/cuda-multigrid-test (pull request #1763)Various improvements for GPUs (mostly for performance and CUDA)
Non-numeric optimizations focused on AIJ, MatFDColoring, and DMCreateMatrix_DA_*AIJ, looking to improve performance in GPU environments1) PetscCalloc*() now uses system calloc()2) Merged some Pets
Non-numeric optimizations focused on AIJ, MatFDColoring, and DMCreateMatrix_DA_*AIJ, looking to improve performance in GPU environments1) PetscCalloc*() now uses system calloc()2) Merged some PetscMalloc*()3) Eliminated unneeded PetscCalloc*()4) Removed some memory allocations and copies in MatFDColoringSetUp(), added local variables for better compiler optimization5) Added MatSetValues_SeqAIJ_SortedFull(), added MatSetOption(MAT_SORTED_FULL)6) Optimized DMCreateMatrix_DA_*AIJ for nonperiodic case to automatically have sorted columns (faster MatSetValues() times)7) Eliminated call to PetscMemzero() in PetscFree()Commit-type: style-fix, feature
Various improvements for GPUs (mostly for performance and CUDA)1) Add VecPinToCPU() for CUDA vector and matrices2) Move initialization of cuBLAS to PetscInitialize() since it takes 1/2 second and
Various improvements for GPUs (mostly for performance and CUDA)1) Add VecPinToCPU() for CUDA vector and matrices2) Move initialization of cuBLAS to PetscInitialize() since it takes 1/2 second and distorts timing with -log_view3) Add logging for DMCreateMatrix (for large meshes this is very large)4) Add VecGet/RestoreArrayWrite() to prevent unneeded copies from GPU (only implemented so far for CUDA); added a small number of usages in the source so that snes tutorials ex19 does not do unneeded communication from the GPU5) Automatically convert MAIJ matrices to AIJ for CUDA since they are not yet supported natively in PETSc's CUDA matrix implementation6) Pinned objects should still use the CUDA/ViennaCL versions of Destroy to clean up the GPU stuffCommit-type: feature
Update the use of Collective on in the manual pages to reflect the new styleCommit-type: style-fix, documentationThanks-to: Patrick Sanan <patrick.sanan@gmail.com>
Based on discussion with Oana I am adding a MatPinToCPU() and VecPinToGPU() capability. For matrices thiswill prevent copies to the GPU when they will never be used there. For vectors this willprev
Based on discussion with Oana I am adding a MatPinToCPU() and VecPinToGPU() capability. For matrices thiswill prevent copies to the GPU when they will never be used there. For vectors this willprevent vectors from boucing back and forth between the CPU and GPU when most of the work is in the CPU. Anexample of the place that needs to avoid bouncing is in MatFDColoringApply_XXXX()Commit-type: feature, documentation, exampleThanks-to: Oana Marin <oanam@mcs.anl.gov>
MATSEQAIJVIENNACL: fix bug in MatMult when the compressed row storage is used
MATSEQVIENNACL: fix bug in MatMult routines when the sizes or the nonzeros are zero
Mat: add slot for defaultvectypeRemove duplicated code for MatCreateVecs_XXX that can take advantage of the new defaultvectype variable
GPUs: Allow ViennaCL and VECCUDA to be used concurrently.This also gets rid of the weird veccuda.py.
CUSP: Removed VECCUSP and AIJCUSP as well as preconditioners.SA-AMG from CUSP will be re-added in a follow-up commit,made to work with AIJCUSPARSE and VECCUDA.
Fix name MatSolverPackage since it is better to be consistent and use the terminology Type.Commit-type: style-fix, documentation
ViennaCL: Added MatDuplicate()Fix similar to MatDuplicate() for CUSPARSE implemented in9ff858a8fa4d1b883cb00760b421121f0c50abc9
Remove the use and definition of __FUNCT__ throughout the codeSince all modern C/C++ compilers provide this functionality we no longer need to provide it manually in PETScTime: 1.5 hoursThanks-t
Remove the use and definition of __FUNCT__ throughout the codeSince all modern C/C++ compilers provide this functionality we no longer need to provide it manually in PETScTime: 1.5 hoursThanks-to: Andreas Mang <andreas@ices.utexas.edu>
CUSP,ViennaCL: Fixed ILU and Cholesky preconditioner fallback.The new registration mechanism for matrix factorizations was notapplied for CUSP and ViennaCL, thus the default ILU preconditioners di
CUSP,ViennaCL: Fixed ILU and Cholesky preconditioner fallback.The new registration mechanism for matrix factorizations was notapplied for CUSP and ViennaCL, thus the default ILU preconditioners did not work.This commit adds the missing registrations for these packages.
MATSEQAIJVIENNACL: remove redundant setting of ops with MatAssemblyEnd.
MATSEQAIJVIENNACL: add no-copy in-place and out-of-place conversion from SeqAIJThis extracts the portion of the creation routine which converts a SeqAIJ Mat toa SeqAIJViennaCL Mat and provides it
MATSEQAIJVIENNACL: add no-copy in-place and out-of-place conversion from SeqAIJThis extracts the portion of the creation routine which converts a SeqAIJ Mat toa SeqAIJViennaCL Mat and provides it as a conversion routine.
12345