cusparsematimpl.h - OpenGrok history log for /petsc/src/mat/impls/aij/seq/seqcusparse/cusparsematimpl.h

Revision	Date	Author	Comments
# 171b80e0	06-Apr-2016	Matthew G. Knepley <knepley@gmail.com>	Merge branch 'master' into sanderarens/fix-plex-neumann-bc * master: (28 commits) few more clang static analyzer fixes many issues detected by clang static analyzer Update hypre to its release Merge branch 'master' into sanderarens/fix-plex-neumann-bc * master: (28 commits) few more clang static analyzer fixes many issues detected by clang static analyzer Update hypre to its release version Updates for xSDKTrilinos builds Added --download-xsdktrilinos DMLabel: Removed unused variable Plex: When creating cohesive submeshes, fix memory leak Plex ex11: Updated output for new distribution algorithm Plex: When explicitly distributing the depth label, we must recreate empty strata Plex: When partitioning, the cell numbering should include hybrid points Plex: Allow cell and vertex numberings to include hybrid points Plex ex1: Fixed output for empty strata DMLabel: Added DMLabelHasStratum() Minor changes and fix to bugs introduced in 82f73ecaa Modified makefile to fix aijcusparse tests Add missing code for the case PETSC_HAVE_VECCUDA Fix bugs introduced in 82f73ecaa Use VECCUDA with MATAIJCUSPARSE CUDA: Fixed visibility and disabled Fortran bindings for VecScatters on GPU. Replace kernels with thrust ... show more ...
# ec42abe4	06-Apr-2016	Alejandro Lamas Daviña <alejandro.lamas@dsic.upv.es>	Enable complex scalars with VECCUDA
# cc442fca	05-Apr-2016	Karl Rupp <rupp@iue.tuwien.ac.at>	Merge branch 'pr421/alex/feature-veccuda' * pr421/alex/feature-veccuda The rationale of this pull request is to have GPU-enabled vectors purely based on CUDA, and with the possibility of placing a u Merge branch 'pr421/alex/feature-veccuda' * pr421/alex/feature-veccuda The rationale of this pull request is to have GPU-enabled vectors purely based on CUDA, and with the possibility of placing a user-provided array on the GPU side. Conflicts: src/vec/vec/impls/seq/seqcusp/cuspvecimpl.h src/vec/vec/impls/seq/seqviennacl/viennaclvecimpl.h show more ...
# c41cb2e2	16-Mar-2016	Alejandro Lamas Daviña <alejandro.lamas@dsic.upv.es>	Use VECCUDA with MATAIJCUSPARSE
# e1b06f76	20-May-2015	Matthew G. Knepley <knepley@gmail.com>	Merge branch 'master' into knepley/solkx * master: (6933 commits) Bib: Added reference Includegraph: Updated for compatibility with new folder layout. Add SNESLineSearchReason to fortran inclu Merge branch 'master' into knepley/solkx * master: (6933 commits) Bib: Added reference Includegraph: Updated for compatibility with new folder layout. Add SNESLineSearchReason to fortran includes Also truncate names to fit 32 char fortran limit fix compile warnings /usr/home/balay/petsc.clone-2/src/vec/vec/interface/vector.c:1944: warning: division by zero in '1.0e+0 / 0.' /usr/home/balay/petsc.clone-2/src/ksp/ksp/interface/itfunc.c:508: warning: 'vec_rhs' may be used uninitialized in this function updated output for new trust region initial size initial size of the trust region is set as a percent of the norm of the initial guess, it should not be related to the initial norm of the function (which is kind of nuts). The default bounds for SNESVISetVariableBounds() in the manual page were reversed from correct values Bib: Added refs Bib: Update MPICH webpage Plex: Forgot to close file removed nonexistent ex46f from makefile test rule Revert "fixed bad merge into master" fixed bad merge into master fix memory leak in mkl_pardiso fix some formatting in mkl_pardiso code mv runex111 from TESTEXAMPLES_C to TESTEXAMPLES_DATAFILESPATH Mat+Doc: More Fortran docs Plex ex3: Run tests with the correct number of field components Bib: Added TetGen cite parmetis: make sure 'ldd libparmetis.so' points to the correct libmetis.so that its linked with. small fix as reported by the nightly test ... Conflicts: config/builder.py show more ...
# 898446f9	11-Mar-2015	Shri Abhyankar <abhyshr@mcs.anl.gov>	Merge branch 'master' into shri/ts-is-for-differential-variables Conflicts: include/petsc-private/tsimpl.h src/ts/impls/eimex/eimex.c src/ts/interface/ts.c
# 9c925a2c	18-Mar-2014	Shri <abhyshr@mcs.anl.gov>	Merge branch 'master' into shri/ts-events Conflicts: src/ts/interface/ts.c
# edbbd480	10-Dec-2013	Barry Smith <bsmith@mcs.anl.gov>	Merge branch 'master' into barry/xcode
# 256ff83f	11-Sep-2013	Barry Smith <bsmith@mcs.anl.gov>	Merge branch 'master' into barry/wirth-fusion-materials Conflicts: src/ts/examples/tutorials/advection-diffusion-reaction/ex10.c
# cc85fe4d	04-Sep-2013	Barry Smith <bsmith@mcs.anl.gov>	Merge branch 'barry/dmvecmattypes' into barry/saws Needed to work with version of PETSc that did not have constant calls to VecSetFromOptions() etc Conflicts: src/ksp/ksp/interface/ams/kspams.c s Merge branch 'barry/dmvecmattypes' into barry/saws Needed to work with version of PETSc that did not have constant calls to VecSetFromOptions() etc Conflicts: src/ksp/ksp/interface/ams/kspams.c src/snes/impls/composite/snescomposite.c src/snes/impls/gs/snesgs.c src/snes/impls/nasm/nasm.c src/snes/impls/ngmres/snesngmres.c show more ...
# 459e96c1	28-Aug-2013	Matthew G. Knepley <knepley@gmail.com>	Merge branch 'master' into knepley/feature-plex-refine-3d * master: (273 commits) Mat ex170: Comments VTK: Small fix to error message (.vts to .vtu) VTK: Small fix to error message Fixed bib Merge branch 'master' into knepley/feature-plex-refine-3d * master: (273 commits) Mat ex170: Comments VTK: Small fix to error message (.vts to .vtu) VTK: Small fix to error message Fixed bib entries Bib: Updates AO: fix erroneous processing of -ao_view and factor into AOViewFromOptions doc: fix named argument in {Vec,Mat,DM}ViewFromOptions Sys: add PetscDataTypeFromString() and test code Mat: Should say that it has a nullspace in MatView() parms: update tarball with fix for namespace conflict with metis fix citation 'Golub_Varga_1961' parmetis: update tarball to parmetis-4.0.2-p5 which fixes an install issue with cygwin Sys Logging: revert parent traversal fixed hdf5.py so that if self.libraries.compression is None the code still runs correctly DMDA: fix bad cast of DM_DA to PetscObject MatClique: follow DistMultiVec API changes MatClique: remove unused variables config cmakeboot: add C++ flags any time compiler is available config OpenMP: check for C++ flag any time the compiler is available replaced all left-over uses of a single PetscMalloc() to allocated multiple arrays: replaced with PetscMallocN() The only ones left are when the second array is set into the first array and one ugly usage in the MUMPS interface that cannot be easily fixed ... Conflicts: include/petscdmplex.h show more ...
# c0c93d0e	28-Aug-2013	Matthew G. Knepley <knepley@gmail.com>	Merge branch 'master' into knepley/feature-dmda-section * master: (287 commits) Mat ex170: Comments VTK: Small fix to error message (.vts to .vtu) VTK: Small fix to error message Fixed bib e Merge branch 'master' into knepley/feature-dmda-section * master: (287 commits) Mat ex170: Comments VTK: Small fix to error message (.vts to .vtu) VTK: Small fix to error message Fixed bib entries Bib: Updates AO: fix erroneous processing of -ao_view and factor into AOViewFromOptions doc: fix named argument in {Vec,Mat,DM}ViewFromOptions Sys: add PetscDataTypeFromString() and test code Mat: Should say that it has a nullspace in MatView() parms: update tarball with fix for namespace conflict with metis fix citation 'Golub_Varga_1961' parmetis: update tarball to parmetis-4.0.2-p5 which fixes an install issue with cygwin Sys Logging: revert parent traversal fixed hdf5.py so that if self.libraries.compression is None the code still runs correctly DMDA: fix bad cast of DM_DA to PetscObject MatClique: follow DistMultiVec API changes MatClique: remove unused variables config cmakeboot: add C++ flags any time compiler is available config OpenMP: check for C++ flag any time the compiler is available replaced all left-over uses of a single PetscMalloc() to allocated multiple arrays: replaced with PetscMallocN() The only ones left are when the second array is set into the first array and one ugly usage in the MUMPS interface that cannot be easily fixed ... show more ...
# aed5ffcb	05-Aug-2013	Karl Rupp <rupp@iue.tuwien.ac.at>	Merge branch 'paulmullowney/txpetscgpu-package-removal2'
# 36d62e41	25-Jul-2013	Paul Mullowney <paulm@txcorp.com>	Removing unused enumerated parapeters from MatCUSP/CUSPARSEFormatOperation I remove MAT_CUSP_SOLVE from MatCUSPFormatOperation and MAT_CUSPARSE_SOLVE from MatCUSPARSEFormatOperation as potential par Removing unused enumerated parapeters from MatCUSP/CUSPARSEFormatOperation I remove MAT_CUSP_SOLVE from MatCUSPFormatOperation and MAT_CUSPARSE_SOLVE from MatCUSPARSEFormatOperation as potential parameters for choosing the storage of the matrix for tirangular solve. In particular, the triangular solves was always using csr format so this parameter had no effect. Moreover, I never found a case where an ellpack/hybrid format worked better. Thus I remove this as a potential parameter. However, I think it is important to keep MatCUSP/CUSPARSEFormatOperation in the code. In the future I can see a MAT_CUSPARSE_FACTOR option which uses CUSPARSE to do ilu0/ic0 factorization on the GPU. show more ...
# b0418fcf	25-Jul-2013	Stefano Zampini <stefano.zampini@gmail.com>	Merge remote-tracking branch 'origin/master' into stefano_zampini/pcbddc-improvelocalsolvers
# 8533652c	25-Jul-2013	Stefano Zampini <stefano.zampini@gmail.com>	Merge remote-tracking branch 'origin/master' into stefano_zampini/pcbddc-mirrorsfix
# 6daa6ed0	25-Jul-2013	Stefano Zampini <stefano.zampini@gmail.com>	Merge remote-tracking branch 'origin/master' into stefano_zampini/pcbddc-constraintssetupimproved
# 72cfe0ad	23-Jul-2013	Karl Rupp <rupp@iue.tuwien.ac.at>	Merge branch 'paulmullowney/txpetscgpu-package-removal'
# 2692e278	08-Jul-2013	Paul Mullowney <paulm@txcorp.com>	Adding PREPROCESSOR directives to protect ELL and HYB storage formats. I've added preprocessor directives around all code using the cusparse hybrid (or ellpack) format to only build when CUDA 4.2 or Adding PREPROCESSOR directives to protect ELL and HYB storage formats. I've added preprocessor directives around all code using the cusparse hybrid (or ellpack) format to only build when CUDA 4.2 or beyond is being used. I've also changed the documentation in a few places to reflect this. In a few places, protections were required for CUDA 5.0 (hyb2csr conversion and in the stream creation in veccusp.cu). Also adding code to the init.c that 1) checks cuda error codes and 2) sets the device flags so that memory can be registered as paged- locked via : cudaSetDeviceFlags(cudaDeviceMapHost). This should be valid for all 1.3 devices and later. Moreover, these changes allow multiple MPI threads to work on 1 GPU using cuda streams in a thread safe manner. show more ...
# b06137fd	27-Jun-2013	Paul Mullowney <paulm@txcorp.com>	Removing TXPETSCGPU from veccusp and mpiaijcusparse In this next step of removing TXPETSCGPU, the host-device and device-host messaging code has been significantly simplified. In particular, all met Removing TXPETSCGPU from veccusp and mpiaijcusparse In this next step of removing TXPETSCGPU, the host-device and device-host messaging code has been significantly simplified. In particular, all methods VecCUSPCopyToGPU/FromGPU now use a cudaMemcpyAsync with a stream (and a stream synchronize()). This never hurts you. Moreover, it can help you in the case of the multi-GPU SpMV as this data transfer will overlap with the MatMult kernel. The more signficant change comes in VecCUSPCopyToGPUSome and VecCUSPCopyFromGPUSome. In this code, the data transfer now moves the smallest contiguous set of vector data containing ALL the indices in a single asynchronous data transfer. Then, the stream containing the data transfer is synchronized (not the entire device). While this can be wasteful in terms of messaging too much data, it has shown the best scalability performance across a wide range of matrices. Lastly the simplicity of the code is a significant advantage over the old way of doing the data transfer. Some old cold in these methods is "if 0"-ed out for reference and will be cleaned up later. One final optimization in the vector code involves registering the host buffer as page locked--which is done in VecCUSPAllocateCheck. Then, the buffer must be unregistered at VecDestroy_SeqCUSP. This shows a nice speedup in the data transfer for a parallel MatMult. Also in this commit, I am removing the TXPETSCGPU dependence from the mpiaijcusparse class--it now depends only on CUDA. In order for the same stream to be used in the MatMult and MatMultAdd (necessary for an optimal Multi-GPU SpMV), the stream is built in the mpiaijcusparse and then passed in the seqaijcusparse data structure via a new method (MatCUSPARSESetStream). A similar method is added for the CUSPARSE library handle (context) as I think the stream needs to be attached to a particular context to work properly. When running in parallel, multiple GPUs, the references to the handle in the seqaijcusparse are cleared from the mpiaijcusparse classes with the method MatCUSPARSEClearHandle. Then, the mpiaijcusparse class deletes the handle. One other non-trivial change was made to the seqaijcusparse. The alpha and beta parameters to the SpMV are now device data which is owned by the Mat_SEQAIJCUSPARSEMultStruct structure. This enables slightly better multi-GPU performance as this data does not need to be copied to the GPU at each kernel launch. Multi-GPU SpMV now works without TXPETSCGPU and the performance is recovered as tested on up to 4 GPUs. Code is valgrind clean and cuda-memcheck clean. Results of tests have been modified to have 1 less digit of precision. This yields consistent results across different GPUs. Lastly, the parallel test is set to run on a different matrix (shallow_water1) so that the iteration actually converges. show more ...
# aa372e3f	20-Jun-2013	Paul Mullowney <paulm@txcorp.com>	Removal of TXPETSCGPU package from the SEQAIJCUSPARSE class In this commit, I've removed the dependence of the SEQAIJCUSPARSE class on the TXPETSCGPU package. However, other classes such as SEQAIJCU Removal of TXPETSCGPU package from the SEQAIJCUSPARSE class In this commit, I've removed the dependence of the SEQAIJCUSPARSE class on the TXPETSCGPU package. However, other classes such as SEQAIJCUSP, VECCUSP, and MPIAIJCUSPARSE, and MPIAIJCUSP still depend on that package. These dependencies will be removed in subsequent commits once the design and structure is agreed upon. The reason for this dependency removal is that SEQAIJCUSPARSE only depends on the Nvidia CUSPARSE library which comes standard with CUDA. Thus, the SEQAIJCUSPARSE class should be built whenever PETSc is built with CUDA support. This will be far more maintanable in the long term. Lastly, most of the CUSP dependencies have been removed from this class. The only remaining CUSP dependencies are in the vector data structures used in MatMult* and MatSolve* methods. These will be removed in a subsequent branch as it is not clear what the architecture should be yet. In order to accomodate all the different functionality for various Krylov solves, two new data structures were defined in cusparsematimpl.h. The first is a Mat_SeqAIJCUSPARSEMultStruct struct. This contains an opaque pointer for a matrix, a MatDescription data structure, and indices vector which will be useful in MatMultAdd functions. The second new data structure is a Mat_SeqAIJCUSPARSETriFactorStruct struct. This contains an CSR Matrix pointer, a MatDescription data structure, a solve analysis data structure and an operation type. Next, Mat_SeqAIJCUSPARSETriFactors was redefined to hold pointers to up to 4 different Mat_SeqAIJCUSPARSETriFactorStruct structs: one for lower and one for upper solves for both ILU and ICC. Two more for lower and upper solves in algorithms that require a transpose, such as BiCG. The latter two are necessary, as far as I can tell, because one doesn't know until runtime if data structures for the transpose are needed Thus, those are created on demand. Indexing vectors for reorderings are also stored in Mat_SeqAIJCUSPARSETriFactors. Lastly, Mat_SeqAIJCUSPARSE is the data structure that holds the data needed in multiply. There are 2 pointers to Mat_SeqAIJCUSPARSEMultStruct structs for MatMult and MatMultTranspose. Several auxilliary data structures like workvectors and few other necessary data for MatMult are also stored in here. One important variable, the cudaStream_t, is stored here but it is not owned. Streams are necessary for the parallel SpMV (a subsequent commit will add code setting stream variables from the MPIAIJCUSPARSE class) and the matrices used in the MatMult and MatMultAdd will then use the same stream identifier to attain optimal performance. The MPIAIJCUSPARSE class will own the stream variable which is then used in the SEQAIJCUSPARSE methods. In matregis.c as well as the petscmat.h and finclude/petscmat.h, I've changed the dependency of SEQAIJCUSPARSE to be on CUDA and not TXPETSCGPU. The test series TESTEXAMPLES_TXPETSCGPU has been changed to TESTEXAMPLES_CUDA since SEQAIJCUSPARSE only depends on CUDA as discussed above. ksp/ksp/examples/tests/ex43-aijcusparse.c has been renamed to ksp/ksp/examples/tests/ex43.c, the targets in the makefile have been changed appropriately and the results fiels are renamed. Two new test targets were added in ksp/ksp/examples/tests/makefile that test aijcusparse using bicg (and thus the MatSolveTranspose and MatMultTranspose methods) as well as bicg with reordering. The previous results for runex43_2.out (formerly runex43-aijcusparse_2.out) were wrong and so I'm committing new results that agree with CPU based computation. The code is valgrind clean and cuda-memcheck clean. show more ...
# e33c197d	11-Jun-2013	Richard Mills <rtm@eecs.utk.edu>	Merged petsc/petsc into rmills/petsc master.
# 62a20339	13-Apr-2013	Jed Brown <jed@59A2.org>	Merge branch 'jed/aijcusparse-icc' * jed/aijcusparse-icc: MatSeqAIJCUSPARSE: make private functions static txpetscgpu: upgrade to version 0.1.0 aijcusparse : fixed MatGetFactor and other small Merge branch 'jed/aijcusparse-icc' * jed/aijcusparse-icc: MatSeqAIJCUSPARSE: make private functions static txpetscgpu: upgrade to version 0.1.0 aijcusparse : fixed MatGetFactor and other small issues in this class ksp tests: new tests for ksp algorithms using aijcusparse methods vscatcusp : fixed compiler warnings MatSeqAIJCUSPARSE: white space, style issues, static and extern functions aijcusparse: ICC/Cholesky preconditioners Conflicts: src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu (PetscObjectComposeFunction usage) show more ...
# bc3f50f2	01-Apr-2013	Paul Mullowney <paulm@txcorp.com>	aijcusparse : fixed MatGetFactor and other small issues in this class Fixed MatGetFactor_seqaij_cusparse to have a more standard set of of function calls (similar to aij/seq/umfpack or superlu) for aijcusparse : fixed MatGetFactor and other small issues in this class Fixed MatGetFactor_seqaij_cusparse to have a more standard set of of function calls (similar to aij/seq/umfpack or superlu) for setting up the factorization. In particular, I replaced the scoping call to MatGetFactor_seqaij_petsc with the sequence MatCreate, MatSetSizes, MatSetType, and MatXXXSetPreallocation. With these changes, all tests that use aijcusparse class pass in optimized and debug builds. Moreover, all memory leaks have been removed. Additional small fixes to this class include the removal of unnecessary PETSC_CUDA_EXTERN_C_BEGIN/END and poor use of PETSC_COMM_WORLD in this file. Lastly, a few missing error checks around several PETSc API method calls for symmetry/hermitian tests were added. show more ...
# 087f3262	07-Apr-2013	Paul Mullowney <paulm@txcorp.com>	aijcusparse: ICC/Cholesky preconditioners This is a patch for ICC/Cholesky preconditioners on GPUs for the aijcusparse(aijcusp) class. One uses ICC by loading an upper triangular matrix into the cus aijcusparse: ICC/Cholesky preconditioners This is a patch for ICC/Cholesky preconditioners on GPUs for the aijcusparse(aijcusp) class. One uses ICC by loading an upper triangular matrix into the cusparse class, and then setting the symmetry option, i.e. ierr = MatCreateSeqAIJCUSPARSE(comm,n,m,PETSC_NULL,num_entries_per_row,&A);CHKERRQ(ierr); ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE); This contains contains both the organization and implementation of the preconditioner. The solves are done under the hood in the txpetscgpu package. Currently, the factorization is done in the aij/seq/aijfact.c routines. aijcusparse.cu makes scoping calls to these methods. Then, the matrix is rebuilt into a form that is palatable for the GPU. This is done in the method MatSeqAIJCUSPARSEBuildICCTriMatrices. [Jed] Formatting and 'static' cleanup show more ...
1 2 3 4 5 6 789