| #
042217e8
|
| 10-Jun-2021 |
Barry Smith <bsmith@mcs.anl.gov> |
MatSetValuesDevice: Cleanup and simplify code, including example
User reported crash of example code. Kernel was passed an ierr that lived in CPU memory
MatSetValuesDevice: do not include private h
MatSetValuesDevice: Cleanup and simplify code, including example
User reported crash of example code. Kernel was passed an ierr that lived in CPU memory
MatSetValuesDevice: do not include private headers from public headers
Feature: MatSetValuesDevice determines automatically from the context (where it is included from) if it is being used from C, CUDA, or Kokkos, PETSC_DEVICE_FUNC_DEC no longer needs to be set before including petscaijdevice.h
Feature: MatSetValuesDevice() now ignores all values outside the global column range.
PetscSplitCSRDataStructure is now a pointer, not a struct, like most PETSc objects, please leave it that way.
Fix all uses of CTABLE that were related to the original MatSetValuesDevice()
Have atomicAdd use Kokkos atomic-add with CPU build when building with Kokkos.
Cuda should now work with --download-openmpi, this is done by updating updateCompilers() to rerun portions of packages/cuda.py after the compilers are reset to use MPI wrappers. This is needed because the resetting of the compilers removes all the compiler flags and packages/cuda.py sets certain values into these flags that was previously lost.
Add MPICXX_INCLUDES, MPICXX_LIBS to fix compile targets for Kokkos examples
'make check' now runs properly for Kokkos test of src/snes/ex3k, fixed bug in the makefile wrt MPI_IS_MPIUNI check
Testing makefile rules: add ex*cu binaries to clean rule
Reported-by: Sam Fagbemi <samkorede24@gmail.com> Thanks-to: Stefano Zampini <stefano.zampini@gmail.com> Thanks-to: Mark Adams <mfadams@lbl.gov>
/spend 16h
show more ...
|
| #
7e8381f9
|
| 18-Oct-2020 |
Stefano Zampini <stefano.zampini@gmail.com> |
MATCUSPARSE: Implement fast assembly from COO data
|
| #
12c380df
|
| 28-Sep-2020 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'adams/feature-mat-cuda' into 'master'
Adams/feature mat cuda
See merge request petsc/petsc!3137
|
| #
3fa6b06a
|
| 02-Sep-2020 |
Mark Adams <mfadams@lbl.gov> |
add cuda matrix for meta data method, assembly
|
| #
3c4168dc
|
| 13-Aug-2020 |
Satish Balay <balay@mcs.anl.gov> |
Merge remote-tracking branch 'origin/maint'
|
| #
9010c9d9
|
| 13-Aug-2020 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'balay/mv-private-cudavecimpl/maint' into 'maint'
cudavecimpl.h move to include/petsc/private as its used by private/sfimpl.h
See merge request petsc/petsc!3044
|
| #
303a667b
|
| 12-Aug-2020 |
Satish Balay <balay@mcs.anl.gov> |
cudavecimpl.h move to include/petsc/private as its used by private/sfimpl.h
Reported-by: Nidish <nb25@rice.edu>
|
| #
e366c154
|
| 15-Nov-2017 |
Jed Brown <jed@jedbrown.org> |
Merge commit 'd47bf9aaf1e5266cc3f0ff499f934c85788965a9' into jed/fix-matcreatempibaij/maint
Obtain changed (v3.5) handling of MatMPIBAIJSetPreallocationCSR to match documentation.
|
| #
35d70571
|
| 04-May-2016 |
Stefano Zampini <stefano.zampini@gmail.com> |
Merge branch 'master' into stefano_zampini/feature-pcbddc-saddlepoint
Conflicts: src/ksp/pc/impls/bddc/bddc.c src/ksp/pc/impls/bddc/bddcgraph.c src/ksp/pc/impls/bddc/bddcprivate.c
|
| #
52774845
|
| 06-Apr-2016 |
Toby Isaac <tisaac@uchicago.edu> |
Merge branch 'tisaac/sf-fix-multi-sf-leaves' into tisaac/dmp4est-feature-injection
* tisaac/sf-fix-multi-sf-leaves: (174 commits) PetscSF: fix PetscSFGetMultiSF() for sparse leaves PetscSF: make
Merge branch 'tisaac/sf-fix-multi-sf-leaves' into tisaac/dmp4est-feature-injection
* tisaac/sf-fix-multi-sf-leaves: (174 commits) PetscSF: fix PetscSFGetMultiSF() for sparse leaves PetscSF: make regression test that fails few more clang static analyzer fixes many issues detected by clang static analyzer Update hypre to its release version Updates for xSDKTrilinos builds Added --download-xsdktrilinos Communicator passed to VecCreateSeq should be PETSC_COMM_SELF PCMG man page: fix typo in options name Bib: Added refs DMLabel: Removed unused variable Metis tries to use backtrace which requires -lexecinfo on some systems such as freebsd Remove -lexecinfo from freebsd configure files since it is only needed by metis add alternative output that occurs on some systems due to use of random number generator in partitioner Plex: When creating cohesive submeshes, fix memory leak Plex ex11: Updated output for new distribution algorithm Plex: When explicitly distributing the depth label, we must recreate empty strata Plex: When partitioning, the cell numbering should include hybrid points Plex: Allow cell and vertex numberings to include hybrid points need to show diffs of ex22 if it fails Support two different output for test example, since due to roundoff this example can produce slightly different convergence history ...
show more ...
|
| #
171b80e0
|
| 06-Apr-2016 |
Matthew G. Knepley <knepley@gmail.com> |
Merge branch 'master' into sanderarens/fix-plex-neumann-bc
* master: (28 commits) few more clang static analyzer fixes many issues detected by clang static analyzer Update hypre to its release
Merge branch 'master' into sanderarens/fix-plex-neumann-bc
* master: (28 commits) few more clang static analyzer fixes many issues detected by clang static analyzer Update hypre to its release version Updates for xSDKTrilinos builds Added --download-xsdktrilinos DMLabel: Removed unused variable Plex: When creating cohesive submeshes, fix memory leak Plex ex11: Updated output for new distribution algorithm Plex: When explicitly distributing the depth label, we must recreate empty strata Plex: When partitioning, the cell numbering should include hybrid points Plex: Allow cell and vertex numberings to include hybrid points Plex ex1: Fixed output for empty strata DMLabel: Added DMLabelHasStratum() Minor changes and fix to bugs introduced in 82f73ecaa Modified makefile to fix aijcusparse tests Add missing code for the case PETSC_HAVE_VECCUDA Fix bugs introduced in 82f73ecaa Use VECCUDA with MATAIJCUSPARSE CUDA: Fixed visibility and disabled Fortran bindings for VecScatters on GPU. Replace kernels with thrust ...
show more ...
|
| #
cc442fca
|
| 05-Apr-2016 |
Karl Rupp <rupp@iue.tuwien.ac.at> |
Merge branch 'pr421/alex/feature-veccuda'
* pr421/alex/feature-veccuda The rationale of this pull request is to have GPU-enabled vectors purely based on CUDA, and with the possibility of placing a u
Merge branch 'pr421/alex/feature-veccuda'
* pr421/alex/feature-veccuda The rationale of this pull request is to have GPU-enabled vectors purely based on CUDA, and with the possibility of placing a user-provided array on the GPU side.
Conflicts: src/vec/vec/impls/seq/seqcusp/cuspvecimpl.h src/vec/vec/impls/seq/seqviennacl/viennaclvecimpl.h
show more ...
|
| #
c41cb2e2
|
| 16-Mar-2016 |
Alejandro Lamas Daviña <alejandro.lamas@dsic.upv.es> |
Use VECCUDA with MATAIJCUSPARSE
|
| #
e1b06f76
|
| 20-May-2015 |
Matthew G. Knepley <knepley@gmail.com> |
Merge branch 'master' into knepley/solkx
* master: (6933 commits) Bib: Added reference Includegraph: Updated for compatibility with new folder layout. Add SNESLineSearchReason to fortran inclu
Merge branch 'master' into knepley/solkx
* master: (6933 commits) Bib: Added reference Includegraph: Updated for compatibility with new folder layout. Add SNESLineSearchReason to fortran includes Also truncate names to fit 32 char fortran limit fix compile warnings /usr/home/balay/petsc.clone-2/src/vec/vec/interface/vector.c:1944: warning: division by zero in '1.0e+0 / 0.' /usr/home/balay/petsc.clone-2/src/ksp/ksp/interface/itfunc.c:508: warning: 'vec_rhs' may be used uninitialized in this function updated output for new trust region initial size initial size of the trust region is set as a percent of the norm of the initial guess, it should not be related to the initial norm of the function (which is kind of nuts). The default bounds for SNESVISetVariableBounds() in the manual page were reversed from correct values Bib: Added refs Bib: Update MPICH webpage Plex: Forgot to close file removed nonexistent ex46f from makefile test rule Revert "fixed bad merge into master" fixed bad merge into master fix memory leak in mkl_pardiso fix some formatting in mkl_pardiso code mv runex111 from TESTEXAMPLES_C to TESTEXAMPLES_DATAFILESPATH Mat+Doc: More Fortran docs Plex ex3: Run tests with the correct number of field components Bib: Added TetGen cite parmetis: make sure 'ldd libparmetis.so' points to the correct libmetis.so that its linked with. small fix as reported by the nightly test ...
Conflicts: config/builder.py
show more ...
|
| #
898446f9
|
| 11-Mar-2015 |
Shri Abhyankar <abhyshr@mcs.anl.gov> |
Merge branch 'master' into shri/ts-is-for-differential-variables
Conflicts: include/petsc-private/tsimpl.h src/ts/impls/eimex/eimex.c src/ts/interface/ts.c
|
| #
9c925a2c
|
| 18-Mar-2014 |
Shri <abhyshr@mcs.anl.gov> |
Merge branch 'master' into shri/ts-events
Conflicts: src/ts/interface/ts.c
|
| #
edbbd480
|
| 10-Dec-2013 |
Barry Smith <bsmith@mcs.anl.gov> |
Merge branch 'master' into barry/xcode
|
| #
256ff83f
|
| 11-Sep-2013 |
Barry Smith <bsmith@mcs.anl.gov> |
Merge branch 'master' into barry/wirth-fusion-materials
Conflicts: src/ts/examples/tutorials/advection-diffusion-reaction/ex10.c
|
| #
cc85fe4d
|
| 04-Sep-2013 |
Barry Smith <bsmith@mcs.anl.gov> |
Merge branch 'barry/dmvecmattypes' into barry/saws
Needed to work with version of PETSc that did not have constant calls to VecSetFromOptions() etc
Conflicts: src/ksp/ksp/interface/ams/kspams.c s
Merge branch 'barry/dmvecmattypes' into barry/saws
Needed to work with version of PETSc that did not have constant calls to VecSetFromOptions() etc
Conflicts: src/ksp/ksp/interface/ams/kspams.c src/snes/impls/composite/snescomposite.c src/snes/impls/gs/snesgs.c src/snes/impls/nasm/nasm.c src/snes/impls/ngmres/snesngmres.c
show more ...
|
| #
c0c93d0e
|
| 28-Aug-2013 |
Matthew G. Knepley <knepley@gmail.com> |
Merge branch 'master' into knepley/feature-dmda-section
* master: (287 commits) Mat ex170: Comments VTK: Small fix to error message (.vts to .vtu) VTK: Small fix to error message Fixed bib e
Merge branch 'master' into knepley/feature-dmda-section
* master: (287 commits) Mat ex170: Comments VTK: Small fix to error message (.vts to .vtu) VTK: Small fix to error message Fixed bib entries Bib: Updates AO: fix erroneous processing of -ao_view and factor into AOViewFromOptions doc: fix named argument in {Vec,Mat,DM}ViewFromOptions Sys: add PetscDataTypeFromString() and test code Mat: Should say that it has a nullspace in MatView() parms: update tarball with fix for namespace conflict with metis fix citation 'Golub_Varga_1961' parmetis: update tarball to parmetis-4.0.2-p5 which fixes an install issue with cygwin Sys Logging: revert parent traversal fixed hdf5.py so that if self.libraries.compression is None the code still runs correctly DMDA: fix bad cast of DM_DA to PetscObject MatClique: follow DistMultiVec API changes MatClique: remove unused variables config cmakeboot: add C++ flags any time compiler is available config OpenMP: check for C++ flag any time the compiler is available replaced all left-over uses of a single PetscMalloc() to allocated multiple arrays: replaced with PetscMallocN() The only ones left are when the second array is set into the first array and one ugly usage in the MUMPS interface that cannot be easily fixed ...
show more ...
|
| #
b0418fcf
|
| 25-Jul-2013 |
Stefano Zampini <stefano.zampini@gmail.com> |
Merge remote-tracking branch 'origin/master' into stefano_zampini/pcbddc-improvelocalsolvers
|
| #
8533652c
|
| 25-Jul-2013 |
Stefano Zampini <stefano.zampini@gmail.com> |
Merge remote-tracking branch 'origin/master' into stefano_zampini/pcbddc-mirrorsfix
|
| #
6daa6ed0
|
| 25-Jul-2013 |
Stefano Zampini <stefano.zampini@gmail.com> |
Merge remote-tracking branch 'origin/master' into stefano_zampini/pcbddc-constraintssetupimproved
|
| #
72cfe0ad
|
| 23-Jul-2013 |
Karl Rupp <rupp@iue.tuwien.ac.at> |
Merge branch 'paulmullowney/txpetscgpu-package-removal'
|
| #
b06137fd
|
| 27-Jun-2013 |
Paul Mullowney <paulm@txcorp.com> |
Removing TXPETSCGPU from veccusp and mpiaijcusparse
In this next step of removing TXPETSCGPU, the host-device and device-host messaging code has been significantly simplified. In particular, all met
Removing TXPETSCGPU from veccusp and mpiaijcusparse
In this next step of removing TXPETSCGPU, the host-device and device-host messaging code has been significantly simplified. In particular, all methods VecCUSPCopyToGPU/FromGPU now use a cudaMemcpyAsync with a stream (and a stream synchronize()). This never hurts you. Moreover, it can help you in the case of the multi-GPU SpMV as this data transfer will overlap with the MatMult kernel. The more signficant change comes in VecCUSPCopyToGPUSome and VecCUSPCopyFromGPUSome. In this code, the data transfer now moves the smallest contiguous set of vector data containing ALL the indices in a single asynchronous data transfer. Then, the stream containing the data transfer is synchronized (not the entire device). While this can be wasteful in terms of messaging too much data, it has shown the best scalability performance across a wide range of matrices. Lastly the simplicity of the code is a significant advantage over the old way of doing the data transfer. Some old cold in these methods is "if 0"-ed out for reference and will be cleaned up later. One final optimization in the vector code involves registering the host buffer as page locked--which is done in VecCUSPAllocateCheck. Then, the buffer must be unregistered at VecDestroy_SeqCUSP. This shows a nice speedup in the data transfer for a parallel MatMult.
Also in this commit, I am removing the TXPETSCGPU dependence from the mpiaijcusparse class--it now depends only on CUDA. In order for the same stream to be used in the MatMult and MatMultAdd (necessary for an optimal Multi-GPU SpMV), the stream is built in the mpiaijcusparse and then passed in the seqaijcusparse data structure via a new method (MatCUSPARSESetStream). A similar method is added for the CUSPARSE library handle (context) as I think the stream needs to be attached to a particular context to work properly. When running in parallel, multiple GPUs, the references to the handle in the seqaijcusparse are cleared from the mpiaijcusparse classes with the method MatCUSPARSEClearHandle. Then, the mpiaijcusparse class deletes the handle.
One other non-trivial change was made to the seqaijcusparse. The alpha and beta parameters to the SpMV are now device data which is owned by the Mat_SEQAIJCUSPARSEMultStruct structure. This enables slightly better multi-GPU performance as this data does not need to be copied to the GPU at each kernel launch.
Multi-GPU SpMV now works without TXPETSCGPU and the performance is recovered as tested on up to 4 GPUs. Code is valgrind clean and cuda-memcheck clean.
Results of tests have been modified to have 1 less digit of precision. This yields consistent results across different GPUs. Lastly, the parallel test is set to run on a different matrix (shallow_water1) so that the iteration actually converges.
show more ...
|