History log of /petsc/src/mat/impls/aij/mpi/mpicusparse/mpicusparsematimpl.h (Results 26 – 50 of 60)
Revision Date Author Comments
# 042217e8 10-Jun-2021 Barry Smith <bsmith@mcs.anl.gov>

MatSetValuesDevice: Cleanup and simplify code, including example

User reported crash of example code. Kernel was passed an ierr that lived in CPU memory

MatSetValuesDevice: do not include private h

MatSetValuesDevice: Cleanup and simplify code, including example

User reported crash of example code. Kernel was passed an ierr that lived in CPU memory

MatSetValuesDevice: do not include private headers from public headers

Feature: MatSetValuesDevice determines automatically from the context (where it is included from) if it is being used from C, CUDA, or Kokkos, PETSC_DEVICE_FUNC_DEC no longer needs to be set before including petscaijdevice.h

Feature: MatSetValuesDevice() now ignores all values outside the global column range.

PetscSplitCSRDataStructure is now a pointer, not a struct, like most PETSc objects, please leave it that way.

Fix all uses of CTABLE that were related to the original MatSetValuesDevice()

Have atomicAdd use Kokkos atomic-add with CPU build when building with Kokkos.

Cuda should now work with --download-openmpi, this is done by updating updateCompilers() to rerun portions of packages/cuda.py after the compilers are reset to use MPI wrappers. This is needed because the resetting of the compilers removes all the compiler flags and packages/cuda.py sets certain values into these flags that was previously lost.

Add MPICXX_INCLUDES, MPICXX_LIBS to fix compile targets for Kokkos examples

'make check' now runs properly for Kokkos test of src/snes/ex3k, fixed bug in the makefile wrt MPI_IS_MPIUNI check

Testing makefile rules: add ex*cu binaries to clean rule

Reported-by: Sam Fagbemi <samkorede24@gmail.com>
Thanks-to: Stefano Zampini <stefano.zampini@gmail.com>
Thanks-to: Mark Adams <mfadams@lbl.gov>

/spend 16h

show more ...


# 7e8381f9 18-Oct-2020 Stefano Zampini <stefano.zampini@gmail.com>

MATCUSPARSE: Implement fast assembly from COO data


# 12c380df 28-Sep-2020 Satish Balay <balay@mcs.anl.gov>

Merge branch 'adams/feature-mat-cuda' into 'master'

Adams/feature mat cuda

See merge request petsc/petsc!3137


# 3fa6b06a 02-Sep-2020 Mark Adams <mfadams@lbl.gov>

add cuda matrix for meta data method, assembly


# 3c4168dc 13-Aug-2020 Satish Balay <balay@mcs.anl.gov>

Merge remote-tracking branch 'origin/maint'


# 9010c9d9 13-Aug-2020 Satish Balay <balay@mcs.anl.gov>

Merge branch 'balay/mv-private-cudavecimpl/maint' into 'maint'

cudavecimpl.h move to include/petsc/private as its used by private/sfimpl.h

See merge request petsc/petsc!3044


# 303a667b 12-Aug-2020 Satish Balay <balay@mcs.anl.gov>

cudavecimpl.h move to include/petsc/private as its used by private/sfimpl.h

Reported-by: Nidish <nb25@rice.edu>


# e366c154 15-Nov-2017 Jed Brown <jed@jedbrown.org>

Merge commit 'd47bf9aaf1e5266cc3f0ff499f934c85788965a9' into jed/fix-matcreatempibaij/maint

Obtain changed (v3.5) handling of MatMPIBAIJSetPreallocationCSR to match
documentation.


# 35d70571 04-May-2016 Stefano Zampini <stefano.zampini@gmail.com>

Merge branch 'master' into stefano_zampini/feature-pcbddc-saddlepoint

Conflicts:
src/ksp/pc/impls/bddc/bddc.c
src/ksp/pc/impls/bddc/bddcgraph.c
src/ksp/pc/impls/bddc/bddcprivate.c


# 52774845 06-Apr-2016 Toby Isaac <tisaac@uchicago.edu>

Merge branch 'tisaac/sf-fix-multi-sf-leaves' into tisaac/dmp4est-feature-injection

* tisaac/sf-fix-multi-sf-leaves: (174 commits)
PetscSF: fix PetscSFGetMultiSF() for sparse leaves
PetscSF: make

Merge branch 'tisaac/sf-fix-multi-sf-leaves' into tisaac/dmp4est-feature-injection

* tisaac/sf-fix-multi-sf-leaves: (174 commits)
PetscSF: fix PetscSFGetMultiSF() for sparse leaves
PetscSF: make regression test that fails
few more clang static analyzer fixes
many issues detected by clang static analyzer
Update hypre to its release version
Updates for xSDKTrilinos builds
Added --download-xsdktrilinos
Communicator passed to VecCreateSeq should be PETSC_COMM_SELF
PCMG man page: fix typo in options name
Bib: Added refs
DMLabel: Removed unused variable
Metis tries to use backtrace which requires -lexecinfo on some systems such as freebsd Remove -lexecinfo from freebsd configure files since it is only needed by metis
add alternative output that occurs on some systems due to use of random number generator in partitioner
Plex: When creating cohesive submeshes, fix memory leak
Plex ex11: Updated output for new distribution algorithm
Plex: When explicitly distributing the depth label, we must recreate empty strata
Plex: When partitioning, the cell numbering should include hybrid points
Plex: Allow cell and vertex numberings to include hybrid points
need to show diffs of ex22 if it fails
Support two different output for test example, since due to roundoff this example can produce slightly different convergence history
...

show more ...


# 171b80e0 06-Apr-2016 Matthew G. Knepley <knepley@gmail.com>

Merge branch 'master' into sanderarens/fix-plex-neumann-bc

* master: (28 commits)
few more clang static analyzer fixes
many issues detected by clang static analyzer
Update hypre to its release

Merge branch 'master' into sanderarens/fix-plex-neumann-bc

* master: (28 commits)
few more clang static analyzer fixes
many issues detected by clang static analyzer
Update hypre to its release version
Updates for xSDKTrilinos builds
Added --download-xsdktrilinos
DMLabel: Removed unused variable
Plex: When creating cohesive submeshes, fix memory leak
Plex ex11: Updated output for new distribution algorithm
Plex: When explicitly distributing the depth label, we must recreate empty strata
Plex: When partitioning, the cell numbering should include hybrid points
Plex: Allow cell and vertex numberings to include hybrid points
Plex ex1: Fixed output for empty strata
DMLabel: Added DMLabelHasStratum()
Minor changes and fix to bugs introduced in 82f73ecaa
Modified makefile to fix aijcusparse tests
Add missing code for the case PETSC_HAVE_VECCUDA
Fix bugs introduced in 82f73ecaa
Use VECCUDA with MATAIJCUSPARSE
CUDA: Fixed visibility and disabled Fortran bindings for VecScatters on GPU.
Replace kernels with thrust
...

show more ...


# cc442fca 05-Apr-2016 Karl Rupp <rupp@iue.tuwien.ac.at>

Merge branch 'pr421/alex/feature-veccuda'

* pr421/alex/feature-veccuda
The rationale of this pull request is to have GPU-enabled vectors purely based on CUDA,
and with the possibility of placing a u

Merge branch 'pr421/alex/feature-veccuda'

* pr421/alex/feature-veccuda
The rationale of this pull request is to have GPU-enabled vectors purely based on CUDA,
and with the possibility of placing a user-provided array on the GPU side.

Conflicts:
src/vec/vec/impls/seq/seqcusp/cuspvecimpl.h
src/vec/vec/impls/seq/seqviennacl/viennaclvecimpl.h

show more ...


# c41cb2e2 16-Mar-2016 Alejandro Lamas Daviña <alejandro.lamas@dsic.upv.es>

Use VECCUDA with MATAIJCUSPARSE


# e1b06f76 20-May-2015 Matthew G. Knepley <knepley@gmail.com>

Merge branch 'master' into knepley/solkx

* master: (6933 commits)
Bib: Added reference
Includegraph: Updated for compatibility with new folder layout.
Add SNESLineSearchReason to fortran inclu

Merge branch 'master' into knepley/solkx

* master: (6933 commits)
Bib: Added reference
Includegraph: Updated for compatibility with new folder layout.
Add SNESLineSearchReason to fortran includes Also truncate names to fit 32 char fortran limit
fix compile warnings /usr/home/balay/petsc.clone-2/src/vec/vec/interface/vector.c:1944: warning: division by zero in '1.0e+0 / 0.' /usr/home/balay/petsc.clone-2/src/ksp/ksp/interface/itfunc.c:508: warning: 'vec_rhs' may be used uninitialized in this function
updated output for new trust region initial size
initial size of the trust region is set as a percent of the norm of the initial guess, it should not be related to the initial norm of the function (which is kind of nuts).
The default bounds for SNESVISetVariableBounds() in the manual page were reversed from correct values
Bib: Added refs
Bib: Update MPICH webpage
Plex: Forgot to close file
removed nonexistent ex46f from makefile test rule
Revert "fixed bad merge into master"
fixed bad merge into master
fix memory leak in mkl_pardiso fix some formatting in mkl_pardiso code
mv runex111 from TESTEXAMPLES_C to TESTEXAMPLES_DATAFILESPATH
Mat+Doc: More Fortran docs
Plex ex3: Run tests with the correct number of field components
Bib: Added TetGen cite
parmetis: make sure 'ldd libparmetis.so' points to the correct libmetis.so that its linked with.
small fix as reported by the nightly test
...

Conflicts:
config/builder.py

show more ...


# 898446f9 11-Mar-2015 Shri Abhyankar <abhyshr@mcs.anl.gov>

Merge branch 'master' into shri/ts-is-for-differential-variables

Conflicts:
include/petsc-private/tsimpl.h
src/ts/impls/eimex/eimex.c
src/ts/interface/ts.c


# 9c925a2c 18-Mar-2014 Shri <abhyshr@mcs.anl.gov>

Merge branch 'master' into shri/ts-events

Conflicts:
src/ts/interface/ts.c


# edbbd480 10-Dec-2013 Barry Smith <bsmith@mcs.anl.gov>

Merge branch 'master' into barry/xcode


# 256ff83f 11-Sep-2013 Barry Smith <bsmith@mcs.anl.gov>

Merge branch 'master' into barry/wirth-fusion-materials

Conflicts:
src/ts/examples/tutorials/advection-diffusion-reaction/ex10.c


# cc85fe4d 04-Sep-2013 Barry Smith <bsmith@mcs.anl.gov>

Merge branch 'barry/dmvecmattypes' into barry/saws

Needed to work with version of PETSc that did not have constant calls to VecSetFromOptions() etc

Conflicts:
src/ksp/ksp/interface/ams/kspams.c
s

Merge branch 'barry/dmvecmattypes' into barry/saws

Needed to work with version of PETSc that did not have constant calls to VecSetFromOptions() etc

Conflicts:
src/ksp/ksp/interface/ams/kspams.c
src/snes/impls/composite/snescomposite.c
src/snes/impls/gs/snesgs.c
src/snes/impls/nasm/nasm.c
src/snes/impls/ngmres/snesngmres.c

show more ...


# c0c93d0e 28-Aug-2013 Matthew G. Knepley <knepley@gmail.com>

Merge branch 'master' into knepley/feature-dmda-section

* master: (287 commits)
Mat ex170: Comments
VTK: Small fix to error message (.vts to .vtu)
VTK: Small fix to error message
Fixed bib e

Merge branch 'master' into knepley/feature-dmda-section

* master: (287 commits)
Mat ex170: Comments
VTK: Small fix to error message (.vts to .vtu)
VTK: Small fix to error message
Fixed bib entries
Bib: Updates
AO: fix erroneous processing of -ao_view and factor into AOViewFromOptions
doc: fix named argument in {Vec,Mat,DM}ViewFromOptions
Sys: add PetscDataTypeFromString() and test code
Mat: Should say that it has a nullspace in MatView()
parms: update tarball with fix for namespace conflict with metis
fix citation 'Golub_Varga_1961'
parmetis: update tarball to parmetis-4.0.2-p5 which fixes an install issue with cygwin
Sys Logging: revert parent traversal
fixed hdf5.py so that if self.libraries.compression is None the code still runs correctly
DMDA: fix bad cast of DM_DA to PetscObject
MatClique: follow DistMultiVec API changes
MatClique: remove unused variables
config cmakeboot: add C++ flags any time compiler is available
config OpenMP: check for C++ flag any time the compiler is available
replaced all left-over uses of a single PetscMalloc() to allocated multiple arrays: replaced with PetscMallocN() The only ones left are when the second array is set into the first array and one ugly usage in the MUMPS interface that cannot be easily fixed
...

show more ...


# b0418fcf 25-Jul-2013 Stefano Zampini <stefano.zampini@gmail.com>

Merge remote-tracking branch 'origin/master' into stefano_zampini/pcbddc-improvelocalsolvers


# 8533652c 25-Jul-2013 Stefano Zampini <stefano.zampini@gmail.com>

Merge remote-tracking branch 'origin/master' into stefano_zampini/pcbddc-mirrorsfix


# 6daa6ed0 25-Jul-2013 Stefano Zampini <stefano.zampini@gmail.com>

Merge remote-tracking branch 'origin/master' into stefano_zampini/pcbddc-constraintssetupimproved


# 72cfe0ad 23-Jul-2013 Karl Rupp <rupp@iue.tuwien.ac.at>

Merge branch 'paulmullowney/txpetscgpu-package-removal'


# b06137fd 27-Jun-2013 Paul Mullowney <paulm@txcorp.com>

Removing TXPETSCGPU from veccusp and mpiaijcusparse

In this next step of removing TXPETSCGPU, the host-device and
device-host messaging code has been significantly simplified. In
particular, all met

Removing TXPETSCGPU from veccusp and mpiaijcusparse

In this next step of removing TXPETSCGPU, the host-device and
device-host messaging code has been significantly simplified. In
particular, all methods VecCUSPCopyToGPU/FromGPU now use
a cudaMemcpyAsync with a stream (and a stream synchronize()).
This never hurts you. Moreover, it can help you in the case
of the multi-GPU SpMV as this data transfer will overlap
with the MatMult kernel. The more signficant change comes in
VecCUSPCopyToGPUSome and VecCUSPCopyFromGPUSome. In this code,
the data transfer now moves the smallest contiguous set of
vector data containing ALL the indices in a single asynchronous data
transfer. Then, the stream containing the data transfer is
synchronized (not the entire device). While this can be wasteful
in terms of messaging too much data, it has shown the best
scalability performance across a wide range of matrices. Lastly
the simplicity of the code is a significant advantage over
the old way of doing the data transfer. Some old cold
in these methods is "if 0"-ed out for reference and will be
cleaned up later. One final optimization in the vector code
involves registering the host buffer as page locked--which
is done in VecCUSPAllocateCheck. Then, the buffer must be
unregistered at VecDestroy_SeqCUSP. This shows a nice
speedup in the data transfer for a parallel MatMult.

Also in this commit, I am removing the TXPETSCGPU dependence from
the mpiaijcusparse class--it now depends only on CUDA. In order
for the same stream to be used in the MatMult and MatMultAdd
(necessary for an optimal Multi-GPU SpMV), the stream is built
in the mpiaijcusparse and then passed in the seqaijcusparse data
structure via a new method (MatCUSPARSESetStream). A similar method
is added for the CUSPARSE library handle (context) as I think the
stream needs to be attached to a particular context to work properly.
When running in parallel, multiple GPUs, the references to the handle
in the seqaijcusparse are cleared from the mpiaijcusparse classes with
the method MatCUSPARSEClearHandle. Then, the mpiaijcusparse class
deletes the handle.

One other non-trivial change was made to the seqaijcusparse. The alpha
and beta parameters to the SpMV are now device data which is owned by
the Mat_SEQAIJCUSPARSEMultStruct structure. This enables slightly better
multi-GPU performance as this data does not need to be copied to the
GPU at each kernel launch.

Multi-GPU SpMV now works without TXPETSCGPU and the performance is recovered
as tested on up to 4 GPUs. Code is valgrind clean and cuda-memcheck clean.

Results of tests have been modified to have 1 less digit of precision. This
yields consistent results across different GPUs. Lastly, the parallel test
is set to run on a different matrix (shallow_water1) so that the iteration
actually converges.

show more ...


123