History log of /petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu (Results 601 – 625 of 685)
Revision Date Author Comments
# 324c91e4 17-Dec-2013 Peter Brune <brune@mcs.anl.gov>

Merge branch 'madams/gamg-destroy' into prbrune/pcgamg-classicalinterpolationstrategies


# 578f55a3 17-Dec-2013 Peter Brune <brune@mcs.anl.gov>

Merge branch 'master' into madams/gamg-destroy

Conflicts:
src/ksp/pc/impls/gamg/gamg.c


# 8c722d37 10-Dec-2013 Barry Smith <bsmith@mcs.anl.gov>

Merge branch 'master' into barry/reduce-dmsetup-da-memoryusage

Conflicts:
src/dm/examples/tests/ex15.c
src/dm/examples/tutorials/ex3.c
src/dm/impls/da/da2.c
src/dm/impls/da/da3.c


# a906b49b 10-Dec-2013 BarryFSmith <bsmith@mcs.anl.gov>

Merged master into barry/update-xxxviewfromoptions


# edbbd480 10-Dec-2013 Barry Smith <bsmith@mcs.anl.gov>

Merge branch 'master' into barry/xcode


# e0133700 05-Dec-2013 Barry Smith <bsmith@mcs.anl.gov>

Merge branch 'barry/saws-push-header-body' into barry/saws-options


# fb3f26da 04-Dec-2013 Jed Brown <jedbrown@mcs.anl.gov>

Merge branch 'jed/malloc-array'

Type arguments dropped from PetscMalloc[2-7], PetscNew, and PetscNewLog,
added PetscMalloc1 for allocating typed arrays, add PetscCalloc[1-7] for
allocating cleared (

Merge branch 'jed/malloc-array'

Type arguments dropped from PetscMalloc[2-7], PetscNew, and PetscNewLog,
added PetscMalloc1 for allocating typed arrays, add PetscCalloc[1-7] for
allocating cleared (zeroed) memory.

* jed/malloc-array:
Sys: Add pointer casts from (void **) in calls to PetscMalloc1()
Sys: drop explicit type arguments from PetscNew() and PetscNewLog()
Sys: add PetscCalloc[1-7]
Sys: add PetscMalloc1 macro, array allocation without redundant types
PetscMalloc[2-7]: remove type arguments, infer from pointer type
SNESComputeJacobianDefaultColor: fix uninitialized variable

show more ...


# 785e854f 03-Dec-2013 Jed Brown <jedbrown@mcs.anl.gov>

Sys: add PetscMalloc1 macro, array allocation without redundant types

The type is inferred from the pointer return type. This patch is
automated via the following script:

git grep -l 'PetscMalloc(

Sys: add PetscMalloc1 macro, array allocation without redundant types

The type is inferred from the pointer return type. This patch is
automated via the following script:

git grep -l 'PetscMalloc(.*sizeof' src | xargs perl -pi -e 's@PetscMalloc\(([^,;]*[^,; ]) *\* *sizeof\([^,;()]+\),@PetscMalloc1($1,@'

This commit contains an additional bug-fix in csrperm.c, fixing pointer
arity. The code was introduced in 2006, but the allocation could not
have been correct at any time. This probably means that
MatDuplicate_SeqAIJPERM has never been tested.

a54129beb540034ba105796c682d589e7e1111f2
Richard Tran Mills <rmills@ornl.gov>

Added MATSEQCSRPERM support for MatDuplicate() and conversion to/from
MATSEQAIJ. Note that these changes are not quite debugged.

show more ...


# 256ff83f 11-Sep-2013 Barry Smith <bsmith@mcs.anl.gov>

Merge branch 'master' into barry/wirth-fusion-materials

Conflicts:
src/ts/examples/tutorials/advection-diffusion-reaction/ex10.c


# cc85fe4d 04-Sep-2013 Barry Smith <bsmith@mcs.anl.gov>

Merge branch 'barry/dmvecmattypes' into barry/saws

Needed to work with version of PETSc that did not have constant calls to VecSetFromOptions() etc

Conflicts:
src/ksp/ksp/interface/ams/kspams.c
s

Merge branch 'barry/dmvecmattypes' into barry/saws

Needed to work with version of PETSc that did not have constant calls to VecSetFromOptions() etc

Conflicts:
src/ksp/ksp/interface/ams/kspams.c
src/snes/impls/composite/snescomposite.c
src/snes/impls/gs/snesgs.c
src/snes/impls/nasm/nasm.c
src/snes/impls/ngmres/snesngmres.c

show more ...


# 459e96c1 28-Aug-2013 Matthew G. Knepley <knepley@gmail.com>

Merge branch 'master' into knepley/feature-plex-refine-3d

* master: (273 commits)
Mat ex170: Comments
VTK: Small fix to error message (.vts to .vtu)
VTK: Small fix to error message
Fixed bib

Merge branch 'master' into knepley/feature-plex-refine-3d

* master: (273 commits)
Mat ex170: Comments
VTK: Small fix to error message (.vts to .vtu)
VTK: Small fix to error message
Fixed bib entries
Bib: Updates
AO: fix erroneous processing of -ao_view and factor into AOViewFromOptions
doc: fix named argument in {Vec,Mat,DM}ViewFromOptions
Sys: add PetscDataTypeFromString() and test code
Mat: Should say that it has a nullspace in MatView()
parms: update tarball with fix for namespace conflict with metis
fix citation 'Golub_Varga_1961'
parmetis: update tarball to parmetis-4.0.2-p5 which fixes an install issue with cygwin
Sys Logging: revert parent traversal
fixed hdf5.py so that if self.libraries.compression is None the code still runs correctly
DMDA: fix bad cast of DM_DA to PetscObject
MatClique: follow DistMultiVec API changes
MatClique: remove unused variables
config cmakeboot: add C++ flags any time compiler is available
config OpenMP: check for C++ flag any time the compiler is available
replaced all left-over uses of a single PetscMalloc() to allocated multiple arrays: replaced with PetscMallocN() The only ones left are when the second array is set into the first array and one ugly usage in the MUMPS interface that cannot be easily fixed
...

Conflicts:
include/petscdmplex.h

show more ...


# c0c93d0e 28-Aug-2013 Matthew G. Knepley <knepley@gmail.com>

Merge branch 'master' into knepley/feature-dmda-section

* master: (287 commits)
Mat ex170: Comments
VTK: Small fix to error message (.vts to .vtu)
VTK: Small fix to error message
Fixed bib e

Merge branch 'master' into knepley/feature-dmda-section

* master: (287 commits)
Mat ex170: Comments
VTK: Small fix to error message (.vts to .vtu)
VTK: Small fix to error message
Fixed bib entries
Bib: Updates
AO: fix erroneous processing of -ao_view and factor into AOViewFromOptions
doc: fix named argument in {Vec,Mat,DM}ViewFromOptions
Sys: add PetscDataTypeFromString() and test code
Mat: Should say that it has a nullspace in MatView()
parms: update tarball with fix for namespace conflict with metis
fix citation 'Golub_Varga_1961'
parmetis: update tarball to parmetis-4.0.2-p5 which fixes an install issue with cygwin
Sys Logging: revert parent traversal
fixed hdf5.py so that if self.libraries.compression is None the code still runs correctly
DMDA: fix bad cast of DM_DA to PetscObject
MatClique: follow DistMultiVec API changes
MatClique: remove unused variables
config cmakeboot: add C++ flags any time compiler is available
config OpenMP: check for C++ flag any time the compiler is available
replaced all left-over uses of a single PetscMalloc() to allocated multiple arrays: replaced with PetscMallocN() The only ones left are when the second array is set into the first array and one ugly usage in the MUMPS interface that cannot be easily fixed
...

show more ...


# aed5ffcb 05-Aug-2013 Karl Rupp <rupp@iue.tuwien.ac.at>

Merge branch 'paulmullowney/txpetscgpu-package-removal2'


# a65300a6 31-Jul-2013 Paul Mullowney <paulm@txcorp.com>

Fix to aijcusparse for ell/hyb matrices in Multi-GPU MatMult

The parameters of the matrix (i.e. the number of rows) were not being
set correctly in a parallel (Multi-GPU) MatMult for Ell/Hyb matrix

Fix to aijcusparse for ell/hyb matrices in Multi-GPU MatMult

The parameters of the matrix (i.e. the number of rows) were not being
set correctly in a parallel (Multi-GPU) MatMult for Ell/Hyb matrix
storage formats. In particular, the offdiagonal component matrix used in
MatMultAdd had the number of rows set incorrectly. The existing test,
src/ksp/ksp/examples/tests/ex43 (runex43_5) only set the diagonal
component to be in non-csr format so this wasn't caught earlier. D'oh.
I have checked all combinations of storage formats on
ksp/ksp/examples/tutorials/ex2.c. Results are entirely consistent in
serial and parallel and are tested on CUDA 5.0(2070, Fermi) and
CUDA 4.2 (1060, Tesla) architectures.

show more ...


# 36d62e41 25-Jul-2013 Paul Mullowney <paulm@txcorp.com>

Removing unused enumerated parapeters from MatCUSP/CUSPARSEFormatOperation

I remove MAT_CUSP_SOLVE from MatCUSPFormatOperation
and MAT_CUSPARSE_SOLVE from MatCUSPARSEFormatOperation as
potential par

Removing unused enumerated parapeters from MatCUSP/CUSPARSEFormatOperation

I remove MAT_CUSP_SOLVE from MatCUSPFormatOperation
and MAT_CUSPARSE_SOLVE from MatCUSPARSEFormatOperation as
potential parameters for choosing the storage of the matrix
for tirangular solve. In particular, the triangular
solves was always using csr format so this parameter had
no effect. Moreover, I never found a case where
an ellpack/hybrid format worked better. Thus I remove this
as a potential parameter. However, I think it is important
to keep MatCUSP/CUSPARSEFormatOperation in the code. In the
future I can see a MAT_CUSPARSE_FACTOR option which uses
CUSPARSE to do ilu0/ic0 factorization on the GPU.

show more ...


# b0418fcf 25-Jul-2013 Stefano Zampini <stefano.zampini@gmail.com>

Merge remote-tracking branch 'origin/master' into stefano_zampini/pcbddc-improvelocalsolvers


# 8533652c 25-Jul-2013 Stefano Zampini <stefano.zampini@gmail.com>

Merge remote-tracking branch 'origin/master' into stefano_zampini/pcbddc-mirrorsfix


# 6daa6ed0 25-Jul-2013 Stefano Zampini <stefano.zampini@gmail.com>

Merge remote-tracking branch 'origin/master' into stefano_zampini/pcbddc-constraintssetupimproved


# 72cfe0ad 23-Jul-2013 Karl Rupp <rupp@iue.tuwien.ac.at>

Merge branch 'paulmullowney/txpetscgpu-package-removal'


# 2692e278 08-Jul-2013 Paul Mullowney <paulm@txcorp.com>

Adding PREPROCESSOR directives to protect ELL and HYB storage formats.

I've added preprocessor directives around all code using the cusparse
hybrid (or ellpack) format to only build when CUDA 4.2 or

Adding PREPROCESSOR directives to protect ELL and HYB storage formats.

I've added preprocessor directives around all code using the cusparse
hybrid (or ellpack) format to only build when CUDA 4.2 or beyond is
being used. I've also changed the documentation in a few places to
reflect this. In a few places, protections were required for CUDA
5.0 (hyb2csr conversion and in the stream creation in veccusp.cu).

Also adding code to the init.c that 1) checks cuda error codes and
2) sets the device flags so that memory can be registered as paged-
locked via : cudaSetDeviceFlags(cudaDeviceMapHost). This should be
valid for all 1.3 devices and later. Moreover, these changes allow
multiple MPI threads to work on 1 GPU using cuda streams in a thread
safe manner.

show more ...


# b06137fd 27-Jun-2013 Paul Mullowney <paulm@txcorp.com>

Removing TXPETSCGPU from veccusp and mpiaijcusparse

In this next step of removing TXPETSCGPU, the host-device and
device-host messaging code has been significantly simplified. In
particular, all met

Removing TXPETSCGPU from veccusp and mpiaijcusparse

In this next step of removing TXPETSCGPU, the host-device and
device-host messaging code has been significantly simplified. In
particular, all methods VecCUSPCopyToGPU/FromGPU now use
a cudaMemcpyAsync with a stream (and a stream synchronize()).
This never hurts you. Moreover, it can help you in the case
of the multi-GPU SpMV as this data transfer will overlap
with the MatMult kernel. The more signficant change comes in
VecCUSPCopyToGPUSome and VecCUSPCopyFromGPUSome. In this code,
the data transfer now moves the smallest contiguous set of
vector data containing ALL the indices in a single asynchronous data
transfer. Then, the stream containing the data transfer is
synchronized (not the entire device). While this can be wasteful
in terms of messaging too much data, it has shown the best
scalability performance across a wide range of matrices. Lastly
the simplicity of the code is a significant advantage over
the old way of doing the data transfer. Some old cold
in these methods is "if 0"-ed out for reference and will be
cleaned up later. One final optimization in the vector code
involves registering the host buffer as page locked--which
is done in VecCUSPAllocateCheck. Then, the buffer must be
unregistered at VecDestroy_SeqCUSP. This shows a nice
speedup in the data transfer for a parallel MatMult.

Also in this commit, I am removing the TXPETSCGPU dependence from
the mpiaijcusparse class--it now depends only on CUDA. In order
for the same stream to be used in the MatMult and MatMultAdd
(necessary for an optimal Multi-GPU SpMV), the stream is built
in the mpiaijcusparse and then passed in the seqaijcusparse data
structure via a new method (MatCUSPARSESetStream). A similar method
is added for the CUSPARSE library handle (context) as I think the
stream needs to be attached to a particular context to work properly.
When running in parallel, multiple GPUs, the references to the handle
in the seqaijcusparse are cleared from the mpiaijcusparse classes with
the method MatCUSPARSEClearHandle. Then, the mpiaijcusparse class
deletes the handle.

One other non-trivial change was made to the seqaijcusparse. The alpha
and beta parameters to the SpMV are now device data which is owned by
the Mat_SEQAIJCUSPARSEMultStruct structure. This enables slightly better
multi-GPU performance as this data does not need to be copied to the
GPU at each kernel launch.

Multi-GPU SpMV now works without TXPETSCGPU and the performance is recovered
as tested on up to 4 GPUs. Code is valgrind clean and cuda-memcheck clean.

Results of tests have been modified to have 1 less digit of precision. This
yields consistent results across different GPUs. Lastly, the parallel test
is set to run on a different matrix (shallow_water1) so that the iteration
actually converges.

show more ...


# aa372e3f 20-Jun-2013 Paul Mullowney <paulm@txcorp.com>

Removal of TXPETSCGPU package from the SEQAIJCUSPARSE class

In this commit, I've removed the dependence of the SEQAIJCUSPARSE
class on the TXPETSCGPU package. However, other classes such as
SEQAIJCU

Removal of TXPETSCGPU package from the SEQAIJCUSPARSE class

In this commit, I've removed the dependence of the SEQAIJCUSPARSE
class on the TXPETSCGPU package. However, other classes such as
SEQAIJCUSP, VECCUSP, and MPIAIJCUSPARSE, and MPIAIJCUSP still depend
on that package. These dependencies will be removed in subsequent
commits once the design and structure is agreed upon.

The reason for this dependency removal is that SEQAIJCUSPARSE only
depends on the Nvidia CUSPARSE library which comes standard with CUDA.
Thus, the SEQAIJCUSPARSE class should be built whenever PETSc is
built with CUDA support. This will be far more maintanable in the
long term. Lastly, most of the CUSP dependencies have been removed
from this class. The only remaining CUSP dependencies are in the
vector data structures used in MatMult* and MatSolve* methods.
These will be removed in a subsequent branch as it is not clear
what the architecture should be yet.

In order to accomodate all the different functionality for various
Krylov solves, two new data structures were defined in cusparsematimpl.h.
The first is a Mat_SeqAIJCUSPARSEMultStruct struct. This contains an opaque pointer
for a matrix, a MatDescription data structure, and indices vector which will
be useful in MatMultAdd functions. The second new data structure is a
Mat_SeqAIJCUSPARSETriFactorStruct struct. This contains an CSR Matrix pointer,
a MatDescription data structure, a solve analysis data structure and an
operation type.

Next, Mat_SeqAIJCUSPARSETriFactors was redefined to hold pointers
to up to 4 different Mat_SeqAIJCUSPARSETriFactorStruct structs: one for lower
and one for upper solves for both ILU and ICC. Two more for lower and upper
solves in algorithms that require a transpose, such as BiCG. The
latter two are necessary, as far as I can tell, because one doesn't
know until runtime if data structures for the transpose are needed
Thus, those are created on demand. Indexing vectors for reorderings
are also stored in Mat_SeqAIJCUSPARSETriFactors.

Lastly, Mat_SeqAIJCUSPARSE is the data structure that holds the
data needed in multiply. There are 2 pointers to Mat_SeqAIJCUSPARSEMultStruct
structs for MatMult and MatMultTranspose. Several auxilliary data
structures like workvectors and few other necessary data for MatMult
are also stored in here. One important variable, the cudaStream_t,
is stored here but it is not owned. Streams are necessary for the parallel
SpMV (a subsequent commit will add code setting stream variables from
the MPIAIJCUSPARSE class) and the matrices used in the MatMult and
MatMultAdd will then use the same stream identifier to attain optimal
performance. The MPIAIJCUSPARSE class will own the stream variable which
is then used in the SEQAIJCUSPARSE methods.

In matregis.c as well as the petscmat.h and finclude/petscmat.h,
I've changed the dependency of SEQAIJCUSPARSE to be on CUDA and
not TXPETSCGPU.

The test series TESTEXAMPLES_TXPETSCGPU has been changed to
TESTEXAMPLES_CUDA since SEQAIJCUSPARSE only depends on CUDA as
discussed above. ksp/ksp/examples/tests/ex43-aijcusparse.c has
been renamed to ksp/ksp/examples/tests/ex43.c, the targets in
the makefile have been changed appropriately and the results
fiels are renamed. Two new test targets were added in
ksp/ksp/examples/tests/makefile that test aijcusparse using
bicg (and thus the MatSolveTranspose and MatMultTranspose methods) as well as
bicg with reordering. The previous results for runex43_2.out (formerly
runex43-aijcusparse_2.out) were wrong and so I'm committing
new results that agree with CPU based computation. The code is
valgrind clean and cuda-memcheck clean.

show more ...


# e33c197d 11-Jun-2013 Richard Mills <rtm@eecs.utk.edu>

Merged petsc/petsc into rmills/petsc master.


# 4cd4e8e8 08-Jun-2013 Karl Rupp <rupp@iue.tuwien.ac.at>

Merge branch 'karlrupp/fix-aijcusparse-factortype'


# 404133a2 06-Jun-2013 Paul Mullowney <paulm@txcorp.com>

Corrected assignment of factortype for aijcusparse

I caught a small error in MatGetFactor_seqaij_cusparse. The factor type
for the new, factored matrix needs to be set before calling MatSetType.

Corrected assignment of factortype for aijcusparse

I caught a small error in MatGetFactor_seqaij_cusparse. The factor type
for the new, factored matrix needs to be set before calling MatSetType.
With the factored defined before MatSetType, the correct type is then
allocated for the spptr for the triangular factors in ILU and ICC preconditioners.
Before this fix, this was leading to some undefined behaviour that I saw in the
performance logging. Now, examples are valgrind clean and the performance
logging is consistent. The three tests in ksp subdirs pass.

show more ...


1...<<2122232425262728