Home
last modified time | relevance | path

Searched hist:ac421f396b3221f4b3f065b5ec198b4b0e1ed7a7 (Results 1 – 7 of 7) sorted by relevance

/libCEED/backends/cuda-shared/
H A Dceed-cuda-shared.hac421f396b3221f4b3f065b5ec198b4b0e1ed7a7 Tue Sep 17 18:38:28 UTC 2019 Yohann <dudouit1@llnl.gov> Improved performance of cuda-gen backend (#341)

Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libParanumal and then ported to libCEED.

* Add colocated gradient in 3D.

* Treat the qFunction by slice in 3d to avoid using too many registers.

* Minor fix

* Minor fix.

* Minor fix

* Compute the colocated gradient slice by slice.

* Add synchthreads after initialization of the matrices.

* Remove code print.

* Add a critical #pragma unroll

* Fix typo on "collocated".

* Remove dead code.

* Use ColloGrad3d functions.

* Fix cuda-gen backend when collocated gradient is not available.

* make style

* make style

* Add some comments.

* Replace int by CeedInt.
H A Dceed-cuda-shared-basis.cac421f396b3221f4b3f065b5ec198b4b0e1ed7a7 Tue Sep 17 18:38:28 UTC 2019 Yohann <dudouit1@llnl.gov> Improved performance of cuda-gen backend (#341)

Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libParanumal and then ported to libCEED.

* Add colocated gradient in 3D.

* Treat the qFunction by slice in 3d to avoid using too many registers.

* Minor fix

* Minor fix.

* Minor fix

* Compute the colocated gradient slice by slice.

* Add synchthreads after initialization of the matrices.

* Remove code print.

* Add a critical #pragma unroll

* Fix typo on "collocated".

* Remove dead code.

* Use ColloGrad3d functions.

* Fix cuda-gen backend when collocated gradient is not available.

* make style

* make style

* Add some comments.

* Replace int by CeedInt.
/libCEED/backends/cuda-gen/
H A Dceed-cuda-gen.hac421f396b3221f4b3f065b5ec198b4b0e1ed7a7 Tue Sep 17 18:38:28 UTC 2019 Yohann <dudouit1@llnl.gov> Improved performance of cuda-gen backend (#341)

Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libParanumal and then ported to libCEED.

* Add colocated gradient in 3D.

* Treat the qFunction by slice in 3d to avoid using too many registers.

* Minor fix

* Minor fix.

* Minor fix

* Compute the colocated gradient slice by slice.

* Add synchthreads after initialization of the matrices.

* Remove code print.

* Add a critical #pragma unroll

* Fix typo on "collocated".

* Remove dead code.

* Use ColloGrad3d functions.

* Fix cuda-gen backend when collocated gradient is not available.

* make style

* make style

* Add some comments.

* Replace int by CeedInt.
H A Dceed-cuda-gen-operator.cac421f396b3221f4b3f065b5ec198b4b0e1ed7a7 Tue Sep 17 18:38:28 UTC 2019 Yohann <dudouit1@llnl.gov> Improved performance of cuda-gen backend (#341)

Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libParanumal and then ported to libCEED.

* Add colocated gradient in 3D.

* Treat the qFunction by slice in 3d to avoid using too many registers.

* Minor fix

* Minor fix.

* Minor fix

* Compute the colocated gradient slice by slice.

* Add synchthreads after initialization of the matrices.

* Remove code print.

* Add a critical #pragma unroll

* Fix typo on "collocated".

* Remove dead code.

* Use ColloGrad3d functions.

* Fix cuda-gen backend when collocated gradient is not available.

* make style

* make style

* Add some comments.

* Replace int by CeedInt.
H A Dceed-cuda-gen-operator-build.cppac421f396b3221f4b3f065b5ec198b4b0e1ed7a7 Tue Sep 17 18:38:28 UTC 2019 Yohann <dudouit1@llnl.gov> Improved performance of cuda-gen backend (#341)

Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libParanumal and then ported to libCEED.

* Add colocated gradient in 3D.

* Treat the qFunction by slice in 3d to avoid using too many registers.

* Minor fix

* Minor fix.

* Minor fix

* Compute the colocated gradient slice by slice.

* Add synchthreads after initialization of the matrices.

* Remove code print.

* Add a critical #pragma unroll

* Fix typo on "collocated".

* Remove dead code.

* Use ColloGrad3d functions.

* Fix cuda-gen backend when collocated gradient is not available.

* make style

* make style

* Add some comments.

* Replace int by CeedInt.
/libCEED/examples/mfem/
H A Dbp1.hppac421f396b3221f4b3f065b5ec198b4b0e1ed7a7 Tue Sep 17 18:38:28 UTC 2019 Yohann <dudouit1@llnl.gov> Improved performance of cuda-gen backend (#341)

Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libParanumal and then ported to libCEED.

* Add colocated gradient in 3D.

* Treat the qFunction by slice in 3d to avoid using too many registers.

* Minor fix

* Minor fix.

* Minor fix

* Compute the colocated gradient slice by slice.

* Add synchthreads after initialization of the matrices.

* Remove code print.

* Add a critical #pragma unroll

* Fix typo on "collocated".

* Remove dead code.

* Use ColloGrad3d functions.

* Fix cuda-gen backend when collocated gradient is not available.

* make style

* make style

* Add some comments.

* Replace int by CeedInt.
H A Dbp3.hppac421f396b3221f4b3f065b5ec198b4b0e1ed7a7 Tue Sep 17 18:38:28 UTC 2019 Yohann <dudouit1@llnl.gov> Improved performance of cuda-gen backend (#341)

Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libParanumal and then ported to libCEED.

* Add colocated gradient in 3D.

* Treat the qFunction by slice in 3d to avoid using too many registers.

* Minor fix

* Minor fix.

* Minor fix

* Compute the colocated gradient slice by slice.

* Add synchthreads after initialization of the matrices.

* Remove code print.

* Add a critical #pragma unroll

* Fix typo on "collocated".

* Remove dead code.

* Use ColloGrad3d functions.

* Fix cuda-gen backend when collocated gradient is not available.

* make style

* make style

* Add some comments.

* Replace int by CeedInt.