Searched hist:ac421f396b3221f4b3f065b5ec198b4b0e1ed7a7 (Results 1 – 7 of 7) sorted by relevance
| /libCEED/backends/cuda-shared/ |
| H A D | ceed-cuda-shared.h | ac421f396b3221f4b3f065b5ec198b4b0e1ed7a7 Tue Sep 17 18:38:28 UTC 2019 Yohann <dudouit1@llnl.gov> Improved performance of cuda-gen backend (#341)
Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libParanumal and then ported to libCEED.
* Add colocated gradient in 3D.
* Treat the qFunction by slice in 3d to avoid using too many registers.
* Minor fix
* Minor fix.
* Minor fix
* Compute the colocated gradient slice by slice.
* Add synchthreads after initialization of the matrices.
* Remove code print.
* Add a critical #pragma unroll
* Fix typo on "collocated".
* Remove dead code.
* Use ColloGrad3d functions.
* Fix cuda-gen backend when collocated gradient is not available.
* make style
* make style
* Add some comments.
* Replace int by CeedInt.
|
| H A D | ceed-cuda-shared-basis.c | ac421f396b3221f4b3f065b5ec198b4b0e1ed7a7 Tue Sep 17 18:38:28 UTC 2019 Yohann <dudouit1@llnl.gov> Improved performance of cuda-gen backend (#341)
Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libParanumal and then ported to libCEED.
* Add colocated gradient in 3D.
* Treat the qFunction by slice in 3d to avoid using too many registers.
* Minor fix
* Minor fix.
* Minor fix
* Compute the colocated gradient slice by slice.
* Add synchthreads after initialization of the matrices.
* Remove code print.
* Add a critical #pragma unroll
* Fix typo on "collocated".
* Remove dead code.
* Use ColloGrad3d functions.
* Fix cuda-gen backend when collocated gradient is not available.
* make style
* make style
* Add some comments.
* Replace int by CeedInt.
|
| /libCEED/backends/cuda-gen/ |
| H A D | ceed-cuda-gen.h | ac421f396b3221f4b3f065b5ec198b4b0e1ed7a7 Tue Sep 17 18:38:28 UTC 2019 Yohann <dudouit1@llnl.gov> Improved performance of cuda-gen backend (#341)
Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libParanumal and then ported to libCEED.
* Add colocated gradient in 3D.
* Treat the qFunction by slice in 3d to avoid using too many registers.
* Minor fix
* Minor fix.
* Minor fix
* Compute the colocated gradient slice by slice.
* Add synchthreads after initialization of the matrices.
* Remove code print.
* Add a critical #pragma unroll
* Fix typo on "collocated".
* Remove dead code.
* Use ColloGrad3d functions.
* Fix cuda-gen backend when collocated gradient is not available.
* make style
* make style
* Add some comments.
* Replace int by CeedInt.
|
| H A D | ceed-cuda-gen-operator.c | ac421f396b3221f4b3f065b5ec198b4b0e1ed7a7 Tue Sep 17 18:38:28 UTC 2019 Yohann <dudouit1@llnl.gov> Improved performance of cuda-gen backend (#341)
Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libParanumal and then ported to libCEED.
* Add colocated gradient in 3D.
* Treat the qFunction by slice in 3d to avoid using too many registers.
* Minor fix
* Minor fix.
* Minor fix
* Compute the colocated gradient slice by slice.
* Add synchthreads after initialization of the matrices.
* Remove code print.
* Add a critical #pragma unroll
* Fix typo on "collocated".
* Remove dead code.
* Use ColloGrad3d functions.
* Fix cuda-gen backend when collocated gradient is not available.
* make style
* make style
* Add some comments.
* Replace int by CeedInt.
|
| H A D | ceed-cuda-gen-operator-build.cpp | ac421f396b3221f4b3f065b5ec198b4b0e1ed7a7 Tue Sep 17 18:38:28 UTC 2019 Yohann <dudouit1@llnl.gov> Improved performance of cuda-gen backend (#341)
Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libParanumal and then ported to libCEED.
* Add colocated gradient in 3D.
* Treat the qFunction by slice in 3d to avoid using too many registers.
* Minor fix
* Minor fix.
* Minor fix
* Compute the colocated gradient slice by slice.
* Add synchthreads after initialization of the matrices.
* Remove code print.
* Add a critical #pragma unroll
* Fix typo on "collocated".
* Remove dead code.
* Use ColloGrad3d functions.
* Fix cuda-gen backend when collocated gradient is not available.
* make style
* make style
* Add some comments.
* Replace int by CeedInt.
|
| /libCEED/examples/mfem/ |
| H A D | bp1.hpp | ac421f396b3221f4b3f065b5ec198b4b0e1ed7a7 Tue Sep 17 18:38:28 UTC 2019 Yohann <dudouit1@llnl.gov> Improved performance of cuda-gen backend (#341)
Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libParanumal and then ported to libCEED.
* Add colocated gradient in 3D.
* Treat the qFunction by slice in 3d to avoid using too many registers.
* Minor fix
* Minor fix.
* Minor fix
* Compute the colocated gradient slice by slice.
* Add synchthreads after initialization of the matrices.
* Remove code print.
* Add a critical #pragma unroll
* Fix typo on "collocated".
* Remove dead code.
* Use ColloGrad3d functions.
* Fix cuda-gen backend when collocated gradient is not available.
* make style
* make style
* Add some comments.
* Replace int by CeedInt.
|
| H A D | bp3.hpp | ac421f396b3221f4b3f065b5ec198b4b0e1ed7a7 Tue Sep 17 18:38:28 UTC 2019 Yohann <dudouit1@llnl.gov> Improved performance of cuda-gen backend (#341)
Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libParanumal and then ported to libCEED.
* Add colocated gradient in 3D.
* Treat the qFunction by slice in 3d to avoid using too many registers.
* Minor fix
* Minor fix.
* Minor fix
* Compute the colocated gradient slice by slice.
* Add synchthreads after initialization of the matrices.
* Remove code print.
* Add a critical #pragma unroll
* Fix typo on "collocated".
* Remove dead code.
* Use ColloGrad3d functions.
* Fix cuda-gen backend when collocated gradient is not available.
* make style
* make style
* Add some comments.
* Replace int by CeedInt.
|