Home
last modified time | relevance | path

Searched hist:"55 ae60f998a3874cb1e4ab7010140b9e0d103903" (Results 1 – 2 of 2) sorted by relevance

/libCEED/
H A DREADME.md55ae60f998a3874cb1e4ab7010140b9e0d103903 Thu Mar 14 18:14:24 UTC 2019 Yohann <yohann.dudouit@gmail.com> Simple Cuda backend using one thread per element (#195)

Thanks-to: Jeremy Thompson

* Take into account the compute capability of the GPU

* Add the cuda/reg backend and rename cuda to cuda/ref.

- cuda/reg uses a simple approach where each element is
processed by one thread. This approach is expected to be
efficient for 1D and 2D problems, but very ineficient
as soon as the kernels start to spill, which should arise
around Q1D=4 for 3D problems.

* Compilation takes into account the deviceId

* Make style

* Remove dead code in cuda qFunctions.

* Cuda-reg specialized Restriction.

* Split the Prolongation operator into Identity/not Identity.

* Remove "#pragma unroll" until further perf investigation.

* README update

* Add a description of cuda/reg.

* Add CompositeOperator msg to CUDA backends
H A DMakefile55ae60f998a3874cb1e4ab7010140b9e0d103903 Thu Mar 14 18:14:24 UTC 2019 Yohann <yohann.dudouit@gmail.com> Simple Cuda backend using one thread per element (#195)

Thanks-to: Jeremy Thompson

* Take into account the compute capability of the GPU

* Add the cuda/reg backend and rename cuda to cuda/ref.

- cuda/reg uses a simple approach where each element is
processed by one thread. This approach is expected to be
efficient for 1D and 2D problems, but very ineficient
as soon as the kernels start to spill, which should arise
around Q1D=4 for 3D problems.

* Compilation takes into account the deviceId

* Make style

* Remove dead code in cuda qFunctions.

* Cuda-reg specialized Restriction.

* Split the Prolongation operator into Identity/not Identity.

* Remove "#pragma unroll" until further perf investigation.

* README update

* Add a description of cuda/reg.

* Add CompositeOperator msg to CUDA backends