xref: /libCEED/doc/sphinx/source/releasenotes.md (revision bcb2dfae4c301ddfdddf58806f08f6e7d17f4ea5)
1*bcb2dfaeSJed Brown# Changes/Release Notes
2*bcb2dfaeSJed Brown
3*bcb2dfaeSJed BrownOn this page we provide a summary of the main API changes, new features and examples
4*bcb2dfaeSJed Brownfor each release of libCEED.
5*bcb2dfaeSJed Brown
6*bcb2dfaeSJed Brown(main)=
7*bcb2dfaeSJed Brown
8*bcb2dfaeSJed Brown## Current `main` branch
9*bcb2dfaeSJed Brown
10*bcb2dfaeSJed Brown### Maintainability
11*bcb2dfaeSJed Brown
12*bcb2dfaeSJed Brown- Refactored preconditioner support internally to facilitate future development and improve GPU completeness/test coverage.
13*bcb2dfaeSJed Brown
14*bcb2dfaeSJed Brown(v0-9)=
15*bcb2dfaeSJed Brown
16*bcb2dfaeSJed Brown## v0.9 (Jul 6, 2021)
17*bcb2dfaeSJed Brown
18*bcb2dfaeSJed Brown### Interface changes
19*bcb2dfaeSJed Brown
20*bcb2dfaeSJed Brown- Minor modification in error handling macro to silence pedantic warnings when compiling with Clang, but no functional impact.
21*bcb2dfaeSJed Brown
22*bcb2dfaeSJed Brown### New features
23*bcb2dfaeSJed Brown
24*bcb2dfaeSJed Brown- Add {c:func}`CeedVectorAXPY` and {c:func}`CeedVectorPointwiseMult` as a convenience for stand-alone testing and internal use.
25*bcb2dfaeSJed Brown- Add `CEED_QFUNCTION_HELPER` macro to properly annotate QFunction helper functions for code generation backends.
26*bcb2dfaeSJed Brown- Add `CeedPragmaOptimizeOff` macro for code that is sensitive to floating point errors from fast math optimizations.
27*bcb2dfaeSJed Brown- Rust support: split `libceed-sys` crate out of `libceed` and [publish both on crates.io](https://crates.io/crates/libceed).
28*bcb2dfaeSJed Brown
29*bcb2dfaeSJed Brown### Performance improvements
30*bcb2dfaeSJed Brown
31*bcb2dfaeSJed Brown### Examples
32*bcb2dfaeSJed Brown
33*bcb2dfaeSJed Brown- Solid mechanics mini-app updated to explore the performance impacts of various formulations in the initial and current configurations.
34*bcb2dfaeSJed Brown- Fluid mechanics example adds GPU support and improves modularity.
35*bcb2dfaeSJed Brown
36*bcb2dfaeSJed Brown### Deprecated backends
37*bcb2dfaeSJed Brown
38*bcb2dfaeSJed Brown- The `/cpu/self/tmpl` and `/cpu/self/tmpl/sub` backends have been removed. These backends were intially added to test the backend inheritance mechanism, but this mechanism is now widely used and tested in multiple backends.
39*bcb2dfaeSJed Brown
40*bcb2dfaeSJed Brown(v0-8)=
41*bcb2dfaeSJed Brown
42*bcb2dfaeSJed Brown## v0.8 (Mar 31, 2021)
43*bcb2dfaeSJed Brown
44*bcb2dfaeSJed Brown### Interface changes
45*bcb2dfaeSJed Brown
46*bcb2dfaeSJed Brown- Error handling improved to include enumerated error codes for C interface return values.
47*bcb2dfaeSJed Brown- Installed headers that will follow semantic versioning were moved to {code}`include/ceed` directory. These headers have been renamed from {code}`ceed-*.h` to {code}`ceed/*.h`. Placeholder headers with the old naming schema are currently provided, but these headers will be removed in the libCEED v0.9 release.
48*bcb2dfaeSJed Brown
49*bcb2dfaeSJed Brown### New features
50*bcb2dfaeSJed Brown
51*bcb2dfaeSJed Brown- Julia and Rust interfaces added, providing a nearly 1-1 correspondence with the C interface, plus some convenience features.
52*bcb2dfaeSJed Brown- Static libraries can be built with `make STATIC=1` and the pkg-config file is installed accordingly.
53*bcb2dfaeSJed Brown- Add {c:func}`CeedOperatorLinearAssembleSymbolic` and {c:func}`CeedOperatorLinearAssemble` to support full assembly of libCEED operators.
54*bcb2dfaeSJed Brown
55*bcb2dfaeSJed Brown### Performance improvements
56*bcb2dfaeSJed Brown
57*bcb2dfaeSJed Brown- New HIP MAGMA backends for hipMAGMA library users: `/gpu/hip/magma` and `/gpu/hip/magma/det`.
58*bcb2dfaeSJed Brown- New HIP backends for improved tensor basis performance: `/gpu/hip/shared` and `/gpu/hip/gen`.
59*bcb2dfaeSJed Brown
60*bcb2dfaeSJed Brown### Examples
61*bcb2dfaeSJed Brown
62*bcb2dfaeSJed Brown- {ref}`example-petsc-elasticity` example updated with traction boundary conditions and improved Dirichlet boundary conditions.
63*bcb2dfaeSJed Brown- {ref}`example-petsc-elasticity` example updated with Neo-Hookean hyperelasticity in current configuration as well as improved Neo-Hookean hyperelasticity exploring storage vs computation tradeoffs.
64*bcb2dfaeSJed Brown- {ref}`example-petsc-navier-stokes` example updated with isentropic traveling vortex test case, an analytical solution to the Euler equations that is useful for testing boundary conditions, discretization stability, and order of accuracy.
65*bcb2dfaeSJed Brown- {ref}`example-petsc-navier-stokes` example updated with support for performing convergence study and plotting order of convergence by polynomial degree.
66*bcb2dfaeSJed Brown
67*bcb2dfaeSJed Brown(v0-7)=
68*bcb2dfaeSJed Brown
69*bcb2dfaeSJed Brown## v0.7 (Sep 29, 2020)
70*bcb2dfaeSJed Brown
71*bcb2dfaeSJed Brown### Interface changes
72*bcb2dfaeSJed Brown
73*bcb2dfaeSJed Brown- Replace limited {code}`CeedInterlaceMode` with more flexible component stride {code}`compstride` in {code}`CeedElemRestriction` constructors.
74*bcb2dfaeSJed Brown  As a result, the {code}`indices` parameter has been replaced with {code}`offsets` and the {code}`nnodes` parameter has been replaced with {code}`lsize`.
75*bcb2dfaeSJed Brown  These changes improve support for mixed finite element methods.
76*bcb2dfaeSJed Brown- Replace various uses of {code}`Ceed*Get*Status` with {code}`Ceed*Is*` in the backend API to match common nomenclature.
77*bcb2dfaeSJed Brown- Replace {code}`CeedOperatorAssembleLinearDiagonal` with {c:func}`CeedOperatorLinearAssembleDiagonal` for clarity.
78*bcb2dfaeSJed Brown- Linear Operators can be assembled as point-block diagonal matrices with {c:func}`CeedOperatorLinearAssemblePointBlockDiagonal`, provided in row-major form in a {code}`ncomp` by {code}`ncomp` block per node.
79*bcb2dfaeSJed Brown- Diagonal assemble interface changed to accept a {ref}`CeedVector` instead of a pointer to a {ref}`CeedVector` to reduce memory movement when interfacing with calling code.
80*bcb2dfaeSJed Brown- Added {c:func}`CeedOperatorLinearAssembleAddDiagonal` and {c:func}`CeedOperatorLinearAssembleAddPointBlockDiagonal` for improved future integration with codes such as MFEM that compose the action of {ref}`CeedOperator`s external to libCEED.
81*bcb2dfaeSJed Brown- Added {c:func}`CeedVectorTakeAray` to sync and remove libCEED read/write access to an allocated array and pass ownership of the array to the caller.
82*bcb2dfaeSJed Brown  This function is recommended over {c:func}`CeedVectorSyncArray` when the {code}`CeedVector` has an array owned by the caller that was set by {c:func}`CeedVectorSetArray`.
83*bcb2dfaeSJed Brown- Added {code}`CeedQFunctionContext` object to manage user QFunction context data and reduce copies between device and host memory.
84*bcb2dfaeSJed Brown- Added {c:func}`CeedOperatorMultigridLevelCreate`, {c:func}`CeedOperatorMultigridLevelCreateTensorH1`, and {c:func}`CeedOperatorMultigridLevelCreateH1` to facilitate creation of multigrid prolongation, restriction, and coarse grid operators using a common quadrature space.
85*bcb2dfaeSJed Brown
86*bcb2dfaeSJed Brown### New features
87*bcb2dfaeSJed Brown
88*bcb2dfaeSJed Brown- New HIP backend: `/gpu/hip/ref`.
89*bcb2dfaeSJed Brown- CeedQFunction support for user `CUfunction`s in some backends
90*bcb2dfaeSJed Brown
91*bcb2dfaeSJed Brown### Performance improvements
92*bcb2dfaeSJed Brown
93*bcb2dfaeSJed Brown- OCCA backend rebuilt to facilitate future performance enhancements.
94*bcb2dfaeSJed Brown- Petsc BPs suite improved to reduce noise due to multiple calls to {code}`mpiexec`.
95*bcb2dfaeSJed Brown
96*bcb2dfaeSJed Brown### Examples
97*bcb2dfaeSJed Brown
98*bcb2dfaeSJed Brown- {ref}`example-petsc-elasticity` example updated with strain energy computation and more flexible boundary conditions.
99*bcb2dfaeSJed Brown
100*bcb2dfaeSJed Brown### Deprecated backends
101*bcb2dfaeSJed Brown
102*bcb2dfaeSJed Brown- The `/gpu/cuda/reg` backend has been removed, with its core features moved into `/gpu/cuda/ref` and `/gpu/cuda/shared`.
103*bcb2dfaeSJed Brown
104*bcb2dfaeSJed Brown(v0-6)=
105*bcb2dfaeSJed Brown
106*bcb2dfaeSJed Brown## v0.6 (Mar 29, 2020)
107*bcb2dfaeSJed Brown
108*bcb2dfaeSJed BrownlibCEED v0.6 contains numerous new features and examples, as well as expanded
109*bcb2dfaeSJed Browndocumentation in [this new website](https://libceed.readthedocs.io).
110*bcb2dfaeSJed Brown
111*bcb2dfaeSJed Brown### New features
112*bcb2dfaeSJed Brown
113*bcb2dfaeSJed Brown- New Python interface using [CFFI](https://cffi.readthedocs.io/) provides a nearly
114*bcb2dfaeSJed Brown  1-1 correspondence with the C interface, plus some convenience features.  For instance,
115*bcb2dfaeSJed Brown  data stored in the {cpp:type}`CeedVector` structure are available without copy as
116*bcb2dfaeSJed Brown  {py:class}`numpy.ndarray`.  Short tutorials are provided in
117*bcb2dfaeSJed Brown  [Binder](https://mybinder.org/v2/gh/CEED/libCEED/main?urlpath=lab/tree/examples/tutorials/).
118*bcb2dfaeSJed Brown- Linear QFunctions can be assembled as block-diagonal matrices (per quadrature point,
119*bcb2dfaeSJed Brown  {c:func}`CeedOperatorAssembleLinearQFunction`) or to evaluate the diagonal
120*bcb2dfaeSJed Brown  ({c:func}`CeedOperatorAssembleLinearDiagonal`).  These operations are useful for
121*bcb2dfaeSJed Brown  preconditioning ingredients and are used in the libCEED's multigrid examples.
122*bcb2dfaeSJed Brown- The inverse of separable operators can be obtained using
123*bcb2dfaeSJed Brown  {c:func}`CeedOperatorCreateFDMElementInverse` and applied with
124*bcb2dfaeSJed Brown  {c:func}`CeedOperatorApply`.  This is a useful preconditioning ingredient,
125*bcb2dfaeSJed Brown  especially for Laplacians and related operators.
126*bcb2dfaeSJed Brown- New functions: {c:func}`CeedVectorNorm`, {c:func}`CeedOperatorApplyAdd`,
127*bcb2dfaeSJed Brown  {c:func}`CeedQFunctionView`, {c:func}`CeedOperatorView`.
128*bcb2dfaeSJed Brown- Make public accessors for various attributes to facilitate writing composable code.
129*bcb2dfaeSJed Brown- New backend: `/cpu/self/memcheck/serial`.
130*bcb2dfaeSJed Brown- QFunctions using variable-length array (VLA) pointer constructs can be used with CUDA
131*bcb2dfaeSJed Brown  backends.  (Single source is coming soon for OCCA backends.)
132*bcb2dfaeSJed Brown- Fix some missing edge cases in CUDA backend.
133*bcb2dfaeSJed Brown
134*bcb2dfaeSJed Brown### Performance Improvements
135*bcb2dfaeSJed Brown
136*bcb2dfaeSJed Brown- MAGMA backend performance optimization and non-tensor bases.
137*bcb2dfaeSJed Brown- No-copy optimization in {c:func}`CeedOperatorApply`.
138*bcb2dfaeSJed Brown
139*bcb2dfaeSJed Brown### Interface changes
140*bcb2dfaeSJed Brown
141*bcb2dfaeSJed Brown- Replace {code}`CeedElemRestrictionCreateIdentity` and
142*bcb2dfaeSJed Brown  {code}`CeedElemRestrictionCreateBlocked` with more flexible
143*bcb2dfaeSJed Brown  {c:func}`CeedElemRestrictionCreateStrided` and
144*bcb2dfaeSJed Brown  {c:func}`CeedElemRestrictionCreateBlockedStrided`.
145*bcb2dfaeSJed Brown- Add arguments to {c:func}`CeedQFunctionCreateIdentity`.
146*bcb2dfaeSJed Brown- Replace ambiguous uses of {cpp:enum}`CeedTransposeMode` for L-vector identification
147*bcb2dfaeSJed Brown  with {cpp:enum}`CeedInterlaceMode`.  This is now an attribute of the
148*bcb2dfaeSJed Brown  {cpp:type}`CeedElemRestriction` (see {c:func}`CeedElemRestrictionCreate`) and no
149*bcb2dfaeSJed Brown  longer passed as `lmode` arguments to {c:func}`CeedOperatorSetField` and
150*bcb2dfaeSJed Brown  {c:func}`CeedElemRestrictionApply`.
151*bcb2dfaeSJed Brown
152*bcb2dfaeSJed Brown### Examples
153*bcb2dfaeSJed Brown
154*bcb2dfaeSJed BrownlibCEED-0.6 contains greatly expanded examples with {ref}`new documentation <Examples>`.
155*bcb2dfaeSJed BrownNotable additions include:
156*bcb2dfaeSJed Brown
157*bcb2dfaeSJed Brown- Standalone {ref}`ex2-surface` ({file}`examples/ceed/ex2-surface`): compute the area of
158*bcb2dfaeSJed Brown  a domain in 1, 2, and 3 dimensions by applying a Laplacian.
159*bcb2dfaeSJed Brown
160*bcb2dfaeSJed Brown- PETSc {ref}`example-petsc-area` ({file}`examples/petsc/area.c`): computes surface area
161*bcb2dfaeSJed Brown  of domains (like the cube and sphere) by direct integration on a surface mesh;
162*bcb2dfaeSJed Brown  demonstrates geometric dimension different from topological dimension.
163*bcb2dfaeSJed Brown
164*bcb2dfaeSJed Brown- PETSc {ref}`example-petsc-bps`:
165*bcb2dfaeSJed Brown
166*bcb2dfaeSJed Brown  - {file}`examples/petsc/bpsraw.c` (formerly `bps.c`): transparent CUDA support.
167*bcb2dfaeSJed Brown  - {file}`examples/petsc/bps.c` (formerly `bpsdmplex.c`): performance improvements
168*bcb2dfaeSJed Brown    and transparent CUDA support.
169*bcb2dfaeSJed Brown  - {ref}`example-petsc-bps-sphere` ({file}`examples/petsc/bpssphere.c`):
170*bcb2dfaeSJed Brown    generalizations of all CEED BPs to the surface of the sphere; demonstrates geometric
171*bcb2dfaeSJed Brown    dimension different from topological dimension.
172*bcb2dfaeSJed Brown
173*bcb2dfaeSJed Brown- {ref}`example-petsc-multigrid` ({file}`examples/petsc/multigrid.c`): new p-multigrid
174*bcb2dfaeSJed Brown  solver with algebraic multigrid coarse solve.
175*bcb2dfaeSJed Brown
176*bcb2dfaeSJed Brown- {ref}`example-petsc-navier-stokes` ({file}`examples/fluids/navierstokes.c`; formerly
177*bcb2dfaeSJed Brown  `examples/navier-stokes`): unstructured grid support (using PETSc's `DMPlex`),
178*bcb2dfaeSJed Brown  implicit time integration, SU/SUPG stabilization, free-slip boundary conditions, and
179*bcb2dfaeSJed Brown  quasi-2D computational domain support.
180*bcb2dfaeSJed Brown
181*bcb2dfaeSJed Brown- {ref}`example-petsc-elasticity` ({file}`examples/solids/elasticity.c`): new solver for
182*bcb2dfaeSJed Brown  linear elasticity, small-strain hyperelasticity, and globalized finite-strain
183*bcb2dfaeSJed Brown  hyperelasticity using p-multigrid with algebraic multigrid coarse solve.
184*bcb2dfaeSJed Brown
185*bcb2dfaeSJed Brown(v0-5)=
186*bcb2dfaeSJed Brown
187*bcb2dfaeSJed Brown## v0.5 (Sep 18, 2019)
188*bcb2dfaeSJed Brown
189*bcb2dfaeSJed BrownFor this release, several improvements were made. Two new CUDA backends were added to
190*bcb2dfaeSJed Brownthe family of backends, of which, the new `cuda-gen` backend achieves state-of-the-art
191*bcb2dfaeSJed Brownperformance using single-source {ref}`CeedQFunction`. From this release, users
192*bcb2dfaeSJed Browncan define Q-Functions in a single source code independently of the targeted backend
193*bcb2dfaeSJed Brownwith the aid of a new macro `CEED QFUNCTION` to support JIT (Just-In-Time) and CPU
194*bcb2dfaeSJed Browncompilation of the user provided {ref}`CeedQFunction` code. To allow a unified
195*bcb2dfaeSJed Browndeclaration, the {ref}`CeedQFunction` API has undergone a slight change:
196*bcb2dfaeSJed Brownthe `QFunctionField` parameter `ncomp` has been changed to `size`. This change
197*bcb2dfaeSJed Brownrequires setting the previous value of `ncomp` to `ncomp*dim` when adding a
198*bcb2dfaeSJed Brown`QFunctionField` with eval mode `CEED EVAL GRAD`.
199*bcb2dfaeSJed Brown
200*bcb2dfaeSJed BrownAdditionally, new CPU backends
201*bcb2dfaeSJed Brownwere included in this release, such as the `/cpu/self/opt/*` backends (which are
202*bcb2dfaeSJed Brownwritten in pure C and use partial **E-vectors** to improve performance) and the
203*bcb2dfaeSJed Brown`/cpu/self/ref/memcheck` backend (which relies upon the
204*bcb2dfaeSJed Brown[Valgrind](http://valgrind.org/) Memcheck tool to help verify that user
205*bcb2dfaeSJed Brown{ref}`CeedQFunction` have no undefined values).
206*bcb2dfaeSJed BrownThis release also included various performance improvements, bug fixes, new examples,
207*bcb2dfaeSJed Brownand improved tests. Among these improvements, vectorized instructions for
208*bcb2dfaeSJed Brown{ref}`CeedQFunction` code compiled for CPU were enhanced by using `CeedPragmaSIMD`
209*bcb2dfaeSJed Browninstead of `CeedPragmaOMP`, implementation of a {ref}`CeedQFunction` gallery and
210*bcb2dfaeSJed Brownidentity Q-Functions were introduced, and the PETSc benchmark problems were expanded
211*bcb2dfaeSJed Brownto include unstructured meshes handling were. For this expansion, the prior version of
212*bcb2dfaeSJed Brownthe PETSc BPs, which only included data associated with structured geometries, were
213*bcb2dfaeSJed Brownrenamed `bpsraw`, and the new version of the BPs, which can handle data associated
214*bcb2dfaeSJed Brownwith any unstructured geometry, were called `bps`. Additionally, other benchmark
215*bcb2dfaeSJed Brownproblems, namely BP2 and BP4 (the vector-valued versions of BP1 and BP3, respectively),
216*bcb2dfaeSJed Brownand BP5 and BP6 (the collocated versions---for which the quadrature points are the same
217*bcb2dfaeSJed Brownas the Gauss Lobatto nodes---of BP3 and BP4 respectively) were added to the PETSc
218*bcb2dfaeSJed Brownexamples. Furthermoew, another standalone libCEED example, called `ex2`, which
219*bcb2dfaeSJed Browncomputes the surface area of a given mesh was added to this release.
220*bcb2dfaeSJed Brown
221*bcb2dfaeSJed BrownBackends available in this release:
222*bcb2dfaeSJed Brown
223*bcb2dfaeSJed Brown```{eval-rst}
224*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
225*bcb2dfaeSJed Brown| CEED resource (``-ceed``)  | Backend                                             |
226*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
227*bcb2dfaeSJed Brown| ``/cpu/self/ref/serial``   | Serial reference implementation                     |
228*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
229*bcb2dfaeSJed Brown| ``/cpu/self/ref/blocked``  | Blocked reference implementation                    |
230*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
231*bcb2dfaeSJed Brown| ``/cpu/self/ref/memcheck`` | Memcheck backend, undefined value checks            |
232*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
233*bcb2dfaeSJed Brown| ``/cpu/self/opt/serial``   | Serial optimized C implementation                   |
234*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
235*bcb2dfaeSJed Brown| ``/cpu/self/opt/blocked``  | Blocked optimized C implementation                  |
236*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
237*bcb2dfaeSJed Brown| ``/cpu/self/avx/serial``   | Serial AVX implementation                           |
238*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
239*bcb2dfaeSJed Brown| ``/cpu/self/avx/blocked``  | Blocked AVX implementation                          |
240*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
241*bcb2dfaeSJed Brown| ``/cpu/self/xsmm/serial``  | Serial LIBXSMM implementation                       |
242*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
243*bcb2dfaeSJed Brown| ``/cpu/self/xsmm/blocked`` | Blocked LIBXSMM implementation                      |
244*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
245*bcb2dfaeSJed Brown| ``/cpu/occa``              | Serial OCCA kernels                                 |
246*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
247*bcb2dfaeSJed Brown| ``/gpu/occa``              | CUDA OCCA kernels                                   |
248*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
249*bcb2dfaeSJed Brown| ``/omp/occa``              | OpenMP OCCA kernels                                 |
250*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
251*bcb2dfaeSJed Brown| ``/ocl/occa``              | OpenCL OCCA kernels                                 |
252*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
253*bcb2dfaeSJed Brown| ``/gpu/cuda/ref``          | Reference pure CUDA kernels                         |
254*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
255*bcb2dfaeSJed Brown| ``/gpu/cuda/reg``          | Pure CUDA kernels using one thread per element      |
256*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
257*bcb2dfaeSJed Brown| ``/gpu/cuda/shared``       | Optimized pure CUDA kernels using shared memory     |
258*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
259*bcb2dfaeSJed Brown| ``/gpu/cuda/gen``          | Optimized pure CUDA kernels using code generation   |
260*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
261*bcb2dfaeSJed Brown| ``/gpu/magma``             | CUDA MAGMA kernels                                  |
262*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
263*bcb2dfaeSJed Brown```
264*bcb2dfaeSJed Brown
265*bcb2dfaeSJed BrownExamples available in this release:
266*bcb2dfaeSJed Brown
267*bcb2dfaeSJed Brown```{eval-rst}
268*bcb2dfaeSJed Brown+-------------------------+--------------------------------------------+
269*bcb2dfaeSJed Brown| User code               | Example                                    |
270*bcb2dfaeSJed Brown+-------------------------+--------------------------------------------+
271*bcb2dfaeSJed Brown|                         | - ex1 (volume)                             |
272*bcb2dfaeSJed Brown| ``ceed``                | - ex2 (surface)                            |
273*bcb2dfaeSJed Brown+-------------------------+--------------------------------------------+
274*bcb2dfaeSJed Brown|                         | - BP1 (scalar mass operator)               |
275*bcb2dfaeSJed Brown| ``mfem``                | - BP3 (scalar Laplace operator)            |
276*bcb2dfaeSJed Brown+-------------------------+--------------------------------------------+
277*bcb2dfaeSJed Brown|                         | - BP1 (scalar mass operator)               |
278*bcb2dfaeSJed Brown|                         | - BP2 (vector mass operator)               |
279*bcb2dfaeSJed Brown|                         | - BP3 (scalar Laplace operator)            |
280*bcb2dfaeSJed Brown| ``petsc``               | - BP4 (vector Laplace operator)            |
281*bcb2dfaeSJed Brown|                         | - BP5 (collocated scalar Laplace operator) |
282*bcb2dfaeSJed Brown|                         | - BP6 (collocated vector Laplace operator) |
283*bcb2dfaeSJed Brown|                         | - Navier-Stokes                            |
284*bcb2dfaeSJed Brown+-------------------------+--------------------------------------------+
285*bcb2dfaeSJed Brown|                         | - BP1 (scalar mass operator)               |
286*bcb2dfaeSJed Brown| ``nek5000``             | - BP3 (scalar Laplace operator)            |
287*bcb2dfaeSJed Brown+-------------------------+--------------------------------------------+
288*bcb2dfaeSJed Brown```
289*bcb2dfaeSJed Brown
290*bcb2dfaeSJed Brown(v0-4)=
291*bcb2dfaeSJed Brown
292*bcb2dfaeSJed Brown## v0.4 (Apr 1, 2019)
293*bcb2dfaeSJed Brown
294*bcb2dfaeSJed BrownlibCEED v0.4 was made again publicly available in the second full CEED software
295*bcb2dfaeSJed Browndistribution, release CEED 2.0. This release contained notable features, such as
296*bcb2dfaeSJed Brownfour new CPU backends, two new GPU backends, CPU backend optimizations, initial
297*bcb2dfaeSJed Brownsupport for operator composition, performance benchmarking, and a Navier-Stokes demo.
298*bcb2dfaeSJed BrownThe new CPU backends in this release came in two families. The `/cpu/self/*/serial`
299*bcb2dfaeSJed Brownbackends process one element at a time and are intended for meshes with a smaller number
300*bcb2dfaeSJed Brownof high order elements. The `/cpu/self/*/blocked` backends process blocked batches of
301*bcb2dfaeSJed Browneight interlaced elements and are intended for meshes with higher numbers of elements.
302*bcb2dfaeSJed BrownThe `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU
303*bcb2dfaeSJed Brownperformance. The `/cpu/self/xsmm/*` backends rely upon the
304*bcb2dfaeSJed Brown[LIBXSMM](http://github.com/hfp/libxsmm) package to provide vectorized CPU
305*bcb2dfaeSJed Brownperformance. The `/gpu/cuda/*` backends provide GPU performance strictly using CUDA.
306*bcb2dfaeSJed BrownThe `/gpu/cuda/ref` backend is a reference CUDA backend, providing reasonable
307*bcb2dfaeSJed Brownperformance for most problem configurations. The `/gpu/cuda/reg` backend uses a simple
308*bcb2dfaeSJed Brownparallelization approach, where each thread treats a finite element. Using just in time
309*bcb2dfaeSJed Browncompilation, provided by nvrtc (NVidia Runtime Compiler), and runtime parameters, this
310*bcb2dfaeSJed Brownbackend unroll loops and map memory address to registers. The `/gpu/cuda/reg` backend
311*bcb2dfaeSJed Brownachieve good peak performance for 1D, 2D, and low order 3D problems, but performance
312*bcb2dfaeSJed Browndeteriorates very quickly when threads run out of registers.
313*bcb2dfaeSJed Brown
314*bcb2dfaeSJed BrownA new explicit time-stepping Navier-Stokes solver was added to the family of libCEED
315*bcb2dfaeSJed Brownexamples in the `examples/petsc` directory (see {ref}`example-petsc-navier-stokes`).
316*bcb2dfaeSJed BrownThis example solves the time-dependent Navier-Stokes equations of compressible gas
317*bcb2dfaeSJed Browndynamics in a static Eulerian three-dimensional frame, using structured high-order
318*bcb2dfaeSJed Brownfinite/spectral element spatial discretizations and explicit high-order time-stepping
319*bcb2dfaeSJed Brown(available in PETSc). Moreover, the Navier-Stokes example was developed using PETSc,
320*bcb2dfaeSJed Brownso that the pointwise physics (defined at quadrature points) is separated from the
321*bcb2dfaeSJed Brownparallelization and meshing concerns.
322*bcb2dfaeSJed Brown
323*bcb2dfaeSJed BrownBackends available in this release:
324*bcb2dfaeSJed Brown
325*bcb2dfaeSJed Brown```{eval-rst}
326*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
327*bcb2dfaeSJed Brown| CEED resource (``-ceed``)  | Backend                                             |
328*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
329*bcb2dfaeSJed Brown| ``/cpu/self/ref/serial``   | Serial reference implementation                     |
330*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
331*bcb2dfaeSJed Brown| ``/cpu/self/ref/blocked``  | Blocked reference implementation                    |
332*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
333*bcb2dfaeSJed Brown| ``/cpu/self/tmpl``         | Backend template, defaults to ``/cpu/self/blocked`` |
334*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
335*bcb2dfaeSJed Brown| ``/cpu/self/avx/serial``   | Serial AVX implementation                           |
336*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
337*bcb2dfaeSJed Brown| ``/cpu/self/avx/blocked``  | Blocked AVX implementation                          |
338*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
339*bcb2dfaeSJed Brown| ``/cpu/self/xsmm/serial``  | Serial LIBXSMM implementation                       |
340*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
341*bcb2dfaeSJed Brown| ``/cpu/self/xsmm/blocked`` | Blocked LIBXSMM implementation                      |
342*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
343*bcb2dfaeSJed Brown| ``/cpu/occa``              | Serial OCCA kernels                                 |
344*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
345*bcb2dfaeSJed Brown| ``/gpu/occa``              | CUDA OCCA kernels                                   |
346*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
347*bcb2dfaeSJed Brown| ``/omp/occa``              | OpenMP OCCA kernels                                 |
348*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
349*bcb2dfaeSJed Brown| ``/ocl/occa``              | OpenCL OCCA kernels                                 |
350*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
351*bcb2dfaeSJed Brown| ``/gpu/cuda/ref``          | Reference pure CUDA kernels                         |
352*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
353*bcb2dfaeSJed Brown| ``/gpu/cuda/reg``          | Pure CUDA kernels using one thread per element      |
354*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
355*bcb2dfaeSJed Brown| ``/gpu/magma``             | CUDA MAGMA kernels                                  |
356*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+
357*bcb2dfaeSJed Brown```
358*bcb2dfaeSJed Brown
359*bcb2dfaeSJed BrownExamples available in this release:
360*bcb2dfaeSJed Brown
361*bcb2dfaeSJed Brown```{eval-rst}
362*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
363*bcb2dfaeSJed Brown| User code               | Example                         |
364*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
365*bcb2dfaeSJed Brown| ``ceed``                | ex1 (volume)                    |
366*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
367*bcb2dfaeSJed Brown|                         | - BP1 (scalar mass operator)    |
368*bcb2dfaeSJed Brown| ``mfem``                | - BP3 (scalar Laplace operator) |
369*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
370*bcb2dfaeSJed Brown|                         | - BP1 (scalar mass operator)    |
371*bcb2dfaeSJed Brown| ``petsc``               | - BP3 (scalar Laplace operator) |
372*bcb2dfaeSJed Brown|                         | - Navier-Stokes                 |
373*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
374*bcb2dfaeSJed Brown|                         | - BP1 (scalar mass operator)    |
375*bcb2dfaeSJed Brown| ``nek5000``             | - BP3 (scalar Laplace operator) |
376*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
377*bcb2dfaeSJed Brown```
378*bcb2dfaeSJed Brown
379*bcb2dfaeSJed Brown(v0-3)=
380*bcb2dfaeSJed Brown
381*bcb2dfaeSJed Brown## v0.3 (Sep 30, 2018)
382*bcb2dfaeSJed Brown
383*bcb2dfaeSJed BrownNotable features in this release include active/passive field interface, support for
384*bcb2dfaeSJed Brownnon-tensor bases, backend optimization, and improved Fortran interface. This release
385*bcb2dfaeSJed Brownalso focused on providing improved continuous integration, and many new tests with code
386*bcb2dfaeSJed Browncoverage reports of about 90%. This release also provided a significant change to the
387*bcb2dfaeSJed Brownpublic interface: a {ref}`CeedQFunction` can take any number of named input and output
388*bcb2dfaeSJed Brownarguments while {ref}`CeedOperator` connects them to the actual data, which may be
389*bcb2dfaeSJed Brownsupplied explicitly to `CeedOperatorApply()` (active) or separately via
390*bcb2dfaeSJed Brown`CeedOperatorSetField()` (passive). This interface change enables reusable libraries
391*bcb2dfaeSJed Brownof CeedQFunctions and composition of block solvers constructed using
392*bcb2dfaeSJed Brown{ref}`CeedOperator`. A concept of blocked restriction was added to this release and
393*bcb2dfaeSJed Brownused in an optimized CPU backend. Although this is typically not visible to the user,
394*bcb2dfaeSJed Brownit enables effective use of arbitrary-length SIMD while maintaining cache locality.
395*bcb2dfaeSJed BrownThis CPU backend also implements an algebraic factorization of tensor product gradients
396*bcb2dfaeSJed Brownto perform fewer operations than standard application of interpolation and
397*bcb2dfaeSJed Browndifferentiation from nodes to quadrature points. This algebraic formulation
398*bcb2dfaeSJed Brownautomatically supports non-polynomial and non-interpolatory bases, thus is more general
399*bcb2dfaeSJed Brownthan the more common derivation in terms of Lagrange polynomials on the quadrature points.
400*bcb2dfaeSJed Brown
401*bcb2dfaeSJed BrownBackends available in this release:
402*bcb2dfaeSJed Brown
403*bcb2dfaeSJed Brown```{eval-rst}
404*bcb2dfaeSJed Brown+---------------------------+-----------------------------------------------------+
405*bcb2dfaeSJed Brown| CEED resource (``-ceed``) | Backend                                             |
406*bcb2dfaeSJed Brown+---------------------------+-----------------------------------------------------+
407*bcb2dfaeSJed Brown| ``/cpu/self/blocked``     | Blocked reference implementation                    |
408*bcb2dfaeSJed Brown+---------------------------+-----------------------------------------------------+
409*bcb2dfaeSJed Brown| ``/cpu/self/ref``         | Serial reference implementation                     |
410*bcb2dfaeSJed Brown+---------------------------+-----------------------------------------------------+
411*bcb2dfaeSJed Brown| ``/cpu/self/tmpl``        | Backend template, defaults to ``/cpu/self/blocked`` |
412*bcb2dfaeSJed Brown+---------------------------+-----------------------------------------------------+
413*bcb2dfaeSJed Brown| ``/cpu/occa``             | Serial OCCA kernels                                 |
414*bcb2dfaeSJed Brown+---------------------------+-----------------------------------------------------+
415*bcb2dfaeSJed Brown| ``/gpu/occa``             | CUDA OCCA kernels                                   |
416*bcb2dfaeSJed Brown+---------------------------+-----------------------------------------------------+
417*bcb2dfaeSJed Brown| ``/omp/occa``             | OpenMP OCCA kernels                                 |
418*bcb2dfaeSJed Brown+---------------------------+-----------------------------------------------------+
419*bcb2dfaeSJed Brown| ``/ocl/occa``             | OpenCL OCCA kernels                                 |
420*bcb2dfaeSJed Brown+---------------------------+-----------------------------------------------------+
421*bcb2dfaeSJed Brown| ``/gpu/magma``            | CUDA MAGMA kernels                                  |
422*bcb2dfaeSJed Brown+---------------------------+-----------------------------------------------------+
423*bcb2dfaeSJed Brown```
424*bcb2dfaeSJed Brown
425*bcb2dfaeSJed BrownExamples available in this release:
426*bcb2dfaeSJed Brown
427*bcb2dfaeSJed Brown```{eval-rst}
428*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
429*bcb2dfaeSJed Brown| User code               | Example                         |
430*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
431*bcb2dfaeSJed Brown| ``ceed``                | ex1 (volume)                    |
432*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
433*bcb2dfaeSJed Brown|                         | - BP1 (scalar mass operator)    |
434*bcb2dfaeSJed Brown| ``mfem``                | - BP3 (scalar Laplace operator) |
435*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
436*bcb2dfaeSJed Brown|                         | - BP1 (scalar mass operator)    |
437*bcb2dfaeSJed Brown| ``petsc``               | - BP3 (scalar Laplace operator) |
438*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
439*bcb2dfaeSJed Brown|                         | - BP1 (scalar mass operator)    |
440*bcb2dfaeSJed Brown| ``nek5000``             | - BP3 (scalar Laplace operator) |
441*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
442*bcb2dfaeSJed Brown```
443*bcb2dfaeSJed Brown
444*bcb2dfaeSJed Brown(v0-21)=
445*bcb2dfaeSJed Brown
446*bcb2dfaeSJed Brown## v0.21 (Sep 30, 2018)
447*bcb2dfaeSJed Brown
448*bcb2dfaeSJed BrownA MAGMA backend (which relies upon the
449*bcb2dfaeSJed Brown[MAGMA](https://bitbucket.org/icl/magma) package) was integrated in libCEED for this
450*bcb2dfaeSJed Brownrelease. This initial integration set up the framework of using MAGMA and provided the
451*bcb2dfaeSJed BrownlibCEED functionality through MAGMA kernels as one of libCEED’s computational backends.
452*bcb2dfaeSJed BrownAs any other backend, the MAGMA backend provides extended basic data structures for
453*bcb2dfaeSJed Brown{ref}`CeedVector`, {ref}`CeedElemRestriction`, and {ref}`CeedOperator`, and implements
454*bcb2dfaeSJed Brownthe fundamental CEED building blocks to work with the new data structures.
455*bcb2dfaeSJed BrownIn general, the MAGMA-specific data structures keep the libCEED pointers to CPU data
456*bcb2dfaeSJed Brownbut also add corresponding device (e.g., GPU) pointers to the data. Coherency is handled
457*bcb2dfaeSJed Browninternally, and thus seamlessly to the user, through the functions/methods that are
458*bcb2dfaeSJed Brownprovided to support them.
459*bcb2dfaeSJed Brown
460*bcb2dfaeSJed BrownBackends available in this release:
461*bcb2dfaeSJed Brown
462*bcb2dfaeSJed Brown```{eval-rst}
463*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
464*bcb2dfaeSJed Brown| CEED resource (``-ceed``) | Backend                         |
465*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
466*bcb2dfaeSJed Brown| ``/cpu/self``             | Serial reference implementation |
467*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
468*bcb2dfaeSJed Brown| ``/cpu/occa``             | Serial OCCA kernels             |
469*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
470*bcb2dfaeSJed Brown| ``/gpu/occa``             | CUDA OCCA kernels               |
471*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
472*bcb2dfaeSJed Brown| ``/omp/occa``             | OpenMP OCCA kernels             |
473*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
474*bcb2dfaeSJed Brown| ``/ocl/occa``             | OpenCL OCCA kernels             |
475*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
476*bcb2dfaeSJed Brown| ``/gpu/magma``            | CUDA MAGMA kernels              |
477*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
478*bcb2dfaeSJed Brown```
479*bcb2dfaeSJed Brown
480*bcb2dfaeSJed BrownExamples available in this release:
481*bcb2dfaeSJed Brown
482*bcb2dfaeSJed Brown```{eval-rst}
483*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
484*bcb2dfaeSJed Brown| User code               | Example                         |
485*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
486*bcb2dfaeSJed Brown| ``ceed``                | ex1 (volume)                    |
487*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
488*bcb2dfaeSJed Brown|                         | - BP1 (scalar mass operator)    |
489*bcb2dfaeSJed Brown| ``mfem``                | - BP3 (scalar Laplace operator) |
490*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
491*bcb2dfaeSJed Brown| ``petsc``               | BP1 (scalar mass operator)      |
492*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
493*bcb2dfaeSJed Brown| ``nek5000``             | BP1 (scalar mass operator)      |
494*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
495*bcb2dfaeSJed Brown```
496*bcb2dfaeSJed Brown
497*bcb2dfaeSJed Brown(v0-2)=
498*bcb2dfaeSJed Brown
499*bcb2dfaeSJed Brown## v0.2 (Mar 30, 2018)
500*bcb2dfaeSJed Brown
501*bcb2dfaeSJed BrownlibCEED was made publicly available the first full CEED software distribution, release
502*bcb2dfaeSJed BrownCEED 1.0. The distribution was made available using the Spack package manager to provide
503*bcb2dfaeSJed Browna common, easy-to-use build environment, where the user can build the CEED distribution
504*bcb2dfaeSJed Brownwith all dependencies. This release included a new Fortran interface for the library.
505*bcb2dfaeSJed BrownThis release also contained major improvements in the OCCA backend (including a new
506*bcb2dfaeSJed Brown`/ocl/occa` backend) and new examples. The standalone libCEED example was modified to
507*bcb2dfaeSJed Browncompute the volume volume of a given mesh (in 1D, 2D, or 3D) and placed in an
508*bcb2dfaeSJed Brown`examples/ceed` subfolder. A new `mfem` example to perform BP3 (with the application
509*bcb2dfaeSJed Brownof the Laplace operator) was also added to this release.
510*bcb2dfaeSJed Brown
511*bcb2dfaeSJed BrownBackends available in this release:
512*bcb2dfaeSJed Brown
513*bcb2dfaeSJed Brown```{eval-rst}
514*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
515*bcb2dfaeSJed Brown| CEED resource (``-ceed``) | Backend                         |
516*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
517*bcb2dfaeSJed Brown| ``/cpu/self``             | Serial reference implementation |
518*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
519*bcb2dfaeSJed Brown| ``/cpu/occa``             | Serial OCCA kernels             |
520*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
521*bcb2dfaeSJed Brown| ``/gpu/occa``             | CUDA OCCA kernels               |
522*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
523*bcb2dfaeSJed Brown| ``/omp/occa``             | OpenMP OCCA kernels             |
524*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
525*bcb2dfaeSJed Brown| ``/ocl/occa``             | OpenCL OCCA kernels             |
526*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
527*bcb2dfaeSJed Brown```
528*bcb2dfaeSJed Brown
529*bcb2dfaeSJed BrownExamples available in this release:
530*bcb2dfaeSJed Brown
531*bcb2dfaeSJed Brown```{eval-rst}
532*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
533*bcb2dfaeSJed Brown| User code               | Example                         |
534*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
535*bcb2dfaeSJed Brown| ``ceed``                | ex1 (volume)                    |
536*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
537*bcb2dfaeSJed Brown|                         | - BP1 (scalar mass operator)    |
538*bcb2dfaeSJed Brown| ``mfem``                | - BP3 (scalar Laplace operator) |
539*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
540*bcb2dfaeSJed Brown| ``petsc``               | BP1 (scalar mass operator)      |
541*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
542*bcb2dfaeSJed Brown| ``nek5000``             | BP1 (scalar mass operator)      |
543*bcb2dfaeSJed Brown+-------------------------+---------------------------------+
544*bcb2dfaeSJed Brown```
545*bcb2dfaeSJed Brown
546*bcb2dfaeSJed Brown(v0-1)=
547*bcb2dfaeSJed Brown
548*bcb2dfaeSJed Brown## v0.1 (Jan 3, 2018)
549*bcb2dfaeSJed Brown
550*bcb2dfaeSJed BrownInitial low-level API of the CEED project. The low-level API provides a set of Finite
551*bcb2dfaeSJed BrownElements kernels and components for writing new low-level kernels. Examples include:
552*bcb2dfaeSJed Brownvector and sparse linear algebra, element matrix assembly over a batch of elements,
553*bcb2dfaeSJed Brownpartial assembly and action for efficient high-order operators like mass, diffusion,
554*bcb2dfaeSJed Brownadvection, etc. The main goal of the low-level API is to establish the basis for the
555*bcb2dfaeSJed Brownhigh-level API. Also, identifying such low-level kernels and providing a reference
556*bcb2dfaeSJed Brownimplementation for them serves as the basis for specialized backend implementations.
557*bcb2dfaeSJed BrownThis release contained several backends: `/cpu/self`, and backends which rely upon the
558*bcb2dfaeSJed Brown[OCCA](http://github.com/libocca/occa) package, such as `/cpu/occa`,
559*bcb2dfaeSJed Brown`/gpu/occa`, and `/omp/occa`.
560*bcb2dfaeSJed BrownIt also included several examples, in the `examples` folder:
561*bcb2dfaeSJed BrownA standalone code that shows the usage of libCEED (with no external
562*bcb2dfaeSJed Browndependencies) to apply the Laplace operator, `ex1`; an `mfem` example to perform BP1
563*bcb2dfaeSJed Brown(with the application of the mass operator); and a `petsc` example to perform BP1
564*bcb2dfaeSJed Brown(with the application of the mass operator).
565*bcb2dfaeSJed Brown
566*bcb2dfaeSJed BrownBackends available in this release:
567*bcb2dfaeSJed Brown
568*bcb2dfaeSJed Brown```{eval-rst}
569*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
570*bcb2dfaeSJed Brown| CEED resource (``-ceed``) | Backend                         |
571*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
572*bcb2dfaeSJed Brown| ``/cpu/self``             | Serial reference implementation |
573*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
574*bcb2dfaeSJed Brown| ``/cpu/occa``             | Serial OCCA kernels             |
575*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
576*bcb2dfaeSJed Brown| ``/gpu/occa``             | CUDA OCCA kernels               |
577*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
578*bcb2dfaeSJed Brown| ``/omp/occa``             | OpenMP OCCA kernels             |
579*bcb2dfaeSJed Brown+---------------------------+---------------------------------+
580*bcb2dfaeSJed Brown```
581*bcb2dfaeSJed Brown
582*bcb2dfaeSJed BrownExamples available in this release:
583*bcb2dfaeSJed Brown
584*bcb2dfaeSJed Brown```{eval-rst}
585*bcb2dfaeSJed Brown+-------------------------+-----------------------------------+
586*bcb2dfaeSJed Brown| User code               | Example                           |
587*bcb2dfaeSJed Brown+-------------------------+-----------------------------------+
588*bcb2dfaeSJed Brown| ``ceed``                | ex1 (scalar Laplace operator)     |
589*bcb2dfaeSJed Brown+-------------------------+-----------------------------------+
590*bcb2dfaeSJed Brown| ``mfem``                | BP1 (scalar mass operator)        |
591*bcb2dfaeSJed Brown+-------------------------+-----------------------------------+
592*bcb2dfaeSJed Brown| ``petsc``               | BP1 (scalar mass operator)        |
593*bcb2dfaeSJed Brown+-------------------------+-----------------------------------+
594*bcb2dfaeSJed Brown```
595