1*bcb2dfaeSJed Brown# Changes/Release Notes 2*bcb2dfaeSJed Brown 3*bcb2dfaeSJed BrownOn this page we provide a summary of the main API changes, new features and examples 4*bcb2dfaeSJed Brownfor each release of libCEED. 5*bcb2dfaeSJed Brown 6*bcb2dfaeSJed Brown(main)= 7*bcb2dfaeSJed Brown 8*bcb2dfaeSJed Brown## Current `main` branch 9*bcb2dfaeSJed Brown 10*bcb2dfaeSJed Brown### Maintainability 11*bcb2dfaeSJed Brown 12*bcb2dfaeSJed Brown- Refactored preconditioner support internally to facilitate future development and improve GPU completeness/test coverage. 13*bcb2dfaeSJed Brown 14*bcb2dfaeSJed Brown(v0-9)= 15*bcb2dfaeSJed Brown 16*bcb2dfaeSJed Brown## v0.9 (Jul 6, 2021) 17*bcb2dfaeSJed Brown 18*bcb2dfaeSJed Brown### Interface changes 19*bcb2dfaeSJed Brown 20*bcb2dfaeSJed Brown- Minor modification in error handling macro to silence pedantic warnings when compiling with Clang, but no functional impact. 21*bcb2dfaeSJed Brown 22*bcb2dfaeSJed Brown### New features 23*bcb2dfaeSJed Brown 24*bcb2dfaeSJed Brown- Add {c:func}`CeedVectorAXPY` and {c:func}`CeedVectorPointwiseMult` as a convenience for stand-alone testing and internal use. 25*bcb2dfaeSJed Brown- Add `CEED_QFUNCTION_HELPER` macro to properly annotate QFunction helper functions for code generation backends. 26*bcb2dfaeSJed Brown- Add `CeedPragmaOptimizeOff` macro for code that is sensitive to floating point errors from fast math optimizations. 27*bcb2dfaeSJed Brown- Rust support: split `libceed-sys` crate out of `libceed` and [publish both on crates.io](https://crates.io/crates/libceed). 28*bcb2dfaeSJed Brown 29*bcb2dfaeSJed Brown### Performance improvements 30*bcb2dfaeSJed Brown 31*bcb2dfaeSJed Brown### Examples 32*bcb2dfaeSJed Brown 33*bcb2dfaeSJed Brown- Solid mechanics mini-app updated to explore the performance impacts of various formulations in the initial and current configurations. 34*bcb2dfaeSJed Brown- Fluid mechanics example adds GPU support and improves modularity. 35*bcb2dfaeSJed Brown 36*bcb2dfaeSJed Brown### Deprecated backends 37*bcb2dfaeSJed Brown 38*bcb2dfaeSJed Brown- The `/cpu/self/tmpl` and `/cpu/self/tmpl/sub` backends have been removed. These backends were intially added to test the backend inheritance mechanism, but this mechanism is now widely used and tested in multiple backends. 39*bcb2dfaeSJed Brown 40*bcb2dfaeSJed Brown(v0-8)= 41*bcb2dfaeSJed Brown 42*bcb2dfaeSJed Brown## v0.8 (Mar 31, 2021) 43*bcb2dfaeSJed Brown 44*bcb2dfaeSJed Brown### Interface changes 45*bcb2dfaeSJed Brown 46*bcb2dfaeSJed Brown- Error handling improved to include enumerated error codes for C interface return values. 47*bcb2dfaeSJed Brown- Installed headers that will follow semantic versioning were moved to {code}`include/ceed` directory. These headers have been renamed from {code}`ceed-*.h` to {code}`ceed/*.h`. Placeholder headers with the old naming schema are currently provided, but these headers will be removed in the libCEED v0.9 release. 48*bcb2dfaeSJed Brown 49*bcb2dfaeSJed Brown### New features 50*bcb2dfaeSJed Brown 51*bcb2dfaeSJed Brown- Julia and Rust interfaces added, providing a nearly 1-1 correspondence with the C interface, plus some convenience features. 52*bcb2dfaeSJed Brown- Static libraries can be built with `make STATIC=1` and the pkg-config file is installed accordingly. 53*bcb2dfaeSJed Brown- Add {c:func}`CeedOperatorLinearAssembleSymbolic` and {c:func}`CeedOperatorLinearAssemble` to support full assembly of libCEED operators. 54*bcb2dfaeSJed Brown 55*bcb2dfaeSJed Brown### Performance improvements 56*bcb2dfaeSJed Brown 57*bcb2dfaeSJed Brown- New HIP MAGMA backends for hipMAGMA library users: `/gpu/hip/magma` and `/gpu/hip/magma/det`. 58*bcb2dfaeSJed Brown- New HIP backends for improved tensor basis performance: `/gpu/hip/shared` and `/gpu/hip/gen`. 59*bcb2dfaeSJed Brown 60*bcb2dfaeSJed Brown### Examples 61*bcb2dfaeSJed Brown 62*bcb2dfaeSJed Brown- {ref}`example-petsc-elasticity` example updated with traction boundary conditions and improved Dirichlet boundary conditions. 63*bcb2dfaeSJed Brown- {ref}`example-petsc-elasticity` example updated with Neo-Hookean hyperelasticity in current configuration as well as improved Neo-Hookean hyperelasticity exploring storage vs computation tradeoffs. 64*bcb2dfaeSJed Brown- {ref}`example-petsc-navier-stokes` example updated with isentropic traveling vortex test case, an analytical solution to the Euler equations that is useful for testing boundary conditions, discretization stability, and order of accuracy. 65*bcb2dfaeSJed Brown- {ref}`example-petsc-navier-stokes` example updated with support for performing convergence study and plotting order of convergence by polynomial degree. 66*bcb2dfaeSJed Brown 67*bcb2dfaeSJed Brown(v0-7)= 68*bcb2dfaeSJed Brown 69*bcb2dfaeSJed Brown## v0.7 (Sep 29, 2020) 70*bcb2dfaeSJed Brown 71*bcb2dfaeSJed Brown### Interface changes 72*bcb2dfaeSJed Brown 73*bcb2dfaeSJed Brown- Replace limited {code}`CeedInterlaceMode` with more flexible component stride {code}`compstride` in {code}`CeedElemRestriction` constructors. 74*bcb2dfaeSJed Brown As a result, the {code}`indices` parameter has been replaced with {code}`offsets` and the {code}`nnodes` parameter has been replaced with {code}`lsize`. 75*bcb2dfaeSJed Brown These changes improve support for mixed finite element methods. 76*bcb2dfaeSJed Brown- Replace various uses of {code}`Ceed*Get*Status` with {code}`Ceed*Is*` in the backend API to match common nomenclature. 77*bcb2dfaeSJed Brown- Replace {code}`CeedOperatorAssembleLinearDiagonal` with {c:func}`CeedOperatorLinearAssembleDiagonal` for clarity. 78*bcb2dfaeSJed Brown- Linear Operators can be assembled as point-block diagonal matrices with {c:func}`CeedOperatorLinearAssemblePointBlockDiagonal`, provided in row-major form in a {code}`ncomp` by {code}`ncomp` block per node. 79*bcb2dfaeSJed Brown- Diagonal assemble interface changed to accept a {ref}`CeedVector` instead of a pointer to a {ref}`CeedVector` to reduce memory movement when interfacing with calling code. 80*bcb2dfaeSJed Brown- Added {c:func}`CeedOperatorLinearAssembleAddDiagonal` and {c:func}`CeedOperatorLinearAssembleAddPointBlockDiagonal` for improved future integration with codes such as MFEM that compose the action of {ref}`CeedOperator`s external to libCEED. 81*bcb2dfaeSJed Brown- Added {c:func}`CeedVectorTakeAray` to sync and remove libCEED read/write access to an allocated array and pass ownership of the array to the caller. 82*bcb2dfaeSJed Brown This function is recommended over {c:func}`CeedVectorSyncArray` when the {code}`CeedVector` has an array owned by the caller that was set by {c:func}`CeedVectorSetArray`. 83*bcb2dfaeSJed Brown- Added {code}`CeedQFunctionContext` object to manage user QFunction context data and reduce copies between device and host memory. 84*bcb2dfaeSJed Brown- Added {c:func}`CeedOperatorMultigridLevelCreate`, {c:func}`CeedOperatorMultigridLevelCreateTensorH1`, and {c:func}`CeedOperatorMultigridLevelCreateH1` to facilitate creation of multigrid prolongation, restriction, and coarse grid operators using a common quadrature space. 85*bcb2dfaeSJed Brown 86*bcb2dfaeSJed Brown### New features 87*bcb2dfaeSJed Brown 88*bcb2dfaeSJed Brown- New HIP backend: `/gpu/hip/ref`. 89*bcb2dfaeSJed Brown- CeedQFunction support for user `CUfunction`s in some backends 90*bcb2dfaeSJed Brown 91*bcb2dfaeSJed Brown### Performance improvements 92*bcb2dfaeSJed Brown 93*bcb2dfaeSJed Brown- OCCA backend rebuilt to facilitate future performance enhancements. 94*bcb2dfaeSJed Brown- Petsc BPs suite improved to reduce noise due to multiple calls to {code}`mpiexec`. 95*bcb2dfaeSJed Brown 96*bcb2dfaeSJed Brown### Examples 97*bcb2dfaeSJed Brown 98*bcb2dfaeSJed Brown- {ref}`example-petsc-elasticity` example updated with strain energy computation and more flexible boundary conditions. 99*bcb2dfaeSJed Brown 100*bcb2dfaeSJed Brown### Deprecated backends 101*bcb2dfaeSJed Brown 102*bcb2dfaeSJed Brown- The `/gpu/cuda/reg` backend has been removed, with its core features moved into `/gpu/cuda/ref` and `/gpu/cuda/shared`. 103*bcb2dfaeSJed Brown 104*bcb2dfaeSJed Brown(v0-6)= 105*bcb2dfaeSJed Brown 106*bcb2dfaeSJed Brown## v0.6 (Mar 29, 2020) 107*bcb2dfaeSJed Brown 108*bcb2dfaeSJed BrownlibCEED v0.6 contains numerous new features and examples, as well as expanded 109*bcb2dfaeSJed Browndocumentation in [this new website](https://libceed.readthedocs.io). 110*bcb2dfaeSJed Brown 111*bcb2dfaeSJed Brown### New features 112*bcb2dfaeSJed Brown 113*bcb2dfaeSJed Brown- New Python interface using [CFFI](https://cffi.readthedocs.io/) provides a nearly 114*bcb2dfaeSJed Brown 1-1 correspondence with the C interface, plus some convenience features. For instance, 115*bcb2dfaeSJed Brown data stored in the {cpp:type}`CeedVector` structure are available without copy as 116*bcb2dfaeSJed Brown {py:class}`numpy.ndarray`. Short tutorials are provided in 117*bcb2dfaeSJed Brown [Binder](https://mybinder.org/v2/gh/CEED/libCEED/main?urlpath=lab/tree/examples/tutorials/). 118*bcb2dfaeSJed Brown- Linear QFunctions can be assembled as block-diagonal matrices (per quadrature point, 119*bcb2dfaeSJed Brown {c:func}`CeedOperatorAssembleLinearQFunction`) or to evaluate the diagonal 120*bcb2dfaeSJed Brown ({c:func}`CeedOperatorAssembleLinearDiagonal`). These operations are useful for 121*bcb2dfaeSJed Brown preconditioning ingredients and are used in the libCEED's multigrid examples. 122*bcb2dfaeSJed Brown- The inverse of separable operators can be obtained using 123*bcb2dfaeSJed Brown {c:func}`CeedOperatorCreateFDMElementInverse` and applied with 124*bcb2dfaeSJed Brown {c:func}`CeedOperatorApply`. This is a useful preconditioning ingredient, 125*bcb2dfaeSJed Brown especially for Laplacians and related operators. 126*bcb2dfaeSJed Brown- New functions: {c:func}`CeedVectorNorm`, {c:func}`CeedOperatorApplyAdd`, 127*bcb2dfaeSJed Brown {c:func}`CeedQFunctionView`, {c:func}`CeedOperatorView`. 128*bcb2dfaeSJed Brown- Make public accessors for various attributes to facilitate writing composable code. 129*bcb2dfaeSJed Brown- New backend: `/cpu/self/memcheck/serial`. 130*bcb2dfaeSJed Brown- QFunctions using variable-length array (VLA) pointer constructs can be used with CUDA 131*bcb2dfaeSJed Brown backends. (Single source is coming soon for OCCA backends.) 132*bcb2dfaeSJed Brown- Fix some missing edge cases in CUDA backend. 133*bcb2dfaeSJed Brown 134*bcb2dfaeSJed Brown### Performance Improvements 135*bcb2dfaeSJed Brown 136*bcb2dfaeSJed Brown- MAGMA backend performance optimization and non-tensor bases. 137*bcb2dfaeSJed Brown- No-copy optimization in {c:func}`CeedOperatorApply`. 138*bcb2dfaeSJed Brown 139*bcb2dfaeSJed Brown### Interface changes 140*bcb2dfaeSJed Brown 141*bcb2dfaeSJed Brown- Replace {code}`CeedElemRestrictionCreateIdentity` and 142*bcb2dfaeSJed Brown {code}`CeedElemRestrictionCreateBlocked` with more flexible 143*bcb2dfaeSJed Brown {c:func}`CeedElemRestrictionCreateStrided` and 144*bcb2dfaeSJed Brown {c:func}`CeedElemRestrictionCreateBlockedStrided`. 145*bcb2dfaeSJed Brown- Add arguments to {c:func}`CeedQFunctionCreateIdentity`. 146*bcb2dfaeSJed Brown- Replace ambiguous uses of {cpp:enum}`CeedTransposeMode` for L-vector identification 147*bcb2dfaeSJed Brown with {cpp:enum}`CeedInterlaceMode`. This is now an attribute of the 148*bcb2dfaeSJed Brown {cpp:type}`CeedElemRestriction` (see {c:func}`CeedElemRestrictionCreate`) and no 149*bcb2dfaeSJed Brown longer passed as `lmode` arguments to {c:func}`CeedOperatorSetField` and 150*bcb2dfaeSJed Brown {c:func}`CeedElemRestrictionApply`. 151*bcb2dfaeSJed Brown 152*bcb2dfaeSJed Brown### Examples 153*bcb2dfaeSJed Brown 154*bcb2dfaeSJed BrownlibCEED-0.6 contains greatly expanded examples with {ref}`new documentation <Examples>`. 155*bcb2dfaeSJed BrownNotable additions include: 156*bcb2dfaeSJed Brown 157*bcb2dfaeSJed Brown- Standalone {ref}`ex2-surface` ({file}`examples/ceed/ex2-surface`): compute the area of 158*bcb2dfaeSJed Brown a domain in 1, 2, and 3 dimensions by applying a Laplacian. 159*bcb2dfaeSJed Brown 160*bcb2dfaeSJed Brown- PETSc {ref}`example-petsc-area` ({file}`examples/petsc/area.c`): computes surface area 161*bcb2dfaeSJed Brown of domains (like the cube and sphere) by direct integration on a surface mesh; 162*bcb2dfaeSJed Brown demonstrates geometric dimension different from topological dimension. 163*bcb2dfaeSJed Brown 164*bcb2dfaeSJed Brown- PETSc {ref}`example-petsc-bps`: 165*bcb2dfaeSJed Brown 166*bcb2dfaeSJed Brown - {file}`examples/petsc/bpsraw.c` (formerly `bps.c`): transparent CUDA support. 167*bcb2dfaeSJed Brown - {file}`examples/petsc/bps.c` (formerly `bpsdmplex.c`): performance improvements 168*bcb2dfaeSJed Brown and transparent CUDA support. 169*bcb2dfaeSJed Brown - {ref}`example-petsc-bps-sphere` ({file}`examples/petsc/bpssphere.c`): 170*bcb2dfaeSJed Brown generalizations of all CEED BPs to the surface of the sphere; demonstrates geometric 171*bcb2dfaeSJed Brown dimension different from topological dimension. 172*bcb2dfaeSJed Brown 173*bcb2dfaeSJed Brown- {ref}`example-petsc-multigrid` ({file}`examples/petsc/multigrid.c`): new p-multigrid 174*bcb2dfaeSJed Brown solver with algebraic multigrid coarse solve. 175*bcb2dfaeSJed Brown 176*bcb2dfaeSJed Brown- {ref}`example-petsc-navier-stokes` ({file}`examples/fluids/navierstokes.c`; formerly 177*bcb2dfaeSJed Brown `examples/navier-stokes`): unstructured grid support (using PETSc's `DMPlex`), 178*bcb2dfaeSJed Brown implicit time integration, SU/SUPG stabilization, free-slip boundary conditions, and 179*bcb2dfaeSJed Brown quasi-2D computational domain support. 180*bcb2dfaeSJed Brown 181*bcb2dfaeSJed Brown- {ref}`example-petsc-elasticity` ({file}`examples/solids/elasticity.c`): new solver for 182*bcb2dfaeSJed Brown linear elasticity, small-strain hyperelasticity, and globalized finite-strain 183*bcb2dfaeSJed Brown hyperelasticity using p-multigrid with algebraic multigrid coarse solve. 184*bcb2dfaeSJed Brown 185*bcb2dfaeSJed Brown(v0-5)= 186*bcb2dfaeSJed Brown 187*bcb2dfaeSJed Brown## v0.5 (Sep 18, 2019) 188*bcb2dfaeSJed Brown 189*bcb2dfaeSJed BrownFor this release, several improvements were made. Two new CUDA backends were added to 190*bcb2dfaeSJed Brownthe family of backends, of which, the new `cuda-gen` backend achieves state-of-the-art 191*bcb2dfaeSJed Brownperformance using single-source {ref}`CeedQFunction`. From this release, users 192*bcb2dfaeSJed Browncan define Q-Functions in a single source code independently of the targeted backend 193*bcb2dfaeSJed Brownwith the aid of a new macro `CEED QFUNCTION` to support JIT (Just-In-Time) and CPU 194*bcb2dfaeSJed Browncompilation of the user provided {ref}`CeedQFunction` code. To allow a unified 195*bcb2dfaeSJed Browndeclaration, the {ref}`CeedQFunction` API has undergone a slight change: 196*bcb2dfaeSJed Brownthe `QFunctionField` parameter `ncomp` has been changed to `size`. This change 197*bcb2dfaeSJed Brownrequires setting the previous value of `ncomp` to `ncomp*dim` when adding a 198*bcb2dfaeSJed Brown`QFunctionField` with eval mode `CEED EVAL GRAD`. 199*bcb2dfaeSJed Brown 200*bcb2dfaeSJed BrownAdditionally, new CPU backends 201*bcb2dfaeSJed Brownwere included in this release, such as the `/cpu/self/opt/*` backends (which are 202*bcb2dfaeSJed Brownwritten in pure C and use partial **E-vectors** to improve performance) and the 203*bcb2dfaeSJed Brown`/cpu/self/ref/memcheck` backend (which relies upon the 204*bcb2dfaeSJed Brown[Valgrind](http://valgrind.org/) Memcheck tool to help verify that user 205*bcb2dfaeSJed Brown{ref}`CeedQFunction` have no undefined values). 206*bcb2dfaeSJed BrownThis release also included various performance improvements, bug fixes, new examples, 207*bcb2dfaeSJed Brownand improved tests. Among these improvements, vectorized instructions for 208*bcb2dfaeSJed Brown{ref}`CeedQFunction` code compiled for CPU were enhanced by using `CeedPragmaSIMD` 209*bcb2dfaeSJed Browninstead of `CeedPragmaOMP`, implementation of a {ref}`CeedQFunction` gallery and 210*bcb2dfaeSJed Brownidentity Q-Functions were introduced, and the PETSc benchmark problems were expanded 211*bcb2dfaeSJed Brownto include unstructured meshes handling were. For this expansion, the prior version of 212*bcb2dfaeSJed Brownthe PETSc BPs, which only included data associated with structured geometries, were 213*bcb2dfaeSJed Brownrenamed `bpsraw`, and the new version of the BPs, which can handle data associated 214*bcb2dfaeSJed Brownwith any unstructured geometry, were called `bps`. Additionally, other benchmark 215*bcb2dfaeSJed Brownproblems, namely BP2 and BP4 (the vector-valued versions of BP1 and BP3, respectively), 216*bcb2dfaeSJed Brownand BP5 and BP6 (the collocated versions---for which the quadrature points are the same 217*bcb2dfaeSJed Brownas the Gauss Lobatto nodes---of BP3 and BP4 respectively) were added to the PETSc 218*bcb2dfaeSJed Brownexamples. Furthermoew, another standalone libCEED example, called `ex2`, which 219*bcb2dfaeSJed Browncomputes the surface area of a given mesh was added to this release. 220*bcb2dfaeSJed Brown 221*bcb2dfaeSJed BrownBackends available in this release: 222*bcb2dfaeSJed Brown 223*bcb2dfaeSJed Brown```{eval-rst} 224*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 225*bcb2dfaeSJed Brown| CEED resource (``-ceed``) | Backend | 226*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 227*bcb2dfaeSJed Brown| ``/cpu/self/ref/serial`` | Serial reference implementation | 228*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 229*bcb2dfaeSJed Brown| ``/cpu/self/ref/blocked`` | Blocked reference implementation | 230*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 231*bcb2dfaeSJed Brown| ``/cpu/self/ref/memcheck`` | Memcheck backend, undefined value checks | 232*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 233*bcb2dfaeSJed Brown| ``/cpu/self/opt/serial`` | Serial optimized C implementation | 234*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 235*bcb2dfaeSJed Brown| ``/cpu/self/opt/blocked`` | Blocked optimized C implementation | 236*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 237*bcb2dfaeSJed Brown| ``/cpu/self/avx/serial`` | Serial AVX implementation | 238*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 239*bcb2dfaeSJed Brown| ``/cpu/self/avx/blocked`` | Blocked AVX implementation | 240*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 241*bcb2dfaeSJed Brown| ``/cpu/self/xsmm/serial`` | Serial LIBXSMM implementation | 242*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 243*bcb2dfaeSJed Brown| ``/cpu/self/xsmm/blocked`` | Blocked LIBXSMM implementation | 244*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 245*bcb2dfaeSJed Brown| ``/cpu/occa`` | Serial OCCA kernels | 246*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 247*bcb2dfaeSJed Brown| ``/gpu/occa`` | CUDA OCCA kernels | 248*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 249*bcb2dfaeSJed Brown| ``/omp/occa`` | OpenMP OCCA kernels | 250*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 251*bcb2dfaeSJed Brown| ``/ocl/occa`` | OpenCL OCCA kernels | 252*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 253*bcb2dfaeSJed Brown| ``/gpu/cuda/ref`` | Reference pure CUDA kernels | 254*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 255*bcb2dfaeSJed Brown| ``/gpu/cuda/reg`` | Pure CUDA kernels using one thread per element | 256*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 257*bcb2dfaeSJed Brown| ``/gpu/cuda/shared`` | Optimized pure CUDA kernels using shared memory | 258*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 259*bcb2dfaeSJed Brown| ``/gpu/cuda/gen`` | Optimized pure CUDA kernels using code generation | 260*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 261*bcb2dfaeSJed Brown| ``/gpu/magma`` | CUDA MAGMA kernels | 262*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 263*bcb2dfaeSJed Brown``` 264*bcb2dfaeSJed Brown 265*bcb2dfaeSJed BrownExamples available in this release: 266*bcb2dfaeSJed Brown 267*bcb2dfaeSJed Brown```{eval-rst} 268*bcb2dfaeSJed Brown+-------------------------+--------------------------------------------+ 269*bcb2dfaeSJed Brown| User code | Example | 270*bcb2dfaeSJed Brown+-------------------------+--------------------------------------------+ 271*bcb2dfaeSJed Brown| | - ex1 (volume) | 272*bcb2dfaeSJed Brown| ``ceed`` | - ex2 (surface) | 273*bcb2dfaeSJed Brown+-------------------------+--------------------------------------------+ 274*bcb2dfaeSJed Brown| | - BP1 (scalar mass operator) | 275*bcb2dfaeSJed Brown| ``mfem`` | - BP3 (scalar Laplace operator) | 276*bcb2dfaeSJed Brown+-------------------------+--------------------------------------------+ 277*bcb2dfaeSJed Brown| | - BP1 (scalar mass operator) | 278*bcb2dfaeSJed Brown| | - BP2 (vector mass operator) | 279*bcb2dfaeSJed Brown| | - BP3 (scalar Laplace operator) | 280*bcb2dfaeSJed Brown| ``petsc`` | - BP4 (vector Laplace operator) | 281*bcb2dfaeSJed Brown| | - BP5 (collocated scalar Laplace operator) | 282*bcb2dfaeSJed Brown| | - BP6 (collocated vector Laplace operator) | 283*bcb2dfaeSJed Brown| | - Navier-Stokes | 284*bcb2dfaeSJed Brown+-------------------------+--------------------------------------------+ 285*bcb2dfaeSJed Brown| | - BP1 (scalar mass operator) | 286*bcb2dfaeSJed Brown| ``nek5000`` | - BP3 (scalar Laplace operator) | 287*bcb2dfaeSJed Brown+-------------------------+--------------------------------------------+ 288*bcb2dfaeSJed Brown``` 289*bcb2dfaeSJed Brown 290*bcb2dfaeSJed Brown(v0-4)= 291*bcb2dfaeSJed Brown 292*bcb2dfaeSJed Brown## v0.4 (Apr 1, 2019) 293*bcb2dfaeSJed Brown 294*bcb2dfaeSJed BrownlibCEED v0.4 was made again publicly available in the second full CEED software 295*bcb2dfaeSJed Browndistribution, release CEED 2.0. This release contained notable features, such as 296*bcb2dfaeSJed Brownfour new CPU backends, two new GPU backends, CPU backend optimizations, initial 297*bcb2dfaeSJed Brownsupport for operator composition, performance benchmarking, and a Navier-Stokes demo. 298*bcb2dfaeSJed BrownThe new CPU backends in this release came in two families. The `/cpu/self/*/serial` 299*bcb2dfaeSJed Brownbackends process one element at a time and are intended for meshes with a smaller number 300*bcb2dfaeSJed Brownof high order elements. The `/cpu/self/*/blocked` backends process blocked batches of 301*bcb2dfaeSJed Browneight interlaced elements and are intended for meshes with higher numbers of elements. 302*bcb2dfaeSJed BrownThe `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU 303*bcb2dfaeSJed Brownperformance. The `/cpu/self/xsmm/*` backends rely upon the 304*bcb2dfaeSJed Brown[LIBXSMM](http://github.com/hfp/libxsmm) package to provide vectorized CPU 305*bcb2dfaeSJed Brownperformance. The `/gpu/cuda/*` backends provide GPU performance strictly using CUDA. 306*bcb2dfaeSJed BrownThe `/gpu/cuda/ref` backend is a reference CUDA backend, providing reasonable 307*bcb2dfaeSJed Brownperformance for most problem configurations. The `/gpu/cuda/reg` backend uses a simple 308*bcb2dfaeSJed Brownparallelization approach, where each thread treats a finite element. Using just in time 309*bcb2dfaeSJed Browncompilation, provided by nvrtc (NVidia Runtime Compiler), and runtime parameters, this 310*bcb2dfaeSJed Brownbackend unroll loops and map memory address to registers. The `/gpu/cuda/reg` backend 311*bcb2dfaeSJed Brownachieve good peak performance for 1D, 2D, and low order 3D problems, but performance 312*bcb2dfaeSJed Browndeteriorates very quickly when threads run out of registers. 313*bcb2dfaeSJed Brown 314*bcb2dfaeSJed BrownA new explicit time-stepping Navier-Stokes solver was added to the family of libCEED 315*bcb2dfaeSJed Brownexamples in the `examples/petsc` directory (see {ref}`example-petsc-navier-stokes`). 316*bcb2dfaeSJed BrownThis example solves the time-dependent Navier-Stokes equations of compressible gas 317*bcb2dfaeSJed Browndynamics in a static Eulerian three-dimensional frame, using structured high-order 318*bcb2dfaeSJed Brownfinite/spectral element spatial discretizations and explicit high-order time-stepping 319*bcb2dfaeSJed Brown(available in PETSc). Moreover, the Navier-Stokes example was developed using PETSc, 320*bcb2dfaeSJed Brownso that the pointwise physics (defined at quadrature points) is separated from the 321*bcb2dfaeSJed Brownparallelization and meshing concerns. 322*bcb2dfaeSJed Brown 323*bcb2dfaeSJed BrownBackends available in this release: 324*bcb2dfaeSJed Brown 325*bcb2dfaeSJed Brown```{eval-rst} 326*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 327*bcb2dfaeSJed Brown| CEED resource (``-ceed``) | Backend | 328*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 329*bcb2dfaeSJed Brown| ``/cpu/self/ref/serial`` | Serial reference implementation | 330*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 331*bcb2dfaeSJed Brown| ``/cpu/self/ref/blocked`` | Blocked reference implementation | 332*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 333*bcb2dfaeSJed Brown| ``/cpu/self/tmpl`` | Backend template, defaults to ``/cpu/self/blocked`` | 334*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 335*bcb2dfaeSJed Brown| ``/cpu/self/avx/serial`` | Serial AVX implementation | 336*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 337*bcb2dfaeSJed Brown| ``/cpu/self/avx/blocked`` | Blocked AVX implementation | 338*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 339*bcb2dfaeSJed Brown| ``/cpu/self/xsmm/serial`` | Serial LIBXSMM implementation | 340*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 341*bcb2dfaeSJed Brown| ``/cpu/self/xsmm/blocked`` | Blocked LIBXSMM implementation | 342*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 343*bcb2dfaeSJed Brown| ``/cpu/occa`` | Serial OCCA kernels | 344*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 345*bcb2dfaeSJed Brown| ``/gpu/occa`` | CUDA OCCA kernels | 346*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 347*bcb2dfaeSJed Brown| ``/omp/occa`` | OpenMP OCCA kernels | 348*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 349*bcb2dfaeSJed Brown| ``/ocl/occa`` | OpenCL OCCA kernels | 350*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 351*bcb2dfaeSJed Brown| ``/gpu/cuda/ref`` | Reference pure CUDA kernels | 352*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 353*bcb2dfaeSJed Brown| ``/gpu/cuda/reg`` | Pure CUDA kernels using one thread per element | 354*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 355*bcb2dfaeSJed Brown| ``/gpu/magma`` | CUDA MAGMA kernels | 356*bcb2dfaeSJed Brown+----------------------------+-----------------------------------------------------+ 357*bcb2dfaeSJed Brown``` 358*bcb2dfaeSJed Brown 359*bcb2dfaeSJed BrownExamples available in this release: 360*bcb2dfaeSJed Brown 361*bcb2dfaeSJed Brown```{eval-rst} 362*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 363*bcb2dfaeSJed Brown| User code | Example | 364*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 365*bcb2dfaeSJed Brown| ``ceed`` | ex1 (volume) | 366*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 367*bcb2dfaeSJed Brown| | - BP1 (scalar mass operator) | 368*bcb2dfaeSJed Brown| ``mfem`` | - BP3 (scalar Laplace operator) | 369*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 370*bcb2dfaeSJed Brown| | - BP1 (scalar mass operator) | 371*bcb2dfaeSJed Brown| ``petsc`` | - BP3 (scalar Laplace operator) | 372*bcb2dfaeSJed Brown| | - Navier-Stokes | 373*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 374*bcb2dfaeSJed Brown| | - BP1 (scalar mass operator) | 375*bcb2dfaeSJed Brown| ``nek5000`` | - BP3 (scalar Laplace operator) | 376*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 377*bcb2dfaeSJed Brown``` 378*bcb2dfaeSJed Brown 379*bcb2dfaeSJed Brown(v0-3)= 380*bcb2dfaeSJed Brown 381*bcb2dfaeSJed Brown## v0.3 (Sep 30, 2018) 382*bcb2dfaeSJed Brown 383*bcb2dfaeSJed BrownNotable features in this release include active/passive field interface, support for 384*bcb2dfaeSJed Brownnon-tensor bases, backend optimization, and improved Fortran interface. This release 385*bcb2dfaeSJed Brownalso focused on providing improved continuous integration, and many new tests with code 386*bcb2dfaeSJed Browncoverage reports of about 90%. This release also provided a significant change to the 387*bcb2dfaeSJed Brownpublic interface: a {ref}`CeedQFunction` can take any number of named input and output 388*bcb2dfaeSJed Brownarguments while {ref}`CeedOperator` connects them to the actual data, which may be 389*bcb2dfaeSJed Brownsupplied explicitly to `CeedOperatorApply()` (active) or separately via 390*bcb2dfaeSJed Brown`CeedOperatorSetField()` (passive). This interface change enables reusable libraries 391*bcb2dfaeSJed Brownof CeedQFunctions and composition of block solvers constructed using 392*bcb2dfaeSJed Brown{ref}`CeedOperator`. A concept of blocked restriction was added to this release and 393*bcb2dfaeSJed Brownused in an optimized CPU backend. Although this is typically not visible to the user, 394*bcb2dfaeSJed Brownit enables effective use of arbitrary-length SIMD while maintaining cache locality. 395*bcb2dfaeSJed BrownThis CPU backend also implements an algebraic factorization of tensor product gradients 396*bcb2dfaeSJed Brownto perform fewer operations than standard application of interpolation and 397*bcb2dfaeSJed Browndifferentiation from nodes to quadrature points. This algebraic formulation 398*bcb2dfaeSJed Brownautomatically supports non-polynomial and non-interpolatory bases, thus is more general 399*bcb2dfaeSJed Brownthan the more common derivation in terms of Lagrange polynomials on the quadrature points. 400*bcb2dfaeSJed Brown 401*bcb2dfaeSJed BrownBackends available in this release: 402*bcb2dfaeSJed Brown 403*bcb2dfaeSJed Brown```{eval-rst} 404*bcb2dfaeSJed Brown+---------------------------+-----------------------------------------------------+ 405*bcb2dfaeSJed Brown| CEED resource (``-ceed``) | Backend | 406*bcb2dfaeSJed Brown+---------------------------+-----------------------------------------------------+ 407*bcb2dfaeSJed Brown| ``/cpu/self/blocked`` | Blocked reference implementation | 408*bcb2dfaeSJed Brown+---------------------------+-----------------------------------------------------+ 409*bcb2dfaeSJed Brown| ``/cpu/self/ref`` | Serial reference implementation | 410*bcb2dfaeSJed Brown+---------------------------+-----------------------------------------------------+ 411*bcb2dfaeSJed Brown| ``/cpu/self/tmpl`` | Backend template, defaults to ``/cpu/self/blocked`` | 412*bcb2dfaeSJed Brown+---------------------------+-----------------------------------------------------+ 413*bcb2dfaeSJed Brown| ``/cpu/occa`` | Serial OCCA kernels | 414*bcb2dfaeSJed Brown+---------------------------+-----------------------------------------------------+ 415*bcb2dfaeSJed Brown| ``/gpu/occa`` | CUDA OCCA kernels | 416*bcb2dfaeSJed Brown+---------------------------+-----------------------------------------------------+ 417*bcb2dfaeSJed Brown| ``/omp/occa`` | OpenMP OCCA kernels | 418*bcb2dfaeSJed Brown+---------------------------+-----------------------------------------------------+ 419*bcb2dfaeSJed Brown| ``/ocl/occa`` | OpenCL OCCA kernels | 420*bcb2dfaeSJed Brown+---------------------------+-----------------------------------------------------+ 421*bcb2dfaeSJed Brown| ``/gpu/magma`` | CUDA MAGMA kernels | 422*bcb2dfaeSJed Brown+---------------------------+-----------------------------------------------------+ 423*bcb2dfaeSJed Brown``` 424*bcb2dfaeSJed Brown 425*bcb2dfaeSJed BrownExamples available in this release: 426*bcb2dfaeSJed Brown 427*bcb2dfaeSJed Brown```{eval-rst} 428*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 429*bcb2dfaeSJed Brown| User code | Example | 430*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 431*bcb2dfaeSJed Brown| ``ceed`` | ex1 (volume) | 432*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 433*bcb2dfaeSJed Brown| | - BP1 (scalar mass operator) | 434*bcb2dfaeSJed Brown| ``mfem`` | - BP3 (scalar Laplace operator) | 435*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 436*bcb2dfaeSJed Brown| | - BP1 (scalar mass operator) | 437*bcb2dfaeSJed Brown| ``petsc`` | - BP3 (scalar Laplace operator) | 438*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 439*bcb2dfaeSJed Brown| | - BP1 (scalar mass operator) | 440*bcb2dfaeSJed Brown| ``nek5000`` | - BP3 (scalar Laplace operator) | 441*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 442*bcb2dfaeSJed Brown``` 443*bcb2dfaeSJed Brown 444*bcb2dfaeSJed Brown(v0-21)= 445*bcb2dfaeSJed Brown 446*bcb2dfaeSJed Brown## v0.21 (Sep 30, 2018) 447*bcb2dfaeSJed Brown 448*bcb2dfaeSJed BrownA MAGMA backend (which relies upon the 449*bcb2dfaeSJed Brown[MAGMA](https://bitbucket.org/icl/magma) package) was integrated in libCEED for this 450*bcb2dfaeSJed Brownrelease. This initial integration set up the framework of using MAGMA and provided the 451*bcb2dfaeSJed BrownlibCEED functionality through MAGMA kernels as one of libCEED’s computational backends. 452*bcb2dfaeSJed BrownAs any other backend, the MAGMA backend provides extended basic data structures for 453*bcb2dfaeSJed Brown{ref}`CeedVector`, {ref}`CeedElemRestriction`, and {ref}`CeedOperator`, and implements 454*bcb2dfaeSJed Brownthe fundamental CEED building blocks to work with the new data structures. 455*bcb2dfaeSJed BrownIn general, the MAGMA-specific data structures keep the libCEED pointers to CPU data 456*bcb2dfaeSJed Brownbut also add corresponding device (e.g., GPU) pointers to the data. Coherency is handled 457*bcb2dfaeSJed Browninternally, and thus seamlessly to the user, through the functions/methods that are 458*bcb2dfaeSJed Brownprovided to support them. 459*bcb2dfaeSJed Brown 460*bcb2dfaeSJed BrownBackends available in this release: 461*bcb2dfaeSJed Brown 462*bcb2dfaeSJed Brown```{eval-rst} 463*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 464*bcb2dfaeSJed Brown| CEED resource (``-ceed``) | Backend | 465*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 466*bcb2dfaeSJed Brown| ``/cpu/self`` | Serial reference implementation | 467*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 468*bcb2dfaeSJed Brown| ``/cpu/occa`` | Serial OCCA kernels | 469*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 470*bcb2dfaeSJed Brown| ``/gpu/occa`` | CUDA OCCA kernels | 471*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 472*bcb2dfaeSJed Brown| ``/omp/occa`` | OpenMP OCCA kernels | 473*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 474*bcb2dfaeSJed Brown| ``/ocl/occa`` | OpenCL OCCA kernels | 475*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 476*bcb2dfaeSJed Brown| ``/gpu/magma`` | CUDA MAGMA kernels | 477*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 478*bcb2dfaeSJed Brown``` 479*bcb2dfaeSJed Brown 480*bcb2dfaeSJed BrownExamples available in this release: 481*bcb2dfaeSJed Brown 482*bcb2dfaeSJed Brown```{eval-rst} 483*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 484*bcb2dfaeSJed Brown| User code | Example | 485*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 486*bcb2dfaeSJed Brown| ``ceed`` | ex1 (volume) | 487*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 488*bcb2dfaeSJed Brown| | - BP1 (scalar mass operator) | 489*bcb2dfaeSJed Brown| ``mfem`` | - BP3 (scalar Laplace operator) | 490*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 491*bcb2dfaeSJed Brown| ``petsc`` | BP1 (scalar mass operator) | 492*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 493*bcb2dfaeSJed Brown| ``nek5000`` | BP1 (scalar mass operator) | 494*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 495*bcb2dfaeSJed Brown``` 496*bcb2dfaeSJed Brown 497*bcb2dfaeSJed Brown(v0-2)= 498*bcb2dfaeSJed Brown 499*bcb2dfaeSJed Brown## v0.2 (Mar 30, 2018) 500*bcb2dfaeSJed Brown 501*bcb2dfaeSJed BrownlibCEED was made publicly available the first full CEED software distribution, release 502*bcb2dfaeSJed BrownCEED 1.0. The distribution was made available using the Spack package manager to provide 503*bcb2dfaeSJed Browna common, easy-to-use build environment, where the user can build the CEED distribution 504*bcb2dfaeSJed Brownwith all dependencies. This release included a new Fortran interface for the library. 505*bcb2dfaeSJed BrownThis release also contained major improvements in the OCCA backend (including a new 506*bcb2dfaeSJed Brown`/ocl/occa` backend) and new examples. The standalone libCEED example was modified to 507*bcb2dfaeSJed Browncompute the volume volume of a given mesh (in 1D, 2D, or 3D) and placed in an 508*bcb2dfaeSJed Brown`examples/ceed` subfolder. A new `mfem` example to perform BP3 (with the application 509*bcb2dfaeSJed Brownof the Laplace operator) was also added to this release. 510*bcb2dfaeSJed Brown 511*bcb2dfaeSJed BrownBackends available in this release: 512*bcb2dfaeSJed Brown 513*bcb2dfaeSJed Brown```{eval-rst} 514*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 515*bcb2dfaeSJed Brown| CEED resource (``-ceed``) | Backend | 516*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 517*bcb2dfaeSJed Brown| ``/cpu/self`` | Serial reference implementation | 518*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 519*bcb2dfaeSJed Brown| ``/cpu/occa`` | Serial OCCA kernels | 520*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 521*bcb2dfaeSJed Brown| ``/gpu/occa`` | CUDA OCCA kernels | 522*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 523*bcb2dfaeSJed Brown| ``/omp/occa`` | OpenMP OCCA kernels | 524*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 525*bcb2dfaeSJed Brown| ``/ocl/occa`` | OpenCL OCCA kernels | 526*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 527*bcb2dfaeSJed Brown``` 528*bcb2dfaeSJed Brown 529*bcb2dfaeSJed BrownExamples available in this release: 530*bcb2dfaeSJed Brown 531*bcb2dfaeSJed Brown```{eval-rst} 532*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 533*bcb2dfaeSJed Brown| User code | Example | 534*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 535*bcb2dfaeSJed Brown| ``ceed`` | ex1 (volume) | 536*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 537*bcb2dfaeSJed Brown| | - BP1 (scalar mass operator) | 538*bcb2dfaeSJed Brown| ``mfem`` | - BP3 (scalar Laplace operator) | 539*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 540*bcb2dfaeSJed Brown| ``petsc`` | BP1 (scalar mass operator) | 541*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 542*bcb2dfaeSJed Brown| ``nek5000`` | BP1 (scalar mass operator) | 543*bcb2dfaeSJed Brown+-------------------------+---------------------------------+ 544*bcb2dfaeSJed Brown``` 545*bcb2dfaeSJed Brown 546*bcb2dfaeSJed Brown(v0-1)= 547*bcb2dfaeSJed Brown 548*bcb2dfaeSJed Brown## v0.1 (Jan 3, 2018) 549*bcb2dfaeSJed Brown 550*bcb2dfaeSJed BrownInitial low-level API of the CEED project. The low-level API provides a set of Finite 551*bcb2dfaeSJed BrownElements kernels and components for writing new low-level kernels. Examples include: 552*bcb2dfaeSJed Brownvector and sparse linear algebra, element matrix assembly over a batch of elements, 553*bcb2dfaeSJed Brownpartial assembly and action for efficient high-order operators like mass, diffusion, 554*bcb2dfaeSJed Brownadvection, etc. The main goal of the low-level API is to establish the basis for the 555*bcb2dfaeSJed Brownhigh-level API. Also, identifying such low-level kernels and providing a reference 556*bcb2dfaeSJed Brownimplementation for them serves as the basis for specialized backend implementations. 557*bcb2dfaeSJed BrownThis release contained several backends: `/cpu/self`, and backends which rely upon the 558*bcb2dfaeSJed Brown[OCCA](http://github.com/libocca/occa) package, such as `/cpu/occa`, 559*bcb2dfaeSJed Brown`/gpu/occa`, and `/omp/occa`. 560*bcb2dfaeSJed BrownIt also included several examples, in the `examples` folder: 561*bcb2dfaeSJed BrownA standalone code that shows the usage of libCEED (with no external 562*bcb2dfaeSJed Browndependencies) to apply the Laplace operator, `ex1`; an `mfem` example to perform BP1 563*bcb2dfaeSJed Brown(with the application of the mass operator); and a `petsc` example to perform BP1 564*bcb2dfaeSJed Brown(with the application of the mass operator). 565*bcb2dfaeSJed Brown 566*bcb2dfaeSJed BrownBackends available in this release: 567*bcb2dfaeSJed Brown 568*bcb2dfaeSJed Brown```{eval-rst} 569*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 570*bcb2dfaeSJed Brown| CEED resource (``-ceed``) | Backend | 571*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 572*bcb2dfaeSJed Brown| ``/cpu/self`` | Serial reference implementation | 573*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 574*bcb2dfaeSJed Brown| ``/cpu/occa`` | Serial OCCA kernels | 575*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 576*bcb2dfaeSJed Brown| ``/gpu/occa`` | CUDA OCCA kernels | 577*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 578*bcb2dfaeSJed Brown| ``/omp/occa`` | OpenMP OCCA kernels | 579*bcb2dfaeSJed Brown+---------------------------+---------------------------------+ 580*bcb2dfaeSJed Brown``` 581*bcb2dfaeSJed Brown 582*bcb2dfaeSJed BrownExamples available in this release: 583*bcb2dfaeSJed Brown 584*bcb2dfaeSJed Brown```{eval-rst} 585*bcb2dfaeSJed Brown+-------------------------+-----------------------------------+ 586*bcb2dfaeSJed Brown| User code | Example | 587*bcb2dfaeSJed Brown+-------------------------+-----------------------------------+ 588*bcb2dfaeSJed Brown| ``ceed`` | ex1 (scalar Laplace operator) | 589*bcb2dfaeSJed Brown+-------------------------+-----------------------------------+ 590*bcb2dfaeSJed Brown| ``mfem`` | BP1 (scalar mass operator) | 591*bcb2dfaeSJed Brown+-------------------------+-----------------------------------+ 592*bcb2dfaeSJed Brown| ``petsc`` | BP1 (scalar mass operator) | 593*bcb2dfaeSJed Brown+-------------------------+-----------------------------------+ 594*bcb2dfaeSJed Brown``` 595