1# Changes/Release Notes 2 3On this page we provide a summary of the main API changes, new features and examples 4for each release of libCEED. 5 6(main)= 7 8## Current `main` branch 9 10### Interface changes 11 12- Update {c:func} `CeedQFunctionGetFields` and {c:func} `CeedOperatorGetFields` to include number of fields. 13- QFunction and Operator field objects, `CeedQFunctionField` and `CeedOperatorField`, and associated getters, {c:func}`CeedQFunctionGetFields`; {c:func}`CeedQFunctionFieldGetName`; {c:func}`CeedQFunctionFieldGetSize`; {c:func}`CeedQFunctionFieldGetEvalMode`; {c:func}`CeedOperatorGetFields`; {c:func}`CeedOperatorFieldGetElemRestriction`; {c:func}`CeedOperatorFieldGetBasis`; and {c:func}`CeedOperatorFieldGetVector`, promoted to the public API. 14 15### Maintainability 16 17- Refactored preconditioner support internally to facilitate future development and improve GPU completeness/test coverage. 18 19(v0-9)= 20 21## v0.9 (Jul 6, 2021) 22 23### Interface changes 24 25- Minor modification in error handling macro to silence pedantic warnings when compiling with Clang, but no functional impact. 26 27### New features 28 29- Add {c:func}`CeedVectorAXPY` and {c:func}`CeedVectorPointwiseMult` as a convenience for stand-alone testing and internal use. 30- Add `CEED_QFUNCTION_HELPER` macro to properly annotate QFunction helper functions for code generation backends. 31- Add `CeedPragmaOptimizeOff` macro for code that is sensitive to floating point errors from fast math optimizations. 32- Rust support: split `libceed-sys` crate out of `libceed` and [publish both on crates.io](https://crates.io/crates/libceed). 33 34### Performance improvements 35 36### Examples 37 38- Solid mechanics mini-app updated to explore the performance impacts of various formulations in the initial and current configurations. 39- Fluid mechanics example adds GPU support and improves modularity. 40 41### Deprecated backends 42 43- The `/cpu/self/tmpl` and `/cpu/self/tmpl/sub` backends have been removed. These backends were intially added to test the backend inheritance mechanism, but this mechanism is now widely used and tested in multiple backends. 44 45(v0-8)= 46 47## v0.8 (Mar 31, 2021) 48 49### Interface changes 50 51- Error handling improved to include enumerated error codes for C interface return values. 52- Installed headers that will follow semantic versioning were moved to {code}`include/ceed` directory. These headers have been renamed from {code}`ceed-*.h` to {code}`ceed/*.h`. Placeholder headers with the old naming schema are currently provided, but these headers will be removed in the libCEED v0.9 release. 53 54### New features 55 56- Julia and Rust interfaces added, providing a nearly 1-1 correspondence with the C interface, plus some convenience features. 57- Static libraries can be built with `make STATIC=1` and the pkg-config file is installed accordingly. 58- Add {c:func}`CeedOperatorLinearAssembleSymbolic` and {c:func}`CeedOperatorLinearAssemble` to support full assembly of libCEED operators. 59 60### Performance improvements 61 62- New HIP MAGMA backends for hipMAGMA library users: `/gpu/hip/magma` and `/gpu/hip/magma/det`. 63- New HIP backends for improved tensor basis performance: `/gpu/hip/shared` and `/gpu/hip/gen`. 64 65### Examples 66 67- {ref}`example-petsc-elasticity` example updated with traction boundary conditions and improved Dirichlet boundary conditions. 68- {ref}`example-petsc-elasticity` example updated with Neo-Hookean hyperelasticity in current configuration as well as improved Neo-Hookean hyperelasticity exploring storage vs computation tradeoffs. 69- {ref}`example-petsc-navier-stokes` example updated with isentropic traveling vortex test case, an analytical solution to the Euler equations that is useful for testing boundary conditions, discretization stability, and order of accuracy. 70- {ref}`example-petsc-navier-stokes` example updated with support for performing convergence study and plotting order of convergence by polynomial degree. 71 72(v0-7)= 73 74## v0.7 (Sep 29, 2020) 75 76### Interface changes 77 78- Replace limited {code}`CeedInterlaceMode` with more flexible component stride {code}`compstride` in {code}`CeedElemRestriction` constructors. 79 As a result, the {code}`indices` parameter has been replaced with {code}`offsets` and the {code}`nnodes` parameter has been replaced with {code}`lsize`. 80 These changes improve support for mixed finite element methods. 81- Replace various uses of {code}`Ceed*Get*Status` with {code}`Ceed*Is*` in the backend API to match common nomenclature. 82- Replace {code}`CeedOperatorAssembleLinearDiagonal` with {c:func}`CeedOperatorLinearAssembleDiagonal` for clarity. 83- Linear Operators can be assembled as point-block diagonal matrices with {c:func}`CeedOperatorLinearAssemblePointBlockDiagonal`, provided in row-major form in a {code}`ncomp` by {code}`ncomp` block per node. 84- Diagonal assemble interface changed to accept a {ref}`CeedVector` instead of a pointer to a {ref}`CeedVector` to reduce memory movement when interfacing with calling code. 85- Added {c:func}`CeedOperatorLinearAssembleAddDiagonal` and {c:func}`CeedOperatorLinearAssembleAddPointBlockDiagonal` for improved future integration with codes such as MFEM that compose the action of {ref}`CeedOperator`s external to libCEED. 86- Added {c:func}`CeedVectorTakeAray` to sync and remove libCEED read/write access to an allocated array and pass ownership of the array to the caller. 87 This function is recommended over {c:func}`CeedVectorSyncArray` when the {code}`CeedVector` has an array owned by the caller that was set by {c:func}`CeedVectorSetArray`. 88- Added {code}`CeedQFunctionContext` object to manage user QFunction context data and reduce copies between device and host memory. 89- Added {c:func}`CeedOperatorMultigridLevelCreate`, {c:func}`CeedOperatorMultigridLevelCreateTensorH1`, and {c:func}`CeedOperatorMultigridLevelCreateH1` to facilitate creation of multigrid prolongation, restriction, and coarse grid operators using a common quadrature space. 90 91### New features 92 93- New HIP backend: `/gpu/hip/ref`. 94- CeedQFunction support for user `CUfunction`s in some backends 95 96### Performance improvements 97 98- OCCA backend rebuilt to facilitate future performance enhancements. 99- Petsc BPs suite improved to reduce noise due to multiple calls to {code}`mpiexec`. 100 101### Examples 102 103- {ref}`example-petsc-elasticity` example updated with strain energy computation and more flexible boundary conditions. 104 105### Deprecated backends 106 107- The `/gpu/cuda/reg` backend has been removed, with its core features moved into `/gpu/cuda/ref` and `/gpu/cuda/shared`. 108 109(v0-6)= 110 111## v0.6 (Mar 29, 2020) 112 113libCEED v0.6 contains numerous new features and examples, as well as expanded 114documentation in [this new website](https://libceed.readthedocs.io). 115 116### New features 117 118- New Python interface using [CFFI](https://cffi.readthedocs.io/) provides a nearly 119 1-1 correspondence with the C interface, plus some convenience features. For instance, 120 data stored in the {cpp:type}`CeedVector` structure are available without copy as 121 {py:class}`numpy.ndarray`. Short tutorials are provided in 122 [Binder](https://mybinder.org/v2/gh/CEED/libCEED/main?urlpath=lab/tree/examples/tutorials/). 123- Linear QFunctions can be assembled as block-diagonal matrices (per quadrature point, 124 {c:func}`CeedOperatorAssembleLinearQFunction`) or to evaluate the diagonal 125 ({c:func}`CeedOperatorAssembleLinearDiagonal`). These operations are useful for 126 preconditioning ingredients and are used in the libCEED's multigrid examples. 127- The inverse of separable operators can be obtained using 128 {c:func}`CeedOperatorCreateFDMElementInverse` and applied with 129 {c:func}`CeedOperatorApply`. This is a useful preconditioning ingredient, 130 especially for Laplacians and related operators. 131- New functions: {c:func}`CeedVectorNorm`, {c:func}`CeedOperatorApplyAdd`, 132 {c:func}`CeedQFunctionView`, {c:func}`CeedOperatorView`. 133- Make public accessors for various attributes to facilitate writing composable code. 134- New backend: `/cpu/self/memcheck/serial`. 135- QFunctions using variable-length array (VLA) pointer constructs can be used with CUDA 136 backends. (Single source is coming soon for OCCA backends.) 137- Fix some missing edge cases in CUDA backend. 138 139### Performance Improvements 140 141- MAGMA backend performance optimization and non-tensor bases. 142- No-copy optimization in {c:func}`CeedOperatorApply`. 143 144### Interface changes 145 146- Replace {code}`CeedElemRestrictionCreateIdentity` and 147 {code}`CeedElemRestrictionCreateBlocked` with more flexible 148 {c:func}`CeedElemRestrictionCreateStrided` and 149 {c:func}`CeedElemRestrictionCreateBlockedStrided`. 150- Add arguments to {c:func}`CeedQFunctionCreateIdentity`. 151- Replace ambiguous uses of {cpp:enum}`CeedTransposeMode` for L-vector identification 152 with {cpp:enum}`CeedInterlaceMode`. This is now an attribute of the 153 {cpp:type}`CeedElemRestriction` (see {c:func}`CeedElemRestrictionCreate`) and no 154 longer passed as `lmode` arguments to {c:func}`CeedOperatorSetField` and 155 {c:func}`CeedElemRestrictionApply`. 156 157### Examples 158 159libCEED-0.6 contains greatly expanded examples with {ref}`new documentation <Examples>`. 160Notable additions include: 161 162- Standalone {ref}`ex2-surface` ({file}`examples/ceed/ex2-surface`): compute the area of 163 a domain in 1, 2, and 3 dimensions by applying a Laplacian. 164 165- PETSc {ref}`example-petsc-area` ({file}`examples/petsc/area.c`): computes surface area 166 of domains (like the cube and sphere) by direct integration on a surface mesh; 167 demonstrates geometric dimension different from topological dimension. 168 169- PETSc {ref}`example-petsc-bps`: 170 171 - {file}`examples/petsc/bpsraw.c` (formerly `bps.c`): transparent CUDA support. 172 - {file}`examples/petsc/bps.c` (formerly `bpsdmplex.c`): performance improvements 173 and transparent CUDA support. 174 - {ref}`example-petsc-bps-sphere` ({file}`examples/petsc/bpssphere.c`): 175 generalizations of all CEED BPs to the surface of the sphere; demonstrates geometric 176 dimension different from topological dimension. 177 178- {ref}`example-petsc-multigrid` ({file}`examples/petsc/multigrid.c`): new p-multigrid 179 solver with algebraic multigrid coarse solve. 180 181- {ref}`example-petsc-navier-stokes` ({file}`examples/fluids/navierstokes.c`; formerly 182 `examples/navier-stokes`): unstructured grid support (using PETSc's `DMPlex`), 183 implicit time integration, SU/SUPG stabilization, free-slip boundary conditions, and 184 quasi-2D computational domain support. 185 186- {ref}`example-petsc-elasticity` ({file}`examples/solids/elasticity.c`): new solver for 187 linear elasticity, small-strain hyperelasticity, and globalized finite-strain 188 hyperelasticity using p-multigrid with algebraic multigrid coarse solve. 189 190(v0-5)= 191 192## v0.5 (Sep 18, 2019) 193 194For this release, several improvements were made. Two new CUDA backends were added to 195the family of backends, of which, the new `cuda-gen` backend achieves state-of-the-art 196performance using single-source {ref}`CeedQFunction`. From this release, users 197can define Q-Functions in a single source code independently of the targeted backend 198with the aid of a new macro `CEED QFUNCTION` to support JIT (Just-In-Time) and CPU 199compilation of the user provided {ref}`CeedQFunction` code. To allow a unified 200declaration, the {ref}`CeedQFunction` API has undergone a slight change: 201the `QFunctionField` parameter `ncomp` has been changed to `size`. This change 202requires setting the previous value of `ncomp` to `ncomp*dim` when adding a 203`QFunctionField` with eval mode `CEED EVAL GRAD`. 204 205Additionally, new CPU backends 206were included in this release, such as the `/cpu/self/opt/*` backends (which are 207written in pure C and use partial **E-vectors** to improve performance) and the 208`/cpu/self/ref/memcheck` backend (which relies upon the 209[Valgrind](http://valgrind.org/) Memcheck tool to help verify that user 210{ref}`CeedQFunction` have no undefined values). 211This release also included various performance improvements, bug fixes, new examples, 212and improved tests. Among these improvements, vectorized instructions for 213{ref}`CeedQFunction` code compiled for CPU were enhanced by using `CeedPragmaSIMD` 214instead of `CeedPragmaOMP`, implementation of a {ref}`CeedQFunction` gallery and 215identity Q-Functions were introduced, and the PETSc benchmark problems were expanded 216to include unstructured meshes handling were. For this expansion, the prior version of 217the PETSc BPs, which only included data associated with structured geometries, were 218renamed `bpsraw`, and the new version of the BPs, which can handle data associated 219with any unstructured geometry, were called `bps`. Additionally, other benchmark 220problems, namely BP2 and BP4 (the vector-valued versions of BP1 and BP3, respectively), 221and BP5 and BP6 (the collocated versions---for which the quadrature points are the same 222as the Gauss Lobatto nodes---of BP3 and BP4 respectively) were added to the PETSc 223examples. Furthermoew, another standalone libCEED example, called `ex2`, which 224computes the surface area of a given mesh was added to this release. 225 226Backends available in this release: 227 228| CEED resource (`-ceed`) | Backend | 229|--------------------------|-----------------------------------------------------| 230| `/cpu/self/ref/serial` | Serial reference implementation | 231| `/cpu/self/ref/blocked` | Blocked reference implementation | 232| `/cpu/self/ref/memcheck` | Memcheck backend, undefined value checks | 233| `/cpu/self/opt/serial` | Serial optimized C implementation | 234| `/cpu/self/opt/blocked` | Blocked optimized C implementation | 235| `/cpu/self/avx/serial` | Serial AVX implementation | 236| `/cpu/self/avx/blocked` | Blocked AVX implementation | 237| `/cpu/self/xsmm/serial` | Serial LIBXSMM implementation | 238| `/cpu/self/xsmm/blocked` | Blocked LIBXSMM implementation | 239| `/cpu/occa` | Serial OCCA kernels | 240| `/gpu/occa` | CUDA OCCA kernels | 241| `/omp/occa` | OpenMP OCCA kernels | 242| `/ocl/occa` | OpenCL OCCA kernels | 243| `/gpu/cuda/ref` | Reference pure CUDA kernels | 244| `/gpu/cuda/reg` | Pure CUDA kernels using one thread per element | 245| `/gpu/cuda/shared` | Optimized pure CUDA kernels using shared memory | 246| `/gpu/cuda/gen` | Optimized pure CUDA kernels using code generation | 247| `/gpu/magma` | CUDA MAGMA kernels | 248 249Examples available in this release: 250 251:::{list-table} 252:header-rows: 1 253:widths: auto 254* - User code 255 - Example 256* - `ceed` 257 - * ex1 (volume) 258 * ex2 (surface) 259* - `mfem` 260 - * BP1 (scalar mass operator) 261 * BP3 (scalar Laplace operator) 262* - `petsc` 263 - * BP1 (scalar mass operator) 264 * BP2 (vector mass operator) 265 * BP3 (scalar Laplace operator) 266 * BP4 (vector Laplace operator) 267 * BP5 (collocated scalar Laplace operator) 268 * BP6 (collocated vector Laplace operator) 269 * Navier-Stokes 270* - `nek5000` 271 - * BP1 (scalar mass operator) 272 * BP3 (scalar Laplace operator) 273::: 274 275(v0-4)= 276 277## v0.4 (Apr 1, 2019) 278 279libCEED v0.4 was made again publicly available in the second full CEED software 280distribution, release CEED 2.0. This release contained notable features, such as 281four new CPU backends, two new GPU backends, CPU backend optimizations, initial 282support for operator composition, performance benchmarking, and a Navier-Stokes demo. 283The new CPU backends in this release came in two families. The `/cpu/self/*/serial` 284backends process one element at a time and are intended for meshes with a smaller number 285of high order elements. The `/cpu/self/*/blocked` backends process blocked batches of 286eight interlaced elements and are intended for meshes with higher numbers of elements. 287The `/cpu/self/avx/*` backends rely upon AVX instructions to provide vectorized CPU 288performance. The `/cpu/self/xsmm/*` backends rely upon the 289[LIBXSMM](http://github.com/hfp/libxsmm) package to provide vectorized CPU 290performance. The `/gpu/cuda/*` backends provide GPU performance strictly using CUDA. 291The `/gpu/cuda/ref` backend is a reference CUDA backend, providing reasonable 292performance for most problem configurations. The `/gpu/cuda/reg` backend uses a simple 293parallelization approach, where each thread treats a finite element. Using just in time 294compilation, provided by nvrtc (NVidia Runtime Compiler), and runtime parameters, this 295backend unroll loops and map memory address to registers. The `/gpu/cuda/reg` backend 296achieve good peak performance for 1D, 2D, and low order 3D problems, but performance 297deteriorates very quickly when threads run out of registers. 298 299A new explicit time-stepping Navier-Stokes solver was added to the family of libCEED 300examples in the `examples/petsc` directory (see {ref}`example-petsc-navier-stokes`). 301This example solves the time-dependent Navier-Stokes equations of compressible gas 302dynamics in a static Eulerian three-dimensional frame, using structured high-order 303finite/spectral element spatial discretizations and explicit high-order time-stepping 304(available in PETSc). Moreover, the Navier-Stokes example was developed using PETSc, 305so that the pointwise physics (defined at quadrature points) is separated from the 306parallelization and meshing concerns. 307 308Backends available in this release: 309 310| CEED resource (`-ceed`) | Backend | 311|--------------------------|-----------------------------------------------------| 312| `/cpu/self/ref/serial` | Serial reference implementation | 313| `/cpu/self/ref/blocked` | Blocked reference implementation | 314| `/cpu/self/tmpl` | Backend template, defaults to `/cpu/self/blocked` | 315| `/cpu/self/avx/serial` | Serial AVX implementation | 316| `/cpu/self/avx/blocked` | Blocked AVX implementation | 317| `/cpu/self/xsmm/serial` | Serial LIBXSMM implementation | 318| `/cpu/self/xsmm/blocked` | Blocked LIBXSMM implementation | 319| `/cpu/occa` | Serial OCCA kernels | 320| `/gpu/occa` | CUDA OCCA kernels | 321| `/omp/occa` | OpenMP OCCA kernels | 322| `/ocl/occa` | OpenCL OCCA kernels | 323| `/gpu/cuda/ref` | Reference pure CUDA kernels | 324| `/gpu/cuda/reg` | Pure CUDA kernels using one thread per element | 325| `/gpu/magma` | CUDA MAGMA kernels | 326 327Examples available in this release: 328 329:::{list-table} 330:header-rows: 1 331:widths: auto 332* - User code 333 - Example 334* - `ceed` 335 - * ex1 (volume) 336* - `mfem` 337 - * BP1 (scalar mass operator) 338 * BP3 (scalar Laplace operator) 339* - `petsc` 340 - * BP1 (scalar mass operator) 341 * BP3 (scalar Laplace operator) 342 * Navier-Stokes 343* - `nek5000` 344 - * BP1 (scalar mass operator) 345 * BP3 (scalar Laplace operator) 346::: 347 348(v0-3)= 349 350## v0.3 (Sep 30, 2018) 351 352Notable features in this release include active/passive field interface, support for 353non-tensor bases, backend optimization, and improved Fortran interface. This release 354also focused on providing improved continuous integration, and many new tests with code 355coverage reports of about 90%. This release also provided a significant change to the 356public interface: a {ref}`CeedQFunction` can take any number of named input and output 357arguments while {ref}`CeedOperator` connects them to the actual data, which may be 358supplied explicitly to `CeedOperatorApply()` (active) or separately via 359`CeedOperatorSetField()` (passive). This interface change enables reusable libraries 360of CeedQFunctions and composition of block solvers constructed using 361{ref}`CeedOperator`. A concept of blocked restriction was added to this release and 362used in an optimized CPU backend. Although this is typically not visible to the user, 363it enables effective use of arbitrary-length SIMD while maintaining cache locality. 364This CPU backend also implements an algebraic factorization of tensor product gradients 365to perform fewer operations than standard application of interpolation and 366differentiation from nodes to quadrature points. This algebraic formulation 367automatically supports non-polynomial and non-interpolatory bases, thus is more general 368than the more common derivation in terms of Lagrange polynomials on the quadrature points. 369 370Backends available in this release: 371 372| CEED resource (`-ceed`) | Backend | 373|-------------------------|-----------------------------------------------------| 374| `/cpu/self/blocked` | Blocked reference implementation | 375| `/cpu/self/ref` | Serial reference implementation | 376| `/cpu/self/tmpl` | Backend template, defaults to `/cpu/self/blocked` | 377| `/cpu/occa` | Serial OCCA kernels | 378| `/gpu/occa` | CUDA OCCA kernels | 379| `/omp/occa` | OpenMP OCCA kernels | 380| `/ocl/occa` | OpenCL OCCA kernels | 381| `/gpu/magma` | CUDA MAGMA kernels | 382 383Examples available in this release: 384 385:::{list-table} 386:header-rows: 1 387:widths: auto 388* - User code 389 - Example 390* - `ceed` 391 - * ex1 (volume) 392* - `mfem` 393 - * BP1 (scalar mass operator) 394 * BP3 (scalar Laplace operator) 395* - `petsc` 396 - * BP1 (scalar mass operator) 397 * BP3 (scalar Laplace operator) 398* - `nek5000` 399 - * BP1 (scalar mass operator) 400 * BP3 (scalar Laplace operator) 401::: 402 403(v0-21)= 404 405## v0.21 (Sep 30, 2018) 406 407A MAGMA backend (which relies upon the 408[MAGMA](https://bitbucket.org/icl/magma) package) was integrated in libCEED for this 409release. This initial integration set up the framework of using MAGMA and provided the 410libCEED functionality through MAGMA kernels as one of libCEED’s computational backends. 411As any other backend, the MAGMA backend provides extended basic data structures for 412{ref}`CeedVector`, {ref}`CeedElemRestriction`, and {ref}`CeedOperator`, and implements 413the fundamental CEED building blocks to work with the new data structures. 414In general, the MAGMA-specific data structures keep the libCEED pointers to CPU data 415but also add corresponding device (e.g., GPU) pointers to the data. Coherency is handled 416internally, and thus seamlessly to the user, through the functions/methods that are 417provided to support them. 418 419Backends available in this release: 420 421| CEED resource (`-ceed`) | Backend | 422|-------------------------|---------------------------------| 423| `/cpu/self` | Serial reference implementation | 424| `/cpu/occa` | Serial OCCA kernels | 425| `/gpu/occa` | CUDA OCCA kernels | 426| `/omp/occa` | OpenMP OCCA kernels | 427| `/ocl/occa` | OpenCL OCCA kernels | 428| `/gpu/magma` | CUDA MAGMA kernels | 429 430Examples available in this release: 431 432:::{list-table} 433:header-rows: 1 434:widths: auto 435* - User code 436 - Example 437* - `ceed` 438 - * ex1 (volume) 439* - `mfem` 440 - * BP1 (scalar mass operator) 441 * BP3 (scalar Laplace operator) 442* - `petsc` 443 - * BP1 (scalar mass operator) 444* - `nek5000` 445 - * BP1 (scalar mass operator) 446::: 447 448(v0-2)= 449 450## v0.2 (Mar 30, 2018) 451 452libCEED was made publicly available the first full CEED software distribution, release 453CEED 1.0. The distribution was made available using the Spack package manager to provide 454a common, easy-to-use build environment, where the user can build the CEED distribution 455with all dependencies. This release included a new Fortran interface for the library. 456This release also contained major improvements in the OCCA backend (including a new 457`/ocl/occa` backend) and new examples. The standalone libCEED example was modified to 458compute the volume volume of a given mesh (in 1D, 2D, or 3D) and placed in an 459`examples/ceed` subfolder. A new `mfem` example to perform BP3 (with the application 460of the Laplace operator) was also added to this release. 461 462Backends available in this release: 463 464| CEED resource (`-ceed`) | Backend | 465|-------------------------|---------------------------------| 466| `/cpu/self` | Serial reference implementation | 467| `/cpu/occa` | Serial OCCA kernels | 468| `/gpu/occa` | CUDA OCCA kernels | 469| `/omp/occa` | OpenMP OCCA kernels | 470| `/ocl/occa` | OpenCL OCCA kernels | 471 472Examples available in this release: 473 474:::{list-table} 475:header-rows: 1 476:widths: auto 477* - User code 478 - Example 479* - `ceed` 480 - * ex1 (volume) 481* - `mfem` 482 - * BP1 (scalar mass operator) 483 * BP3 (scalar Laplace operator) 484* - `petsc` 485 - * BP1 (scalar mass operator) 486* - `nek5000` 487 - * BP1 (scalar mass operator) 488::: 489 490(v0-1)= 491 492## v0.1 (Jan 3, 2018) 493 494Initial low-level API of the CEED project. The low-level API provides a set of Finite 495Elements kernels and components for writing new low-level kernels. Examples include: 496vector and sparse linear algebra, element matrix assembly over a batch of elements, 497partial assembly and action for efficient high-order operators like mass, diffusion, 498advection, etc. The main goal of the low-level API is to establish the basis for the 499high-level API. Also, identifying such low-level kernels and providing a reference 500implementation for them serves as the basis for specialized backend implementations. 501This release contained several backends: `/cpu/self`, and backends which rely upon the 502[OCCA](http://github.com/libocca/occa) package, such as `/cpu/occa`, 503`/gpu/occa`, and `/omp/occa`. 504It also included several examples, in the `examples` folder: 505A standalone code that shows the usage of libCEED (with no external 506dependencies) to apply the Laplace operator, `ex1`; an `mfem` example to perform BP1 507(with the application of the mass operator); and a `petsc` example to perform BP1 508(with the application of the mass operator). 509 510Backends available in this release: 511 512| CEED resource (`-ceed`) | Backend | 513|-------------------------|---------------------------------| 514| `/cpu/self` | Serial reference implementation | 515| `/cpu/occa` | Serial OCCA kernels | 516| `/gpu/occa` | CUDA OCCA kernels | 517| `/omp/occa` | OpenMP OCCA kernels | 518 519Examples available in this release: 520 521| User code | Example | 522|-----------------------|-----------------------------------| 523| `ceed` | ex1 (scalar Laplace operator) | 524| `mfem` | BP1 (scalar mass operator) | 525| `petsc` | BP1 (scalar mass operator) | 526``` 527