sphinx/source/libCEEDapi.md

3 This page provides a brief description of the theoretical foundations and the practical implementat…
5 (theoretical-framework)=
9 …eak form of a Partial Differential Equation (PDE) is evaluated on a subdomain $\Omega_e$ (element)…
10 In particular, when high-order finite elements or spectral elements are used, the resulting sparse …
11 libCEED provides an interface for matrix-free operator description that enables efficient evaluatio…
15 We first define the $L^2$ inner product between real-valued functions
23 We want to find $u$ in a suitable space $V_D$, such that
30 …$ represents all terms in {eq}`residual` which multiply the (possibly vector-valued) test function…
31 For an n-component problems in $d$ dimensions, $\bm f_0 \in \mathbb{R}^n$ and $\bm f_1 \in \mathbb{…
34 …m f_1$ represents contraction over both fields and spatial dimensions while a single dot represent…
40 If equation {eq}`residual` only presents a term of the type $\bm f_0$, the {ref}`CeedQFunction` wil…
41 If equation {eq}`residual` also presents a term of the type $\bm f_1$, then the {ref}`CeedQFunction…
46 … formulations of partial differential equations that involve integration over a computational mesh.
47 …litting them as a sum over the mesh elements, mapping each element to a simple *reference* element…
51 …inuous ($H^1$) elements, where we use the notions **T-vector**, **L-vector**, **E-vector** and **Q…
55 - Subdomain restriction $\bm{P}$
56 - Element restriction $\bm{\mathcal{E}}$
57 - Basis (Dofs-to-Qpts) evaluator $\bm{B}$
58 - Operator at quadrature points $\bm{D}$
62 (fig-operator-decomp)=
64 :::{figure} ../../img/libCEED-decomposition.svg
68 …ictions $\bm{P}$ and $\bm{\mathcal{E}}$ will involve not just extracting sub-vectors, but evaluati…
75 - True degrees of freedom/unknowns, **T-vector**:
77   > - each unknown $i$ has exactly one copy, on exactly one processor, $rank(i)$
78   > - this is a non-overlapping vector decomposition
79   > - usually includes any essential (fixed) DoFs.
81   > ```{image} ../../img/T-vector.svg
84 - Local (w.r.t. processors) degrees of freedom/unknowns, **L-vector**:
86   > - each unknown $i$ has exactly one copy on each processor that owns an element containing $i$
87 …> - this is an overlapping vector decomposition with overlaps only across different processors---t…
88 …> - the shared DoFs/unknowns are the overlapping DoFs, i.e. the ones that have more than one copy,…
90   > ```{image} ../../img/L-vector.svg
93 - Per element decomposition, **E-vector**:
95   > - each unknown $i$ has as many copies as the number of elements that contain $i$
96   > - usually, the copies of the unknowns are grouped by the element they belong to.
98   > ```{image} ../../img/E-vector.svg
101 - In the case of AMR with hanging nodes (giving rise to hanging DoFs):
103   > - the **L-vector** is enhanced with the hanging/dependent DoFs
104 …> - the additional hanging/dependent DoFs are duplicated when they are shared by multiple processo…
105 …> - this way, an **E-vector** can be derived from an **L-vector** without any communications and w…
106 …- in other words, an entry in an **E-vector** is obtained by copying an entry from the correspondi…
108   > ```{image} ../../img/L-vector-AMR.svg
111 - In the case of variable order spaces:
113 …> - the dependent DoFs (usually on the higher-order side of a face/edge) can be treated just like …
115 - Quadrature point vector, **Q-vector**:
117 …> - this is similar to **E-vector** where instead of DoFs, the vector represents values at quadrat…
119 - In many cases it is useful to distinguish two types of vectors:
121   > - **X-vector**, or **primal X-vector**, and **X'-vector**, or **dual X-vector**
122   > - here X can be any of the T, L, E, or Q categories
123   > - for example, the mass matrix operator maps a **T-vector** to a **T'-vector**
124   > - the solutions vector is a **T-vector**, and the RHS vector is a **T'-vector**
125 …> - using the parallel prolongation operator, one can map the solution **T-vector** to a solution …
129 - Full true-DoF parallel assembly, **TA**, or **A**:
131   > - ParCSR or similar format
132 …> - the T in TA indicates that the data format represents an operator from a **T-vector** to a **T…
134 - Full local assembly, **LA**:
136   > - CSR matrix on each rank
137 …> - the parallel prolongation operator, $\bm{P}$, (and its transpose) should use optimized matrix-…
138   > - note that $\bm{P}$ is the operator mapping T-vectors to L-vectors.
140 - Element matrix assembly, **EA**:
142   > - each element matrix is stored as a dense matrix
143   > - optimized element and parallel prolongation operators
144 …> - note that the element prolongation operator is the mapping from an **L-vector** to an **E-vect…
146 - Quadrature-point/partial assembly, **QA** or **PA**:
148   > - precompute and store $w\det(J)$ at all quadrature points in all mesh elements
149   > - the stored data can be viewed as a **Q-vector**.
151 - Unassembled option,  **UA** or **U**:
153   > - no assembly step
154 …- the action uses directly the mesh node coordinates, and assumes specific form of the coefficient…
158 …A}$ is just a series of variational restrictions with $\bm{B}$, $\bm{\mathcal{E}}$ and $\bm{P}$, s…
159 For example, one can compute and store a global matrix on **T-vector** level.
160 …ute and store only the subdomain (**L-vector**) or element (**E-vector**) matrices and perform the…
161 …-order discretizations, they are not a good fit for high-order methods due to the amount of FLOPs …
163 …r portions of it) and evaluate the actions of $\bm{P}$, $\bm{\mathcal{E}}$ and $\bm{B}$ on-the-fly.
164 …-product structure of the degrees of freedom and quadrature points on *quad* and *hex* elements to…
166 …l amount of memory transfers (with respect to the polynomial order) and near-optimal FLOPs for ope…
167 …and an operator *apply* (evaluation) phase that computes the action of $\bm{A}$ on an input vector.
168 When desired, the setup phase may be done as a side-effect of evaluating a different operator, such…
173 …their ranges, so $\bm{P}$, $\bm{\mathcal{E}}$ and $\bm{B}$ allow us to "zoom-in" to subdomain, ele…
175 Thus, a natural mapping of $\bm{A}$ on a parallel computer is to split the **T-vector** over MPI ra…
177 …oice of the finite element space/basis ($\bm{B}$) and the geometry and point-wise physics $\bm{D}$.
178 …-- parallel (multi-device) linear algebra for $\bm{P}$, sparse (on-device) linear algebra for $\bm…
180 Currently in libCEED, it is assumed that the host application manages the global **T-vectors** and …
181 Our API is thus focused on the **L-vector** level, where the logical devices, which in the library …
184 For example, on a node with 2 CPU sockets and 4 GPUs, one may decide to use 6 MPI ranks (each using…
187 The interface is non-blocking for all operations involving more than O(1) data, allowing operations…
191 …implementations and coordinates their action to the original operator on **L-vector** level (i.e. …
194 (fig-operator-schematic)=
201 The frontend description is general enough to support a wide variety of finite element algorithms, …
202 The separation of the front- and backends enables applications to easily switch/try different backe…
203 It also enables backend developers to impact many applications from a single implementation.
205 Our long-term vision is to include a variety of backend implementations in libCEED, ranging from re…
206 A simple reference backend implementation is provided in the file
207 [ceed-ref.c](https://github.com/CEED/libCEED/blob/main/backends/ref/ceed-ref.c).
212 - **L-**, **E-** and **Q-vector** are represented as variables of type {ref}`CeedVector`.
213   (A backend may choose to operate incrementally without forming explicit **E-** or **Q-vectors**.)
214 - $\bm{\mathcal{E}}$ is represented as variable of type {ref}`CeedElemRestriction`.
215 - $\bm{B}$ is represented as variable of type {ref}`CeedBasis`.
216 - the action of $\bm{D}$ is represented as variable of type {ref}`CeedQFunction`.
217 - the overall operator $\bm{\mathcal{E}}^T \bm{B}^T \bm{D} \bm{B} \bm{\mathcal{E}}$ is represented …
219 …ation of the action of a simple 1D mass matrix (cf. [tests/t500-operator.c](https://github.com/CEE…
221 ```{literalinclude} ../../../tests/t500-operator.c
226 … weights with the Jacobian information for the mesh transformation, becomes a passive input to the…
229 (fig-operator-schematic-mass)=
232 …$\bm{D}$, and input/output vectors corresponding to the libCEED operators in the t500-operator test
237 ```{literalinclude} ../../../tests/t500-operator.c
238 :end-at: CeedInit
240 :start-at: CeedInit
243 creates a logical device `ceed` on the specified *resource*, which could also be a coprocessor such…
245 The resource is used to locate a suitable backend which will have discretion over the implementatio…
247 The `setup` routine above computes and stores $\bm{D}$, in this case a scalar value in each quadrat…
250 ```{literalinclude} ../../../tests/t500-operator.c
251 :end-before: //! [QFunction Create]
253 :start-after: //! [QFunction Create]
256 A {ref}`CeedQFunction` performs independent operations at each quadrature point and the interface i…
261 In addition to the function pointers (`setup` and `mass`), {ref}`CeedQFunction` constructors take a…
262 This is used by backends that support Just-In-Time (JIT) compilation (i.e., CUDA and HIP) to compil…
264 …tible arguments for {code}`math` library functions is required, and variable-length array (VLA) sy…
268 The size of the field is provided by a combination of the number of components the effect of any ba…
270 The evaluation mode (see {ref}`CeedBasis-Typedefs and Enumerations`) `CEED_EVAL_INTERP` for both in…
276 where $v$ are test functions (see the {ref}`theoretical-framework`).
285 For fields with derivatives, such as with the basis evaluation mode (see {ref}`CeedBasis-Typedefs a…
286 A 3-dimensional gradient on four components would therefore mean the field has a size of 12\.
290 Both basis operators use the same integration rule, which is Gauss-Legendre with 8 points (the `Q` …
292 ```{literalinclude} ../../../tests/t500-operator.c
293 :end-before: //! [Basis Create]
295 :start-after: //! [Basis Create]
304 ```{literalinclude} ../../../tests/t500-operator.c
305 :end-before: //! [ElemRestr Create]
307 :start-after: //! [ElemRestr Create]
310 ```{literalinclude} ../../../tests/t500-operator.c
311 :end-before: //! [ElemRestrU Create]
313 :start-after: //! [ElemRestrU Create]
316 If the user has arrays available on a device, they can be provided using `CEED_MEM_DEVICE`.
317 This technique is used to provide no-copy interfaces in all contexts that involve problem-sized dat…
319 …and for applications such as Nek5000 that only explicitly store **E-vectors** (inter-element conti…
321 …-conforming elements: applying the node constraints via $\bm P$ so that the **L-vector** can be pr…
322 The former can be done with the existing interface while the latter will require a generalization t…
324 These operations, $\bm{\mathcal{E}}$, $\bm{B}$, and $\bm{D}$, are combined with a {ref}`CeedOperato…
326 separately with a matching field name, basis ($\bm{B}$), element restriction ($\bm{\mathcal{E}}$), …
328 Otherwise the input/output will be read from/written to the specified **L-vector**.
330 With partial assembly, we first perform a setup stage where $\bm{D}$ is evaluated and stored.
334 ```{literalinclude} ../../../tests/t500-operator.c
335 :end-before: //! [Setup Create]
337 :start-after: //! [Setup Create]
340 ```{literalinclude} ../../../tests/t500-operator.c
341 :end-before: //! [Setup Set]
343 :start-after: //! [Setup Set]
346 ```{literalinclude} ../../../tests/t500-operator.c
347 :end-before: //! [Setup Apply]
349 :start-after: //! [Setup Apply]
352 …by operator `op_mass` and its {c:func}`CeedOperatorApply()` to the input **L-vector** `U` with out…
354 ```{literalinclude} ../../../tests/t500-operator.c
355 :end-before: //! [Operator Create]
357 :start-after: //! [Operator Create]
360 ```{literalinclude} ../../../tests/t500-operator.c
361 :end-before: //! [Operator Set]
363 :start-after: //! [Operator Set]
366 ```{literalinclude} ../../../tests/t500-operator.c
367 :end-before: //! [Operator Apply]
369 :start-after: //! [Operator Apply]
372 A number of function calls in the interface, such as {c:func}`CeedOperatorApply()`, are intended to…
374 For a true asynchronous call, one needs to provide the address of a user defined variable.
375 Such a variable can be used later to explicitly wait for the completion of the operation.
379 LibCEED provides a gallery of built-in {ref}`CeedQFunction`s in the {file}`gallery/` directory.
381 …a {ref}`CeedQFunction` via the gallery of available QFunctions, consider the selection of the {ref…
383 ```{literalinclude} ../../../tests/t410-qfunction.c
390 …nds that are packaged with the library and packaged separately (possibly as a binary containing pr…
393 ```{literalinclude} ../../../backends/ref/ceed-ref.c
394 :end-before: //! [Register]
396 :start-after: //! [Register]
399 typically in a library initializer or "constructor" that runs automatically.
406 …u're interested in packaging backends externally, and will work with you on a practical stability …