xref: /libCEED/doc/sphinx/source/libCEEDapi.md (revision af62e76e6931bb5c038cf477f5bd4dee4bc40f7c)
1bcb2dfaeSJed Brown# Interface Concepts
2bcb2dfaeSJed Brown
3bcb2dfaeSJed BrownThis page provides a brief description of the theoretical foundations and the
4bcb2dfaeSJed Brownpractical implementation of the libCEED library.
5bcb2dfaeSJed Brown
6bcb2dfaeSJed Brown(theoretical-framework)=
7bcb2dfaeSJed Brown
8bcb2dfaeSJed Brown## Theoretical Framework
9bcb2dfaeSJed Brown
10bcb2dfaeSJed BrownIn finite element formulations, the weak form of a Partial Differential Equation
11bcb2dfaeSJed Brown(PDE) is evaluated on a subdomain $\Omega_e$ (element) and the local results
12bcb2dfaeSJed Brownare composed into a larger system of equations that models the entire problem on
13bcb2dfaeSJed Brownthe global domain $\Omega$. In particular, when high-order finite elements or
14bcb2dfaeSJed Brownspectral elements are used, the resulting sparse matrix representation of the global
15bcb2dfaeSJed Brownoperator is computationally expensive, with respect to both the memory transfer and
16bcb2dfaeSJed Brownfloating point operations needed for its evaluation. libCEED provides an interface
17bcb2dfaeSJed Brownfor matrix-free operator description that enables efficient evaluation on a variety
18bcb2dfaeSJed Brownof computational device types (selectable at run time). We present here the notation
19bcb2dfaeSJed Brownand the mathematical formulation adopted in libCEED.
20bcb2dfaeSJed Brown
21bcb2dfaeSJed BrownWe start by considering the discrete residual $F(u)=0$ formulation
22bcb2dfaeSJed Brownin weak form. We first define the $L^2$ inner product between real-valued functions
23bcb2dfaeSJed Brown
24bcb2dfaeSJed Brown$$
25bcb2dfaeSJed Brown\langle v, u \rangle = \int_\Omega v u d \bm{x},
26bcb2dfaeSJed Brown$$
27bcb2dfaeSJed Brown
28bcb2dfaeSJed Brownwhere $\bm{x} \in \mathbb{R}^d \supset \Omega$.
29bcb2dfaeSJed Brown
30bcb2dfaeSJed BrownWe want to find $u$ in a suitable space $V_D$,
31bcb2dfaeSJed Brownsuch that
32bcb2dfaeSJed Brown
33bcb2dfaeSJed Brown$$
34bcb2dfaeSJed Brown\langle  \bm v,  \bm f(u) \rangle = \int_\Omega  \bm v \cdot  \bm f_0 (u, \nabla u) + \nabla \bm v :  \bm f_1 (u, \nabla u) = 0
35bcb2dfaeSJed Brown$$ (residual)
36bcb2dfaeSJed Brown
37bcb2dfaeSJed Brownfor all $\bm v$ in the corresponding homogeneous space $V_0$, where $\bm f_0$
38bcb2dfaeSJed Brownand $\bm f_1$ contain all possible sources in the problem. We notice here that
398791656fSJed Brown$\bm f_0$ represents all terms in {eq}`residual` which multiply the (possibly vector-valued) test
40bcb2dfaeSJed Brownfunction $\bm v$ and $\bm f_1$ all terms which multiply its gradient $\nabla \bm v$.
41bcb2dfaeSJed BrownFor an n-component problems in $d$ dimensions, $\bm f_0 \in \mathbb{R}^n$ and
42bcb2dfaeSJed Brown$\bm f_1 \in \mathbb{R}^{nd}$.
43bcb2dfaeSJed Brown
44bcb2dfaeSJed Brown:::{note}
45bcb2dfaeSJed BrownThe notation $\nabla \bm v \!:\! \bm f_1$ represents contraction over both
46bcb2dfaeSJed Brownfields and spatial dimensions while a single dot represents contraction in just one,
47bcb2dfaeSJed Brownwhich should be clear from context, e.g., $\bm v \cdot \bm f_0$ contracts only over
48bcb2dfaeSJed Brownfields.
49bcb2dfaeSJed Brown:::
50bcb2dfaeSJed Brown
51bcb2dfaeSJed Brown:::{note}
52bcb2dfaeSJed BrownIn the code, the function that represents the weak form at quadrature
53bcb2dfaeSJed Brownpoints is called the {ref}`CeedQFunction`. In the {ref}`Examples` provided with the
54bcb2dfaeSJed Brownlibrary (in the {file}`examples/` directory), we store the term $\bm f_0$ directly
55bcb2dfaeSJed Browninto `v`, and the term $\bm f_1$ directly into `dv` (which stands for
568791656fSJed Brown$\nabla \bm v$). If equation {eq}`residual` only presents a term of the
57bcb2dfaeSJed Browntype $\bm f_0$, the {ref}`CeedQFunction` will only have one output argument,
588791656fSJed Brownnamely `v`. If equation {eq}`residual` also presents a term of the type
59bcb2dfaeSJed Brown$\bm f_1$, then the {ref}`CeedQFunction` will have two output arguments, namely,
60bcb2dfaeSJed Brown`v` and `dv`.
61bcb2dfaeSJed Brown:::
62bcb2dfaeSJed Brown
63bcb2dfaeSJed Brown## Finite Element Operator Decomposition
64bcb2dfaeSJed Brown
65bcb2dfaeSJed BrownFinite element operators are typically defined through weak formulations of
66bcb2dfaeSJed Brownpartial differential equations that involve integration over a computational
67bcb2dfaeSJed Brownmesh. The required integrals are computed by splitting them as a sum over the
68bcb2dfaeSJed Brownmesh elements, mapping each element to a simple *reference* element (e.g. the
69bcb2dfaeSJed Brownunit square) and applying a quadrature rule in reference space.
70bcb2dfaeSJed Brown
71bcb2dfaeSJed BrownThis sequence of operations highlights an inherent hierarchical structure
72bcb2dfaeSJed Brownpresent in all finite element operators where the evaluation starts on *global
73bcb2dfaeSJed Brown(trial) degrees of freedom (dofs) or nodes on the whole mesh*, restricts to
74bcb2dfaeSJed Brown*dofs on subdomains* (groups of elements), then moves to independent
75bcb2dfaeSJed Brown*dofs on each element*, transitions to independent *quadrature points* in
76bcb2dfaeSJed Brownreference space, performs the integration, and then goes back in reverse order
77bcb2dfaeSJed Brownto global (test) degrees of freedom on the whole mesh.
78bcb2dfaeSJed Brown
79bcb2dfaeSJed BrownThis is illustrated below for the simple case of symmetric linear operator on
80bcb2dfaeSJed Brownthird order ($Q_3$) scalar continuous ($H^1$) elements, where we use
81bcb2dfaeSJed Brownthe notions **T-vector**, **L-vector**, **E-vector** and **Q-vector** to represent
82bcb2dfaeSJed Brownthe sets corresponding to the (true) degrees of freedom on the global mesh, the split
83bcb2dfaeSJed Brownlocal degrees of freedom on the subdomains, the split degrees of freedom on the
84bcb2dfaeSJed Brownmesh elements, and the values at quadrature points, respectively.
85bcb2dfaeSJed Brown
86bcb2dfaeSJed BrownWe refer to the operators that connect the different types of vectors as:
87bcb2dfaeSJed Brown
88bcb2dfaeSJed Brown- Subdomain restriction $\bm{P}$
890fe925dfSnbeams- Element restriction $\bm{\mathcal{E}}$
90bcb2dfaeSJed Brown- Basis (Dofs-to-Qpts) evaluator $\bm{B}$
91bcb2dfaeSJed Brown- Operator at quadrature points $\bm{D}$
92bcb2dfaeSJed Brown
93bcb2dfaeSJed BrownMore generally, when the test and trial space differ, they get their own
940fe925dfSnbeamsversions of $\bm{P}$, $\bm{\mathcal{E}}$ and $\bm{B}$.
95bcb2dfaeSJed Brown
96bcb2dfaeSJed Brown(fig-operator-decomp)=
97bcb2dfaeSJed Brown
980fe925dfSnbeams:::{figure} ../../img/libCEED.svg
99bcb2dfaeSJed BrownOperator Decomposition
100bcb2dfaeSJed Brown:::
101bcb2dfaeSJed Brown
102bcb2dfaeSJed BrownNote that in the case of adaptive mesh refinement (AMR), the restrictions
1030fe925dfSnbeams$\bm{P}$ and $\bm{\mathcal{E}}$ will involve not just extracting sub-vectors,
104bcb2dfaeSJed Brownbut evaluating values at constrained degrees of freedom through the AMR interpolation.
105bcb2dfaeSJed BrownThere can also be several levels of subdomains ($\bm P_1$, $\bm P_2$,
106bcb2dfaeSJed Brownetc.), and it may be convenient to split $\bm{D}$ as the product of several
107bcb2dfaeSJed Brownoperators ($\bm D_1$, $\bm D_2$, etc.).
108bcb2dfaeSJed Brown
109bcb2dfaeSJed Brown### Terminology and Notation
110bcb2dfaeSJed Brown
111bcb2dfaeSJed BrownVector representation/storage categories:
112bcb2dfaeSJed Brown
113bcb2dfaeSJed Brown- True degrees of freedom/unknowns, **T-vector**:
114bcb2dfaeSJed Brown
115bcb2dfaeSJed Brown  > - each unknown $i$ has exactly one copy, on exactly one processor, $rank(i)$
116bcb2dfaeSJed Brown  > - this is a non-overlapping vector decomposition
117bcb2dfaeSJed Brown  > - usually includes any essential (fixed) dofs.
118bcb2dfaeSJed Brown  >
119bcb2dfaeSJed Brown  > ```{image} ../../img/T-vector.svg
120bcb2dfaeSJed Brown  > ```
121bcb2dfaeSJed Brown
122bcb2dfaeSJed Brown- Local (w.r.t. processors) degrees of freedom/unknowns, **L-vector**:
123bcb2dfaeSJed Brown
124bcb2dfaeSJed Brown  > - each unknown $i$ has exactly one copy on each processor that owns an
125bcb2dfaeSJed Brown  >   element containing $i$
126bcb2dfaeSJed Brown  > - this is an overlapping vector decomposition with overlaps only across
127bcb2dfaeSJed Brown  >   different processors---there is no duplication of unknowns on a single
128bcb2dfaeSJed Brown  >   processor
129bcb2dfaeSJed Brown  > - the shared dofs/unknowns are the overlapping dofs, i.e. the ones that have
130bcb2dfaeSJed Brown  >   more than one copy, on different processors.
131bcb2dfaeSJed Brown  >
132bcb2dfaeSJed Brown  > ```{image} ../../img/L-vector.svg
133bcb2dfaeSJed Brown  > ```
134bcb2dfaeSJed Brown
135bcb2dfaeSJed Brown- Per element decomposition, **E-vector**:
136bcb2dfaeSJed Brown
137bcb2dfaeSJed Brown  > - each unknown $i$ has as many copies as the number of elements that contain
138bcb2dfaeSJed Brown  >   $i$
139bcb2dfaeSJed Brown  > - usually, the copies of the unknowns are grouped by the element they belong
140bcb2dfaeSJed Brown  >   to.
141bcb2dfaeSJed Brown  >
142bcb2dfaeSJed Brown  > ```{image} ../../img/E-vector.svg
143bcb2dfaeSJed Brown  > ```
144bcb2dfaeSJed Brown
145bcb2dfaeSJed Brown- In the case of AMR with hanging nodes (giving rise to hanging dofs):
146bcb2dfaeSJed Brown
147bcb2dfaeSJed Brown  > - the **L-vector** is enhanced with the hanging/dependent dofs
148bcb2dfaeSJed Brown  > - the additional hanging/dependent dofs are duplicated when they are shared
149bcb2dfaeSJed Brown  >   by multiple processors
150bcb2dfaeSJed Brown  > - this way, an **E-vector** can be derived from an **L-vector** without any
151bcb2dfaeSJed Brown  >   communications and without additional computations to derive the dependent
152bcb2dfaeSJed Brown  >   dofs
153bcb2dfaeSJed Brown  > - in other words, an entry in an **E-vector** is obtained by copying an entry
154bcb2dfaeSJed Brown  >   from the corresponding **L-vector**, optionally switching the sign of the
155bcb2dfaeSJed Brown  >   entry (for $H(\mathrm{div})$---and $H(\mathrm{curl})$-conforming spaces).
156bcb2dfaeSJed Brown  >
157bcb2dfaeSJed Brown  > ```{image} ../../img/L-vector-AMR.svg
158bcb2dfaeSJed Brown  > ```
159bcb2dfaeSJed Brown
160bcb2dfaeSJed Brown- In the case of variable order spaces:
161bcb2dfaeSJed Brown
162bcb2dfaeSJed Brown  > - the dependent dofs (usually on the higher-order side of a face/edge) can
163bcb2dfaeSJed Brown  >   be treated just like the hanging/dependent dofs case.
164bcb2dfaeSJed Brown
165bcb2dfaeSJed Brown- Quadrature point vector, **Q-vector**:
166bcb2dfaeSJed Brown
167bcb2dfaeSJed Brown  > - this is similar to **E-vector** where instead of dofs, the vector represents
168bcb2dfaeSJed Brown  >   values at quadrature points, grouped by element.
169bcb2dfaeSJed Brown
170bcb2dfaeSJed Brown- In many cases it is useful to distinguish two types of vectors:
171bcb2dfaeSJed Brown
172bcb2dfaeSJed Brown  > - **X-vector**, or **primal X-vector**, and **X'-vector**, or **dual X-vector**
173bcb2dfaeSJed Brown  > - here X can be any of the T, L, E, or Q categories
174bcb2dfaeSJed Brown  > - for example, the mass matrix operator maps a **T-vector** to a **T'-vector**
175bcb2dfaeSJed Brown  > - the solutions vector is a **T-vector**, and the RHS vector is a **T'-vector**
176bcb2dfaeSJed Brown  > - using the parallel prolongation operator, one can map the solution
177bcb2dfaeSJed Brown  >   **T-vector** to a solution **L-vector**, etc.
178bcb2dfaeSJed Brown
179bcb2dfaeSJed BrownOperator representation/storage/action categories:
180bcb2dfaeSJed Brown
181bcb2dfaeSJed Brown- Full true-dof parallel assembly, **TA**, or **A**:
182bcb2dfaeSJed Brown
183bcb2dfaeSJed Brown  > - ParCSR or similar format
184bcb2dfaeSJed Brown  > - the T in TA indicates that the data format represents an operator from a
185bcb2dfaeSJed Brown  >   **T-vector** to a **T'-vector**.
186bcb2dfaeSJed Brown
187bcb2dfaeSJed Brown- Full local assembly, **LA**:
188bcb2dfaeSJed Brown
189bcb2dfaeSJed Brown  > - CSR matrix on each rank
190bcb2dfaeSJed Brown  > - the parallel prolongation operator, $\bm{P}$, (and its transpose) should use
191bcb2dfaeSJed Brown  >   optimized matrix-free action
192bcb2dfaeSJed Brown  > - note that $\bm{P}$ is the operator mapping T-vectors to L-vectors.
193bcb2dfaeSJed Brown
194bcb2dfaeSJed Brown- Element matrix assembly, **EA**:
195bcb2dfaeSJed Brown
196bcb2dfaeSJed Brown  > - each element matrix is stored as a dense matrix
197bcb2dfaeSJed Brown  > - optimized element and parallel prolongation operators
198bcb2dfaeSJed Brown  > - note that the element prolongation operator is the mapping from an
199bcb2dfaeSJed Brown  >   **L-vector** to an **E-vector**.
200bcb2dfaeSJed Brown
201bcb2dfaeSJed Brown- Quadrature-point/partial assembly, **QA** or **PA**:
202bcb2dfaeSJed Brown
203bcb2dfaeSJed Brown  > - precompute and store $w\det(J)$ at all quadrature points in all mesh elements
204bcb2dfaeSJed Brown  > - the stored data can be viewed as a **Q-vector**.
205bcb2dfaeSJed Brown
206bcb2dfaeSJed Brown- Unassembled option,  **UA** or **U**:
207bcb2dfaeSJed Brown
208bcb2dfaeSJed Brown  > - no assembly step
209bcb2dfaeSJed Brown  > - the action uses directly the mesh node coordinates, and assumes specific
210bcb2dfaeSJed Brown  >   form of the coefficient, e.g. constant, piecewise-constant, or given as a
211bcb2dfaeSJed Brown  >   **Q-vector** (Q-coefficient).
212bcb2dfaeSJed Brown
213bcb2dfaeSJed Brown### Partial Assembly
214bcb2dfaeSJed Brown
215bcb2dfaeSJed BrownSince the global operator $\bm{A}$ is just a series of variational restrictions
2160fe925dfSnbeamswith $\bm{B}$, $\bm{\mathcal{E}}$ and $\bm{P}$, starting from its
217bcb2dfaeSJed Brownpoint-wise kernel $\bm{D}$, a "matvec" with $\bm{A}$ can be
218bcb2dfaeSJed Brownperformed by evaluating and storing some of the innermost variational restriction
219bcb2dfaeSJed Brownmatrices, and applying the rest of the operators "on-the-fly". For example, one can
220bcb2dfaeSJed Browncompute and store a global matrix on **T-vector** level. Alternatively, one can compute
221bcb2dfaeSJed Brownand store only the subdomain (**L-vector**) or element (**E-vector**) matrices and
222bcb2dfaeSJed Brownperform the action of $\bm{A}$ using matvecs with $\bm{P}$ or
2230fe925dfSnbeams$\bm{P}$ and $\bm{\mathcal{E}}$. While these options are natural for
224bcb2dfaeSJed Brownlow-order discretizations, they are not a good fit for high-order methods due to
225bcb2dfaeSJed Brownthe amount of FLOPs needed for their evaluation, as well as the memory transfer
226bcb2dfaeSJed Brownneeded for a matvec.
227bcb2dfaeSJed Brown
228bcb2dfaeSJed BrownOur focus in libCEED, instead, is on **partial assembly**, where we compute and
229bcb2dfaeSJed Brownstore only $\bm{D}$ (or portions of it) and evaluate the actions of
2300fe925dfSnbeams$\bm{P}$, $\bm{\mathcal{E}}$ and $\bm{B}$ on-the-fly.
231bcb2dfaeSJed BrownCritically for performance, we take advantage of the tensor-product structure of the
232bcb2dfaeSJed Browndegrees of freedom and quadrature points on *quad* and *hex* elements to perform the
233bcb2dfaeSJed Brownaction of $\bm{B}$ without storing it as a matrix.
234bcb2dfaeSJed Brown
235bcb2dfaeSJed BrownImplemented properly, the partial assembly algorithm requires optimal amount of
236bcb2dfaeSJed Brownmemory transfers (with respect to the polynomial order) and near-optimal FLOPs
237bcb2dfaeSJed Brownfor operator evaluation. It consists of an operator *setup* phase, that
238bcb2dfaeSJed Brownevaluates and stores $\bm{D}$ and an operator *apply* (evaluation) phase that
239bcb2dfaeSJed Browncomputes the action of $\bm{A}$ on an input vector. When desired, the setup
240bcb2dfaeSJed Brownphase may be done as a side-effect of evaluating a different operator, such as a
241bcb2dfaeSJed Brownnonlinear residual. The relative costs of the setup and apply phases are
242bcb2dfaeSJed Browndifferent depending on the physics being expressed and the representation of
243bcb2dfaeSJed Brown$\bm{D}$.
244bcb2dfaeSJed Brown
245bcb2dfaeSJed Brown### Parallel Decomposition
246bcb2dfaeSJed Brown
247bcb2dfaeSJed BrownAfter the application of each of the first three transition operators,
2480fe925dfSnbeams$\bm{P}$, $\bm{\mathcal{E}}$ and $\bm{B}$, the operator evaluation
2490fe925dfSnbeamsis decoupled  on their ranges, so $\bm{P}$, $\bm{\mathcal{E}}$ and
250bcb2dfaeSJed Brown$\bm{B}$ allow us to "zoom-in" to subdomain, element and quadrature point
251bcb2dfaeSJed Brownlevel, ignoring the coupling at higher levels.
252bcb2dfaeSJed Brown
253bcb2dfaeSJed BrownThus, a natural mapping of $\bm{A}$ on a parallel computer is to split the
254bcb2dfaeSJed Brown**T-vector** over MPI ranks (a non-overlapping decomposition, as is typically
255bcb2dfaeSJed Brownused for sparse matrices), and then split the rest of the vector types over
256bcb2dfaeSJed Browncomputational devices (CPUs, GPUs, etc.) as indicated by the shaded regions in
257bcb2dfaeSJed Brownthe diagram above.
258bcb2dfaeSJed Brown
259bcb2dfaeSJed BrownOne of the advantages of the decomposition perspective in these settings is that
2600fe925dfSnbeamsthe operators $\bm{P}$, $\bm{\mathcal{E}}$, $\bm{B}$ and
261bcb2dfaeSJed Brown$\bm{D}$ clearly separate the MPI parallelism
262bcb2dfaeSJed Brownin the operator ($\bm{P}$) from the unstructured mesh topology
2630fe925dfSnbeams($\bm{\mathcal{E}}$), the choice of the finite element space/basis ($\bm{B}$)
264bcb2dfaeSJed Brownand the geometry and point-wise physics $\bm{D}$. These components also
265bcb2dfaeSJed Brownnaturally fall in different classes of numerical algorithms -- parallel (multi-device)
266bcb2dfaeSJed Brownlinear algebra for $\bm{P}$, sparse (on-device) linear algebra for
2670fe925dfSnbeams$\bm{\mathcal{E}}$, dense/structured linear algebra (tensor contractions) for
268bcb2dfaeSJed Brown$\bm{B}$ and parallel point-wise evaluations for $\bm{D}$.
269bcb2dfaeSJed Brown
270bcb2dfaeSJed BrownCurrently in libCEED, it is assumed that the host application manages the global
271bcb2dfaeSJed Brown**T-vectors** and the required communications among devices (which are generally
272bcb2dfaeSJed Brownon different compute nodes) with **P**. Our API is thus focused on the
273bcb2dfaeSJed Brown**L-vector** level, where the logical devices, which in the library are
274bcb2dfaeSJed Brownrepresented by the {ref}`Ceed` object, are independent. Each MPI rank can use one or
275bcb2dfaeSJed Brownmore {ref}`Ceed`s, and each {ref}`Ceed`, in turn, can represent one or more physical
276bcb2dfaeSJed Browndevices, as long as libCEED backends support such configurations. The idea is
277bcb2dfaeSJed Brownthat every MPI rank can use any logical device it is assigned at runtime. For
278bcb2dfaeSJed Brownexample, on a node with 2 CPU sockets and 4 GPUs, one may decide to use 6 MPI
279bcb2dfaeSJed Brownranks (each using a single {ref}`Ceed` object): 2 ranks using 1 CPU socket each, and
280bcb2dfaeSJed Brown4 using 1 GPU each. Another choice could be to run 1 MPI rank on the whole node
281bcb2dfaeSJed Brownand use 5 {ref}`Ceed` objects: 1 managing all CPU cores on the 2 sockets and 4
282bcb2dfaeSJed Brownmanaging 1 GPU each. The communications among the devices, e.g. required for
283bcb2dfaeSJed Brownapplying the action of $\bm{P}$, are currently out of scope of libCEED. The
284bcb2dfaeSJed Browninterface is non-blocking for all operations involving more than O(1) data,
285bcb2dfaeSJed Brownallowing operations performed on a coprocessor or worker threads to overlap with
286bcb2dfaeSJed Brownoperations on the host.
287bcb2dfaeSJed Brown
288bcb2dfaeSJed Brown## API Description
289bcb2dfaeSJed Brown
290bcb2dfaeSJed BrownThe libCEED API takes an algebraic approach, where the user essentially
2910fe925dfSnbeamsdescribes in the *frontend* the operators $\bm{\bm{\mathcal{E}}}$, $\bm{B}$, and $\bm{D}$ and the library
292bcb2dfaeSJed Brownprovides *backend* implementations and coordinates their action to the original
293bcb2dfaeSJed Brownoperator on **L-vector** level (i.e. independently on each device / MPI task).
29402076a18SnbeamsThis is visualized in the schematic below; "active" and "passive" inputs/outputs
29502076a18Snbeamswill be discussed in more detail later.
29602076a18Snbeams
29702076a18Snbeams(fig-operator-schematic)=
29802076a18Snbeams
29902076a18Snbeams:::{figure} ../../img/libceed_schematic.svg
30002076a18SnbeamsFlow of data through vector types inside libCEED Operators, through backend implementations
30102076a18Snbeamsof $\bm{\bm{\mathcal{E}}}$, $\bm{B}$, and $\bm{D}$
30202076a18Snbeams:::
303bcb2dfaeSJed Brown
304bcb2dfaeSJed BrownOne of the advantages of this purely algebraic description is that it already
305bcb2dfaeSJed Brownincludes all the finite element information, so the backends can operate on
306bcb2dfaeSJed Brownlinear algebra level without explicit finite element code. The frontend
307bcb2dfaeSJed Browndescription is general enough to support a wide variety of finite element
308bcb2dfaeSJed Brownalgorithms, as well as some other types algorithms such as spectral finite
309bcb2dfaeSJed Browndifferences. The separation of the front- and backends enables applications to
310bcb2dfaeSJed Browneasily switch/try different backends. It also enables backend developers to
311bcb2dfaeSJed Brownimpact many applications from a single implementation.
312bcb2dfaeSJed Brown
313bcb2dfaeSJed BrownOur long-term vision is to include a variety of backend implementations in
314bcb2dfaeSJed BrownlibCEED, ranging from reference kernels to highly optimized kernels targeting
315bcb2dfaeSJed Brownspecific devices (e.g. GPUs) or specific polynomial orders. A simple reference
316bcb2dfaeSJed Brownbackend implementation is provided in the file
317bcb2dfaeSJed Brown[ceed-ref.c](https://github.com/CEED/libCEED/blob/main/backends/ref/ceed-ref.c).
318bcb2dfaeSJed Brown
31952006392Snbeams
320bcb2dfaeSJed BrownOn the frontend, the mapping between the decomposition concepts and the code
321bcb2dfaeSJed Brownimplementation is as follows:
322bcb2dfaeSJed Brown
323bcb2dfaeSJed Brown- **L-**, **E-** and **Q-vector** are represented as variables of type {ref}`CeedVector`.
324bcb2dfaeSJed Brown  (A backend may choose to operate incrementally without forming explicit **E-** or
325bcb2dfaeSJed Brown  **Q-vectors**.)
3260fe925dfSnbeams- $\bm{\mathcal{E}}$ is represented as variable of type {ref}`CeedElemRestriction`.
327bcb2dfaeSJed Brown- $\bm{B}$ is represented as variable of type {ref}`CeedBasis`.
328bcb2dfaeSJed Brown- the action of $\bm{D}$ is represented as variable of type {ref}`CeedQFunction`.
3290fe925dfSnbeams- the overall operator $\bm{\mathcal{E}}^T \bm{B}^T \bm{D} \bm{B} \bm{\mathcal{E}}$
330bcb2dfaeSJed Brown  is represented as variable of type
331bcb2dfaeSJed Brown  {ref}`CeedOperator` and its action is accessible through {c:func}`CeedOperatorApply()`.
332bcb2dfaeSJed Brown
333bcb2dfaeSJed BrownTo clarify these concepts and illustrate how they are combined in the API,
334bcb2dfaeSJed Brownconsider the implementation of the action of a simple 1D mass matrix
335bcb2dfaeSJed Brown(cf. [tests/t500-operator.c](https://github.com/CEED/libCEED/blob/main/tests/t500-operator.c)).
336bcb2dfaeSJed Brown
337bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
338bcb2dfaeSJed Brown:language: c
339bcb2dfaeSJed Brown:linenos: true
340bcb2dfaeSJed Brown```
34102076a18SnbeamsIn the following figure, we specialize the schematic used above for general operators so that
342*af62e76eSnbeamsit corresponds to the specific setup and mass operators as implemented in the sample code. We show
343*af62e76eSnbeamsthat the active output of the setup operator, combining the quadrature weights with the Jacobian
344*af62e76eSnbeamsinformation for the mesh transformation, becomes a passive input to the mass operator.  Notations
345*af62e76eSnbeamsdenote the libCEED function used to set the properties of the input and output fields.
34602076a18Snbeams
34702076a18Snbeams(fig-operator-schematic-mass)=
34802076a18Snbeams
349*af62e76eSnbeams:::{figure} ../../img/libceed_schematic_op_setup_mass.svg
35002076a18SnbeamsSpecific combination of $\bm{\bm{\mathcal{E}}}$, $\bm{B}$, $\bm{D}$, and input/output vectors
351*af62e76eSnbeamscorresponding to the libCEED operators in the t500-operator test
35202076a18Snbeams:::
353bcb2dfaeSJed Brown
354bcb2dfaeSJed BrownThe constructor
355bcb2dfaeSJed Brown
356bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
357bcb2dfaeSJed Brown:end-at: CeedInit
358bcb2dfaeSJed Brown:language: c
359bcb2dfaeSJed Brown:start-at: CeedInit
360bcb2dfaeSJed Brown```
361bcb2dfaeSJed Brown
362bcb2dfaeSJed Browncreates a logical device `ceed` on the specified *resource*, which could also be
363bcb2dfaeSJed Browna coprocessor such as `"/nvidia/0"`. There can be any number of such devices,
364bcb2dfaeSJed Brownincluding multiple logical devices driving the same resource (though performance
365bcb2dfaeSJed Brownmay suffer in case of oversubscription). The resource is used to locate a
366bcb2dfaeSJed Brownsuitable backend which will have discretion over the implementations of all
367bcb2dfaeSJed Brownobjects created with this logical device.
368bcb2dfaeSJed Brown
369bcb2dfaeSJed BrownThe `setup` routine above computes and stores $\bm{D}$, in this case a
370bcb2dfaeSJed Brownscalar value in each quadrature point, while `mass` uses these saved values to perform
371bcb2dfaeSJed Brownthe action of $\bm{D}$. These functions are turned into the {ref}`CeedQFunction`
372bcb2dfaeSJed Brownvariables `qf_setup` and `qf_mass` in the {c:func}`CeedQFunctionCreateInterior()` calls:
373bcb2dfaeSJed Brown
374bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
375bcb2dfaeSJed Brown:end-before: //! [QFunction Create]
376bcb2dfaeSJed Brown:language: c
377bcb2dfaeSJed Brown:start-after: //! [QFunction Create]
378bcb2dfaeSJed Brown```
379bcb2dfaeSJed Brown
380bcb2dfaeSJed BrownA {ref}`CeedQFunction` performs independent operations at each quadrature point and
381bcb2dfaeSJed Brownthe interface is intended to facilitate vectorization.  The second argument is
382bcb2dfaeSJed Brownan expected vector length. If greater than 1, the caller must ensure that the
383bcb2dfaeSJed Brownnumber of quadrature points `Q` is divisible by the vector length. This is
384bcb2dfaeSJed Brownoften satisfied automatically due to the element size or by batching elements
385bcb2dfaeSJed Browntogether to facilitate vectorization in other stages, and can always be ensured
386bcb2dfaeSJed Brownby padding.
387bcb2dfaeSJed Brown
388bcb2dfaeSJed BrownIn addition to the function pointers (`setup` and `mass`), {ref}`CeedQFunction`
389bcb2dfaeSJed Brownconstructors take a string representation specifying where the source for the
390bcb2dfaeSJed Brownimplementation is found. This is used by backends that support Just-In-Time
391bcb2dfaeSJed Brown(JIT) compilation (i.e., CUDA and OCCA) to compile for coprocessors.
392bcb2dfaeSJed BrownFor full support across all backends, these {ref}`CeedQFunction` source files must only contain constructs mutually supported by C99, C++11, and CUDA.
393bcb2dfaeSJed BrownFor example, explicit type casting of void pointers and explicit use of compatible arguments for {code}`math` library functions is required, and variable-length array (VLA) syntax for array reshaping is only available via libCEED's {code}`CEED_Q_VLA` macro.
394bcb2dfaeSJed Brown
395bcb2dfaeSJed BrownDifferent input and output fields are added individually, specifying the field
396bcb2dfaeSJed Brownname, size of the field, and evaluation mode.
397bcb2dfaeSJed Brown
398bcb2dfaeSJed BrownThe size of the field is provided by a combination of the number of components
399bcb2dfaeSJed Brownthe effect of any basis evaluations.
400bcb2dfaeSJed Brown
401bcb2dfaeSJed BrownThe evaluation mode (see {ref}`CeedBasis-Typedefs and Enumerations`) `CEED_EVAL_INTERP`
402bcb2dfaeSJed Brownfor both input and output fields indicates that the mass operator only contains terms of
403bcb2dfaeSJed Brownthe form
404bcb2dfaeSJed Brown
405bcb2dfaeSJed Brown$$
406bcb2dfaeSJed Brown\int_\Omega v \cdot f_0 (u, \nabla u)
407bcb2dfaeSJed Brown$$
408bcb2dfaeSJed Brown
409bcb2dfaeSJed Brownwhere $v$ are test functions (see the {ref}`theoretical-framework`).
410bcb2dfaeSJed BrownMore general operators, such as those of the form
411bcb2dfaeSJed Brown
412bcb2dfaeSJed Brown$$
413bcb2dfaeSJed Brown\int_\Omega v \cdot f_0 (u, \nabla u) + \nabla v : f_1 (u, \nabla u)
414bcb2dfaeSJed Brown$$
415bcb2dfaeSJed Brown
416bcb2dfaeSJed Browncan be expressed.
417bcb2dfaeSJed Brown
418bcb2dfaeSJed BrownFor fields with derivatives, such as with the basis evaluation mode
419bcb2dfaeSJed Brown(see {ref}`CeedBasis-Typedefs and Enumerations`) `CEED_EVAL_GRAD`, the size of the
420bcb2dfaeSJed Brownfield needs to reflect both the number of components and the geometric dimension.
421bcb2dfaeSJed BrownA 3-dimensional gradient on four components would therefore mean the field has a size of
422bcb2dfaeSJed Brown12\.
423bcb2dfaeSJed Brown
424bcb2dfaeSJed BrownThe $\bm{B}$ operators for the mesh nodes, `basis_x`, and the unknown field,
425bcb2dfaeSJed Brown`basis_u`, are defined in the calls to the function {c:func}`CeedBasisCreateTensorH1Lagrange()`.
426bcb2dfaeSJed BrownIn this example, both the mesh and the unknown field use $H^1$ Lagrange finite
427bcb2dfaeSJed Brownelements of order 1 and 4 respectively (the `P` argument represents the number of 1D
428bcb2dfaeSJed Browndegrees of freedom on each element). Both basis operators use the same integration rule,
429bcb2dfaeSJed Brownwhich is Gauss-Legendre with 8 points (the `Q` argument).
430bcb2dfaeSJed Brown
431bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
432bcb2dfaeSJed Brown:end-before: //! [Basis Create]
433bcb2dfaeSJed Brown:language: c
434bcb2dfaeSJed Brown:start-after: //! [Basis Create]
435bcb2dfaeSJed Brown```
436bcb2dfaeSJed Brown
437bcb2dfaeSJed BrownOther elements with this structure can be specified in terms of the `Q×P`
438bcb2dfaeSJed Brownmatrices that evaluate values and gradients at quadrature points in one
439bcb2dfaeSJed Browndimension using {c:func}`CeedBasisCreateTensorH1()`. Elements that do not have tensor
440bcb2dfaeSJed Brownproduct structure, such as symmetric elements on simplices, will be created
441bcb2dfaeSJed Brownusing different constructors.
442bcb2dfaeSJed Brown
4430fe925dfSnbeamsThe $\bm{\mathcal{E}}$ operators for the mesh nodes, `elem_restr_x`, and the unknown field,
444bcb2dfaeSJed Brown`elem_restr_u`, are specified in the {c:func}`CeedElemRestrictionCreate()`. Both of these
445bcb2dfaeSJed Brownspecify directly the dof indices for each element in the `ind_x` and `ind_u`
446bcb2dfaeSJed Brownarrays:
447bcb2dfaeSJed Brown
448bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
449bcb2dfaeSJed Brown:end-before: //! [ElemRestr Create]
450bcb2dfaeSJed Brown:language: c
451bcb2dfaeSJed Brown:start-after: //! [ElemRestr Create]
452bcb2dfaeSJed Brown```
453bcb2dfaeSJed Brown
454bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
455bcb2dfaeSJed Brown:end-before: //! [ElemRestrU Create]
456bcb2dfaeSJed Brown:language: c
457bcb2dfaeSJed Brown:start-after: //! [ElemRestrU Create]
458bcb2dfaeSJed Brown```
459bcb2dfaeSJed Brown
460bcb2dfaeSJed BrownIf the user has arrays available on a device, they can be provided using
461bcb2dfaeSJed Brown`CEED_MEM_DEVICE`. This technique is used to provide no-copy interfaces in all
462bcb2dfaeSJed Browncontexts that involve problem-sized data.
463bcb2dfaeSJed Brown
464bcb2dfaeSJed BrownFor discontinuous Galerkin and for applications such as Nek5000 that only
465bcb2dfaeSJed Brownexplicitly store **E-vectors** (inter-element continuity has been subsumed by
4660fe925dfSnbeamsthe parallel restriction $\bm{P}$), the element restriction $\bm{\mathcal{E}}$
467bcb2dfaeSJed Brownis the identity and {c:func}`CeedElemRestrictionCreateStrided()` is used instead.
4680fe925dfSnbeamsWe plan to support other structured representations of $\bm{\mathcal{E}}$ which will
469bcb2dfaeSJed Brownbe added according to demand.
4700fe925dfSnbeamsThere are two common approaches for supporting non-conforming elements: applying the node constraints via $\bm P$ so that the **L-vector** can be processed uniformly and applying the constraints via $\bm{\mathcal{E}}$ so that the **E-vector** is uniform.
471bcb2dfaeSJed BrownThe former can be done with the existing interface while the latter will require a generalization to element restriction that would define field values at constrained nodes as linear combinations of the values at primary nodes.
472bcb2dfaeSJed Brown
473bcb2dfaeSJed BrownThese operations, $\bm{P}$, $\bm{B}$, and $\bm{D}$,
474bcb2dfaeSJed Brownare combined with a {ref}`CeedOperator`. As with {ref}`CeedQFunction`s, operator fields are added
475bcb2dfaeSJed Brownseparately with a matching field name, basis ($\bm{B}$), element restriction
4760fe925dfSnbeams($\bm{\mathcal{E}}$), and **L-vector**. The flag
477bcb2dfaeSJed Brown`CEED_VECTOR_ACTIVE` indicates that the vector corresponding to that field will
478bcb2dfaeSJed Brownbe provided to the operator when {c:func}`CeedOperatorApply()` is called. Otherwise the
479bcb2dfaeSJed Browninput/output will be read from/written to the specified **L-vector**.
480bcb2dfaeSJed Brown
481bcb2dfaeSJed BrownWith partial assembly, we first perform a setup stage where $\bm{D}$ is evaluated
482bcb2dfaeSJed Brownand stored. This is accomplished by the operator `op_setup` and its application
483bcb2dfaeSJed Brownto `X`, the nodes of the mesh (these are needed to compute Jacobians at
484bcb2dfaeSJed Brownquadrature points). Note that the corresponding {c:func}`CeedOperatorApply()` has no basis
485bcb2dfaeSJed Brownevaluation on the output, as the quadrature data is not needed at the dofs:
486bcb2dfaeSJed Brown
487bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
488bcb2dfaeSJed Brown:end-before: //! [Setup Create]
489bcb2dfaeSJed Brown:language: c
490bcb2dfaeSJed Brown:start-after: //! [Setup Create]
491bcb2dfaeSJed Brown```
492bcb2dfaeSJed Brown
493bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
494bcb2dfaeSJed Brown:end-before: //! [Setup Set]
495bcb2dfaeSJed Brown:language: c
496bcb2dfaeSJed Brown:start-after: //! [Setup Set]
497bcb2dfaeSJed Brown```
498bcb2dfaeSJed Brown
499bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
500bcb2dfaeSJed Brown:end-before: //! [Setup Apply]
501bcb2dfaeSJed Brown:language: c
502bcb2dfaeSJed Brown:start-after: //! [Setup Apply]
503bcb2dfaeSJed Brown```
504bcb2dfaeSJed Brown
505bcb2dfaeSJed BrownThe action of the operator is then represented by operator `op_mass` and its
506bcb2dfaeSJed Brown{c:func}`CeedOperatorApply()` to the input **L-vector** `U` with output in `V`:
507bcb2dfaeSJed Brown
508bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
509bcb2dfaeSJed Brown:end-before: //! [Operator Create]
510bcb2dfaeSJed Brown:language: c
511bcb2dfaeSJed Brown:start-after: //! [Operator Create]
512bcb2dfaeSJed Brown```
513bcb2dfaeSJed Brown
514bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
515bcb2dfaeSJed Brown:end-before: //! [Operator Set]
516bcb2dfaeSJed Brown:language: c
517bcb2dfaeSJed Brown:start-after: //! [Operator Set]
518bcb2dfaeSJed Brown```
519bcb2dfaeSJed Brown
520bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
521bcb2dfaeSJed Brown:end-before: //! [Operator Apply]
522bcb2dfaeSJed Brown:language: c
523bcb2dfaeSJed Brown:start-after: //! [Operator Apply]
524bcb2dfaeSJed Brown```
525bcb2dfaeSJed Brown
526bcb2dfaeSJed BrownA number of function calls in the interface, such as {c:func}`CeedOperatorApply()`, are
527bcb2dfaeSJed Brownintended to support asynchronous execution via their last argument,
528bcb2dfaeSJed Brown`CeedRequest*`. The specific (pointer) value used in the above example,
529bcb2dfaeSJed Brown`CEED_REQUEST_IMMEDIATE`, is used to express the request (from the user) for the
530bcb2dfaeSJed Brownoperation to complete before returning from the function call, i.e. to make sure
531bcb2dfaeSJed Brownthat the result of the operation is available in the output parameters
532bcb2dfaeSJed Brownimmediately after the call. For a true asynchronous call, one needs to provide
533bcb2dfaeSJed Brownthe address of a user defined variable. Such a variable can be used later to
534bcb2dfaeSJed Brownexplicitly wait for the completion of the operation.
535bcb2dfaeSJed Brown
536bcb2dfaeSJed Brown## Gallery of QFunctions
537bcb2dfaeSJed Brown
538bcb2dfaeSJed BrownLibCEED provides a gallery of built-in {ref}`CeedQFunction`s in the {file}`gallery/` directory.
539bcb2dfaeSJed BrownThe available QFunctions are the ones associated with the mass, the Laplacian, and
540bcb2dfaeSJed Brownthe identity operators. To illustrate how the user can declare a {ref}`CeedQFunction`
541bcb2dfaeSJed Brownvia the gallery of available QFunctions, consider the selection of the
542bcb2dfaeSJed Brown{ref}`CeedQFunction` associated with a simple 1D mass matrix
543bcb2dfaeSJed Brown(cf. [tests/t410-qfunction.c](https://github.com/CEED/libCEED/blob/main/tests/t410-qfunction.c)).
544bcb2dfaeSJed Brown
545bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t410-qfunction.c
546bcb2dfaeSJed Brown:language: c
547bcb2dfaeSJed Brown:linenos: true
548bcb2dfaeSJed Brown```
549bcb2dfaeSJed Brown
550bcb2dfaeSJed Brown## Interface Principles and Evolution
551bcb2dfaeSJed Brown
552bcb2dfaeSJed BrownLibCEED is intended to be extensible via backends that are packaged with the
553bcb2dfaeSJed Brownlibrary and packaged separately (possibly as a binary containing proprietary
554bcb2dfaeSJed Browncode). Backends are registered by calling
555bcb2dfaeSJed Brown
556bcb2dfaeSJed Brown```{literalinclude} ../../../backends/ref/ceed-ref.c
557bcb2dfaeSJed Brown:end-before: //! [Register]
558bcb2dfaeSJed Brown:language: c
559bcb2dfaeSJed Brown:start-after: //! [Register]
560bcb2dfaeSJed Brown```
561bcb2dfaeSJed Brown
562bcb2dfaeSJed Browntypically in a library initializer or "constructor" that runs automatically.
563bcb2dfaeSJed Brown`CeedInit` uses this prefix to find an appropriate backend for the resource.
564bcb2dfaeSJed Brown
565bcb2dfaeSJed BrownSource (API) and binary (ABI) stability are important to libCEED. Prior to
566bcb2dfaeSJed Brownreaching version 1.0, libCEED does not implement strict [semantic versioning](https://semver.org) across the entire interface. However, user code,
567bcb2dfaeSJed Brownincluding libraries of {ref}`CeedQFunction`s, should be source and binary
568bcb2dfaeSJed Browncompatible moving from 0.x.y to any later release 0.x.z. We have less experience
569bcb2dfaeSJed Brownwith external packaging of backends and do not presently guarantee source or
570bcb2dfaeSJed Brownbinary stability, but we intend to define stability guarantees for libCEED 1.0.
571bcb2dfaeSJed BrownWe'd love to talk with you if you're interested in packaging backends
572bcb2dfaeSJed Brownexternally, and will work with you on a practical stability policy.
573