xref: /libCEED/doc/sphinx/source/libCEEDapi.md (revision 48203ff5b2e76d34bb2203dfd66749ca64137ac2)
1bcb2dfaeSJed Brown# Interface Concepts
2bcb2dfaeSJed Brown
317be3a41SJeremy L ThompsonThis page provides a brief description of the theoretical foundations and the practical implementation of the libCEED library.
4bcb2dfaeSJed Brown
5bcb2dfaeSJed Brown(theoretical-framework)=
6bcb2dfaeSJed Brown
7bcb2dfaeSJed Brown## Theoretical Framework
8bcb2dfaeSJed Brown
917be3a41SJeremy L ThompsonIn finite element formulations, the weak form of a Partial Differential Equation (PDE) is evaluated on a subdomain $\Omega_e$ (element) and the local results are composed into a larger system of equations that models the entire problem on the global domain $\Omega$.
1017be3a41SJeremy L ThompsonIn particular, when high-order finite elements or spectral elements are used, the resulting sparse matrix representation of the global operator is computationally expensive, with respect to both the memory transfer and floating point operations needed for its evaluation.
1117be3a41SJeremy L ThompsonlibCEED provides an interface for matrix-free operator description that enables efficient evaluation on a variety of computational device types (selectable at run time).
1217be3a41SJeremy L ThompsonWe present here the notation and the mathematical formulation adopted in libCEED.
13bcb2dfaeSJed Brown
1417be3a41SJeremy L ThompsonWe start by considering the discrete residual $F(u)=0$ formulation in weak form.
1517be3a41SJeremy L ThompsonWe first define the $L^2$ inner product between real-valued functions
16bcb2dfaeSJed Brown
17bcb2dfaeSJed Brown$$
18bcb2dfaeSJed Brown\langle v, u \rangle = \int_\Omega v u d \bm{x},
19bcb2dfaeSJed Brown$$
20bcb2dfaeSJed Brown
21bcb2dfaeSJed Brownwhere $\bm{x} \in \mathbb{R}^d \supset \Omega$.
22bcb2dfaeSJed Brown
2317be3a41SJeremy L ThompsonWe want to find $u$ in a suitable space $V_D$, such that
24bcb2dfaeSJed Brown
25bcb2dfaeSJed Brown$$
26bcb2dfaeSJed Brown\langle  \bm v,  \bm f(u) \rangle = \int_\Omega  \bm v \cdot  \bm f_0 (u, \nabla u) + \nabla \bm v :  \bm f_1 (u, \nabla u) = 0
27bcb2dfaeSJed Brown$$ (residual)
28bcb2dfaeSJed Brown
2917be3a41SJeremy L Thompsonfor all $\bm v$ in the corresponding homogeneous space $V_0$, where $\bm f_0$ and $\bm f_1$ contain all possible sources in the problem.
3017be3a41SJeremy L ThompsonWe notice here that $\bm f_0$ represents all terms in {eq}`residual` which multiply the (possibly vector-valued) test function $\bm v$ and $\bm f_1$ all terms which multiply its gradient $\nabla \bm v$.
3117be3a41SJeremy L ThompsonFor an n-component problems in $d$ dimensions, $\bm f_0 \in \mathbb{R}^n$ and $\bm f_1 \in \mathbb{R}^{nd}$.
32bcb2dfaeSJed Brown
33bcb2dfaeSJed Brown:::{note}
3417be3a41SJeremy L ThompsonThe notation $\nabla \bm v \!:\! \bm f_1$ represents contraction over both fields and spatial dimensions while a single dot represents contraction in just one, which should be clear from context, e.g., $\bm v \cdot \bm f_0$ contracts only over fields.
35bcb2dfaeSJed Brown:::
36bcb2dfaeSJed Brown
37bcb2dfaeSJed Brown:::{note}
3817be3a41SJeremy L ThompsonIn the code, the function that represents the weak form at quadrature points is called the {ref}`CeedQFunction`.
3917be3a41SJeremy L ThompsonIn the {ref}`Examples` provided with the library (in the {file}`examples/` directory), we store the term $\bm f_0$ directly into `v`, and the term $\bm f_1$ directly into `dv` (which stands for $\nabla \bm v$).
4017be3a41SJeremy L ThompsonIf equation {eq}`residual` only presents a term of the type $\bm f_0$, the {ref}`CeedQFunction` will only have one output argument, namely `v`.
4117be3a41SJeremy L ThompsonIf equation {eq}`residual` also presents a term of the type $\bm f_1$, then the {ref}`CeedQFunction` will have two output arguments, namely, `v` and `dv`.
42bcb2dfaeSJed Brown:::
43bcb2dfaeSJed Brown
44bcb2dfaeSJed Brown## Finite Element Operator Decomposition
45bcb2dfaeSJed Brown
4617be3a41SJeremy L ThompsonFinite element operators are typically defined through weak formulations of partial differential equations that involve integration over a computational mesh.
4717be3a41SJeremy L ThompsonThe required integrals are computed by splitting them as a sum over the mesh elements, mapping each element to a simple *reference* element (e.g. the unit square) and applying a quadrature rule in reference space.
48bcb2dfaeSJed Brown
49ac5aa7bcSJeremy L ThompsonThis sequence of operations highlights an inherent hierarchical structure present in all finite element operators where the evaluation starts on *global (trial) degrees of freedom (DoFs) or nodes on the whole mesh*, restricts to *DoFs on subdomains* (groups of elements), then moves to independent *DoFs on each element*, transitions to independent *quadrature points* in reference space, performs the integration, and then goes back in reverse order to global (test) degrees of freedom on the whole mesh.
50bcb2dfaeSJed Brown
5117be3a41SJeremy L ThompsonThis is illustrated below for the simple case of symmetric linear operator on third order ($Q_3$) scalar continuous ($H^1$) elements, where we use the notions **T-vector**, **L-vector**, **E-vector** and **Q-vector** to represent the sets corresponding to the (true) degrees of freedom on the global mesh, the split local degrees of freedom on the subdomains, the split degrees of freedom on the mesh elements, and the values at quadrature points, respectively.
52bcb2dfaeSJed Brown
53bcb2dfaeSJed BrownWe refer to the operators that connect the different types of vectors as:
54bcb2dfaeSJed Brown
55bcb2dfaeSJed Brown- Subdomain restriction $\bm{P}$
560fe925dfSnbeams- Element restriction $\bm{\mathcal{E}}$
57bcb2dfaeSJed Brown- Basis (Dofs-to-Qpts) evaluator $\bm{B}$
58bcb2dfaeSJed Brown- Operator at quadrature points $\bm{D}$
59bcb2dfaeSJed Brown
6017be3a41SJeremy L ThompsonMore generally, when the test and trial space differ, they get their own versions of $\bm{P}$, $\bm{\mathcal{E}}$ and $\bm{B}$.
61bcb2dfaeSJed Brown
62bcb2dfaeSJed Brown(fig-operator-decomp)=
63bcb2dfaeSJed Brown
64833ca6b5SJed Brown:::{figure} ../../img/libCEED-decomposition.svg
65bcb2dfaeSJed BrownOperator Decomposition
66bcb2dfaeSJed Brown:::
67bcb2dfaeSJed Brown
6817be3a41SJeremy L ThompsonNote that in the case of adaptive mesh refinement (AMR), the restrictions $\bm{P}$ and $\bm{\mathcal{E}}$ will involve not just extracting sub-vectors, but evaluating values at constrained degrees of freedom through the AMR interpolation.
6917be3a41SJeremy L ThompsonThere can also be several levels of subdomains ($\bm P_1$, $\bm P_2$, etc.), and it may be convenient to split $\bm{D}$ as the product of several operators ($\bm D_1$, $\bm D_2$, etc.).
70bcb2dfaeSJed Brown
71bcb2dfaeSJed Brown### Terminology and Notation
72bcb2dfaeSJed Brown
73bcb2dfaeSJed BrownVector representation/storage categories:
74bcb2dfaeSJed Brown
75bcb2dfaeSJed Brown- True degrees of freedom/unknowns, **T-vector**:
76bcb2dfaeSJed Brown
77bcb2dfaeSJed Brown  > - each unknown $i$ has exactly one copy, on exactly one processor, $rank(i)$
78bcb2dfaeSJed Brown  > - this is a non-overlapping vector decomposition
79ac5aa7bcSJeremy L Thompson  > - usually includes any essential (fixed) DoFs.
80bcb2dfaeSJed Brown  >
81bcb2dfaeSJed Brown  > ```{image} ../../img/T-vector.svg
82bcb2dfaeSJed Brown  > ```
83bcb2dfaeSJed Brown
84bcb2dfaeSJed Brown- Local (w.r.t. processors) degrees of freedom/unknowns, **L-vector**:
85bcb2dfaeSJed Brown
8617be3a41SJeremy L Thompson  > - each unknown $i$ has exactly one copy on each processor that owns an element containing $i$
8717be3a41SJeremy L Thompson  > - this is an overlapping vector decomposition with overlaps only across different processors---there is no duplication of unknowns on a single processor
88ac5aa7bcSJeremy L Thompson  > - the shared DoFs/unknowns are the overlapping DoFs, i.e. the ones that have more than one copy, on different processors.
89bcb2dfaeSJed Brown  >
90bcb2dfaeSJed Brown  > ```{image} ../../img/L-vector.svg
91bcb2dfaeSJed Brown  > ```
92bcb2dfaeSJed Brown
93bcb2dfaeSJed Brown- Per element decomposition, **E-vector**:
94bcb2dfaeSJed Brown
9517be3a41SJeremy L Thompson  > - each unknown $i$ has as many copies as the number of elements that contain $i$
9617be3a41SJeremy L Thompson  > - usually, the copies of the unknowns are grouped by the element they belong to.
97bcb2dfaeSJed Brown  >
98bcb2dfaeSJed Brown  > ```{image} ../../img/E-vector.svg
99bcb2dfaeSJed Brown  > ```
100bcb2dfaeSJed Brown
101ac5aa7bcSJeremy L Thompson- In the case of AMR with hanging nodes (giving rise to hanging DoFs):
102bcb2dfaeSJed Brown
103ac5aa7bcSJeremy L Thompson  > - the **L-vector** is enhanced with the hanging/dependent DoFs
104ac5aa7bcSJeremy L Thompson  > - the additional hanging/dependent DoFs are duplicated when they are shared by multiple processors
105ac5aa7bcSJeremy L Thompson  > - this way, an **E-vector** can be derived from an **L-vector** without any communications and without additional computations to derive the dependent DoFs
10617be3a41SJeremy L Thompson  > - in other words, an entry in an **E-vector** is obtained by copying an entry from the corresponding **L-vector**, optionally switching the sign of the entry (for $H(\mathrm{div})$---and $H(\mathrm{curl})$-conforming spaces).
107bcb2dfaeSJed Brown  >
108bcb2dfaeSJed Brown  > ```{image} ../../img/L-vector-AMR.svg
109bcb2dfaeSJed Brown  > ```
110bcb2dfaeSJed Brown
111bcb2dfaeSJed Brown- In the case of variable order spaces:
112bcb2dfaeSJed Brown
113ac5aa7bcSJeremy L Thompson  > - the dependent DoFs (usually on the higher-order side of a face/edge) can be treated just like the hanging/dependent DoFs case.
114bcb2dfaeSJed Brown
115bcb2dfaeSJed Brown- Quadrature point vector, **Q-vector**:
116bcb2dfaeSJed Brown
117ac5aa7bcSJeremy L Thompson  > - this is similar to **E-vector** where instead of DoFs, the vector represents values at quadrature points, grouped by element.
118bcb2dfaeSJed Brown
119bcb2dfaeSJed Brown- In many cases it is useful to distinguish two types of vectors:
120bcb2dfaeSJed Brown
121bcb2dfaeSJed Brown  > - **X-vector**, or **primal X-vector**, and **X'-vector**, or **dual X-vector**
122bcb2dfaeSJed Brown  > - here X can be any of the T, L, E, or Q categories
123bcb2dfaeSJed Brown  > - for example, the mass matrix operator maps a **T-vector** to a **T'-vector**
124bcb2dfaeSJed Brown  > - the solutions vector is a **T-vector**, and the RHS vector is a **T'-vector**
12517be3a41SJeremy L Thompson  > - using the parallel prolongation operator, one can map the solution **T-vector** to a solution **L-vector**, etc.
126bcb2dfaeSJed Brown
127bcb2dfaeSJed BrownOperator representation/storage/action categories:
128bcb2dfaeSJed Brown
129ac5aa7bcSJeremy L Thompson- Full true-DoF parallel assembly, **TA**, or **A**:
130bcb2dfaeSJed Brown
131bcb2dfaeSJed Brown  > - ParCSR or similar format
13217be3a41SJeremy L Thompson  > - the T in TA indicates that the data format represents an operator from a **T-vector** to a **T'-vector**.
133bcb2dfaeSJed Brown
134bcb2dfaeSJed Brown- Full local assembly, **LA**:
135bcb2dfaeSJed Brown
136bcb2dfaeSJed Brown  > - CSR matrix on each rank
13717be3a41SJeremy L Thompson  > - the parallel prolongation operator, $\bm{P}$, (and its transpose) should use optimized matrix-free action
138bcb2dfaeSJed Brown  > - note that $\bm{P}$ is the operator mapping T-vectors to L-vectors.
139bcb2dfaeSJed Brown
140bcb2dfaeSJed Brown- Element matrix assembly, **EA**:
141bcb2dfaeSJed Brown
142bcb2dfaeSJed Brown  > - each element matrix is stored as a dense matrix
143bcb2dfaeSJed Brown  > - optimized element and parallel prolongation operators
14417be3a41SJeremy L Thompson  > - note that the element prolongation operator is the mapping from an **L-vector** to an **E-vector**.
145bcb2dfaeSJed Brown
146bcb2dfaeSJed Brown- Quadrature-point/partial assembly, **QA** or **PA**:
147bcb2dfaeSJed Brown
148bcb2dfaeSJed Brown  > - precompute and store $w\det(J)$ at all quadrature points in all mesh elements
149bcb2dfaeSJed Brown  > - the stored data can be viewed as a **Q-vector**.
150bcb2dfaeSJed Brown
151bcb2dfaeSJed Brown- Unassembled option,  **UA** or **U**:
152bcb2dfaeSJed Brown
153bcb2dfaeSJed Brown  > - no assembly step
15417be3a41SJeremy L Thompson  > - the action uses directly the mesh node coordinates, and assumes specific form of the coefficient, e.g. constant, piecewise-constant, or given as a **Q-vector** (Q-coefficient).
155bcb2dfaeSJed Brown
156bcb2dfaeSJed Brown### Partial Assembly
157bcb2dfaeSJed Brown
15817be3a41SJeremy L ThompsonSince the global operator $\bm{A}$ is just a series of variational restrictions with $\bm{B}$, $\bm{\mathcal{E}}$ and $\bm{P}$, starting from its point-wise kernel $\bm{D}$, a "matvec" with $\bm{A}$ can be performed by evaluating and storing some of the innermost variational restriction matrices, and applying the rest of the operators "on-the-fly".
15917be3a41SJeremy L ThompsonFor example, one can compute and store a global matrix on **T-vector** level.
16017be3a41SJeremy L ThompsonAlternatively, one can compute and store only the subdomain (**L-vector**) or element (**E-vector**) matrices and perform the action of $\bm{A}$ using matvecs with $\bm{P}$ or $\bm{P}$ and $\bm{\mathcal{E}}$.
16117be3a41SJeremy L ThompsonWhile these options are natural for low-order discretizations, they are not a good fit for high-order methods due to the amount of FLOPs needed for their evaluation, as well as the memory transfer needed for a matvec.
162bcb2dfaeSJed Brown
16317be3a41SJeremy L ThompsonOur focus in libCEED, instead, is on **partial assembly**, where we compute and store only $\bm{D}$ (or portions of it) and evaluate the actions of $\bm{P}$, $\bm{\mathcal{E}}$ and $\bm{B}$ on-the-fly.
16417be3a41SJeremy L ThompsonCritically for performance, we take advantage of the tensor-product structure of the degrees of freedom and quadrature points on *quad* and *hex* elements to perform the action of $\bm{B}$ without storing it as a matrix.
165bcb2dfaeSJed Brown
16617be3a41SJeremy L ThompsonImplemented properly, the partial assembly algorithm requires optimal amount of memory transfers (with respect to the polynomial order) and near-optimal FLOPs for operator evaluation.
16717be3a41SJeremy L ThompsonIt consists of an operator *setup* phase, that evaluates and stores $\bm{D}$ and an operator *apply* (evaluation) phase that computes the action of $\bm{A}$ on an input vector.
16817be3a41SJeremy L ThompsonWhen desired, the setup phase may be done as a side-effect of evaluating a different operator, such as a nonlinear residual.
16917be3a41SJeremy L ThompsonThe relative costs of the setup and apply phases are different depending on the physics being expressed and the representation of $\bm{D}$.
170bcb2dfaeSJed Brown
171bcb2dfaeSJed Brown### Parallel Decomposition
172bcb2dfaeSJed Brown
17317be3a41SJeremy L ThompsonAfter the application of each of the first three transition operators, $\bm{P}$, $\bm{\mathcal{E}}$ and $\bm{B}$, the operator evaluation is decoupled  on their ranges, so $\bm{P}$, $\bm{\mathcal{E}}$ and $\bm{B}$ allow us to "zoom-in" to subdomain, element and quadrature point level, ignoring the coupling at higher levels.
174bcb2dfaeSJed Brown
17517be3a41SJeremy L ThompsonThus, a natural mapping of $\bm{A}$ on a parallel computer is to split the **T-vector** over MPI ranks (a non-overlapping decomposition, as is typically used for sparse matrices), and then split the rest of the vector types over computational devices (CPUs, GPUs, etc.) as indicated by the shaded regions in the diagram above.
176bcb2dfaeSJed Brown
17717be3a41SJeremy L ThompsonOne of the advantages of the decomposition perspective in these settings is that the operators $\bm{P}$, $\bm{\mathcal{E}}$, $\bm{B}$ and $\bm{D}$ clearly separate the MPI parallelism in the operator ($\bm{P}$) from the unstructured mesh topology ($\bm{\mathcal{E}}$), the choice of the finite element space/basis ($\bm{B}$) and the geometry and point-wise physics $\bm{D}$.
17817be3a41SJeremy L ThompsonThese components also naturally fall in different classes of numerical algorithms -- parallel (multi-device) linear algebra for $\bm{P}$, sparse (on-device) linear algebra for $\bm{\mathcal{E}}$, dense/structured linear algebra (tensor contractions) for $\bm{B}$ and parallel point-wise evaluations for $\bm{D}$.
179bcb2dfaeSJed Brown
18017be3a41SJeremy L ThompsonCurrently in libCEED, it is assumed that the host application manages the global **T-vectors** and the required communications among devices (which are generally on different compute nodes) with **P**.
18117be3a41SJeremy L ThompsonOur API is thus focused on the **L-vector** level, where the logical devices, which in the library are represented by the {ref}`Ceed` object, are independent.
18217be3a41SJeremy L ThompsonEach MPI rank can use one or more {ref}`Ceed`s, and each {ref}`Ceed`, in turn, can represent one or more physical devices, as long as libCEED backends support such configurations.
18317be3a41SJeremy L ThompsonThe idea is that every MPI rank can use any logical device it is assigned at runtime.
18417be3a41SJeremy L ThompsonFor example, on a node with 2 CPU sockets and 4 GPUs, one may decide to use 6 MPI ranks (each using a single {ref}`Ceed` object): 2 ranks using 1 CPU socket each, and 4 using 1 GPU each.
18517be3a41SJeremy L ThompsonAnother choice could be to run 1 MPI rank on the whole node and use 5 {ref}`Ceed` objects: 1 managing all CPU cores on the 2 sockets and 4 managing 1 GPU each.
18617be3a41SJeremy L ThompsonThe communications among the devices, e.g. required for applying the action of $\bm{P}$, are currently out of scope of libCEED.
18717be3a41SJeremy L ThompsonThe interface is non-blocking for all operations involving more than O(1) data, allowing operations performed on a coprocessor or worker threads to overlap with operations on the host.
188bcb2dfaeSJed Brown
189bcb2dfaeSJed Brown## API Description
190bcb2dfaeSJed Brown
19117be3a41SJeremy L ThompsonThe libCEED API takes an algebraic approach, where the user essentially describes in the *frontend* the operators $\bm{\bm{\mathcal{E}}}$, $\bm{B}$, and $\bm{D}$ and the library provides *backend* implementations and coordinates their action to the original operator on **L-vector** level (i.e. independently on each device / MPI task).
19217be3a41SJeremy L ThompsonThis is visualized in the schematic below; "active" and "passive" inputs/outputs will be discussed in more detail later.
19302076a18Snbeams
19402076a18Snbeams(fig-operator-schematic)=
19502076a18Snbeams
19602076a18Snbeams:::{figure} ../../img/libceed_schematic.svg
19717be3a41SJeremy L ThompsonFlow of data through vector types inside libCEED Operators, through backend implementations of $\bm{\bm{\mathcal{E}}}$, $\bm{B}$, and $\bm{D}$
19802076a18Snbeams:::
199bcb2dfaeSJed Brown
20017be3a41SJeremy L ThompsonOne of the advantages of this purely algebraic description is that it already includes all the finite element information, so the backends can operate on linear algebra level without explicit finite element code.
20117be3a41SJeremy L ThompsonThe frontend description is general enough to support a wide variety of finite element algorithms, as well as some other types algorithms such as spectral finite differences.
20217be3a41SJeremy L ThompsonThe separation of the front- and backends enables applications to easily switch/try different backends.
20317be3a41SJeremy L ThompsonIt also enables backend developers to impact many applications from a single implementation.
204bcb2dfaeSJed Brown
20517be3a41SJeremy L ThompsonOur long-term vision is to include a variety of backend implementations in libCEED, ranging from reference kernels to highly optimized kernels targeting specific devices (e.g. GPUs) or specific polynomial orders.
20617be3a41SJeremy L ThompsonA simple reference backend implementation is provided in the file
207bcb2dfaeSJed Brown[ceed-ref.c](https://github.com/CEED/libCEED/blob/main/backends/ref/ceed-ref.c).
208bcb2dfaeSJed Brown
20952006392Snbeams
21017be3a41SJeremy L ThompsonOn the frontend, the mapping between the decomposition concepts and the code implementation is as follows:
211bcb2dfaeSJed Brown
212bcb2dfaeSJed Brown- **L-**, **E-** and **Q-vector** are represented as variables of type {ref}`CeedVector`.
21317be3a41SJeremy L Thompson  (A backend may choose to operate incrementally without forming explicit **E-** or **Q-vectors**.)
2140fe925dfSnbeams- $\bm{\mathcal{E}}$ is represented as variable of type {ref}`CeedElemRestriction`.
215bcb2dfaeSJed Brown- $\bm{B}$ is represented as variable of type {ref}`CeedBasis`.
216bcb2dfaeSJed Brown- the action of $\bm{D}$ is represented as variable of type {ref}`CeedQFunction`.
21717be3a41SJeremy L Thompson- the overall operator $\bm{\mathcal{E}}^T \bm{B}^T \bm{D} \bm{B} \bm{\mathcal{E}}$ is represented as variable of type {ref}`CeedOperator` and its action is accessible through {c:func}`CeedOperatorApply()`.
218bcb2dfaeSJed Brown
21917be3a41SJeremy L ThompsonTo clarify these concepts and illustrate how they are combined in the API, consider the implementation of the action of a simple 1D mass matrix (cf. [tests/t500-operator.c](https://github.com/CEED/libCEED/blob/main/tests/t500-operator.c)).
220bcb2dfaeSJed Brown
221bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
222bcb2dfaeSJed Brown:language: c
223bcb2dfaeSJed Brown:linenos: true
224bcb2dfaeSJed Brown```
22517be3a41SJeremy L ThompsonIn the following figure, we specialize the schematic used above for general operators so that it corresponds to the specific setup and mass operators as implemented in the sample code.
22617be3a41SJeremy L ThompsonWe show that the active output of the setup operator, combining the quadrature weights with the Jacobian information for the mesh transformation, becomes a passive input to the mass operator.
22717be3a41SJeremy L ThompsonNotations denote the libCEED function used to set the properties of the input and output fields.
22802076a18Snbeams
22902076a18Snbeams(fig-operator-schematic-mass)=
23002076a18Snbeams
231af62e76eSnbeams:::{figure} ../../img/libceed_schematic_op_setup_mass.svg
23217be3a41SJeremy L ThompsonSpecific combination of $\bm{\bm{\mathcal{E}}}$, $\bm{B}$, $\bm{D}$, and input/output vectors corresponding to the libCEED operators in the t500-operator test
23302076a18Snbeams:::
234bcb2dfaeSJed Brown
235bcb2dfaeSJed BrownThe constructor
236bcb2dfaeSJed Brown
237bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
238bcb2dfaeSJed Brown:end-at: CeedInit
239bcb2dfaeSJed Brown:language: c
240bcb2dfaeSJed Brown:start-at: CeedInit
241bcb2dfaeSJed Brown```
242bcb2dfaeSJed Brown
24317be3a41SJeremy L Thompsoncreates a logical device `ceed` on the specified *resource*, which could also be a coprocessor such as `"/nvidia/0"`.
24417be3a41SJeremy L ThompsonThere can be any number of such devices, including multiple logical devices driving the same resource (though performance may suffer in case of oversubscription).
24517be3a41SJeremy L ThompsonThe resource is used to locate a suitable backend which will have discretion over the implementations of all objects created with this logical device.
246bcb2dfaeSJed Brown
24717be3a41SJeremy L ThompsonThe `setup` routine above computes and stores $\bm{D}$, in this case a scalar value in each quadrature point, while `mass` uses these saved values to perform the action of $\bm{D}$.
24817be3a41SJeremy L ThompsonThese functions are turned into the {ref}`CeedQFunction` variables `qf_setup` and `qf_mass` in the {c:func}`CeedQFunctionCreateInterior()` calls:
249bcb2dfaeSJed Brown
250bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
251bcb2dfaeSJed Brown:end-before: //! [QFunction Create]
252bcb2dfaeSJed Brown:language: c
253bcb2dfaeSJed Brown:start-after: //! [QFunction Create]
254bcb2dfaeSJed Brown```
255bcb2dfaeSJed Brown
25617be3a41SJeremy L ThompsonA {ref}`CeedQFunction` performs independent operations at each quadrature point and the interface is intended to facilitate vectorization.
25717be3a41SJeremy L ThompsonThe second argument is an expected vector length.
25817be3a41SJeremy L ThompsonIf greater than 1, the caller must ensure that the number of quadrature points `Q` is divisible by the vector length.
25917be3a41SJeremy L ThompsonThis is often satisfied automatically due to the element size or by batching elements together to facilitate vectorization in other stages, and can always be ensured by padding.
260bcb2dfaeSJed Brown
26117be3a41SJeremy L ThompsonIn addition to the function pointers (`setup` and `mass`), {ref}`CeedQFunction` constructors take a string representation specifying where the source for the implementation is found.
262*346c77e6SJeremy L ThompsonThis is used by backends that support Just-In-Time (JIT) compilation (i.e., CUDA and HIP) to compile for coprocessors.
263bcb2dfaeSJed BrownFor full support across all backends, these {ref}`CeedQFunction` source files must only contain constructs mutually supported by C99, C++11, and CUDA.
264bcb2dfaeSJed BrownFor example, explicit type casting of void pointers and explicit use of compatible arguments for {code}`math` library functions is required, and variable-length array (VLA) syntax for array reshaping is only available via libCEED's {code}`CEED_Q_VLA` macro.
265bcb2dfaeSJed Brown
26617be3a41SJeremy L ThompsonDifferent input and output fields are added individually, specifying the field name, size of the field, and evaluation mode.
267bcb2dfaeSJed Brown
26817be3a41SJeremy L ThompsonThe size of the field is provided by a combination of the number of components the effect of any basis evaluations.
269bcb2dfaeSJed Brown
27017be3a41SJeremy L ThompsonThe evaluation mode (see {ref}`CeedBasis-Typedefs and Enumerations`) `CEED_EVAL_INTERP` for both input and output fields indicates that the mass operator only contains terms of the form
271bcb2dfaeSJed Brown
272bcb2dfaeSJed Brown$$
273bcb2dfaeSJed Brown\int_\Omega v \cdot f_0 (u, \nabla u)
274bcb2dfaeSJed Brown$$
275bcb2dfaeSJed Brown
276bcb2dfaeSJed Brownwhere $v$ are test functions (see the {ref}`theoretical-framework`).
277bcb2dfaeSJed BrownMore general operators, such as those of the form
278bcb2dfaeSJed Brown
279bcb2dfaeSJed Brown$$
280bcb2dfaeSJed Brown\int_\Omega v \cdot f_0 (u, \nabla u) + \nabla v : f_1 (u, \nabla u)
281bcb2dfaeSJed Brown$$
282bcb2dfaeSJed Brown
283bcb2dfaeSJed Browncan be expressed.
284bcb2dfaeSJed Brown
28517be3a41SJeremy L ThompsonFor fields with derivatives, such as with the basis evaluation mode (see {ref}`CeedBasis-Typedefs and Enumerations`) `CEED_EVAL_GRAD`, the size of the field needs to reflect both the number of components and the geometric dimension.
28617be3a41SJeremy L ThompsonA 3-dimensional gradient on four components would therefore mean the field has a size of 12\.
287bcb2dfaeSJed Brown
28817be3a41SJeremy L ThompsonThe $\bm{B}$ operators for the mesh nodes, `basis_x`, and the unknown field, `basis_u`, are defined in the calls to the function {c:func}`CeedBasisCreateTensorH1Lagrange()`.
28917be3a41SJeremy L ThompsonIn this example, both the mesh and the unknown field use $H^1$ Lagrange finite elements of order 1 and 4 respectively (the `P` argument represents the number of 1D degrees of freedom on each element).
29017be3a41SJeremy L ThompsonBoth basis operators use the same integration rule, which is Gauss-Legendre with 8 points (the `Q` argument).
291bcb2dfaeSJed Brown
292bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
293bcb2dfaeSJed Brown:end-before: //! [Basis Create]
294bcb2dfaeSJed Brown:language: c
295bcb2dfaeSJed Brown:start-after: //! [Basis Create]
296bcb2dfaeSJed Brown```
297bcb2dfaeSJed Brown
29817be3a41SJeremy L ThompsonOther elements with this structure can be specified in terms of the `Q×P` matrices that evaluate values and gradients at quadrature points in one dimension using {c:func}`CeedBasisCreateTensorH1()`.
29917be3a41SJeremy L ThompsonElements that do not have tensor product structure, such as symmetric elements on simplices, will be created using different constructors.
300bcb2dfaeSJed Brown
30117be3a41SJeremy L ThompsonThe $\bm{\mathcal{E}}$ operators for the mesh nodes, `elem_restr_x`, and the unknown field, `elem_restr_u`, are specified in the {c:func}`CeedElemRestrictionCreate()`.
302ac5aa7bcSJeremy L ThompsonBoth of these specify directly the DoF indices for each element in the `ind_x` and `ind_u` arrays:
303bcb2dfaeSJed Brown
304bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
305bcb2dfaeSJed Brown:end-before: //! [ElemRestr Create]
306bcb2dfaeSJed Brown:language: c
307bcb2dfaeSJed Brown:start-after: //! [ElemRestr Create]
308bcb2dfaeSJed Brown```
309bcb2dfaeSJed Brown
310bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
311bcb2dfaeSJed Brown:end-before: //! [ElemRestrU Create]
312bcb2dfaeSJed Brown:language: c
313bcb2dfaeSJed Brown:start-after: //! [ElemRestrU Create]
314bcb2dfaeSJed Brown```
315bcb2dfaeSJed Brown
31617be3a41SJeremy L ThompsonIf the user has arrays available on a device, they can be provided using `CEED_MEM_DEVICE`.
31717be3a41SJeremy L ThompsonThis technique is used to provide no-copy interfaces in all contexts that involve problem-sized data.
318bcb2dfaeSJed Brown
31917be3a41SJeremy L ThompsonFor discontinuous Galerkin and for applications such as Nek5000 that only explicitly store **E-vectors** (inter-element continuity has been subsumed by the parallel restriction $\bm{P}$), the element restriction $\bm{\mathcal{E}}$ is the identity and {c:func}`CeedElemRestrictionCreateStrided()` is used instead.
32017be3a41SJeremy L ThompsonWe plan to support other structured representations of $\bm{\mathcal{E}}$ which will be added according to demand.
3210fe925dfSnbeamsThere are two common approaches for supporting non-conforming elements: applying the node constraints via $\bm P$ so that the **L-vector** can be processed uniformly and applying the constraints via $\bm{\mathcal{E}}$ so that the **E-vector** is uniform.
322bcb2dfaeSJed BrownThe former can be done with the existing interface while the latter will require a generalization to element restriction that would define field values at constrained nodes as linear combinations of the values at primary nodes.
323bcb2dfaeSJed Brown
32417be3a41SJeremy L ThompsonThese operations, $\bm{\mathcal{E}}$, $\bm{B}$, and $\bm{D}$, are combined with a {ref}`CeedOperator`.
32517be3a41SJeremy L ThompsonAs with {ref}`CeedQFunction`s, operator fields are added
32617be3a41SJeremy L Thompsonseparately with a matching field name, basis ($\bm{B}$), element restriction ($\bm{\mathcal{E}}$), and **L-vector**.
32717be3a41SJeremy L ThompsonThe flag `CEED_VECTOR_ACTIVE` indicates that the vector corresponding to that field will be provided to the operator when {c:func}`CeedOperatorApply()` is called.
32817be3a41SJeremy L ThompsonOtherwise the input/output will be read from/written to the specified **L-vector**.
329bcb2dfaeSJed Brown
33017be3a41SJeremy L ThompsonWith partial assembly, we first perform a setup stage where $\bm{D}$ is evaluated and stored.
33117be3a41SJeremy L ThompsonThis is accomplished by the operator `op_setup` and its application to `X`, the nodes of the mesh (these are needed to compute Jacobians at quadrature points).
332ac5aa7bcSJeremy L ThompsonNote that the corresponding {c:func}`CeedOperatorApply()` has no basis evaluation on the output, as the quadrature data is not needed at the DoFs:
333bcb2dfaeSJed Brown
334bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
335bcb2dfaeSJed Brown:end-before: //! [Setup Create]
336bcb2dfaeSJed Brown:language: c
337bcb2dfaeSJed Brown:start-after: //! [Setup Create]
338bcb2dfaeSJed Brown```
339bcb2dfaeSJed Brown
340bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
341bcb2dfaeSJed Brown:end-before: //! [Setup Set]
342bcb2dfaeSJed Brown:language: c
343bcb2dfaeSJed Brown:start-after: //! [Setup Set]
344bcb2dfaeSJed Brown```
345bcb2dfaeSJed Brown
346bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
347bcb2dfaeSJed Brown:end-before: //! [Setup Apply]
348bcb2dfaeSJed Brown:language: c
349bcb2dfaeSJed Brown:start-after: //! [Setup Apply]
350bcb2dfaeSJed Brown```
351bcb2dfaeSJed Brown
35217be3a41SJeremy L ThompsonThe action of the operator is then represented by operator `op_mass` and its {c:func}`CeedOperatorApply()` to the input **L-vector** `U` with output in `V`:
353bcb2dfaeSJed Brown
354bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
355bcb2dfaeSJed Brown:end-before: //! [Operator Create]
356bcb2dfaeSJed Brown:language: c
357bcb2dfaeSJed Brown:start-after: //! [Operator Create]
358bcb2dfaeSJed Brown```
359bcb2dfaeSJed Brown
360bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
361bcb2dfaeSJed Brown:end-before: //! [Operator Set]
362bcb2dfaeSJed Brown:language: c
363bcb2dfaeSJed Brown:start-after: //! [Operator Set]
364bcb2dfaeSJed Brown```
365bcb2dfaeSJed Brown
366bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t500-operator.c
367bcb2dfaeSJed Brown:end-before: //! [Operator Apply]
368bcb2dfaeSJed Brown:language: c
369bcb2dfaeSJed Brown:start-after: //! [Operator Apply]
370bcb2dfaeSJed Brown```
371bcb2dfaeSJed Brown
37217be3a41SJeremy L ThompsonA number of function calls in the interface, such as {c:func}`CeedOperatorApply()`, are intended to support asynchronous execution via their last argument, `CeedRequest*`.
37317be3a41SJeremy L ThompsonThe specific (pointer) value used in the above example, `CEED_REQUEST_IMMEDIATE`, is used to express the request (from the user) for the operation to complete before returning from the function call, i.e. to make sure that the result of the operation is available in the output parameters immediately after the call.
37417be3a41SJeremy L ThompsonFor a true asynchronous call, one needs to provide the address of a user defined variable.
37517be3a41SJeremy L ThompsonSuch a variable can be used later to explicitly wait for the completion of the operation.
376bcb2dfaeSJed Brown
377bcb2dfaeSJed Brown## Gallery of QFunctions
378bcb2dfaeSJed Brown
379bcb2dfaeSJed BrownLibCEED provides a gallery of built-in {ref}`CeedQFunction`s in the {file}`gallery/` directory.
38017be3a41SJeremy L ThompsonThe available QFunctions are the ones associated with the mass, the Laplacian, and the identity operators.
38117be3a41SJeremy L ThompsonTo illustrate how the user can declare a {ref}`CeedQFunction` via the gallery of available QFunctions, consider the selection of the {ref}`CeedQFunction` associated with a simple 1D mass matrix (cf. [tests/t410-qfunction.c](https://github.com/CEED/libCEED/blob/main/tests/t410-qfunction.c)).
382bcb2dfaeSJed Brown
383bcb2dfaeSJed Brown```{literalinclude} ../../../tests/t410-qfunction.c
384bcb2dfaeSJed Brown:language: c
385bcb2dfaeSJed Brown:linenos: true
386bcb2dfaeSJed Brown```
387bcb2dfaeSJed Brown
388bcb2dfaeSJed Brown## Interface Principles and Evolution
389bcb2dfaeSJed Brown
39017be3a41SJeremy L ThompsonLibCEED is intended to be extensible via backends that are packaged with the library and packaged separately (possibly as a binary containing proprietary code).
39117be3a41SJeremy L ThompsonBackends are registered by calling
392bcb2dfaeSJed Brown
393bcb2dfaeSJed Brown```{literalinclude} ../../../backends/ref/ceed-ref.c
394bcb2dfaeSJed Brown:end-before: //! [Register]
395bcb2dfaeSJed Brown:language: c
396bcb2dfaeSJed Brown:start-after: //! [Register]
397bcb2dfaeSJed Brown```
398bcb2dfaeSJed Brown
399bcb2dfaeSJed Browntypically in a library initializer or "constructor" that runs automatically.
400bcb2dfaeSJed Brown`CeedInit` uses this prefix to find an appropriate backend for the resource.
401bcb2dfaeSJed Brown
40217be3a41SJeremy L ThompsonSource (API) and binary (ABI) stability are important to libCEED.
40317be3a41SJeremy L ThompsonPrior to reaching version 1.0, libCEED does not implement strict [semantic versioning](https://semver.org) across the entire interface.
40417be3a41SJeremy L ThompsonHowever, user code, including libraries of {ref}`CeedQFunction`s, should be source and binary compatible moving from 0.x.y to any later release 0.x.z.
40517be3a41SJeremy L ThompsonWe have less experience with external packaging of backends and do not presently guarantee source or binary stability, but we intend to define stability guarantees for libCEED 1.0.
40617be3a41SJeremy L ThompsonWe'd love to talk with you if you're interested in packaging backends externally, and will work with you on a practical stability policy.
407