xref: /libCEED/doc/sphinx/source/libCEEDapi.md (revision 5aed82e4fa97acf4ba24a7f10a35f5303a6798e0)
1# Interface Concepts
2
3This page provides a brief description of the theoretical foundations and the practical implementation of the libCEED library.
4
5(theoretical-framework)=
6
7## Theoretical Framework
8
9In finite element formulations, the weak form of a Partial Differential Equation (PDE) is evaluated on a subdomain $\Omega_e$ (element) and the local results are composed into a larger system of equations that models the entire problem on the global domain $\Omega$.
10In particular, when high-order finite elements or spectral elements are used, the resulting sparse matrix representation of the global operator is computationally expensive, with respect to both the memory transfer and floating point operations needed for its evaluation.
11libCEED provides an interface for matrix-free operator description that enables efficient evaluation on a variety of computational device types (selectable at run time).
12We present here the notation and the mathematical formulation adopted in libCEED.
13
14We start by considering the discrete residual $F(u)=0$ formulation in weak form.
15We first define the $L^2$ inner product between real-valued functions
16
17$$
18\langle v, u \rangle = \int_\Omega v u d \bm{x},
19$$
20
21where $\bm{x} \in \mathbb{R}^d \supset \Omega$.
22
23We want to find $u$ in a suitable space $V_D$, such that
24
25$$
26\langle  \bm v,  \bm f(u) \rangle = \int_\Omega  \bm v \cdot  \bm f_0 (u, \nabla u) + \nabla \bm v :  \bm f_1 (u, \nabla u) = 0
27$$ (residual)
28
29for all $\bm v$ in the corresponding homogeneous space $V_0$, where $\bm f_0$ and $\bm f_1$ contain all possible sources in the problem.
30We notice here that $\bm f_0$ represents all terms in {eq}`residual` which multiply the (possibly vector-valued) test function $\bm v$ and $\bm f_1$ all terms which multiply its gradient $\nabla \bm v$.
31For an n-component problems in $d$ dimensions, $\bm f_0 \in \mathbb{R}^n$ and $\bm f_1 \in \mathbb{R}^{nd}$.
32
33:::{note}
34The notation $\nabla \bm v \!:\! \bm f_1$ represents contraction over both fields and spatial dimensions while a single dot represents contraction in just one, which should be clear from context, e.g., $\bm v \cdot \bm f_0$ contracts only over fields.
35:::
36
37:::{note}
38In the code, the function that represents the weak form at quadrature points is called the {ref}`CeedQFunction`.
39In the {ref}`Examples` provided with the library (in the {file}`examples/` directory), we store the term $\bm f_0$ directly into `v`, and the term $\bm f_1$ directly into `dv` (which stands for $\nabla \bm v$).
40If equation {eq}`residual` only presents a term of the type $\bm f_0$, the {ref}`CeedQFunction` will only have one output argument, namely `v`.
41If equation {eq}`residual` also presents a term of the type $\bm f_1$, then the {ref}`CeedQFunction` will have two output arguments, namely, `v` and `dv`.
42:::
43
44## Finite Element Operator Decomposition
45
46Finite element operators are typically defined through weak formulations of partial differential equations that involve integration over a computational mesh.
47The required integrals are computed by splitting them as a sum over the mesh elements, mapping each element to a simple *reference* element (e.g. the unit square) and applying a quadrature rule in reference space.
48
49This sequence of operations highlights an inherent hierarchical structure present in all finite element operators where the evaluation starts on *global (trial) degrees of freedom (DoFs) or nodes on the whole mesh*, restricts to *DoFs on subdomains* (groups of elements), then moves to independent *DoFs on each element*, transitions to independent *quadrature points* in reference space, performs the integration, and then goes back in reverse order to global (test) degrees of freedom on the whole mesh.
50
51This is illustrated below for the simple case of symmetric linear operator on third order ($Q_3$) scalar continuous ($H^1$) elements, where we use the notions **T-vector**, **L-vector**, **E-vector** and **Q-vector** to represent the sets corresponding to the (true) degrees of freedom on the global mesh, the split local degrees of freedom on the subdomains, the split degrees of freedom on the mesh elements, and the values at quadrature points, respectively.
52
53We refer to the operators that connect the different types of vectors as:
54
55- Subdomain restriction $\bm{P}$
56- Element restriction $\bm{\mathcal{E}}$
57- Basis (Dofs-to-Qpts) evaluator $\bm{B}$
58- Operator at quadrature points $\bm{D}$
59
60More generally, when the test and trial space differ, they get their own versions of $\bm{P}$, $\bm{\mathcal{E}}$ and $\bm{B}$.
61
62(fig-operator-decomp)=
63
64:::{figure} ../../img/libCEED-decomposition.svg
65Operator Decomposition
66:::
67
68Note that in the case of adaptive mesh refinement (AMR), the restrictions $\bm{P}$ and $\bm{\mathcal{E}}$ will involve not just extracting sub-vectors, but evaluating values at constrained degrees of freedom through the AMR interpolation.
69There can also be several levels of subdomains ($\bm P_1$, $\bm P_2$, etc.), and it may be convenient to split $\bm{D}$ as the product of several operators ($\bm D_1$, $\bm D_2$, etc.).
70
71### Terminology and Notation
72
73Vector representation/storage categories:
74
75- True degrees of freedom/unknowns, **T-vector**:
76
77  > - each unknown $i$ has exactly one copy, on exactly one processor, $rank(i)$
78  > - this is a non-overlapping vector decomposition
79  > - usually includes any essential (fixed) DoFs.
80  >
81  > ```{image} ../../img/T-vector.svg
82  > ```
83
84- Local (w.r.t. processors) degrees of freedom/unknowns, **L-vector**:
85
86  > - each unknown $i$ has exactly one copy on each processor that owns an element containing $i$
87  > - this is an overlapping vector decomposition with overlaps only across different processors---there is no duplication of unknowns on a single processor
88  > - the shared DoFs/unknowns are the overlapping DoFs, i.e. the ones that have more than one copy, on different processors.
89  >
90  > ```{image} ../../img/L-vector.svg
91  > ```
92
93- Per element decomposition, **E-vector**:
94
95  > - each unknown $i$ has as many copies as the number of elements that contain $i$
96  > - usually, the copies of the unknowns are grouped by the element they belong to.
97  >
98  > ```{image} ../../img/E-vector.svg
99  > ```
100
101- In the case of AMR with hanging nodes (giving rise to hanging DoFs):
102
103  > - the **L-vector** is enhanced with the hanging/dependent DoFs
104  > - the additional hanging/dependent DoFs are duplicated when they are shared by multiple processors
105  > - this way, an **E-vector** can be derived from an **L-vector** without any communications and without additional computations to derive the dependent DoFs
106  > - in other words, an entry in an **E-vector** is obtained by copying an entry from the corresponding **L-vector**, optionally switching the sign of the entry (for $H(\mathrm{div})$---and $H(\mathrm{curl})$-conforming spaces).
107  >
108  > ```{image} ../../img/L-vector-AMR.svg
109  > ```
110
111- In the case of variable order spaces:
112
113  > - the dependent DoFs (usually on the higher-order side of a face/edge) can be treated just like the hanging/dependent DoFs case.
114
115- Quadrature point vector, **Q-vector**:
116
117  > - this is similar to **E-vector** where instead of DoFs, the vector represents values at quadrature points, grouped by element.
118
119- In many cases it is useful to distinguish two types of vectors:
120
121  > - **X-vector**, or **primal X-vector**, and **X'-vector**, or **dual X-vector**
122  > - here X can be any of the T, L, E, or Q categories
123  > - for example, the mass matrix operator maps a **T-vector** to a **T'-vector**
124  > - the solutions vector is a **T-vector**, and the RHS vector is a **T'-vector**
125  > - using the parallel prolongation operator, one can map the solution **T-vector** to a solution **L-vector**, etc.
126
127Operator representation/storage/action categories:
128
129- Full true-DoF parallel assembly, **TA**, or **A**:
130
131  > - ParCSR or similar format
132  > - the T in TA indicates that the data format represents an operator from a **T-vector** to a **T'-vector**.
133
134- Full local assembly, **LA**:
135
136  > - CSR matrix on each rank
137  > - the parallel prolongation operator, $\bm{P}$, (and its transpose) should use optimized matrix-free action
138  > - note that $\bm{P}$ is the operator mapping T-vectors to L-vectors.
139
140- Element matrix assembly, **EA**:
141
142  > - each element matrix is stored as a dense matrix
143  > - optimized element and parallel prolongation operators
144  > - note that the element prolongation operator is the mapping from an **L-vector** to an **E-vector**.
145
146- Quadrature-point/partial assembly, **QA** or **PA**:
147
148  > - precompute and store $w\det(J)$ at all quadrature points in all mesh elements
149  > - the stored data can be viewed as a **Q-vector**.
150
151- Unassembled option,  **UA** or **U**:
152
153  > - no assembly step
154  > - the action uses directly the mesh node coordinates, and assumes specific form of the coefficient, e.g. constant, piecewise-constant, or given as a **Q-vector** (Q-coefficient).
155
156### Partial Assembly
157
158Since the global operator $\bm{A}$ is just a series of variational restrictions with $\bm{B}$, $\bm{\mathcal{E}}$ and $\bm{P}$, starting from its point-wise kernel $\bm{D}$, a "matvec" with $\bm{A}$ can be performed by evaluating and storing some of the innermost variational restriction matrices, and applying the rest of the operators "on-the-fly".
159For example, one can compute and store a global matrix on **T-vector** level.
160Alternatively, one can compute and store only the subdomain (**L-vector**) or element (**E-vector**) matrices and perform the action of $\bm{A}$ using matvecs with $\bm{P}$ or $\bm{P}$ and $\bm{\mathcal{E}}$.
161While these options are natural for low-order discretizations, they are not a good fit for high-order methods due to the amount of FLOPs needed for their evaluation, as well as the memory transfer needed for a matvec.
162
163Our focus in libCEED, instead, is on **partial assembly**, where we compute and store only $\bm{D}$ (or portions of it) and evaluate the actions of $\bm{P}$, $\bm{\mathcal{E}}$ and $\bm{B}$ on-the-fly.
164Critically for performance, we take advantage of the tensor-product structure of the degrees of freedom and quadrature points on *quad* and *hex* elements to perform the action of $\bm{B}$ without storing it as a matrix.
165
166Implemented properly, the partial assembly algorithm requires optimal amount of memory transfers (with respect to the polynomial order) and near-optimal FLOPs for operator evaluation.
167It consists of an operator *setup* phase, that evaluates and stores $\bm{D}$ and an operator *apply* (evaluation) phase that computes the action of $\bm{A}$ on an input vector.
168When desired, the setup phase may be done as a side-effect of evaluating a different operator, such as a nonlinear residual.
169The relative costs of the setup and apply phases are different depending on the physics being expressed and the representation of $\bm{D}$.
170
171### Parallel Decomposition
172
173After the application of each of the first three transition operators, $\bm{P}$, $\bm{\mathcal{E}}$ and $\bm{B}$, the operator evaluation is decoupled  on their ranges, so $\bm{P}$, $\bm{\mathcal{E}}$ and $\bm{B}$ allow us to "zoom-in" to subdomain, element and quadrature point level, ignoring the coupling at higher levels.
174
175Thus, a natural mapping of $\bm{A}$ on a parallel computer is to split the **T-vector** over MPI ranks (a non-overlapping decomposition, as is typically used for sparse matrices), and then split the rest of the vector types over computational devices (CPUs, GPUs, etc.) as indicated by the shaded regions in the diagram above.
176
177One of the advantages of the decomposition perspective in these settings is that the operators $\bm{P}$, $\bm{\mathcal{E}}$, $\bm{B}$ and $\bm{D}$ clearly separate the MPI parallelism in the operator ($\bm{P}$) from the unstructured mesh topology ($\bm{\mathcal{E}}$), the choice of the finite element space/basis ($\bm{B}$) and the geometry and point-wise physics $\bm{D}$.
178These components also naturally fall in different classes of numerical algorithms -- parallel (multi-device) linear algebra for $\bm{P}$, sparse (on-device) linear algebra for $\bm{\mathcal{E}}$, dense/structured linear algebra (tensor contractions) for $\bm{B}$ and parallel point-wise evaluations for $\bm{D}$.
179
180Currently in libCEED, it is assumed that the host application manages the global **T-vectors** and the required communications among devices (which are generally on different compute nodes) with **P**.
181Our API is thus focused on the **L-vector** level, where the logical devices, which in the library are represented by the {ref}`Ceed` object, are independent.
182Each MPI rank can use one or more {ref}`Ceed`s, and each {ref}`Ceed`, in turn, can represent one or more physical devices, as long as libCEED backends support such configurations.
183The idea is that every MPI rank can use any logical device it is assigned at runtime.
184For example, on a node with 2 CPU sockets and 4 GPUs, one may decide to use 6 MPI ranks (each using a single {ref}`Ceed` object): 2 ranks using 1 CPU socket each, and 4 using 1 GPU each.
185Another choice could be to run 1 MPI rank on the whole node and use 5 {ref}`Ceed` objects: 1 managing all CPU cores on the 2 sockets and 4 managing 1 GPU each.
186The communications among the devices, e.g. required for applying the action of $\bm{P}$, are currently out of scope of libCEED.
187The interface is non-blocking for all operations involving more than O(1) data, allowing operations performed on a coprocessor or worker threads to overlap with operations on the host.
188
189## API Description
190
191The libCEED API takes an algebraic approach, where the user essentially describes in the *frontend* the operators $\bm{\bm{\mathcal{E}}}$, $\bm{B}$, and $\bm{D}$ and the library provides *backend* implementations and coordinates their action to the original operator on **L-vector** level (i.e. independently on each device / MPI task).
192This is visualized in the schematic below; "active" and "passive" inputs/outputs will be discussed in more detail later.
193
194(fig-operator-schematic)=
195
196:::{figure} ../../img/libceed_schematic.svg
197Flow of data through vector types inside libCEED Operators, through backend implementations of $\bm{\bm{\mathcal{E}}}$, $\bm{B}$, and $\bm{D}$
198:::
199
200One of the advantages of this purely algebraic description is that it already includes all the finite element information, so the backends can operate on linear algebra level without explicit finite element code.
201The frontend description is general enough to support a wide variety of finite element algorithms, as well as some other types algorithms such as spectral finite differences.
202The separation of the front- and backends enables applications to easily switch/try different backends.
203It also enables backend developers to impact many applications from a single implementation.
204
205Our long-term vision is to include a variety of backend implementations in libCEED, ranging from reference kernels to highly optimized kernels targeting specific devices (e.g. GPUs) or specific polynomial orders.
206A simple reference backend implementation is provided in the file
207[ceed-ref.c](https://github.com/CEED/libCEED/blob/main/backends/ref/ceed-ref.c).
208
209
210On the frontend, the mapping between the decomposition concepts and the code implementation is as follows:
211
212- **L-**, **E-** and **Q-vector** are represented as variables of type {ref}`CeedVector`.
213  (A backend may choose to operate incrementally without forming explicit **E-** or **Q-vectors**.)
214- $\bm{\mathcal{E}}$ is represented as variable of type {ref}`CeedElemRestriction`.
215- $\bm{B}$ is represented as variable of type {ref}`CeedBasis`.
216- the action of $\bm{D}$ is represented as variable of type {ref}`CeedQFunction`.
217- the overall operator $\bm{\mathcal{E}}^T \bm{B}^T \bm{D} \bm{B} \bm{\mathcal{E}}$ is represented as variable of type {ref}`CeedOperator` and its action is accessible through {c:func}`CeedOperatorApply()`.
218
219To clarify these concepts and illustrate how they are combined in the API, consider the implementation of the action of a simple 1D mass matrix (cf. [tests/t500-operator.c](https://github.com/CEED/libCEED/blob/main/tests/t500-operator.c)).
220
221```{literalinclude} ../../../tests/t500-operator.c
222:language: c
223:linenos: true
224```
225In the following figure, we specialize the schematic used above for general operators so that it corresponds to the specific setup and mass operators as implemented in the sample code.
226We show that the active output of the setup operator, combining the quadrature weights with the Jacobian information for the mesh transformation, becomes a passive input to the mass operator.
227Notations denote the libCEED function used to set the properties of the input and output fields.
228
229(fig-operator-schematic-mass)=
230
231:::{figure} ../../img/libceed_schematic_op_setup_mass.svg
232Specific combination of $\bm{\bm{\mathcal{E}}}$, $\bm{B}$, $\bm{D}$, and input/output vectors corresponding to the libCEED operators in the t500-operator test
233:::
234
235The constructor
236
237```{literalinclude} ../../../tests/t500-operator.c
238:end-at: CeedInit
239:language: c
240:start-at: CeedInit
241```
242
243creates a logical device `ceed` on the specified *resource*, which could also be a coprocessor such as `"/nvidia/0"`.
244There can be any number of such devices, including multiple logical devices driving the same resource (though performance may suffer in case of oversubscription).
245The resource is used to locate a suitable backend which will have discretion over the implementations of all objects created with this logical device.
246
247The `setup` routine above computes and stores $\bm{D}$, in this case a scalar value in each quadrature point, while `mass` uses these saved values to perform the action of $\bm{D}$.
248These functions are turned into the {ref}`CeedQFunction` variables `qf_setup` and `qf_mass` in the {c:func}`CeedQFunctionCreateInterior()` calls:
249
250```{literalinclude} ../../../tests/t500-operator.c
251:end-before: //! [QFunction Create]
252:language: c
253:start-after: //! [QFunction Create]
254```
255
256A {ref}`CeedQFunction` performs independent operations at each quadrature point and the interface is intended to facilitate vectorization.
257The second argument is an expected vector length.
258If greater than 1, the caller must ensure that the number of quadrature points `Q` is divisible by the vector length.
259This is often satisfied automatically due to the element size or by batching elements together to facilitate vectorization in other stages, and can always be ensured by padding.
260
261In addition to the function pointers (`setup` and `mass`), {ref}`CeedQFunction` constructors take a string representation specifying where the source for the implementation is found.
262This is used by backends that support Just-In-Time (JIT) compilation (i.e., CUDA and OCCA) to compile for coprocessors.
263For full support across all backends, these {ref}`CeedQFunction` source files must only contain constructs mutually supported by C99, C++11, and CUDA.
264For example, explicit type casting of void pointers and explicit use of compatible arguments for {code}`math` library functions is required, and variable-length array (VLA) syntax for array reshaping is only available via libCEED's {code}`CEED_Q_VLA` macro.
265
266Different input and output fields are added individually, specifying the field name, size of the field, and evaluation mode.
267
268The size of the field is provided by a combination of the number of components the effect of any basis evaluations.
269
270The evaluation mode (see {ref}`CeedBasis-Typedefs and Enumerations`) `CEED_EVAL_INTERP` for both input and output fields indicates that the mass operator only contains terms of the form
271
272$$
273\int_\Omega v \cdot f_0 (u, \nabla u)
274$$
275
276where $v$ are test functions (see the {ref}`theoretical-framework`).
277More general operators, such as those of the form
278
279$$
280\int_\Omega v \cdot f_0 (u, \nabla u) + \nabla v : f_1 (u, \nabla u)
281$$
282
283can be expressed.
284
285For fields with derivatives, such as with the basis evaluation mode (see {ref}`CeedBasis-Typedefs and Enumerations`) `CEED_EVAL_GRAD`, the size of the field needs to reflect both the number of components and the geometric dimension.
286A 3-dimensional gradient on four components would therefore mean the field has a size of 12\.
287
288The $\bm{B}$ operators for the mesh nodes, `basis_x`, and the unknown field, `basis_u`, are defined in the calls to the function {c:func}`CeedBasisCreateTensorH1Lagrange()`.
289In this example, both the mesh and the unknown field use $H^1$ Lagrange finite elements of order 1 and 4 respectively (the `P` argument represents the number of 1D degrees of freedom on each element).
290Both basis operators use the same integration rule, which is Gauss-Legendre with 8 points (the `Q` argument).
291
292```{literalinclude} ../../../tests/t500-operator.c
293:end-before: //! [Basis Create]
294:language: c
295:start-after: //! [Basis Create]
296```
297
298Other elements with this structure can be specified in terms of the `Q×P` matrices that evaluate values and gradients at quadrature points in one dimension using {c:func}`CeedBasisCreateTensorH1()`.
299Elements that do not have tensor product structure, such as symmetric elements on simplices, will be created using different constructors.
300
301The $\bm{\mathcal{E}}$ operators for the mesh nodes, `elem_restr_x`, and the unknown field, `elem_restr_u`, are specified in the {c:func}`CeedElemRestrictionCreate()`.
302Both of these specify directly the DoF indices for each element in the `ind_x` and `ind_u` arrays:
303
304```{literalinclude} ../../../tests/t500-operator.c
305:end-before: //! [ElemRestr Create]
306:language: c
307:start-after: //! [ElemRestr Create]
308```
309
310```{literalinclude} ../../../tests/t500-operator.c
311:end-before: //! [ElemRestrU Create]
312:language: c
313:start-after: //! [ElemRestrU Create]
314```
315
316If the user has arrays available on a device, they can be provided using `CEED_MEM_DEVICE`.
317This technique is used to provide no-copy interfaces in all contexts that involve problem-sized data.
318
319For discontinuous Galerkin and for applications such as Nek5000 that only explicitly store **E-vectors** (inter-element continuity has been subsumed by the parallel restriction $\bm{P}$), the element restriction $\bm{\mathcal{E}}$ is the identity and {c:func}`CeedElemRestrictionCreateStrided()` is used instead.
320We plan to support other structured representations of $\bm{\mathcal{E}}$ which will be added according to demand.
321There are two common approaches for supporting non-conforming elements: applying the node constraints via $\bm P$ so that the **L-vector** can be processed uniformly and applying the constraints via $\bm{\mathcal{E}}$ so that the **E-vector** is uniform.
322The former can be done with the existing interface while the latter will require a generalization to element restriction that would define field values at constrained nodes as linear combinations of the values at primary nodes.
323
324These operations, $\bm{\mathcal{E}}$, $\bm{B}$, and $\bm{D}$, are combined with a {ref}`CeedOperator`.
325As with {ref}`CeedQFunction`s, operator fields are added
326separately with a matching field name, basis ($\bm{B}$), element restriction ($\bm{\mathcal{E}}$), and **L-vector**.
327The flag `CEED_VECTOR_ACTIVE` indicates that the vector corresponding to that field will be provided to the operator when {c:func}`CeedOperatorApply()` is called.
328Otherwise the input/output will be read from/written to the specified **L-vector**.
329
330With partial assembly, we first perform a setup stage where $\bm{D}$ is evaluated and stored.
331This is accomplished by the operator `op_setup` and its application to `X`, the nodes of the mesh (these are needed to compute Jacobians at quadrature points).
332Note that the corresponding {c:func}`CeedOperatorApply()` has no basis evaluation on the output, as the quadrature data is not needed at the DoFs:
333
334```{literalinclude} ../../../tests/t500-operator.c
335:end-before: //! [Setup Create]
336:language: c
337:start-after: //! [Setup Create]
338```
339
340```{literalinclude} ../../../tests/t500-operator.c
341:end-before: //! [Setup Set]
342:language: c
343:start-after: //! [Setup Set]
344```
345
346```{literalinclude} ../../../tests/t500-operator.c
347:end-before: //! [Setup Apply]
348:language: c
349:start-after: //! [Setup Apply]
350```
351
352The action of the operator is then represented by operator `op_mass` and its {c:func}`CeedOperatorApply()` to the input **L-vector** `U` with output in `V`:
353
354```{literalinclude} ../../../tests/t500-operator.c
355:end-before: //! [Operator Create]
356:language: c
357:start-after: //! [Operator Create]
358```
359
360```{literalinclude} ../../../tests/t500-operator.c
361:end-before: //! [Operator Set]
362:language: c
363:start-after: //! [Operator Set]
364```
365
366```{literalinclude} ../../../tests/t500-operator.c
367:end-before: //! [Operator Apply]
368:language: c
369:start-after: //! [Operator Apply]
370```
371
372A number of function calls in the interface, such as {c:func}`CeedOperatorApply()`, are intended to support asynchronous execution via their last argument, `CeedRequest*`.
373The specific (pointer) value used in the above example, `CEED_REQUEST_IMMEDIATE`, is used to express the request (from the user) for the operation to complete before returning from the function call, i.e. to make sure that the result of the operation is available in the output parameters immediately after the call.
374For a true asynchronous call, one needs to provide the address of a user defined variable.
375Such a variable can be used later to explicitly wait for the completion of the operation.
376
377## Gallery of QFunctions
378
379LibCEED provides a gallery of built-in {ref}`CeedQFunction`s in the {file}`gallery/` directory.
380The available QFunctions are the ones associated with the mass, the Laplacian, and the identity operators.
381To illustrate how the user can declare a {ref}`CeedQFunction` via the gallery of available QFunctions, consider the selection of the {ref}`CeedQFunction` associated with a simple 1D mass matrix (cf. [tests/t410-qfunction.c](https://github.com/CEED/libCEED/blob/main/tests/t410-qfunction.c)).
382
383```{literalinclude} ../../../tests/t410-qfunction.c
384:language: c
385:linenos: true
386```
387
388## Interface Principles and Evolution
389
390LibCEED is intended to be extensible via backends that are packaged with the library and packaged separately (possibly as a binary containing proprietary code).
391Backends are registered by calling
392
393```{literalinclude} ../../../backends/ref/ceed-ref.c
394:end-before: //! [Register]
395:language: c
396:start-after: //! [Register]
397```
398
399typically in a library initializer or "constructor" that runs automatically.
400`CeedInit` uses this prefix to find an appropriate backend for the resource.
401
402Source (API) and binary (ABI) stability are important to libCEED.
403Prior to reaching version 1.0, libCEED does not implement strict [semantic versioning](https://semver.org) across the entire interface.
404However, user code, including libraries of {ref}`CeedQFunction`s, should be source and binary compatible moving from 0.x.y to any later release 0.x.z.
405We have less experience with external packaging of backends and do not presently guarantee source or binary stability, but we intend to define stability guarantees for libCEED 1.0.
406We'd love to talk with you if you're interested in packaging backends externally, and will work with you on a practical stability policy.
407