xref: /libCEED/doc/sphinx/source/libCEEDdev.md (revision d416dc2b8eb8ab8cb4fa3546f1e63962299dc06a)
1# Developer Notes
2
3## Library Design
4
5LibCEED has a single user facing API for creating and using the libCEED objects ({ref}`CeedVector`, {ref}`CeedBasis`, etc).
6Different Ceed backends are selected by instantiating a different {ref}`Ceed` object to create the other libCEED objects, in a [bridge pattern](https://en.wikipedia.org/wiki/Bridge_pattern).
7At runtime, the user can select the different backend implementations to target different hardware, such as CPUs or GPUs.
8
9When designing new features, developers should place the function definitions for the user facing API in the header `/include/ceed/ceed.h`.
10The basic implementation of these functions should typically be placed in `/interface/*.c` files.
11The interface should pass any computationally expensive or hardware specific operations to a backend implementation.
12A new method for the associated libCEED object can be added in `/include/ceed-impl.h`, with a corresponding `CEED_FTABLE_ENTRY` in `/interface/ceed.c` to allow backends to set their own implementations of this method.
13Then in the creation of the backend specific implementation of the object, typically found in `/backends/[impl]/ceed-[impl]-[object].c`, the developer creates the backend implementation of the specific method and calls {c:func}`CeedSetBackendFunction` to set this implementation of the method for the backend.
14Any supplemental functions intended to be used in the interface or by the backends may be added to the backend API in the header `/include/ceed/backend.h`.
15The basic implementation of these functions should also be placed in `/interface/*.c` files.
16
17LibCEED generally follows a "CPU first" implementation strategy when adding new functionality to the user facing API.
18If there are no performance specific considerations, it is generally recommended to include a basic CPU default implementation in `/interface/*.c`.
19Any new functions must be well documented and tested.
20Once the user facing API and the default implementation are in place and verified correct via tests, then the developer can focus on hardware specific implementations (AVX, CUDA, HIP, etc.) as necessary.
21
22## Backend Inheritance
23
24A Ceed backend is not required to implement all libCeed objects or {ref}`CeedOperator` methods.
25There are three mechanisms by which a Ceed backend can inherit implementations from another Ceed backend.
26
271. Delegation - Developers may use {c:func}`CeedSetDelegate` to set a general delegate {ref}`Ceed` object.
28   This delegate {ref}`Ceed` will provide the implementation of any libCeed objects that parent backend does not implement.
29   For example, the `/cpu/self/xsmm/serial` backend implements the `CeedTensorContract` object itself but delegates all other functionality to the `/cpu/self/opt/serial` backend.
30
312. Object delegation  - Developers may use {c:func}`CeedSetObjectDelegate` to set a delegate {ref}`Ceed` object for a specific libCEED object.
32   This delegate {ref}`Ceed` will only provide the implementation of that specific libCeed object for the parent backend.
33   Object delegation has higher precedence than delegation.
34
353. Operator fallback - Developers may use {c:func}`CeedSetOperatorFallbackCeed` to set a {ref}`Ceed` object to provide any unimplemented {ref}`CeedOperator` methods that support preconditioning, such as {c:func}`CeedOperatorLinearAssemble`.
36   The parent backend must implement the basic {ref}`CeedOperator` functionality.
37   Like the delegates above, this fallback {ref}`Ceed` object should be created and set in the backend `CeedInit` function.
38   In order to use operator fallback, the parent backend and fallback backend must use compatible E-vector and Q-vector layouts.
39   For example, `/gpu/cuda/gen` falls back to `/gpu/cuda/ref` for missing {ref}`CeedOperator` preconditioning support methods.
40   If an unimplemented method is called, then the parent `/gpu/cuda/gen` {ref}`Ceed` object uses its fallback `/gpu/cuda/ref` {ref}`Ceed` object to create a clone of the {ref}`CeedOperator`.
41   This clone {ref}`CeedOperator` is then used for the unimplemented preconditioning support methods.
42
43## Backend Families
44
45There are 4 general 'families' of backend implementations.
46As internal data layouts are specific to backend families, it is generally not possible to delegate between backend families.
47
48### CPU Backends
49
50The basic CPU with the simplest implementation is `/cpu/self/ref/serial`.
51This backend contains the basic implementations of most objects that other backends rely upon.
52Most of the other CPU backends only update the {ref}`CeedOperator` and `CeedTensorContract` objects.
53
54The `/cpu/self/ref/blockend` and `/cpu/self/opt/*` backends delegate to the `/cpu/self/ref/serial` backend.
55The `/cpu/self/ref/blocked` backend updates the {ref}`CeedOperator` to use an E-vector and Q-vector ordering when data for 8 elements are interlaced to provide better vectorization.
56The `/cpu/self/opt/*` backends update the {ref}`CeedOperator` to apply the action of the operator in 1 or 8 element batches, depending upon if the blocking strategy is used.
57This reduced the memory required to utilize this backend significantly.
58
59The `/cpu/self/avx/*` and `/cpu/self/xsmm/*` backends delegate to the corresponding `/cpu/self/opt/*` backends.
60These backends update the `CeedTensorContract` objects using AVX intrinsics and libXSMM functions, respectively.
61
62The `/cpu/self/memcheck/*` backends delegate to the `/cpu/self/ref/*` backends.
63These backends replace many of the implementations with methods that include more verification checks and a memory management model that more closely matches the memory management for GPU backends.
64These backends rely upon the [Valgrind](https://valgrind.org/) Memcheck tool and Valgrind headers.
65
66### GPU Backends
67
68The CUDA, HIP, and SYCL backend families all follow similar designs.
69The CUDA and HIP backends are very similar, with minor differences.
70While the SYCL backend was based upon the CUDA and HIP backends, there are more internal differences to accommodate OpenCL and Intel hardware.
71
72The `/gpu/*/ref` backends provide basic functionality.
73In these backends, the operator is applied in multiple separate kernel launches, following the libCEED operator decomposition, where first {ref}`CeedElemRestriction` kernels map from the L-vectors to E-vectors, then {ref}`CeedBasis` kernels map from the E-vectors to Q-vectors, then the {ref}`CeedQFunction` kernel provides the action of the user quadrature point function, and the transpose {ref}`CeedBasis` and {ref}`CeedElemRestriction` kernels are applied to go back to the E-vectors and finally the L-vectors.
74These kernels apply to all points across all elements in order to maximize the amount of work each kernel launch has.
75Some of these kernels are compiled at runtime via NVRTC, HIPRTC, or OpenCL RTC.
76
77The `/gpu/*/shared` backends delegate to the corresponding `/gpu/*/ref` backends.
78These backends use shared memory to improve performance for the {ref}`CeedBasis` kernels.
79All other libCEED objects are delegated to `/gpu/*/ref`.
80These kernels are compiled at runtime via NVRTC, HIPRTC, or OpenCL RTC.
81
82The `/gpu/*/gen` backends delegate to the corresponding `/gpu/*/shared` backends.
83These backends write a single comprehensive kernel to apply the action of the {ref}`CeedOperator`, significantly improving performance by eliminating intermediate data structures and reducing the total number of kernel launches required.
84This kernel is compiled at runtime via NVRTC, HIPRTC, or OpenCL RTC.
85
86The `/gpu/*/magma` backends delegate to the corresponding `/gpu/cuda/ref` and `/gpu/hip/ref` backends.
87These backends provide better performance for {ref}`CeedBasis` kernels but do not have the improvements from the `/gpu/*/gen` backends for {ref}`CeedOperator`.
88
89## Internal Layouts
90
91Ceed backends are free to use any E-vector and Q-vector data layout (including never fully forming these vectors) so long as the backend passes the `t5**` series tests and all examples.
92There are several common layouts for L-vectors, E-vectors, and Q-vectors, detailed below:
93
94- **L-vector** layouts
95
96  - L-vectors described by a standard {ref}`CeedElemRestriction` have a layout described by the `offsets` array and `comp_stride` parameter.
97    Data for node `i`, component `j`, element `k` can be found in the L-vector at index `offsets[i + k*elem_size] + j*comp_stride`.
98  - L-vectors described by a strided {ref}`CeedElemRestriction` have a layout described by the `strides` array.
99    Data for node `i`, component `j`, element `k` can be found in the L-vector at index `i*strides[0] + j*strides[1] + k*strides[2]`.
100
101- **E-vector** layouts
102
103  - If possible, backends should use {c:func}`CeedElemRestrictionSetELayout()` to use the `t2**` tests.
104    If the backend uses a strided E-vector layout, then the data for node `i`, component `j`, element `k` in the E-vector is given by `i*layout[0] + j*layout[1] + k*layout[2]`.
105  - Backends may choose to use a non-strided E-vector layout; however, the `t2**` tests will not function correctly in this case and these tests will need to be marked as allowable failures for this backend in the test suite.
106
107- **Q-vector** layouts
108
109  - When the size of a {ref}`CeedQFunction` field is greater than `1`, data for quadrature point `i` component `j` can be found in the Q-vector at index `i + Q*j`, where `Q` is the total number of quadrature points in the Q-vector.
110    Backends are free to provide the quadrature points in any order.
111  - When the {ref}`CeedQFunction` field has `emode` `CEED_EVAL_GRAD`, data for quadrature point `i`, component `j`, derivative `k` can be found in the Q-vector at index `i + Q*j + Q*num_comp*k`.
112  - Backend developers must take special care to ensure that the data in the Q-vectors for a field with `emode` `CEED_EVAL_NONE` is properly ordered when the backend uses different layouts for E-vectors and Q-vectors.
113
114## CeedVector Array Access
115
116Backend implementations are expected to separately track 'owned' and 'borrowed' memory locations.
117Backends are responsible for freeing 'owned' memory; 'borrowed' memory is set by the user and backends only have read/write access to 'borrowed' memory.
118For any given precision and memory type, a backend should only have 'owned' or 'borrowed' memory, not both.
119
120Backends are responsible for tracking which memory locations contain valid data.
121If the user calls {c:func}`CeedVectorTakeArray` on the only memory location that contains valid data, then the {ref}`CeedVector` is left in an *invalid state*.
122To repair an *invalid state*, the user must set valid data by calling {c:func}`CeedVectorSetValue`, {c:func}`CeedVectorSetArray`, or {c:func}`CeedVectorGetArrayWrite`.
123
124Some checks for consistency and data validity with {ref}`CeedVector` array access are performed at the interface level.
125All backends may assume that array access will conform to these guidelines:
126
127- Borrowed memory
128
129  - {ref}`CeedVector` access to borrowed memory is set with {c:func}`CeedVectorSetArray` with `copy_mode = CEED_USE_POINTER` and revoked with {c:func}`CeedVectorTakeArray`.
130    The user must first call {c:func}`CeedVectorSetArray` with `copy_mode = CEED_USE_POINTER` for the appropriate precision and memory type before calling {c:func}`CeedVectorTakeArray`.
131  - {c:func}`CeedVectorTakeArray` cannot be called on a vector in a *invalid state*.
132
133- Owned memory
134
135  - Owned memory can be allocated by calling {c:func}`CeedVectorSetValue` or by calling {c:func}`CeedVectorSetArray` with `copy_mode = CEED_COPY_VALUES`.
136  - Owned memory can be set by calling {c:func}`CeedVectorSetArray` with `copy_mode = CEED_OWN_POINTER`.
137  - Owned memory can also be allocated by calling {c:func}`CeedVectorGetArrayWrite`.
138    The user is responsible for manually setting the contents of the array in this case.
139
140- Data validity
141
142  - Internal synchronization and user calls to {c:func}`CeedVectorSync` cannot be made on a vector in an *invalid state*.
143  - Calls to {c:func}`CeedVectorGetArray` and {c:func}`CeedVectorGetArrayRead` cannot be made on a vector in an *invalid state*.
144  - Calls to {c:func}`CeedVectorSetArray` and {c:func}`CeedVectorSetValue` can be made on a vector in an *invalid state*.
145  - Calls to {c:func}`CeedVectorGetArrayWrite` can be made on a vector in an *invalid* state.
146    Data synchronization is not required for the memory location returned by {c:func}`CeedVectorGetArrayWrite`.
147    The caller should assume that all data at the memory location returned by {c:func}`CeedVectorGetArrayWrite` is *invalid*.
148
149## Shape
150
151Backends often manipulate tensors of dimension greater than 2.
152It is awkward to pass fully-specified multi-dimensional arrays using C99 and certain operations will flatten/reshape the tensors for computational convenience.
153We frequently use comments to document shapes using a lexicographic ordering.
154For example, the comment
155
156```c
157// u has shape [dim, num_comp, Q, num_elem]
158```
159
160means that it can be traversed as
161
162```c
163for (d = 0; d < dim; d++) {
164  for (c = 0; c < num_comp; c++) {
165    for (q = 0; q < Q; q++) {
166      for (e = 0; e < num_elem; e++) {
167        u[((d*num_comp + c)*Q + q)*num_elem + e] = ...
168```
169
170This ordering is sometimes referred to as row-major or C-style.
171Note that flattening such as
172
173```c
174// u has shape [dim, num_comp, Q*num_elem]
175```
176
177and
178
179```c
180// u has shape [dim*num_comp, Q, num_elem]
181```
182
183are purely implicit -- one just indexes the same array using the appropriate convention.
184
185## `restrict` Semantics
186
187QFunction arguments can be assumed to have `restrict` semantics.
188That is, each input and output array must reside in distinct memory without overlap.
189
190## Style Guide
191
192Please check your code for style issues by running
193
194`make format`
195
196In addition to those automatically enforced style rules, libCEED tends to follow the following code style conventions:
197
198- Variable names: `snake_case`
199- Strut members: `snake_case`
200- Function and method names: `PascalCase` or language specific style
201- Type names: `PascalCase` or language specific style
202- Constant names: `CAPS_SNAKE_CASE` or language specific style
203
204Also, documentation files should have one sentence per line to help make git diffs clearer and less disruptive.
205
206## Clang-tidy
207
208Please check your code for common issues by running
209
210`make tidy`
211
212which uses the `clang-tidy` utility included in recent releases of Clang.
213This tool is much slower than actual compilation (`make -j8` parallelism helps).
214To run on a single file, use
215
216`make interface/ceed.c.tidy`
217
218for example.
219All issues reported by `make tidy` should be fixed.
220
221## Include-What-You-Use
222
223Header inclusion for source files should follow the principal of 'include what you use' rather than relying upon transitive `#include` to define all symbols.
224
225Every symbol that is used in the source file `foo.c` should be defined in `foo.c`, `foo.h`, or in a header file `#include`d in one of these two locations.
226Please check your code by running the tool [`include-what-you-use`](https://include-what-you-use.org/) to see recommendations for changes to your source.
227Most issues reported by `include-what-you-use` should be fixed; however this rule is flexible to account for differences in header file organization in external libraries.
228If you have `include-what-you-use` installed in a sibling directory to libCEED or set the environment variable `IWYU_CC`, then you can use the makefile target `make iwyu`.
229
230Header files should be listed in alphabetical order, with installed headers preceding local headers and `ceed` headers being listed first.
231The `ceed-f64.h` and `ceed-f32.h` headers should only be included in `ceed.h`.
232
233```c
234#include <ceed.h>
235#include <ceed/backend.h>
236#include <stdbool.h>
237#include <string.h>
238#include "ceed-avx.h"
239```
240