sphinx/source/libCEEDapi.md

9 …luated on a subdomain $\Omega_e$ (element) and the local results are composed into a larger system…
11 …ace for matrix-free operator description that enables efficient evaluation on a variety of computa…
49 …on *global (trial) degrees of freedom (DoFs) or nodes on the whole mesh*, restricts to *DoFs on su…
51 …on third order ($Q_3$) scalar continuous ($H^1$) elements, where we use the notions **T-vector**, …
77   > - each unknown $i$ has exactly one copy, on exactly one processor, $rank(i)$
86   > - each unknown $i$ has exactly one copy on each processor that owns an element containing $i$
87 …erlaps only across different processors---there is no duplication of unknowns on a single processor
88 …owns are the overlapping DoFs, i.e. the ones that have more than one copy, on different processors.
113 …> - the dependent DoFs (usually on the higher-order side of a face/edge) can be treated just like …
136   > - CSR matrix on each rank
158 …he innermost variational restriction matrices, and applying the rest of the operators "on-the-fly".
159 For example, one can compute and store a global matrix on **T-vector** level.
163 …on **partial assembly**, where we compute and store only $\bm{D}$ (or portions of it) and evaluate…
164 …e tensor-product structure of the degrees of freedom and quadrature points on *quad* and *hex* ele…
167 …and an operator *apply* (evaluation) phase that computes the action of $\bm{A}$ on an input vector.
169 The relative costs of the setup and apply phases are different depending on the physics being expre…
173 …}$, $\bm{\mathcal{E}}$ and $\bm{B}$, the operator evaluation is decoupled  on their ranges, so $\b…
175 Thus, a natural mapping of $\bm{A}$ on a parallel computer is to split the **T-vector** over MPI ra…
178 …algorithms -- parallel (multi-device) linear algebra for $\bm{P}$, sparse (on-device) linear algeb…
180 …ctors** and the required communications among devices (which are generally on different compute no…
181 Our API is thus focused on the **L-vector** level, where the logical devices, which in the library …
184 For example, on a node with 2 CPU sockets and 4 GPUs, one may decide to use 6 MPI ranks (each using…
185 …her choice could be to run 1 MPI rank on the whole node and use 5 {ref}`Ceed` objects: 1 managing …
187 …an O(1) data, allowing operations performed on a coprocessor or worker threads to overlap with ope…
191 …and coordinates their action to the original operator on **L-vector** level (i.e. independently on…
200 …y includes all the finite element information, so the backends can operate on linear algebra level…
210 On the frontend, the mapping between the decomposition concepts and the code implementation is as f…
243 creates a logical device `ceed` on the specified *resource*, which could also be a coprocessor such…
286 A 3-dimensional gradient on four components would therefore mean the field has a size of 12\.
289 …d 4 respectively (the `P` argument represents the number of 1D degrees of freedom on each element).
299 Elements that do not have tensor product structure, such as symmetric elements on simplices, will b…
316 If the user has arrays available on a device, they can be provided using `CEED_MEM_DEVICE`.
332 Note that the corresponding {c:func}`CeedOperatorApply()` has no basis evaluation on the output, as…
406 …you're interested in packaging backends externally, and will work with you on a practical stabilit…