| e03682af | 13-Sep-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
rust - add is_some() and is_none() for vector/elemrestriction/basis/qfunction opts |
| c68be7a2 | 13-Sep-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
rust - remove unwrap() in documentation in favor of ? |
| 8ef3655d | 13-Sep-2021 |
David Medina <dmed256@gmail.com> |
[#808][OCCA] Avoid verifying objects in destructors (#809)
[#808][OCCA] Avoid verifying objects in destructors
* Update backends/occa/ceed-occa-qfunction.cpp
Co-authored-by: Jeremy L Thompson
[#808][OCCA] Avoid verifying objects in destructors (#809)
[#808][OCCA] Avoid verifying objects in destructors
* Update backends/occa/ceed-occa-qfunction.cpp
Co-authored-by: Jeremy L Thompson <jeremy@jeremylt.org>
show more ...
|
| c33c11c1 | 10-Sep-2021 |
Will Pazner <will.e.p@gmail.com> |
[Julia] Bump minimum_libceed_version. Change version comparison logic.
Add ceedversion_ge, which performs the same comparison as the macro CEED_VERSION_GE. Non-release builds compare as infinity. |
| 443fcf8a | 10-Sep-2021 |
Will Pazner <will.e.p@gmail.com> |
[Julia] update bindings |
| e9b533fb | 09-Sep-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
doc - add advanced function classification |
| f479eb23 | 09-Sep-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
doc - add single precision release note |
| f04ea552 | 09-Sep-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
qf/op - make immutability conditions explicit |
| 28567f8f | 09-Sep-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
interface - add CeedOperatorGetFieldName |
| 43bbe138 | 09-Sep-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
interface - promote field getters to public API |
| 7e7773b5 | 09-Sep-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
interface - refactor *GetFields to include number of fields |
| c53bf7d0 | 13-Sep-2021 |
Jed Brown <jed@jedbrown.org> |
Merge pull request #810 from CEED/cuda-gen-launch
Adjust cuda-gen launch to limit z dimension of thread block |
| 13516544 | 13-Sep-2021 |
nbeams <246972+nbeams@users.noreply.github.com> |
Check z dimension thread block limits before launching |
| 26513686 | 08-Sep-2021 |
Jed Brown <jed@jedbrown.org> |
Merge pull request #804 from CEED/jed/cuda-register-block-size
make CUDA block sizes fit according to number of used registers |
| 39532ceb | 07-Sep-2021 |
Jed Brown <jed@jedbrown.org> |
backends/cuda-gen: use occupancy to calculate launch sizes
Choose sizes that actually fit while being big enough to amortize thread block overhead and choosing sizes that permit high occupancy.
htt
backends/cuda-gen: use occupancy to calculate launch sizes
Choose sizes that actually fit while being big enough to amortize thread block overhead and choosing sizes that permit high occupancy.
https://developer.nvidia.com/blog/cuda-pro-tip-occupancy-api-simplifies-launch-configuration/
show more ...
|
| 44abf3e8 | 07-Sep-2021 |
Jed Brown <jed@jedbrown.org> |
backends/cuda: record cudaDeviceProp struct instead of just max block size |
| 4853cbf0 | 05-Sep-2021 |
Jed Brown <jed@jedbrown.org> |
backends/cuda: choose block size based on number of registers used by kernel
Complicate QFunctions, such as those in solid mechanics, use too many registers to launch blocks of 1024 threads (hardwar
backends/cuda: choose block size based on number of registers used by kernel
Complicate QFunctions, such as those in solid mechanics, use too many registers to launch blocks of 1024 threads (hardware max on Volta/Ampere). We ask the kernel how large a block it can use and select that block size. As a refinement, we could consider making the block sizes smaller if there are fewer blocks than SMs (strong scaling limit).
show more ...
|
| a784c500 | 02-Sep-2021 |
Jed Brown <jed@jedbrown.org> |
Merge pull request #787 from CEED/yohann/cuda-gen/restrict
Use more `__restrict__` in `cuda-gen` backend. |
| 80a9ef05 | 02-Sep-2021 |
Natalie Beams <246972+nbeams@users.noreply.github.com> |
Allow CeedScalar to be single precision (#788)
One can modify `ceed.h` to include `ceed-f32.h` and then use single precision. This is tested for C in CI and has been tested by developers with Rust,
Allow CeedScalar to be single precision (#788)
One can modify `ceed.h` to include `ceed-f32.h` and then use single precision. This is tested for C in CI and has been tested by developers with Rust, Julia, and Python. This interface is evolving and should be considered experimental at this time (thus lack of automated build support).
* Introduce CeedScalarType enum
* WIP changes to allow different definitions of CeedScalar
* Introduce new header files for float and double
* Only use avx tensor contract and MAGMA non-tensor basis if CeedScalar is double
* WIP changes to allow CeedScalar to be float
* WIP start trying to adjust test tolerances for float or double
* fix typos in comments
* install ceed-f32/64 headers
* Fix missing casts for hipMAGMA element restrictions
* make CeedQFunctionContextGetContextSize available for Python bindings
* Changes to Python bindings to allow CeedScalar to be float
* WIP adjust Python tests for float or double
* make style
* remove QFunctionContextGetContextSize from backend header
* Use quotes instead of <> in include statement
* Remove unncessary includes
* Update tolerances for tests
* [Julia] allow CeedScalar to be Float32
* [Julia] Use Preferences instead of custom build configuration
# Conflicts:
# julia/LibCEED.jl/src/C.jl
* [Makefile] Change definition of CC_VENDOR so it works with cross-compilation
* [Julia] Use Preferences in CI
# Conflicts:
# .github/workflows/julia-test-with-style.yml
* [Julia] Update docs about preferences
* [Julia] Add test/Project.toml workaround for Preferences
* Add CeedGetScalarType to get the type of CeedScalar at runtime
* [Julia] Move functions from Ceed.jl to LibCEED.jl
* [Julia] Add support for getting library path and scalar type at runtime
* [Julia] Minor change to checking if CUDA is loaded
* [Julia] Check correct CeedScalar types in basis functions
* [Julia] Fix tests comparing with output file
* [Julia] Change devtests to use CeedScalar instead of Float64
* Update test 402 so context will be same size in double or float
* Update tolerances for ceed examples
* [Julia] CUDA fixes
* remove unused variable in t208
* SchurDecomposition: do not compute tau on final iteration
* Update tolerances for some basis tests (for single precision)
* Make style
* Python style fixes for basis test
* Add single precision output for t300 and t320 and adjust checks; skip t541 in single
* Add LCOV exclusions after moving to new line
* fix spacing
* Python: make CEED_EPSILON available as libceed.EPSILON
* Python: optional parameter to specify different output file for test comparison
* Python: update tests' use of EPSILON and change test_300 output file for single precision
* Python: add convenience function for getting dtype corresponding to CeedScalar
* rust - add single precision support
* [Julia] Fall back on Float64 if CeedGetScalarType is not available
* [Julia] style
* Adjust tolerance for t301
* xsmm - add single precision support
* avx - add single precision support
* Add initial single precision support for MAGMA non-tensor basis
* Skip t300 and t320 in single precision; revert Python t300 changes
* Revert output changes for t300 and t320 in junit
* [Julia] Changes to autogenerated bindings for mixed precision
* [Julia] style
* [Julia] Check scalar type when changing libceed library path
The check is also performed when the package is loaded. This prevents having to
restart the Julia session twice
* [Julia] Require JLLWrappers version 1.3
This is needed to use Preferences to change the library path
* Add documentation page for precision development
Co-authored-by: Will Pazner <will.e.p@gmail.com>
* Cleanup from merge: remove old README
* Return CEED_ALIGN to backend.h
* Make Fortran compiler (FC) optional; empty skips Fortran tests
Use in Python and Rust builds, which may not have a Fortran compiler
installed and thus would produce confusing output.
* Add single precision CI test for Noether
Co-authored-by: Jeremy L Thompson <jeremy@jeremylt.org>
Co-authored-by: Will Pazner <will.e.p@gmail.com>
Co-authored-by: Jeremy L Thompson <jeremy@jeremylt.org>
Co-authored-by: Jed Brown <jed@jedbrown.org>
show more ...
|
| ae718e2f | 02-Sep-2021 |
Jed Brown <jed@jedbrown.org> |
doc: add note on restrict qualified semantics for QFunctions |
| 3c17d89b | 29-Aug-2021 |
Jed Brown <jed@jedbrown.org> |
Merge pull request #803 from CEED/jed/test-junit
testing updates: junit classname, bpsraw tolerances, CUDA on lv |
| 96e2ca22 | 29-Aug-2021 |
Jed Brown <jed@jedbrown.org> |
gitlab-ci: update lv configuration
* CUDA-11.2+ bug circumvented so use latest 11.4 * Run minimal CPU tests in parallel so parallelism applies to all compilation * * Run GPU tests sequentially to
gitlab-ci: update lv configuration
* CUDA-11.2+ bug circumvented so use latest 11.4 * Run minimal CPU tests in parallel so parallelism applies to all compilation * * Run GPU tests sequentially to avoid cudaGetDevice returning CUDA_ERROR_NOT_INITIALIZED * This is weird because nvidia-smi -q reports very low resource utilization * MPS and retrying within the same process failed, though other processes can get a device
show more ...
|
| bc251d84 | 28-Aug-2021 |
Jed Brown <jed@jedbrown.org> |
cuda/ref: modify weight kernels to avoid CUDA-11.2+ bug on RTX 2080 (issue #802) |
| b868981d | 18-Aug-2021 |
Jed Brown <jed@jedbrown.org> |
examples/fluids: make interface/tests support PETSc main |
| b9ce5a03 | 17-Aug-2021 |
Jed Brown <jed@jedbrown.org> |
gitlab-ci: simpler handling of success for codecov upload |