| c47bfe2b | 16-Feb-2022 |
Jed Brown <jed@jedbrown.org> |
backends/cuda-shared: limit 1D thread counts
We need to avoid this error:
CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES: max_threads_per_block 512 on block size (24,1,32), shared_size 0, num_regs 106
A pro
backends/cuda-shared: limit 1D thread counts
We need to avoid this error:
CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES: max_threads_per_block 512 on block size (24,1,32), shared_size 0, num_regs 106
A proper solution is to use cuOccupancyMaxPotentialBlockSize to place a number of elements per block that stays within resource limits. This would involve a bit more refactoring to do cleanly.
show more ...
|
| 63d3996f | 16-Feb-2022 |
Jed Brown <jed@jedbrown.org> |
backends/cuda: more informative error reporting |
| f190906a | 16-Feb-2022 |
Jed Brown <jed@jedbrown.org> |
xsmm: support for 1.17 headers |
| 3a8b50de | 15-Feb-2022 |
Jed Brown <jed@jedbrown.org> |
CI: update lv for cuda-11.6 |
| 2361c888 | 13-Feb-2022 |
Jed Brown <jed@jedbrown.org> |
Merge pull request #895 from CEED/jeremylt/stray-char
qf - remove stray character |
| 8d000c77 | 12-Feb-2022 |
Jed Brown <jed@jedbrown.org> |
Merge pull request #897 from CEED/jed/vec-zero-sized
Vector: error-free path for get/take array when size=0 |
| e076c219 | 12-Feb-2022 |
Jed Brown <jed@jedbrown.org> |
tests: add zero sized array tests |
| 50c643e1 | 12-Feb-2022 |
Jed Brown <jed@jedbrown.org> |
Vector: error-free path for get/take array when size=0
Among other things, this is important so that parallel callers can easily keep collective control flow even when some subdomains (materials or
Vector: error-free path for get/take array when size=0
Among other things, this is important so that parallel callers can easily keep collective control flow even when some subdomains (materials or boundary surfaces) are size 0 on some ranks.
show more ...
|
| edfb5f23 | 10-Feb-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
qf - remove stray character |
| 8c11b842 | 09-Feb-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #892 from CEED/jeremy/small-leak
op - fix small leak in composite ctx label |
| f2adece3 | 08-Feb-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
test - add test for label that doesn't exist |
| 60801d19 | 08-Feb-2022 |
Jed Brown <jed@jedbrown.org> |
Merge pull request #894 from CEED/will/julia-style
Update JuliaFormatter version |
| cdf95791 | 08-Feb-2022 |
Will Pazner <will.e.p@gmail.com> |
[julia] Update JuliaFormatter version |
| a48e5f43 | 07-Feb-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
op - fix small leak in composite ctx label |
| c6e1a279 | 07-Feb-2022 |
Jed Brown <jed@jedbrown.org> |
Merge pull request #889 from CEED/rezgar/oriented-restr
Element Restriction Oriented |
| 86e1ed65 | 05-Feb-2022 |
nbeams <246972+nbeams@users.noreply.github.com> |
Add launch bounds to HIP QFunction kernels |
| f71aa81b | 01-Feb-2022 |
nbeams <246972+nbeams@users.noreply.github.com> |
add launch bounds to magma kernels; add macro definition for y-dim of magma basis kernel threadblocks
Co-authored-by: Ahmad Abdelfattah <ahmad@icl.utk.edu> |
| b3c5430c | 01-Feb-2022 |
nbeams <246972+nbeams@users.noreply.github.com> |
Add flag to use atomic adds on supported AMD GPU hardware |
| 000294e3 | 04-Feb-2022 |
rezgarshakeri <rezgar.shakeri@colorado.edu> |
updated ceed-ref-restriction.c |
| b435c5a6 | 04-Feb-2022 |
rezgarshakeri <rezgar.shakeri@colorado.edu> |
Added CeedElemRestrictionIsOriented function |
| c7745053 | 04-Feb-2022 |
Rezgar Shakeri <42816410+rezgarshakeri@users.noreply.github.com> |
Update interface/ceed-elemrestriction.c
Co-authored-by: Jeremy L Thompson <jeremy@jeremylt.org> |
| d4b88fd2 | 03-Feb-2022 |
rezgarshakeri <rezgar.shakeri@colorado.edu> |
tests: deleted CeedVectorSetValue in restriction tests |
| 61e7462c | 03-Feb-2022 |
Rezgar Shakeri <42816410+rezgarshakeri@users.noreply.github.com> |
Update interface/ceed-elemrestriction.c
Co-authored-by: Jeremy L Thompson <jeremy@jeremylt.org> |
| 4dd06d33 | 02-Feb-2022 |
rezgarshakeri <rezgar.shakeri@colorado.edu> |
update ceed-elemrestriction.c: fixed formatting |
| cf6be907 | 01-Feb-2022 |
rezgarshakeri <rezgar.shakeri@colorado.edu> |
tests: added t220-elemrestriction.c |