sac-group issueshttps://gitlab.sac-home.org/groups/sac-group/-/issues2024-03-22T14:51:48Zhttps://gitlab.sac-home.org/sac-group/sac2c/-/issues/2396Distributed memory backend2024-03-22T14:51:48ZThomas KoopmanDistributed memory backendReimplement the distributed memory backend based on Shray to ensure memory-scalability, and potentially greater performance.
## Multithreaded + distributed status:
The ground works is there: all the parallelism is created by restrictin...Reimplement the distributed memory backend based on Shray to ensure memory-scalability, and potentially greater performance.
## Multithreaded + distributed status:
The ground works is there: all the parallelism is created by restricting the bounds and in principle orthogonal as to how these restricted with-loops are computed. However, the build system needs to be adjusted and probably some work needs to go into proper initialization.
## Side-effects
A function is assumed to have side-effects if it is an external function consuming a unique object, or a user-defined function type casting a unique object in its body. In that case, the function is computed only on the source node, and return arguments are broadcast. For now, we also consider functions that take or return hidden objects as having side-effects, though this can be relaxed in the future.
## Performance and correctness
This branch should behave correctly for all inputs as we fall back to replicated execution for constructs we cannot parallelize. Performance is outside of the scope of this merge request.
## Dependencies
This branch will only build if `gasnet` is installed, and environment variable `GASNet_ROOT` has been set to the installation. Shray needs the MPI or UDP backend of GASNet, so one of these dependencies also needs to be installed.
## TODO
- [x] implement reshape
- [x] implement multi-operator with-loops
- [x] fix hard-coded nametags in stdlib
- [x] nbody-naive
- [ ] FlashAttention alg 1
- [x] FlashAttention modified algorithm
- [ ] VolCalib
- [x] MG: multigrid method
- [ ] MG: initalisation (quickselect, RNG, some other funsies)
- [ ] Adjust CI to deal with GASNetThomas KoopmanThomas Koopman2024-04-21https://gitlab.sac-home.org/sac-group/sac2c/-/issues/2395Loop lifting fails2024-03-13T15:11:10ZThomas KoopmanLoop lifting failsCommit c0ba0c984b00f4109a3edb95597a517613c9f5c1
Consider the following program. `matmulT` only has to be computed `N / Br` times.
```
use Array: all;
#define d 128
#define N 1024
#define Br 128
#define Bc 256
noinline float[n:shp] i...Commit c0ba0c984b00f4109a3edb95597a517613c9f5c1
Consider the following program. `matmulT` only has to be computed `N / Br` times.
```
use Array: all;
#define d 128
#define N 1024
#define Br 128
#define Bc 256
noinline float[n:shp] id(float[n:shp] x) { return x; }
noinline
float[m, n] matmulT(float[m, k] A, float[n, k] B)
{
return {iv -> tof(0) | iv < [m, n]};
}
inline
float FlashAttention(float[N, d] Q, float[N, d] K, float[N, d] V)
{
Qb = reshape([N / Br, Br, d], Q);
Kb = reshape([N / Bc, Bc, d], K);
O = tof(0);
for (j = 0; j < N / Bc; j++) {
Pj = {[i, a] -> matmulT(Qb[i], Kb[j])[a]
| [i, a] < [N / Br, Br]};
O += sum(Pj);
}
return O;
}
int main()
{
Q = id({iv -> tof(1) | iv < [N, d]});
K = id({iv -> tof(1) | iv < [N, d]});
V = id({iv -> tof(1) | iv < [N, d]});
O = FlashAttention(Q, K, V);
return _toi_S_(O);
}
```
However, the optimised code gives
```
/* Partn */
([ 0, 0 ] <= _flat_82=[i, a] (IDXS:_wlidx_920_Pj) < [ 8, 128 ] genwidth [ 8, 128 ])
{
_ivesli_930 = _idxs2offset_( [ 8, 128, 128 ], i, _iveras_1039, _iveras_1040);
_flat_84 = with /** FOLDABLE (all gen's const) **/
/** REFERENCED: 1 (total num refs) **/
{
/* Partn */
([ 0, 0 ] <= _pinl_540_iv=[_pinl_543__eat_146, _pinl_542__eat_145] (IDXS:_wlidx_921__flat_84) < [ 128, 128 ] genwidth [ 128, 128 ])
{
_ivesli_932 = _idxs2offset_( [ 8, 128, 128 ], _iveras_1041, _pinl_543__eat_146, _pinl_542__eat_145);
_ivesli_933 = _add_SxS_( _ivesli_930, _ivesli_932);
_pinl_538__flat_396 = _idx_sel_( _ivesli_933, Qb);
} : _pinl_538__flat_396 ;
} :
genarray( [ 128, 128 ], _pinl_450__flat_393, IDX(_wlidx_921__flat_84));
_flat_83 = _MAIN::matmulT( _flat_84, _flat_56) ;
```
computing it `N / Br * Br` times. The `[i, a]` loop should have been split up in an `[i]` and `[a]` loop, the `matmulT` lifted out of the `[a]` loop, and then inside of the `[a]` loop a suballoc can be done.https://gitlab.sac-home.org/sac-group/sac2c/-/issues/2393Out of memory compile error2024-03-25T14:34:38ZThomas KoopmanOut of memory compile error```
use StdIO: all;
int main()
{
m = 1000;
x = {iv -> 1 | iv < [m]};
StdIO::print(x);
return 0;
}
```
fails with `Out of memory: 751605325050240 bytes requested`. Compiled with commit `9b80bcfd0ad24d93a1cab9c365ac1fea723fe6d5...```
use StdIO: all;
int main()
{
m = 1000;
x = {iv -> 1 | iv < [m]};
StdIO::print(x);
return 0;
}
```
fails with `Out of memory: 751605325050240 bytes requested`. Compiled with commit `9b80bcfd0ad24d93a1cab9c365ac1fea723fe6d5`, no flagshttps://gitlab.sac-home.org/sac-group/sac2c/-/issues/2392Bug in class implementation2024-03-09T10:32:01ZArtem ShinkarovBug in class implementationHere is a definition of the class which is a counter, where the update effect is merged into the `value` accessor:
```c
class cnt;
classtype int;
export all;
cnt new (int x) {
return to_cnt (x);
}
int value (cnt &x) {
v = from_cnt ...Here is a definition of the class which is a counter, where the update effect is merged into the `value` accessor:
```c
class cnt;
classtype int;
export all;
cnt new (int x) {
return to_cnt (x);
}
int value (cnt &x) {
v = from_cnt (x);
x = to_cnt (_add_SxS_ (v, 1));
return v;
}
```
Here is the file that uses this class:
```c
use cnt:all;
int main () {
a = new (1);
return value (a);
}
```
When I run this, I get the value 2 back, but the correct result is 1.
The version of sac2c I used is: `sac2c 1.3.3-MijasCosta-1161-gb543c`Sven-Bodo ScholzSven-Bodo Scholzhttps://gitlab.sac-home.org/sac-group/sac2c/-/issues/2390Towards a more lazy semantics2024-03-07T17:31:30ZSven-Bodo ScholzTowards a more lazy semanticsSeveral optimisations do perform redices that in a strict evaluation would never be executed.
A simple example is: `K (2, 3/0)`. Under a strict evaluation scheme it should not terminate / yield bottom; if we optimise this into `2`, this ...Several optimisations do perform redices that in a strict evaluation would never be executed.
A simple example is: `K (2, 3/0)`. Under a strict evaluation scheme it should not terminate / yield bottom; if we optimise this into `2`, this changes the semantics even though, under a normal order regime, this is fine.
Quite some discussion on this matter was done in the context of the merge request for URR:
https://gitlab.sac-home.org/sac-group/sac2c/-/merge_requests/285#note_13840
The question remains: how do we want to deal with this?https://gitlab.sac-home.org/sac-group/sac2c/-/issues/2389UTCatenate crash in IVEXP with -doawlf, due to unexpected guard primitive2024-03-06T14:40:37ZRobert BerneckyUTCatenate crash in IVEXP with -doawlf, due to unexpected guard primitiveN_type found where 31:N_id was expected. This is a recent failure,
likely due to a change in pattern-matching semantics.
[sacbug.sac](/uploads/368505b6c3871d061ad2e47ef51282c1/sacbug.sac)N_type found where 31:N_id was expected. This is a recent failure,
likely due to a change in pattern-matching semantics.
[sacbug.sac](/uploads/368505b6c3871d061ad2e47ef51282c1/sacbug.sac)Jordy AalderingJordy Aalderinghttps://gitlab.sac-home.org/sac-group/sac2c/-/issues/23881 / (1 + inf) gives nan instead of 02024-02-21T16:00:28ZThomas Koopman1 / (1 + inf) gives nan instead of 0# SaC version
sac2c 1.3.3-MijasCosta-1161-gb543c
build-type: DEBUG
built-by: "thomas" at 2024-02-21T12:07:11
# Test program
```
/* We need expf so Stdlib is necessary */
use Array: all;
use Math: all;
inline
float[*] logistics(float[...# SaC version
sac2c 1.3.3-MijasCosta-1161-gb543c
build-type: DEBUG
built-by: "thomas" at 2024-02-21T12:07:11
# Test program
```
/* We need expf so Stdlib is necessary */
use Array: all;
use Math: all;
inline
float[*] logistics(float[*] x)
{
return {iv -> 1f / (1f + exp(-x[iv]))};
}
noinline
float[d:shp] id(float[d:shp] x)
{
return x;
}
int main()
{
c11 = id(with {}: genarray([6, 24, 24], -1875001.750000f));
/* Note that this is well defined under IEEE-754:
exp(-x[iv]) = +inf
1 + +inf = +inf
1 / +inf = 0+
*/
c1 = logistics(c11);
StdIO::print(c1);
return 0;
}
```
# Output
When compiling without arguments, prints all NaN (with gcc 13.2.0 and clang 14.0.6). When passing `-Xc -O2` it does give the correct result, which is all `0`s.
Stepping through with `gdb` shows that the `+ 1` and division are optimised away from O3 onwards. It is possible that the generated C code is incorrect, but `-fsanitize=undefined` finds nothing. I may have to boil this down to a minimal example and file a bug with `gcc` and `clang` if it is not on our end.https://gitlab.sac-home.org/sac-group/sac2c/-/issues/2387Missing default expression in WL2024-02-19T23:35:10ZRobert BerneckyMissing default expression in WL[UTReduce.sac](/uploads/6df9a4e53e68f7a0e8a8fa26321ec509/UTReduce.sac)
```
sac2c_d UTReduce.sac -doawlf -v1
Warning:
AWLF is enabled: -ecc enabled.
Warning:
AWLF is enabled: -extrema enabled.
Warning:
AWLF is enabled: -maxoptc...[UTReduce.sac](/uploads/6df9a4e53e68f7a0e8a8fa26321ec509/UTReduce.sac)
```
sac2c_d UTReduce.sac -doawlf -v1
Warning:
AWLF is enabled: -ecc enabled.
Warning:
AWLF is enabled: -extrema enabled.
Warning:
AWLF is enabled: -maxoptcyc=20
//home/sac/sac/BASE/Stdlib/build/src-mt_pth/structures/ArrayBasics.sac:223: abort:
Genarray with-loop with missing default
expression found. Unfortunately, a default expression is necessary here to compute the shape of the result
Compilation failed while Transforming with-loop representation, 3 warning(s).
apex@medusa:/tmp/crud$
apex@medusa:/tmp/crud$ sac2c_d -V
sac2c 1.3.3-MijasCosta-1157-g02676
build-type: DEBUG
built-by: "sac" at 2024-02-14T10:15:11
```Sven-Bodo ScholzSven-Bodo Scholzhttps://gitlab.sac-home.org/sac-group/sac2c/-/issues/2386crash in LUR if -doawlf2024-02-14T16:33:30ZRobert Berneckycrash in LUR if -doawlf```
Applying loop unrolling ...
TRAVERSE ERROR: node of type 50:N_type found where 31:N_id was expected.
apex@medusa:/tmp/crud$ sac2c -V
sac2c 1.3.3-MijasCosta-1157-g02676
build-type: DEBUG
built-by: "sac" at 2024-02-14T10:15...```
Applying loop unrolling ...
TRAVERSE ERROR: node of type 50:N_type found where 31:N_id was expected.
apex@medusa:/tmp/crud$ sac2c -V
sac2c 1.3.3-MijasCosta-1157-g02676
build-type: DEBUG
built-by: "sac" at 2024-02-14T10:15:11
apex@medusa:/tmp/crud$ sac2c_d UTIndexSet.sac -doawlf -v4
```
[UTIndexSet.sac](/uploads/51520ffb42170b630db5394a7dc8b588/UTIndexSet.sac)https://gitlab.sac-home.org/sac-group/sac2c/-/issues/2385UTReduce.sac dies in WLF, if compiled with -maxwlur 1 -maxspec 02024-02-14T16:34:05ZRobert BerneckyUTReduce.sac dies in WLF, if compiled with -maxwlur 1 -maxspec 0Title says it all
[UTReduce.sac](/uploads/d5d6523dbc3638bf08e04783d395196e/UTReduce.sac)
sac2c_d UTReduce.sac -maxwlur 1 -maxspec 0 -v4Title says it all
[UTReduce.sac](/uploads/d5d6523dbc3638bf08e04783d395196e/UTReduce.sac)
sac2c_d UTReduce.sac -maxwlur 1 -maxspec 0 -v4Sven-Bodo ScholzSven-Bodo Scholzhttps://gitlab.sac-home.org/sac-group/sac2c/-/issues/2384conflicting types, likely caused by maxwlur 1 and/or maxspec 02024-02-14T16:17:16ZRobert Berneckyconflicting types, likely caused by maxwlur 1 and/or maxspec 0This unit test compiles okay with no command-line options, btw.
```
****** UTIndexSet::testisXIB( hidden, hidden, int, double, int, int, int): ...
Applying common subexpression elimination ...
Determine candida...This unit test compiles okay with no command-line options, btw.
```
****** UTIndexSet::testisXIB( hidden, hidden, int, double, int, int, int): ...
Applying common subexpression elimination ...
Determine candidate functions for inlining ...
Inferring loop invariant variables ...
Applying type upgrade ...
./UTIndexSet.sac:227: abort:
loop variable "x" is being used inconsistently in function _dup_3625_sameIIB__Cond_0; conflicting types
are int[.] and #10092: in [ --, int[4]] le <> ge <>
Compilation failed while Running SAC optimizations.
sac2c -V
sac2c 1.3.3-MijasCosta-1157-g02676
build-type: DEBUG
built-by: "sac" at 2024-02-14T10:15:11
apex@medusa:/tmp/crud$ sac2c_d UTIndexSet.sac -maxspec 0 -maxwlur 1 -v4
[UTIndexSet.sac](/uploads/fcc2dd5aadc758b05b607c3bfe02dc42/UTIndexSet.sac)
```Sven-Bodo ScholzSven-Bodo Scholzhttps://gitlab.sac-home.org/sac-group/sac2c/-/issues/2383Complexity problem in FUNDEF_WRAPPERTYPE2024-02-15T10:17:47ZThomas KoopmanComplexity problem in FUNDEF_WRAPPERTYPEThe following program is OOM-killed with 40GB of available space in `elim_alpha_types.c`
```
use Array: all;
use StdIO: all;
use Math: all;
#define CATEGORIES 10
inline
float[d:shp] averageOuter(float[n,d:shp] array)
{
return {iv ->...The following program is OOM-killed with 40GB of available space in `elim_alpha_types.c`
```
use Array: all;
use StdIO: all;
use Math: all;
#define CATEGORIES 10
inline
float[d:shp] averageOuter(float[n,d:shp] array)
{
return {iv -> sum({[i] -> array[[i]++iv] | [i] < [n]}) | iv < shp} / tof(n);
}
inline
int MaxPos( float[.,.,.,.,.] output)
{
n = shape( output)[0];
max = output[[0,0,0,0,0]];
res = 0;
for( i=0; i<n; i++)
if( output[[i,0,0,0,0]] > max) {
max = output[[i,0,0,0,0]];
res = i;
}
return res;
}
inline
float MeanSquaredError( float[*] result, float[*] labels)
{
return sum ( 0.5f * ( labels - result) * ( labels - result) );
}
#define sumOuter(xo, xi, e, shpi, shpo) \
{xi -> with { \
(0 * shpo <= xo < shpo): (e)[xi]; \
}: fold(+, 0f) \
| xi < shpi}
/* In `Fortran' notation x(i:i+n+1) where i, n are vectors and the range
* is inclusive, exclusive. */
inline
float[d:n1] slide(int[d] i, float[d:m] x, int[d] n) | all(n1 == n + 1)
| all(n + 1 + i <= m)
{
return {iv -> x[iv + i] | iv < n + 1};
}
inline
float[d:mn] backslide(int[d] i, float[d:n1] y, int[d] mn)
| all(i < 1 + mn - n1)
{
return {iv -> 0f | iv < i;
iv -> y[iv - i] | i <= iv < n1 + i;
iv -> 0f | n1 + i <= iv < mn};
}
inline
float[*] logistics(float[*] x)
{
return {iv -> 1f / (1f + exp(-x[iv]))};
}
inline
float[n:oshp,n:ishp] block(float[d:shp] x, int[n] ishp)
| all(shp % ishp == 0)
| all(oshp * ishp == shp)
| (2 * n == d)
{
return {iv -> tile(ishp, iv * ishp, x) | iv < shp / ishp};
}
inline
float[*] unblock(float[*] a, int[.] bshp)
{
shp = drop(-shape(bshp), shape(a)) * bshp;
return {iv -> a[(iv / bshp) ++ (iv % bshp)] | iv < shp};
}
inline
float[n:ishp] selb(float[d:shp] x, int[n] iv, int[n] ishp)
| (2 * n == d)
| all(iv < shp / ishp)
{
return block(x, ishp)[iv];
}
#define IMAPS(iv, e, shp) {iv -> (e)[[0]] | iv < shp}
float sels(float[d:shp] x, int[d] iv)
{
return x[iv];
}
noinline
float[6, 5, 5], float[6], float[12, 6, 5, 5], float[12],
float[CATEGORIES, 12, 1, 4, 4], float[CATEGORIES], float
TrainZhang(float[28, 28] inp, float[6, 5, 5] k1, float[6] b1,
float[12, 6, 5, 5] k2, float[12] b2,
float[CATEGORIES, 12, 1, 4, 4] fc, float[CATEGORIES] b,
float[CATEGORIES, 1, 1, 1, 1] target)
{
c11 = { x20 -> (sumOuter(x21, x22, (slide(x21, inp, [23, 23])) * (IMAPS(x23, (sels((k1)[x20], x21)), ([24, 24]))), ([5, 5]), ([24, 24]))) + (IMAPS(x24, (sels(b1, x20)), ([24, 24]))) | x20 < [6] };
c1 = logistics(c11);
s1 = { x16 -> IMAPS(x17, ((sumOuter(x18, x19, sels(selb((c1)[x16], x17, [2, 2]), x18), ([2, 2]), ([1]))) / 4), ([12, 12])) | x16 < [6] };
c21 = { x11 -> (sumOuter(x12, x13, (slide(x12, s1, [0, 7, 7])) * (IMAPS(x14, (sels((k2)[x11], x12)), ([1, 8, 8]))), ([6, 5, 5]), ([1, 8, 8]))) + (IMAPS(x15, (sels(b2, x11)), ([1, 8, 8]))) | x11 < [12] };
c2 = logistics(c21);
s2 = { x6 -> { x7 -> IMAPS(x8, ((sumOuter(x9, x10, sels(selb(((c2)[x6])[x7], x8, [2, 2]), x9), ([2, 2]), ([1]))) / 4), ([4, 4])) | x7 < [1] } | x6 < [12] };
r1 = { x1 -> (sumOuter(x2, x3, (slide(x2, s2, [0, 0, 0, 0])) * (IMAPS(x4, (sels((fc)[x1], x2)), ([1, 1, 1, 1]))), ([12, 1, 4, 4]), ([1, 1, 1, 1]))) + (IMAPS(x5, (sels(b, x1)), ([1, 1, 1, 1]))) | x1 < [10] };
r = logistics(r1);
ddr = 1f;
ddr1 = (ddr) * ((r1) * ((1f) + (-(r1))));
dds2 = sumOuter(x1, x2, sumOuter(x3, x4, backslide(x3, IMAPS(x5, ((sels((ddr1)[x1], x5)) * (sels((fc)[x1], x3))), ([1, 1, 1, 1])), [12, 1, 4, 4]), ([12, 1, 4, 4]), ([12, 1, 4, 4])), ([10]), ([12, 1, 4, 4]));
ddc2 = { x1 -> { x2 -> unblock({ x3 -> IMAPS(x4, ((sels(((dds2)[x1])[x2], x3)) / 4), ([2, 2])) | x3 < [4, 4] }, [2, 2]) | x2 < [1] } | x1 < [12] };
ddc21 = (ddc2) * ((c21) * ((1f) + (-(c21))));
dds1 = sumOuter(x1, x2, sumOuter(x3, x4, backslide(x3, IMAPS(x5, ((sels((ddc21)[x1], x5)) * (sels((k2)[x1], x3))), ([1, 8, 8])), [6, 12, 12]), ([6, 5, 5]), ([6, 12, 12])), ([12]), ([6, 12, 12]));
ddc1 = { x1 -> unblock({ x2 -> IMAPS(x3, ((sels((dds1)[x1], x2)) / 4), ([2, 2])) | x2 < [12, 12] }, [2, 2]) | x1 < [6] };
ddc11 = (ddc1) * ((c11) * ((1f) + (-(c11))));
ddb = IMAPS(x1, (sumOuter(x2, x3, sels((ddr1)[x1], x2), ([1, 1, 1, 1]), ([1]))), ([10]));
ddfc = { x1 -> IMAPS(x2, (sumOuter(x3, x4, (sels((ddr1)[x1], x3)) * (sels(slide(x2, s2, [0, 0, 0, 0]), x3)), ([1, 1, 1, 1]), ([1]))), ([12, 1, 4, 4])) | x1 < [10] };
ddb2 = IMAPS(x1, (sumOuter(x2, x3, sels((ddc21)[x1], x2), ([1, 8, 8]), ([1]))), ([12]));
ddk2 = { x1 -> IMAPS(x2, (sumOuter(x3, x4, (sels((ddc21)[x1], x3)) * (sels(slide(x2, s1, [0, 7, 7]), x3)), ([1, 8, 8]), ([1]))), ([6, 5, 5])) | x1 < [12] };
ddb1 = IMAPS(x1, (sumOuter(x2, x3, sels((ddc11)[x1], x2), ([24, 24]), ([1]))), ([6]));
ddk1 = { x1 -> IMAPS(x2, (sumOuter(x3, x4, (sels((ddc11)[x1], x3)) * (sels(slide(x2, inp, [23, 23]), x3)), ([24, 24]), ([1]))), ([5, 5])) | x1 < [6] };
ddinp = sumOuter(x1, x2, sumOuter(x3, x4, backslide(x3, IMAPS(x5, ((sels((ddc11)[x1], x5)) * (sels((k1)[x1], x3))), ([24, 24])), [28, 28]), ([5, 5]), ([28, 28])), ([6]), ([28, 28]));
error = MeanSquaredError(r, target);
return (ddk1, ddb1, ddk2, ddb2, ddfc, ddb, error);
}
int main()
{
k1 = genarray([6, 5, 5], 1f / 25f);
b1 = genarray([6], 1f / 6f);
k2 = genarray([12, 6, 5, 5], 1f / 150f);
b2 = genarray([12], 1f / 12f);
fc = genarray([CATEGORIES, 12, 1, 4, 4], 1f / 192f);
b = genarray([CATEGORIES], 1f / tof(CATEGORIES));
target = genarray([CATEGORIES,1,1,1,1], 0f);
inp = reshape([28, 28], tof(iota(28 * 28)));
d_k1, d_b1, d_k2, d_b2, d_fc, d_b, err = TrainZhang(inp, k1, b1, k2, b2, fc, b, target);
print(d_k1);
return 0;
}
```https://gitlab.sac-home.org/sac-group/sac2c/-/issues/2382Constant folding does not optimise sel(jv, {iv -> expr(iv) | iv < ub})2024-02-15T10:19:35ZThomas KoopmanConstant folding does not optimise sel(jv, {iv -> expr(iv) | iv < ub})Consider the following program.
```
inline
int[d:n1] slide(int[d] i, int[d:m] x, int[d] n1) | all(n1 + i <= m)
{
return {iv -> _sel_VxA_(_add_VxV_(iv, i), x)
| iv < n1};
}
int +(int a, int b)
{
return _add_SxS_(a, b)...Consider the following program.
```
inline
int[d:n1] slide(int[d] i, int[d:m] x, int[d] n1) | all(n1 + i <= m)
{
return {iv -> _sel_VxA_(_add_VxV_(iv, i), x)
| iv < n1};
}
int +(int a, int b)
{
return _add_SxS_(a, b);
}
noinline
int[*] id(int[*] x)
{
return x;
}
int main()
{
inp = id(with {}: genarray([28, 28], 2));
#if 1
bla = {x22 -> with {
([0, 0] <= x21 < [5, 5]): _sel_VxA_(x22, slide(x21, inp, [23, 23]));
}: fold(+, 0)
| x22 < [24, 24]};
#else
bla = {x22 -> with {
([0, 0] <= x21 < [5, 5]): _sel_VxA_(_add_VxV_(x22, x21), inp);
}: fold(+, 0)
| x22 < [24, 24]};
#endif
res = _sel_VxA_([0, 0], bla);
return res;
}
```
Compiling the following example with `sac2c_d -bopt -printfun main` gives
```
/****************************************************************************
* _MAIN::main(...) [ body ]
****************************************************************************/
int _MAIN::main()
/*
* main :: ---
*/
{
int _ivesli_2836 { , NN } ;
int _ivesli_2835 { , NN } ;
int _ivesli_2834 { , NN } ;
int _ivesli_2832 { , NN } ;
int _wlidx_2816__flat_91 { , NN } ;
int _wlidx_2815_bla { , NN } ;
int _wlidx_2814__flat_33 { , NN } ;
int _pinl_405__eat_107 { , NN } ;
int _pinl_404__eat_106 { , NN } ;
int _pinl_403__mose_8__SSA0_1 { , NN } ;
int[2] _pinl_402_iv { , NN } ;
int _ea_327__flat_90 { , NN } ;
int _ea_326__mose_9__SSA0_1 { , NN } ;
int _eat_105 { , NN } ;
int _eat_104 { , NN } ;
int _eat_103 { , NN } ;
int _eat_102 { , NN } ;
int _eat_99 { , NN } ;
int _eat_98 { , NN } ;
int _mose_9__SSA0_1 { , NN } ;
int[2] x21__SSA0_1 { , NN } ;
int res { , NN } ;
int[24,24] bla { , NN } ;
int[2] x22 { , NN } ;
int[28,28] inp { , NN } ;
int[2] _hzgwl_12 { , NN } ;
int[23,23] _flat_91 { , NN } ;
int _flat_90 { , NN } ;
int{0} _flat_39 { , NN } ;
int{2} _flat_37 { , NN } ;
int[28,28] _flat_33 { , NN } ;
_flat_39 = 0;
_flat_37 = 2;
_flat_33 = with /** FOLDABLE (all gen's const) **/
/** REFERENCED: 1 (total num refs) **/
{
/* Partn */
([ 0, 0 ] <= _hzgwl_12=[_eat_99, _eat_98] (IDXS:_wlidx_2814__flat_33) < [ 28, 28 ] genwidth [ 28, 28 ])
{
} : _flat_37 ;
} :
genarray( [ 28, 28 ], _flat_37, IDX(_wlidx_2814__flat_33));
inp = _MAIN::id( _flat_33) ;
bla = with /** FOLDABLE (all gen's const) **/
/** REFERENCED: 1 (total num refs) **/
{
/* Partn */
([ 0, 0 ] <= x22=[_eat_103, _eat_102] (IDXS:_wlidx_2815_bla) < [ 24, 24 ] genwidth [ 24, 24 ])
{
_ivesli_2836 = _idxs2offset_( [ 23, 23 ], _eat_103, _eat_102);
_mose_9__SSA0_1 = with /** FOLDABLE (all gen's const) **/
/** REFERENCED: 1 (total num refs) **/
{
/* Partn */
([ 0, 0 ] <= x21__SSA0_1=[_eat_105, _eat_104] < [ 5, 5 ] genwidth [ 5, 5 ])
{
_ea_326__mose_9__SSA0_1 = _accu_( x21__SSA0_1, _flat_39);
_ivesli_2832 = _idxs2offset_( [ 28, 28 ], _eat_105, _eat_104);
_flat_91 = with /** FOLDABLE (all gen's const) **/
/** REFERENCED: 1 (total num refs) **/
{
/* Partn */
([ 0, 0 ] <= _pinl_402_iv=[_pinl_405__eat_107, _pinl_404__eat_106] (IDXS:_wlidx_2816__flat_91) < [ 23, 23 ] genwidth [ 23, 23 ])
{
_ivesli_2834 = _idxs2offset_( [ 28, 28 ], _pinl_405__eat_107, _pinl_404__eat_106);
_ivesli_2835 = _add_SxS_( _ivesli_2832, _ivesli_2834);
_pinl_403__mose_8__SSA0_1 = _idx_sel_( _ivesli_2835, inp);
} : _pinl_403__mose_8__SSA0_1 ;
} :
genarray( [ 23, 23 ], _flat_39, IDX(_wlidx_2816__flat_91));
_flat_90 = _idx_sel_( _ivesli_2836, _flat_91);
_ea_327__flat_90 = _add_SxS_( _ea_326__mose_9__SSA0_1, _flat_90);
} : _ea_327__flat_90 ;
} :
fold( _MAIN::+(), _flat_39);
} : _mose_9__SSA0_1 ;
} :
genarray( [ 24, 24 ], _flat_39, IDX(_wlidx_2815_bla));
res = _idx_sel_( _flat_39, bla);
return( res);
}
/*-----------------------------------------------*/
```
so `slide` is computed every iteration of the fold-loop, whereas I would have expected a simple selection `inp[x21 + x22]`, as in the commented out version.https://gitlab.sac-home.org/sac-group/sac2c/-/issues/2368compiler does not compile on Debian 12: `Assertion "sacargudt != UT_NOT_DEFIN...2024-01-24T14:20:27ZBernard van Gastelcompiler does not compile on Debian 12: `Assertion "sacargudt != UT_NOT_DEFINED" failed`On branch `develop` (commit b19ded1503af34fa9ce1e2c00fe99d5ed679bb38, 22 Dec):
```
[ 0%] Built target cb
[ 1%] Built target hzip
[ 1%] Built target csd
[ 1%] Built target cse
[ 1%] Built target icmt
[ 1%] Checking git repo version
...On branch `develop` (commit b19ded1503af34fa9ce1e2c00fe99d5ed679bb38, 22 Dec):
```
[ 0%] Built target cb
[ 1%] Built target hzip
[ 1%] Built target csd
[ 1%] Built target cse
[ 1%] Built target icmt
[ 1%] Checking git repo version
[ 1%] Built target check_repo_version
[ 1%] Built target sac2c
[ 2%] Built target sac4c
[ 2%] Built target sac2tex
[ 68%] Built target sac2cShared
[ 81%] Built target sac_h
[ 82%] Built target runtime_libraries-configure
[ 82%] Performing build step for 'runtime_libraries'
[ 8%] Built target libsac-seq
[ 10%] Built target libsacphm-seq
[ 13%] Built target libsacphmdiag-seq
[ 13%] Built target libsacphmc-seq
[ 13%] Built target libsacphmcdiag-seq
[ 17%] Built target libsacdistmem-seq
[ 17%] Built target libsacrtspec-seq
[ 18%] Generating libsacprelude for target `seq'
Internal compiler error
Assertion "sacargudt != UT_NOT_DEFINED" failed at ../sac/sac2c/src/libsac2c/generics/generate_generic_type_conversions.c:350 -- Cannot find sacarg udt!
Please file a bug at: https://gitlab.sac-home.org/sac-group/sac2c/-/issues
```
This is on a Debian 12 (bookworm) sysstem with CUDA installed.https://gitlab.sac-home.org/sac-group/sac2c/-/issues/2366Virtual stack-trace of function applications2024-01-26T16:12:15ZJordy AalderingVirtual stack-trace of function applicationsWe want to keep track of a 'virtual' stack trace so that we can give the current trace if we encounter an error during run-time.
We return the arguments or return values to avoid code-reordering from messing with the location of these s...We want to keep track of a 'virtual' stack trace so that we can give the current trace if we encounter an error during run-time.
We return the arguments or return values to avoid code-reordering from messing with the location of these stack pushes and pops.
```
x1', .., xn' = _virtual_stack_push_ (x1, .., xn, "foo signature");
res1, .., resm = foo (x1', .., xn');
res1', .., resm' = _virtual_stack_pop_ (res1, .., resm);
```
Optionally, we might also be able to use this notation to construct a stack trace for compile time errors.
But that is something we should figure out in the future, first we must investigate whether this will even work as intended.
---
We might consider multiple tracing levels:
- No stack trace; used for performance-critical applications
- Trace user-defined function applications (even inlined ones); used for printing the stack trace of e.g. type checking, ecc, or type pattern errors
- Trace all function applications (even generated ones); useful for people debugging the compiler
For the two new function we might be able to reuse `NUMVARIABLERETS` and `CONTEXTSTRING` from the guard primitive.Jordy AalderingJordy Aalderinghttps://gitlab.sac-home.org/sac-group/sac2c/-/issues/2361Type pattern dimensionality check done too late2023-12-08T12:14:50ZJordy AalderingType pattern dimensionality check done too lateDimensionality checks generated for type patterns are inserted too late into the check functions, currently it is:
```
assigns
dim checks
checks
```
But to avoid executing erroneous code in the assigns it should be:
```
dim checks
ass...Dimensionality checks generated for type patterns are inserted too late into the check functions, currently it is:
```
assigns
dim checks
checks
```
But to avoid executing erroneous code in the assigns it should be:
```
dim checks
assigns
checks
```
Example:
```
use Array: all;
int[o] concat( int[m] x, int[n] y) | o == m+n
{
return {[i] -> x[i];
[i] -> y[i-m] | [m] <= [i] < [m+n] };
}
int[d:shp] mysel( int[n] idx, int[n:outer, d:shp] arr) | all (0 <= idx) && all (idx < outer)
{
return {iv -> _sel_VxA_ (concat (idx, iv), arr) | iv < shp};
}
int main () {
res = mysel ([1,2,2], reshape ([3,4], iota(12)));
return res;
}
```Jordy AalderingJordy Aalderinghttps://gitlab.sac-home.org/sac-group/sac2c/-/issues/2356Multithreaded AKD fold2024-01-11T15:45:51ZThomas KoopmanMultithreaded AKD fold# Compiler version
```
sac2c 1.3.3-MijasCosta-1111-g2cfd20
build-type: DEBUG
```
# Problem
The following program concetenates 100 `[1]`s into a single array and prints the shape.
```
int[.] rijgen(int[m] x, int[n] y)
{
res_shape =...# Compiler version
```
sac2c 1.3.3-MijasCosta-1111-g2cfd20
build-type: DEBUG
```
# Problem
The following program concetenates 100 `[1]`s into a single array and prints the shape.
```
int[.] rijgen(int[m] x, int[n] y)
{
res_shape = [_add_SxS_(m, n)];
conny = with {
([0] <= iv < [m]): _sel_VxA_(iv, x);
([m] <= iv < res_shape): _sel_VxA_(_sub_VxV_(iv, [m]),
y);
}: genarray(res_shape, 0);
return conny;
}
int main()
{
n = 100;
jean = with {
([0] <= iv < [n]): [1];
}: fold(rijgen, []);
return _sel_VxA_([0], _shape_A_(jean));
}
```
This gives error
```
*** SAC runtime error
*** No appropriate instance of function "_MAIN::rijgen :: int[*] int[*] -> int[.] " found!
```
for backend `mt_pth`.
# Investigation
The problem seems to be that
```MT_MTFUN_DEF_BEGIN( SACwf__MAIN_CL_MT__rijgen__i_S__i_S, , 3, out, int, (SAC_arg_1, (AKD, (NHD, (NUQ, (INT, (GLO, (FPM, (NOT, (NDI, (INT, )))))))))), in, int, (SACl_x, (AUD, (NHD, (NUQ, (INT, (GLO, (FPM, (NOT, (NDI, (INT, )))))))))), in, int, (SACl_y, (AUD, (NHD, (NUQ, (INT, (GLO, (FPM, (NOT, (NDI, (INT, )))))))))))```
forgets that arguments x and y are `AKD`, and turns it into an `AUD`.https://gitlab.sac-home.org/sac-group/sac2c/-/issues/2355Add DBUG_ASSERT variations to reduce duplicate code2023-11-21T22:08:07ZMichiel VerloopAdd DBUG_ASSERT variations to reduce duplicate codeMany functions start with a bunch of assertions, and rightly so. Making preconditions explicit in dbug_asserts is extremely helpful in stopping the propagation of bugs and conveying the intent to the reader. It is, however, quite repetit...Many functions start with a bunch of assertions, and rightly so. Making preconditions explicit in dbug_asserts is extremely helpful in stopping the propagation of bugs and conveying the intent to the reader. It is, however, quite repetitive. Consider the following, which I've taken from `WLUTupdateBoundNthDim`:
```
DBUG_ASSERT (bound != NULL, "bound may not be null!");
DBUG_ASSERT (*bound != NULL, "bound may not point to null!");
DBUG_ASSERT (NODE_TYPE (*bound) == N_array || NODE_TYPE (*bound) == N_id,
"bound must be an n_id or n_array but is \"%s\"!",
NODE_TEXT (*bound));
DBUG_ASSERT (new_scalar_avis != NULL, "new_scalar_avis may not be null!");
DBUG_ASSERT (vardecs != NULL, "Vardecs may not be null!");
DBUG_ASSERT (preassigns != NULL, "Preassigns may not be null!");
```
Frankly, I'm a bit tired of writing the same helpful debug messages over and over again.
Could we instead create some variations of the assertions to weed out most cases?
```
DBUG_ASSERT_NOT_NULL (bound, *bound, new_scalar_avis, vardecs, preassigns);
DBUG_ASSERT_NODE_IS_OF_TYPES (*bound, N_array, N_id);
```
This could still expand to give helpful messages as before, just now in an automatic manner.
Are the proposed variants feasible? What variants of DBUG_ASSERT should we make?https://gitlab.sac-home.org/sac-group/sac2c/-/issues/2351Specialisation problem2023-11-08T14:27:48ZThomas KoopmanSpecialisation problemUncommenting `tridag` makes the code slower because a version of modarray is no longer specialised, meaning the compiler cannot statically dispatch it.
TODO: reduce the example
```
use Array: all;
use Math: all;
use StdIO: all;
use Ben...Uncommenting `tridag` makes the code slower because a version of modarray is no longer specialised, meaning the compiler cannot statically dispatch it.
TODO: reduce the example
```
use Array: all;
use Math: all;
use StdIO: all;
use Benchmarking: all;
#define REAL double
#define TO_REAL tod
#define VEC 8
/**
* Initializes VarX as outer product of VarX1 and VarX2.
*/
REAL[NUM_Y], REAL[NUM_X] updateParams(REAL[NUM_X] X, REAL[NUM_Y] Y, REAL t)
{
VarX1 = {[j] -> exp(2.0 * Y[j] - NU * NU * t)};
VarX2 = {[i] -> pow(X[i], 2.0 * BETA)};
return (VarX1, VarX2);
}
/**
* Initializes indX, indY, X, Y, Dxx, Dyy
*/
inline
int, int, REAL[NUM_X], REAL[NUM_Y], REAL[3], REAL[3] initGrid()
{
stdX = 20.0 * ALPHA * S0 * sqrt(T);
dx = stdX / TO_REAL(NUM_X);
indX = toi(S0 / dx);
X = {[i] -> TO_REAL(i) * dx - TO_REAL(indX) * dx + S0 | [i] < [NUM_X]};
Dxx = [1.0 / (dx * dx), -2.0 / (dx * dx), 1.0 / (dx * dx)];
stdY = 10.0 * NU * sqrt(T);
dy = stdY / TO_REAL(NUM_Y);
indY = NUM_Y / 2;
Dyy = [1.0 / (dy * dy), -2.0 / (dy * dy), 1.0 / (dy * dy)];
Y = {[i] -> TO_REAL(i) * dy - TO_REAL(indY) * dy + log(ALPHA)
| [i] < [NUM_Y]};
return (indX, indY, X, Y, Dxx, Dyy);
}
//inline
//REAL[.] tridag(REAL[.] a, REAL[.] b, REAL[.] c, REAL[.] y)
//{
// n = shape(y)[0];
//
// /* This is the modified Thomas method from Numerical Methods.
// * Note that the non-zeroes in a row are a, b, c in this application,
// * and b, a, c in Numerical Methods.
// * We store gamma in b. */
// b[0] = 1.0 / b[0];
// y[0] = b[0] * y[0];
//
// b[1] = 1.0 / b[1];
// y[1] = b[1] * (y[1] - a[1] * y[0]);
// for (i = 2; i < n; i++) {
// b[i] = 1.0 / (b[i] - a[i] * b[i - 1] * c[i - 1]);
// y[i] = b[i] * (y[i] - a[i] * y[i - 1]);
// }
//
// for (i = n - 2; i >= 1; i--) {
// y[i] = y[i] - b[i] * c[i] * y[i + 1];
// }
//
// return y;
//}
/* a, b, c are constants signifying a Toeplitz system. This solves
the Toeplitz system on each column of y. */
inline
REAL[., .] toeplitz(REAL a, REAL b, REAL c, REAL[, .] y)
{
n = shape(y)[0];
gamma = genarray(take([1], shape(y)), 0d);
/* This is the modified Thomas method from Numerical Mathematics.
* Note that the non-zeroes in a row are a, b, c in this application,
* and b, a, c in Numerical Methods. */
gamma[0] = 1.0 / b;
y[0] = gamma[0] * y[0];
gamma[1] = 1.0 / b;
y[1] = gamma[1] * (y[1] - a * y[0]);
for (i = 2; i < n; i++) {
gamma[i] = 1.0 / (b - a * gamma[i - 1] * c);
y[i] = gamma[i] * (y[i] - a * y[i - 1]);
}
for (i = n - 2; i >= 1; i--) {
y[i] = y[i] - gamma[i] * c * y[i + 1];
}
return y;
}
noinline
REAL[NUM_X, VEC] tridag_v(REAL[NUM_X, VEC] a, REAL[NUM_X, VEC] b,
REAL[NUM_X, VEC] c, REAL[NUM_X, VEC] y)
{
n = shape(y)[0];
/* This is the modified Thomas method from Numerical Methods.
* Note that the non-zeroes in a row are a, b, c in this application,
* and b, a, c in Numerical Methods.
* We store gamma in b. */
b[0] = 1.0 / b[0];
y[0] = b[0] * y[0];
b[1] = 1.0 / b[1];
y[1] = b[1] * (y[1] - a[1] * y[0]);
for (i = 2; i < n; i++) {
b[i] = 1.0 / (b[i] - a[i] * b[i - 1] * c[i - 1]);
y[i] = b[i] * (y[i] - a[i] * y[i - 1]);
}
for (i = n - 2; i >= 1; i--) {
y[i] = y[i] - b[i] * c[i] * y[i + 1];
}
return y;
}
noinline
REAL[VEC, NUM_X] implicit_x(REAL[VEC] VarX1, REAL[NUM_X] VarX2, REAL[3] Dxx,
REAL[VEC, NUM_X] uu)
{
uu_t = {[i, j] -> uu[j, i]};
a = {[j] -> VarX1 * VarX2[j] | [1] <= [j] < [NUM_X - 1];
[j] -> genarray([VEC], 0d) | [NUM_X - 1] <= [j] < [NUM_X]};
res = tridag_v(a, 1d - 2d * a, a, uu_t);
return {[i, j] -> res[j, i]};
}
inline
REAL[NUM_Y, NUM_X] rollback(REAL dtInv, REAL[3] Dxx,
REAL[NUM_X] X, REAL[NUM_Y] Y,
REAL[NUM_Y] VarX1, REAL[NUM_X] VarX2,
REAL[3] Dyy, REAL VarY,
REAL[NUM_Y, NUM_X] ResultE)
{
/* explicit y */
V = {[j, i] -> 0.25 * VarY * (ResultE[j - 1, i] * Dyy[0] +
ResultE[j , i] * Dyy[1] +
ResultE[j + 1, i] * Dyy[2])
| [1, 0] <= [j, i] < [NUM_Y - 1, NUM_X];
[j, i] -> 0d
| [NUM_Y - 1, 0] <= [j, i] < [NUM_Y, NUM_X]
};
/* explicit x */
U = {[j, i] -> dtInv * ResultE[j, i]
| [0, 0] <= [j, i] < [NUM_Y, 1];
[j, i] -> dtInv * ResultE[j, i] + 0.25 * VarX1[j] * VarX2[i] * (
ResultE[j, i - 1] * Dxx[0] +
ResultE[j, i ] * Dxx[1] +
ResultE[j, i + 1] * Dxx[2])
| [0, 1] <= [j, i] < [NUM_Y, NUM_X - 1];
[j, i] -> dtInv * ResultE[j, i]
| [0, NUM_X - 1] <= [j, i] < [NUM_Y, NUM_X]
};
U = {[j, i] -> U[j, i] + 2.0 * V[j, i]};
/* implicit x */
VarX1 *= -0.25 * Dxx[0] / dtInv; // Scale solution by dtInv
VarX1p = reshape([NUM_Y / VEC, VEC], VarX1);
Up = reshape([NUM_Y / VEC, VEC, NUM_X], U);
Up = {[j] -> implicit_x(VarX1p[j], VarX2, Dxx, Up[j])};
U = reshape([NUM_Y, NUM_X], Up);
/* implicit y */
V = U - V;
a = -0.25 * VarY * Dyy[0];
ResultE = toeplitz(a, dtInv - 2d * a, a, V);
return ResultE;
}
inline
REAL value(REAL strike, int indX, int indY,
REAL[NUM_X] X, REAL[NUM_Y] Y,
REAL[3] Dxx, REAL[3] Dyy)
{
VarX1, VarX2 = updateParams(X, Y, TO_REAL(NUM_T - 2) * T / TO_REAL(NUM_T - 1));
ResultE = {[j, i] -> max(X[i] - strike, 0.0) | [j, i] < [NUM_Y, NUM_X]};
for (t = NUM_T - 2; t >= 0; t--) {
ResultE = rollback(TO_REAL(NUM_T - 1) / T, Dxx, X, Y, VarX1, VarX2,
Dyy, NU * NU, ResultE);
VarX2 *= exp(NU * NU * T / TO_REAL(NUM_T - 1));
}
return ResultE[indY, indX];
}
int main()
{
fprintf(stdout, "\n// SaC Volatility Calibration Benchmark:\n");
itime = getInterval("time", 1);
start(itime);
indX, indY, X, Y, Dxx, Dyy = initGrid();
result = {[i] -> value(TO_REAL(i) / 1000.0, indX, indY, X, Y, Dxx, Dyy)
| [i] < [OUTER]};
end(itime);
print(result);
time, unit = returnResultUnit(itime);
printf("This took %f%s.\n", time, unit);
return 0;
}
```https://gitlab.sac-home.org/sac-group/sac2c/-/issues/2348Freeing a bogus address in IVERAS2023-10-20T11:45:40ZThomas KoopmanFreeing a bogus address in IVERAS# Setup
Compiler (only modifications are sanitizer flags and no compilation for cuda backend)
```
sac2c 1.3.3-MijasCosta-1085-gd975-dirty
build-type: DEBUG
built-by: "thomas" at 2023-10-20T12:28:17
```
The problem occurs in compiling ...# Setup
Compiler (only modifications are sanitizer flags and no compilation for cuda backend)
```
sac2c 1.3.3-MijasCosta-1085-gd975-dirty
build-type: DEBUG
built-by: "thomas" at 2023-10-20T12:28:17
```
The problem occurs in compiling `src/structures/String.sac` from the Stdlib for the sequential backend.
# Error
In function `FREEattribExtLink`, we read `(NODE_TYPE (attr) == N_fundef)`. Here `attr` is not aligned to an 8-byte boundary, causing ubsan to complain.
# Some investigation into the problem
This address is in the memory segment corresponding to global variables, not the segment corresponding to glibc's malloc. It is the address between strings ```TRAVERSE ERROR: node of type %d:%s found where %d:%s was expected!\n\n``` and "/home/thomas/repos/sac2c/src/libsac2c/tree/pattern_match.c"
This attribute comes from `SET_MEMBER( arg_node)= FREEattribExtLink(SET_MEMBER( arg_node), arg_node);` in `FREEset`. The `arg_node` first makes its appearance in `FindIVOffset` as the result of `PMmultiExprs (2, shapeexpr, WITHOFFSET_SHAPEEXPR (oinfo)))`. Here `WITHOFFSET_SHAPEEXPR (oinfo)` is NULL. What `PMmultiExprs` does is create a `stack = NULL`, then pushes `shapeexpr` on that stack, and then `WITHOFFSET_SHAPEEXPR (oinfo) = NULL`. That does not seem correct.