Updated safe function calls proposal. Add section about annotating entire files as safe. Expanded and improved section about infering #pragma safe by the compiler, and clean up alternatvie syntax suggestion section

Quinten Cabo · 7dc43f11
--- a/propsals-ideas/Safe-functions-Calls.md
+++ b/propsals-ideas/Safe-functions-Calls.md
+This is a proposal to annotate **function calls** as safe. 
+In a safe function call the pre- and post-conditions of the function MUST be skipped. 
+
+The programmer can annotate function calls as safe by putting a `#pragma safe` behind a function call expression. 
+As a convenience the programmer can also annotate entire statement blocks of a function definition with `#pragma safe` to indicate that all function calls in the body are safe. 
+
+As another convenience the programmer can also annotate an entire file as safe by putting `#pragma safe` on top of the file. This will flag every function definition body as safe which will in turn flag every function call as safe.
+## Small Example 
+
+First show the current location.
+Take the following code that is NOT annotated with pragma safe.
+
+```c
+/* checks primality of inputs */
+extern bool[dim:shp] isprime(int[dim:shp] a); 
+
+int foo(x) | x > 0 { return x;}
+
+int[2] bar (int x) | isprime(x) { // All function calls in bar are safe
+	foo(x);
+	return {x,x+2};
+} 
+
+int main() {  
+   result = bar(11); 
+   return 0;  
+}
+```
+
+This would normally (without any `#pragma safe`) compile into:
+```c
+int _foo_impl (int x) {return x;}
+
+int[2] _bar_impl (int x) {
+	r = foo();
+	return {r,x};
+}
+
+inline
+int[2] bar (int a) {
+	p = isprime(a);
+	error::abort(p, "Error: arg a does not satisfy check isprime(a)");
+	return _bar_impl(a);
+}
+
+inline
+int[2] foo (int x) {
+	p = x > 0;
+	error::abort(p, "Error: arg x does not satisfy check x > 0");
+	return _foo_impl(x);
+}
+
+int main() {  
+   result = bar(11); 
+   return 0;  
+}
+```
+
+What happens if you add `#pragma safe`?
+We know for instance that `11` is in fact prime. 
+
+```c
+/* checks primality of inputs */
+extern bool[dim:shp] isprime(int[dim:shp] a); 
+
+int foo(x) | x > 0 { return x;}
+
+int[2] bar(int a) | isprime(a) { // All function calls in bar are safe
+	r = foo(a);
+	return {r,a};
+} #pragma safe
+
+int main() {  
+	// This call to bar is safe, 
+	//   isprime(x) is not checked we know 11 is prime!
+   result = bar(11) #pragma safe; 
+   return 0;  
+}
+```
+
+This would conceptually compile to something like (at the end of phase 2):
+
+```c
+int[2] _bar_impl (int a) {
+	r = foo_impl(a)
+	return {r,a};
+}
+
+int _foo_impl(int a) {return a;}
+
+inline 
+int foo(int a) {
+	p = a > 0;
+	error::abort(p, "Error: arg a does not satisfy check a > 0");
+	return _foo_impl(a);
+}
+
+
+inline
+int[2] bar (int a) {
+	p = isprime(a);
+	error::abort(p, "Error: arg a does not satisfy check isprime(a)");
+	return _bar_impl(a);
+}
+
+int main() {  
+   result = _bar_impl(11); 
+   print(result);
+   return 0;  
+}
+```
+
+### What happens here?
+
+The compiler renames the original function `<fname>` authored by the programmer (in the example it is called `bar`) to `_<fname>_impl`. 
+Then the compiler creates a new function called `<fname>` which is a `<_fname_impl>` call decorated with the checks generated from the [type patterns](type_patterns.md). 
+This all happens in the `rtpf` step. 
+In other words the compiler replaces the `<fname>` function with a decorated version that does the checks. 
+
+In a traversal after the type pattern pipeline the compiler replaces the `<fname>` function calls that have been annotated as safe using `#pragma safe` with `_<fname>_impl` in order to skip the checks. To avoid errors in the future this renaming happens with an exported function from the type pattern code called `GTPgetImplFundef`.
+
+This way we have a version of each function `bar` with and without the checks. 
+The compiler can skip the checks of a function by replacing the call to the function by calling the `_<fname>_impl` version of it.
+
+In the example above the `bar` and `foo` functions are actually dead code!
+### How do we deal with overloading?
+
+The idea is that we rename all the overloaded functions written by the programmer to the same `_<fname>_impl` name which preserves the overloading. 
+
+In other words all functions in the same overload set should all be renamed to the same  `_<fname>_impl` name.  This way you keep the original overloading intact.
+
+For each version of any function (here `bar`) we generate a `_bar_impl` function. 
+The original `bar` function is renamed to `_bar_impl`.
+
+This way the [overloading](sac2c.wiki/concepts/overloading.md) later should be handled properly. 
+
+In this current example there is only one version of `bar`. 
+Hopefully the next larger example will show overloading better.
+
+Important is that we always generate the `_impl` functions. 
+They are also always exported so that other modules can make direct calls to function implementations.
+
+## Larger Example 
+
+In this example `foo` is overloaded, and it is called safely and a non-safe way. 
+Lets see what happens. 
+
+```c
+/* checks primality of inputs */
+extern bool[dim:shp] isprime (int[dim:shp] a); 
+
+// Imagine the foo do something intresting things with a
+int foo (int a) | isprime(a) { return a; } 
+int[dim:shp] foo(int[dim:shp] a) | all(isprime(a)) { return a; }
+
+int[2] bar (int a) | isprime(a) {
+	b = foo(a) #pragma safe;
+	return foo([a,b]);
+}
+  
+int main () {
+	result = bar(11) #pragma safe;
+	print(result);
+	return 0;  
+}
+```
+
+In the code above the programmer (and maybe the compiler at some point 🤞) know the checks can be skipped in some cases. 
+Because, 11 is in fact a prime number.
+
+Additionally, if we have already checked `isprime` on the input of `bar` we can consider at least the first call to `foo` inside `bar` as safe because we know `a` should be prime. 
+In these cases you can mark a function call as safe.
+
+The larger example code  would compile (after the `pre` step) to:
+
+```c
+extern bool[dim:shp] isprime(int[dim:shp] a); 
+
+
+int _foo_impl(int a)  {   // NO longer impl1!!!
+  return a;  
+}
+int[dim:shp] _foo_impl(int[dim:shp] a)  {  // NO longer impl2!!!
+  return dim;  
+}
+
+int foo(int a)  {  
+  p = isprime(a);
+  error::abort(p, "Error: a does not satify isprime(a)");
+  return _foo_impl(a);  
+}
+int[dim:shp] foo(int[dim:shp] a)  {  
+  p = all(isprime(a));
+  error::abort(p, "Error: a does not satify all(isprime(a))");
+  return foo_impl(a);  
+}
+
+int[2] _bar_impl(int a) {
+	b = _foo_impl(a);
+	return foo([a,b]);
+}
+
+int[2] bar(int a) {
+	p = isprime(a);
+	error::abort(p, "Error: a does not satisfy isprime(a)");
+	return bar_impl(a);
+}
+
+int main() {  
+   result = _bar_impl(11);
+   print(result);
+   return 0;  
+}
+```
+
+For each function we generate a checked version that replaces the original function and also calls the original version. 
+The original versions are saved to the `_<fname>_impl` function.
+
+The `_<fname>_impl` versions are overloaded. 
+Later during type checking and overload resolution this should all work out.  
+
+# Safe blocks definition shortcut 
+
+If we do not want the compiler to check **any** function call in a function definition the programmer can also mark the all the function calls in a function definition as safe. 
+The programmer can do this by annotating the body of a function with `@`.
+
+See the example below:
+
+```c
+extern bool isprime(); // Expensive check  
+  
+// Imagine all these do something useful with a  
+int foo1(int a) | isprime(a) {return a;}  
+int foo2(int a) | isprime(a) {return a;}  
+int foo3(int a) | isprime(a) {return a;}  
+  
+int bar(int a) | isprime(a) {  
+  r0 = foo1(a);  
+  r1 = foo1(a);  
+  r2 = foo2(a);  
+  r3 = foo3(a);  
+return r0 + r1 + r2 + r3;  
+} #pragma safe
+```
+
+In this example we already check `isprime` as a precondition of `bar` so we, as the programmer (and hopefully the compiler some day), know we do not have to check it again for `foo1`, `foo2` and `foo3`. 
+
+We can indicate this to the compiler by making the entire `bar` function body safe. 
+
+Marking the `bar` function as safe (like above) is syntactic sugar for:
+
+```c
+int bar(int a) | isprime(a) {  
+  r0 = foo1(a) #pragma safe;  
+  r1 = foo1(a) #pragma safe;  
+  r2 = foo2(a) #pragma safe;  
+  r3 = foo3(a) #pragma safe;  
+  return r1 + r2 + r3;  
+}
+```
+
+
+You can in fact mark any code block as safe by appending a `#pragma safe`:
+
+```c
+int bar(int a) | isprime(a) {  
+  r = foo1(a) #pragma safe;
+  i = 10;
+  while (i-- > 0) {
+	  r1 = foo1(a);  
+	  r2 = foo2(a);  
+	  r3 = foo3(a);
+	  r += r1 + r2 + r3;  
+  } #pragma safe
+    
+  return foo(r);
+}
+
+/** 
+* Would Compile Into:
+**/
+int bar(int a) | isprime(a) {  
+  r = foo1(a) #pragma safe;
+  i = 10;
+  while (i-- > 0) {
+	  r1 = foo1(a) #pragma safe;  
+	  r2 = foo2(a) #pragma safe;  
+	  r3 = foo3(a) #pragma safe;
+	  r = +(r,+(r2, r3) #pragma safe) #pragma safe; 
+  }
+	
+  return foo(r); // Not marked safe
+}
+
+```
+
+> **Note** that the last foo call at the return statement is not marked as safe because the function definition block itself was not marked as safe. 
+
+The `#pragma safe` propagates safety automatically to sub-blocks.
+In other words if a block is annotated as safe any nested block is also safe.
+
+Take as example the following code.  
+
+```c
+void bar(int a) | isprime(a) {  
+  while (true) {
+	  foo();
+	  if (true) { foo() }
+  } #pragma safe
+  return foo();  
+}
+
+/** 
+* Would Compile Into:
+**/
+void bar(int a) | isprime(a) {  
+  while (true) {
+	  foo(a) #pragma safe;
+	  if (true) { foo(a) #pragma safe }
+  }
+  return foo();  
+}
+
+```
+
+
+# Safe modules shortcut 
+
+The programmer can also mark an entire file as safe.
+This will apply `#pragma safe` to every function body block in the file. 
+In essence this is a shortcut for the previous block shortcut. A shortcut shortcut. 
+
+> The pragma comes even before the module keyword.
+
+## Example
+
+```c
+#pragma safe;
+module Bar;
+
+int[n] id(int [n] x) {
+	return x;
+}
+
+int[n] bar(int [n] x) {
+	return id(x);
+}
+```
+
+> The semicolon behind `#pragma safe` is optional
+
+There is an extra traversal that converts the above code to:
+
+```c
+#pragma safe;
+module Bar;
+
+int[n] id(int [n] x) {
+	return x;
+} #pragma safe
+
+int[n] bar(int [n] x) {
+	return id(x);
+} #pragma safe 
+```
+
+Indeed, by putting `#pragma safe` at the top of the file every function block becomes safe. 
+This then later makes every function call in the file safe. 
+
+> [!info]
+> Just for clarity. If one would import this module the pre- and post-conditions on the imported functions themselves of course still run. 
+> For instance if an importer calls `id` in another module the post condition which checks if the input shape equals output shape still runs. 
+> **You can only remove checks from function applications not from the function definitions**. 
+
+# The syntax motivation
+
+Initially there was discussion to introduce new syntax to sac in order to flag a body or ap as safe. 
+We started with a `#pragma safe` which turned into a `safe` keyword that quickly moved to the `@` operator.  
+
+In the end we decided on `#pragma safe` at the end of blocks or function calls or at the start of an entire module. 
+This was chosen because the other syntax had downsides that the `#pragma` does not have. Writing `#pragma` is familiar to c programmers, and it doesn't add any real new syntax. 
+
+Pragma are a good idea because it allows to add features that the programmer can add when they want. Later the compiler might be able to turn on the feature as well.
+
+## Alternatives syntax suggestions
+
+This section goes over other syntax suggestions which was considered and the motivation to not go with along with suggestions for syntax.
+
+### @ in front of function call
+
+- Adding operators without names is not very readable. Very fun to write though but who knows what they do. We might add `@.` later why not! So it is a slippery slope. We don't want to go on this slippery slope.
+- You can actually currently define functions called `@` which is very confusing. See this example: 
+
+```c
+int @( int a, int b)
+{
+    return _add_SxS_( a, b);
+}
+
+int @( int a, int b, int c)
+{
+    return _add_SxS_( a, b);
+}
+
+int @( int a)
+{
+    return _add_SxS_(1, a);
+}
+
+int main ()
+{
+res = @ @(1,2, 3 @ with {
+        ([0] <= [i] < [200]): i;
+      } : fold ( @, 0));
+
+return res;
+}
+```
+- It looks weird too programmers coming from c. 
+
+Why was the `@` symbal a good idea.
+
+- It has no use yet in sac
+- It is nice and short which is good if we want programmers to actually use it 
+- Many symbols on the QWERTY keyboard already have semantic meaning to programmers. This is not the case for @. For instance % is often already modulo. 
+- Other real programming context `@` can be found is:
+	- decorators in Python and Typescript. This is good because decorating functions is actually what we are doing. **Using @ your actually calling the undecorated version of a function**. 
+	- Matrix multiplication operator in python. Not an issue. 
+	- In java it is used to provide meta information to the compiler or runtime. Also, good.
+- @ is pronounced as "at" which kinds of sounds like "ap" which reminds of application.
+
+### Putting @ in front of the function definition name 
+
+When discussing the syntax we discovered that it would be **bad** to have this syntax like this 
+
+```
+int @bar(int a) | isprime(a) {}
+```
+
+Because then when you annotate a bar call as safe it looks as if you are calling a function named `@bar`and **this is bad.**
+
+By placing the @ syntax in front of the body you're making it explicit that the @ corresponds to the body of the function and has nothing to do with the name. 
+
+### Putting @ behind the { in a function definition
+
+You could also have 
+
+```
+int bar(int a) | isprime(a) {@
+	...
+}
+```
+
+This is less good. Because by putting the `@` before the `{` unlike above you can still see the `@` as a unary operator that takes in the entire function body block as input instead of a function call.
+You lose this intuition if you put it behind the `{`. 
+
+Also having the @ behind the `{` might be a bit harder to parse as it becomes like a first statement you have to parse instead of optional syntax at the start of a function definition.
+
+One potential argument to have it in front of the  `{` is to avoid that the `@` looks to be part of the type pattern. However, it doesn't look like its part of the type pattern because there is no leading `@` and there is no real pre-adjusted meaning in most programmers minds to the `@` symbol. 
+
+```
+int bar(int a) | a > 2, isprime(a), a != 40 @ {
+	...
+}
+```
+
+But this looks fine. Having the `@` behind the `{` looks worse.
+
+## Safe keyword
+
+Another suggestion was to introduce a `safe` keyword.
+
+```
+int bar(int a) | a > 2, isprime(a), a != 40 safe {}
+```
+
+or 
+
+```
+int bar(int a) | a > 2, isprime(a), a != 40 {
+	a = safe foo(a);
+	b = safe foo(a);
+	c = foo(a-1);
+	return a + 2;
+}
+```
+
+While this is more explicit and clear it was considered to be too verbose for if we want programmers to actually use this feature.
+
+## Putting the @ in front of the type pattern
+
+```
+int bar(int a) @ | a > 2, isprime(a), a != 40 {
+	...
+}
+```
+
+Another option. This is just valid syntax and is implemented.
+Here it kind of looks like the at is acting on the type patterns instead of the body.
+
+This would mean to the programmer like strong type pattern. If these type patterns hold we can skip all the checks done by function applications in the function. 
+I think I like this less than putting it before the body.
+
+## Double | 
+
+With this we also mark the safe before the type pattern but just with another |. 
+
+```
+int bar(int a) || a > 2, isprime(a), a != 40 {
+	...
+}
+```
+
+I like this a lot, but it sadly won't work well with annotating function calls. 
+It won't work well because | is already an operator in the standard library. 
+
+```
+int bar(int a) || a > 2, isprime(a), a != 40 {
+	return 2 | foo();
+}
+```
+
+Would be ambiguous unless we say that | in front of an ap always makes the call safe and that something like this means actually using the | operator on the return value of foo. 
+
+```
+int bar(int a) || a > 2, isprime(a), a != 40 {
+	return 2 | (foo());
+}
+```
+
+Another possible option would be suffix:
+```
+int bar(int a) || a > 2, isprime(a), a != 40 {
+	return 2 + foo()|;
+}
+```
+
+This is kind of nice because it looks like your setting the checks to null. Woah you could maybe even have like inline checks!!!
+
+```
+int bar(int a) || a > 2, isprime(a), a != 40 {
+	return 2 + foo(a) | a > 5;
+}
+```
+
+# On inferring `#pragma safe`
+
+How cool would it be if the compiler could infer `#pragma safe` on function calls? 
+
+If you have this example:
+
+```c
+int bar(int a) | a > 2, isprime(a) {return foo(a);}
+int foo(int b) | isprime(b) {return b;}
+```
+
+Here the prechecks performed on `a` contain all the checks that foo performs on b. 
+As long as `a` is not reassigned here then we know that the predicates `a>2` and `isprime` hold for `a`at the call to foo. Thus, foo could be inferred as safe. 
+
+This requires binding the predicate expressions to the variable assignments. 
+If the compiler supports this it should be able to infer `#pragma safe`.
+With the example above, the compiler could look up the checks that foo does on its input and then look up the set of predicates of the argument given. For instance a function called `pred` that takes a function name and the name of an input and returns the set of predicates which hold for that input variable. I think it's a good idea to replace the binding of input variable itself with a Greek letter so that you can easily compare sets of predicates. I chose $I$ iota for this to refer to identity.
+
+In this case that be $preds(bar, a) = \{I>2, isprime(I)\}$ and the for `b` of foo you get $preds(foo, b) = \{isprime(I)\}$.
+
+This made me wonder how we would deal with predicates that take more the one function input as argument? The best I could come up with is to put lambda functions in the set which still need another argument. Meaning that with
+
+```c
+void lala(int x, int y) | x > y, x > 4 {}
+```
+The predicates of $a$ would be $pred(foo, x) = \{\\\alpha \rightarrow I > \alpha, I > 4\}$ 
+
+If the difference between the sets of predicates that must hold to call the function (foo here) and the set of predicates already checked for the input of that functions is $\emptyset$ then the function call can be inferred as safe.  
+In this case:
+
+$preds(foo, b) - preds(bar, a) = \{isprime(I)\} - \{I > 2, isprime(I)\}=  \emptyset$
+
+Because there are no other arguments of foo the call to foo could be inferred as safe here!
+
+What makes this more challenging is overloading because if you want to infer safety of the foo call, but different versions of foo exist with different predicates then how do you know which ones to check? Additionally, more versions of foo might come in later after the module system has completed. So this is also challenging. One simple solution might be to check if `a` satisfies all pre- and post- conditions of all known versions of foo. This way the overloading does not matter because whichever foo it is they are all safe. But it is questionable how often safety could be inferred. I think that for this to work it would need to happen after the dispatch.
+ 
+You could take this one step further and infer that if an input has been checked for `a > 10` that using symbolic reasoning this also means `a > 9` etc. This requires that you can have some equivalence relations on functions $\Rightarrow$ when you take the difference in sets of predicates.
+
+In that case you could infer:
+
+```c
+int bar(int a) | a > 10 {return foo(a);}
+int foo(int b) | b > 5  {return b;}
+```
+
+Which would lead to $preds(foo, b) - preds(bar, a) = \{I > 5\} - \{I>10\}=  \emptyset$ and `#pragma safe` could be added to the `foo` application. 
+
+The reason that this difference evaluates to $\emptyset$ is because $I>10 \Rightarrow I>5$ and thus the $-$ operation removes $I>5$ from $preds(foo, b)$ because it is essentially present in the set on the right-hand side. 
+
+What would happen if we would instead call `foo` from `bar`?
+
+```c
+int bar(int a) | a > 10 {return a;}
+int foo(int b) | b > 5  {return bar(b);}
+```
+
+Here to infer the call to `bar` in `foo` as safe you would get:
+
+$$preds(bar, a) - preds(foo, b) = \{I>10\} - \{I>5\}= \{I>10\}$$
+
+This would not go to $\emptyset$ because  $I>5 \Rightarrow I>10$ does NOT hold. 
+This is an interesting case because we could even try to give warnings saying `foo` might error if $10 > b > 5$ would occur.
+
+I think that we can maybe define these `\Rightarrow` relations based on the primitive functions which sit at the bottom of all other functions. 
+
+# Implementation ideas
+
+## Parsing
+
+After parsing the `@` symbol we could add a flag to the `Ap` node it is in front of. 
+This will then set the safe flag on the `Ap` node to true. 
+If we encounter a `@` symbol before a function body the safe flag can be set to true on the Body node. 
+Then during parsing we can mark any Ap node we find in that body as safe.
+
+## Traversal
+
+In the `rtpf` phase every `impl` function must get the same name.
+
+Then in a new traversal right after `rtpf` every function call that has the `safe` flag must be renamed to the `_<fname>_impl` version.
+
+# Progress 
+
+- [x] Add safe flag to siap node  
+- [x] Add safe flag to body node  
+- [x] Add `#pragma safe` parsing for funcalls nodes
+- [x] Add `#pragma safe` parsing for block nodes
+- [x] Add `#pragma safe` to module 
+- [x] Add phase to desugar the safe blocks 
+- [x] Add phase to desugar the safe module annotation
+- [x] Fix naming using new exported function
+- [x] Document the naming function from type patterns phase.
+- [x] Add phase to convert safe functions to correct name 
+- [x] Deal with safe infix functions. You can't just name those impl. How do type patterns on infix functions deal?