|
|
If you are unfamiliar with the structure of the sac2c project it is a good idea to read about the [project structure](./project-structure.md) first.
|
|
|
|
|
|
The XML file refers to `src/libsac2c/xml/ast.xml`file.
|
|
|
This XML file holds the description of important things in the compiler like the format of the [abstract syntax tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree) (AST) and the description of traversals on the AST alom other things.
|
|
|
|
|
|
The build system uses `ast.xml` along with [XSL](https://en.wikipedia.org/wiki/XSL) to generate a bunch of code.
|
|
|
This generated code includes for instance all the structs for the AST and also access macros to access fields of the AST nodes.
|
|
|
|
|
|
If you want to change the AST you have to change this file.
|
|
|
|
|
|
Specifically `ast.xml` has the following structure:
|
|
|
|
|
|
```
|
|
|
ast.xml
|
|
|
└── definition
|
|
|
├── attributetypes
|
|
|
├── traversals
|
|
|
├── nodesets
|
|
|
└── syntaxtree
|
|
|
```
|
|
|
|
|
|
> [!Info] **XML Terminology**
|
|
|
> Something with these <> brackets around it is a tag.
|
|
|
> So `<syntaxtree>` is a syntax tree tag.
|
|
|
> This is an opening tag. Each tag must also be closed.
|
|
|
> Closing tags look like this: `</syntaxtree>` with `</` instead of `<`.
|
|
|
> The idea is that you can nest tags to create nested trees of data like this `<syntaxtree><node></node></syntaxtree>`
|
|
|
> This is a syntax tree tag with a node tag inside it. \
|
|
|
> If a tag has nothing inside it you can also close it right away using this shorthand: `<syntaxtree> <node/> </syntaxtree>`\
|
|
|
> \
|
|
|
> The opening tag can also have attributes like this: `<syntaxtree version="1.0"></syntaxtree>` this is a syntax tree tag with a version atribute.
|
|
|
|
|
|
# Syntax Tree
|
|
|
|
|
|
The `synaxtree` field of the XML file holds the description of all the possible nodes of the AST and their corresponding properties.
|
|
|
Each AST node has a description, sons, attributes and flags.
|
|
|
|
|
|
## Description
|
|
|
|
|
|
The description is simply an explanation about the AST node.
|
|
|
There is never too much description.
|
|
|
|
|
|
## Sons
|
|
|
|
|
|
The sons are all the children AST nodes of the current node.
|
|
|
For instance a `Module` node (the root of the AST) has many sons like for instance `Funs` which holds the function definitions in the module.
|
|
|
The `Fundef` node has a body son which has a block son which has an assigns son and so on.
|
|
|
|
|
|
The `name` attribute of a `<son>` tag does not decide the type of the son.
|
|
|
The type is decided using done using the `<node>` or `<set>` tag inside the `son.targets.target` tags like so:
|
|
|
|
|
|
```xml
|
|
|
<son><targets><target>
|
|
|
<node>
|
|
|
</target></targets>
|
|
|
</son>
|
|
|
```
|
|
|
|
|
|
The `<target>` tag has `mandatory` attribute where you can specify if the target is mandatory.
|
|
|
|
|
|
A `<node>` tag refers to a son that can only be a single type of node.
|
|
|
A `<set>` tag refers to a son that can be one of a set of nodes.
|
|
|
A good example of a node set is `Expr:nodeset`.
|
|
|
This node set holds all the possible expressions.
|
|
|
So if a son target is `<set name="Expr">` you know that it can hold all the expression nodes.
|
|
|
`<set>` is a sum type implemented as a tagged union.
|
|
|
All the node sets are defined under the `nodeset` tag.
|
|
|
There should also be a [phases](#Phases) definition.
|
|
|
|
|
|
Here is an example son definition of expressions on a `let` node:
|
|
|
|
|
|
```xml
|
|
|
<son name="Expr">
|
|
|
<targets>
|
|
|
<target mandatory="yes">
|
|
|
<set name="Expr" />
|
|
|
<phases> <all /> </phases>
|
|
|
</target>
|
|
|
</targets>
|
|
|
</son>
|
|
|
```
|
|
|
|
|
|
### Next
|
|
|
|
|
|
Many nodes have son called `Next`.
|
|
|
This is a pointer to the next son of the type of the current node.
|
|
|
For instance each `fundef` has a `next` son which holds the next fundef in the module or it is `NULL` if there are no more fundefs.
|
|
|
Using this Next son on nodes a linked list structure is created in parts of the tree.
|
|
|
|
|
|

|
|
|
### Accessor macros
|
|
|
|
|
|
You can get the type of an AST node using the `NODE_TYPE()` macro from `src/libsac2c/tree`.
|
|
|
You will often see code like this in traversals:
|
|
|
|
|
|
```c
|
|
|
bool is_fold = (NODE_TYPE(withop) == N_fold)
|
|
|
```
|
|
|
|
|
|
or something like this directly in an if statement condition.
|
|
|
|
|
|
If you want to access an attribute you use the `<NODENAME>_<SONNAME>` macro.
|
|
|
|
|
|
> [!info]
|
|
|
> The header files for these accessor macros are only generated after you compile the project once.
|
|
|
> For this reason it is a good idea to compile the project once if you want your language server if you have it.
|
|
|
|
|
|
## Attributes
|
|
|
|
|
|
Besides sons which are basically other nodes, each node also has `<attribute>` tags.
|
|
|
An attribute specifies a typed key value pair on the node.
|
|
|
Attributes store extra data on the node or pointers to other nodes in the tree which are not direct sons.
|
|
|
|
|
|
Attributes should have a description, a type and a target.
|
|
|
The attribute types (valid values for the name attribute of the `<type>` tag inside the `<attribute>` tag) are defined under the `<atributetypes>` tag.
|
|
|
|
|
|
If the type is a pointer to another node it is important that the target matches correctly to the node that is expected by the type.
|
|
|
If the target is not another node then the target is usually defined as `<any>` inside the `<target>` tag.
|
|
|
In that case the type is based on the definition in `<atributetypes>`.
|
|
|
|
|
|
The target tag should also have a [phages tag](#Phases)
|
|
|
|
|
|
Here is an example attribute definition:
|
|
|
|
|
|
```xml
|
|
|
<attribute name="MatchingSpawnSync" inconstructor="no">
|
|
|
<description>
|
|
|
If the node contains a spawned ap or a sync statement, this
|
|
|
will point to the matching spawn or sync statement.
|
|
|
</description>
|
|
|
<type name="Link">
|
|
|
<targets>
|
|
|
<target mandatory="no">
|
|
|
<node name="Let" />
|
|
|
<phases>
|
|
|
<!-- Read on for more info about phases -->
|
|
|
<phase from="fp_syn" to="final" />
|
|
|
</phases>
|
|
|
</target>
|
|
|
</targets>
|
|
|
</type>
|
|
|
</attribute>
|
|
|
```
|
|
|
|
|
|
If you want to access an attribute you use the `<NODENAME>_<ATTRIBUTENAME>` macro.
|
|
|
|
|
|
## Flags
|
|
|
|
|
|
Flags are boolean key values on nodes.
|
|
|
The type of a flag value is always true or false.
|
|
|
Flags are meant to be simple.
|
|
|
Flags should have a default value, a description and a [phase definition](#Phases).
|
|
|
|
|
|
Here is an example of a flag:
|
|
|
|
|
|
```xml
|
|
|
<flag name="DistMemHasSideEffects" default="FALSE">
|
|
|
<description>This function application has side effects</description>
|
|
|
<phases> <range from="dm_dmisefa" to="final" /> </phases>
|
|
|
</flag>
|
|
|
```
|
|
|
|
|
|
The accessor macro for this is defined using the `<nodename>_<flagname>`.
|
|
|
|
|
|
## Phases
|
|
|
|
|
|
For each son, attribute and phase of a node must also specify in which phases it is expected to be available.
|
|
|
The phase names are specified under the traversals tag.
|
|
|
This the `<phases>` tag should be placed inside the `<target>` tag alongside the `<set>` or `<node>` for sons or the `<all>` tag for attributes.
|
|
|
|
|
|
```xml
|
|
|
<phases>
|
|
|
<!-- specify here which phase it is available -->
|
|
|
</phases>
|
|
|
```
|
|
|
|
|
|
You can specify the phase in 4 different ways:
|
|
|
|
|
|
- `<all>` The son or attribute is available in every phase.
|
|
|
- `<unknown>` It is unknown when this attribute or son is available.
|
|
|
- `<range from="foo" to="bar">` The son or attribute is available from phase foo to phase bar. You may have multiple `<range>` tags inside a single `phases` tag.
|
|
|
- `<phase name="foo">` The attribute or son is available in the foo phase. You may have multiple `<phase>` tags inside a single a `phases` tag.
|
|
|
|
|
|
The phases might not always be up-to-date.
|
|
|
Make sure to fix outdated data when encountered.
|
|
|
|
|
|
## Constructor functions
|
|
|
|
|
|
The code build system generates a constructor function from all these fields.
|
|
|
A constructor function allocates the corresponding struct node into memory properly for you.
|
|
|
Each son or attribute can be an argument to the generated constructor function or not.
|
|
|
You can mark a son or attribute as being required in the constructor using the `inconstructor` attribute.
|
|
|
The values for this tag are `yes` or `no`.
|
|
|
If something is in the constructor you know that the value is never `NULL` because it is part of the constructor.
|
|
|
Anything that is not marked as `inconstructor` can be `NULL`
|
|
|
|
|
|
# Node sets
|
|
|
|
|
|
In node sets you can define AST nodes that are sets/collections of other AST nodes.
|
|
|
For instance the `<nodeset name="Stmt">` node set defines all the individual AST nodes that are statements.
|
|
|
Using node sets you can refer to a group of nodes using one name.
|
|
|
This is useful to express that a certain node in the AST can be any of the nodes of the node set.
|
|
|
For instance this `Stmt` node set is useful to define as the son of the body node.
|
|
|
Node sets are simple to define.
|
|
|
Just add something like the following xml to the `<nodesets>` tag.
|
|
|
|
|
|
```xml
|
|
|
<nodeset name="Stmt">
|
|
|
<target>
|
|
|
<node name="Let" />
|
|
|
<node name="Cond" />
|
|
|
<node name="Return" />
|
|
|
<node name="Do" />
|
|
|
<node name="Annotate" />
|
|
|
<node name="While" />
|
|
|
<node name="Icm" />
|
|
|
<node name="Cudast" />
|
|
|
</target>
|
|
|
</nodeset>
|
|
|
```
|
|
|
|
|
|
Just copy the other ones that are already there if you want to add a new one.
|
|
|
Do not forget to add any new nodes you make to proper node sets.
|
|
|
|
|
|
# Attribute Types
|
|
|
|
|
|
In the `<atributetypes>` tag all the valid types for attributes are specified.
|
|
|
The mean purpose of specifying this is code generation.
|
|
|
|
|
|
The idea is that each attribute types follows a certain interface.
|
|
|
If this interface is followed the all the tree utilities mainly specified in `src/libsac2c/tree` will work correctly.
|
|
|
|
|
|
You specify attribute types using `<type>`tag inside the `<atributetypes>` tag.
|
|
|
Each `<type>` tag has the following attributes:
|
|
|
|
|
|
- name: The name of the type
|
|
|
- ctype: The type of the attribute in c code
|
|
|
- init: How to initialize the value, can be any expression
|
|
|
- copy: How to copy the attribute. Can be:
|
|
|
- `literal` if the `ctype` is a c literal.
|
|
|
- `function` if there is a special function to copy the attribute somewhere. It is unclear however how to say what the name of this function is or where it is defined.
|
|
|
- `hash` Also unclear but I assume something with the hash of the memory somehow
|
|
|
|
|
|
Optional attributes include:
|
|
|
|
|
|
- vtype: Another c type but also unclear what this does.
|
|
|
- persist: Should be either yes or no, but I don't know what this does.
|
|
|
|
|
|
# Traversals
|
|
|
|
|
|
Traversals are edits the compiler performs on the AST.
|
|
|
The compiler makes these edits by walking nodes in the AST using the visitor pattern.
|
|
|
Almost all the phases in the compiler besides the parsing are implemented as traversals.
|
|
|
|
|
|
Defining a traversal in the XML file is simple.
|
|
|
Here is an example:
|
|
|
|
|
|
```xml
|
|
|
<traversal id="FFC" name="Filter fundef constraints" default="sons" include="filter_fundef_conditions.h">
|
|
|
<travuser>
|
|
|
<node name="Fundef" />
|
|
|
<node name="Ret" />
|
|
|
<node name="Arg" />
|
|
|
<node name="SPAp" />
|
|
|
<node name="SPId" />
|
|
|
</travuser>
|
|
|
</traversal>
|
|
|
```
|
|
|
|
|
|
A traversal is defined using the `<traversal>` tag.
|
|
|
Inside this tag you put a `<travuser>` tag and inside that you put a `<node>` tag for each node type that you want to visit in your traversal.
|
|
|
|
|
|
The `<traversal>` tag has the following attributes:
|
|
|
- `id` the short name to refer to the traversal. This is used in code generation. **The id should be all uppercase**
|
|
|
- `name` the name of traversal meant for humans. The id should be some sort of acronym from this name.
|
|
|
- `include` The header file which defines the functions for the traversal.
|
|
|
- `default` The default way of traversing nodes that are not specified inside `<travuser>`. I think that this is always just "sons" meaning that we always recursively traverse all the sons of the current node by default.
|
|
|
|
|
|
See [adding a traversal](./adding-a-traversal.md) for a full tutorial on adding a traversal to the compiler and what you should put inside the header file specified in the `include` attribute of the `<traversal>` tag. |