Skip to content
GitLab
  • Menu
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • sac2c sac2c
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 403
    • Issues 403
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 12
    • Merge requests 12
  • Deployments
    • Deployments
    • Releases
  • Wiki
    • Wiki
  • External wiki
    • External wiki
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • sac-group
  • sac2csac2c
  • Wiki
  • Tutorials
  • ast.xml

ast.xml · Changes

Page history
added tutorials for adding a traversal tutorial and a tutorial about project... authored Feb 18, 2025 by Quinten Cabo's avatar Quinten Cabo
added tutorials for adding a traversal tutorial and a tutorial about project structure and ast.xml. Also added tutorials from other people I found in the source code and a concepts directory with phm and ref counting also added a faq file
Hide whitespace changes
Inline Side-by-side
tutorials/ast.xml.md 0 → 100644
View page @ 66bc47a8
If you are unfamiliar with the structure of the sac2c project it is a good idea to read about the [project structure](./project-structure.md) first.
The XML file refers to `src/libsac2c/xml/ast.xml`file.
This XML file holds the description of important things in the compiler like the format of the [abstract syntax tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree) (AST) and the description of traversals on the AST alom other things.
The build system uses `ast.xml` along with [XSL](https://en.wikipedia.org/wiki/XSL) to generate a bunch of code.
This generated code includes for instance all the structs for the AST and also access macros to access fields of the AST nodes.
If you want to change the AST you have to change this file.
Specifically `ast.xml` has the following structure:
```
ast.xml
└── definition
├── attributetypes
├── traversals
├── nodesets
└── syntaxtree
```
> [!Info] **XML Terminology**
> Something with these <> brackets around it is a tag.
> So `<syntaxtree>` is a syntax tree tag.
> This is an opening tag. Each tag must also be closed.
> Closing tags look like this: `</syntaxtree>` with `</` instead of `<`.
> The idea is that you can nest tags to create nested trees of data like this `<syntaxtree><node></node></syntaxtree>`
> This is a syntax tree tag with a node tag inside it. \
> If a tag has nothing inside it you can also close it right away using this shorthand: `<syntaxtree> <node/> </syntaxtree>`\
> \
> The opening tag can also have attributes like this: `<syntaxtree version="1.0"></syntaxtree>` this is a syntax tree tag with a version atribute.
# Syntax Tree
The `synaxtree` field of the XML file holds the description of all the possible nodes of the AST and their corresponding properties.
Each AST node has a description, sons, attributes and flags.
## Description
The description is simply an explanation about the AST node.
There is never too much description.
## Sons
The sons are all the children AST nodes of the current node.
For instance a `Module` node (the root of the AST) has many sons like for instance `Funs` which holds the function definitions in the module.
The `Fundef` node has a body son which has a block son which has an assigns son and so on.
The `name` attribute of a `<son>` tag does not decide the type of the son.
The type is decided using done using the `<node>` or `<set>` tag inside the `son.targets.target` tags like so:
```xml
<son><targets><target>
<node>
</target></targets>
</son>
```
The `<target>` tag has `mandatory` attribute where you can specify if the target is mandatory.
A `<node>` tag refers to a son that can only be a single type of node.
A `<set>` tag refers to a son that can be one of a set of nodes.
A good example of a node set is `Expr:nodeset`.
This node set holds all the possible expressions.
So if a son target is `<set name="Expr">` you know that it can hold all the expression nodes.
`<set>` is a sum type implemented as a tagged union.
All the node sets are defined under the `nodeset` tag.
There should also be a [phases](#Phases) definition.
Here is an example son definition of expressions on a `let` node:
```xml
<son name="Expr">
<targets>
<target mandatory="yes">
<set name="Expr" />
<phases> <all /> </phases>
</target>
</targets>
</son>
```
### Next
Many nodes have son called `Next`.
This is a pointer to the next son of the type of the current node.
For instance each `fundef` has a `next` son which holds the next fundef in the module or it is `NULL` if there are no more fundefs.
Using this Next son on nodes a linked list structure is created in parts of the tree.
![Next example](next.svg)
### Accessor macros
You can get the type of an AST node using the `NODE_TYPE()` macro from `src/libsac2c/tree`.
You will often see code like this in traversals:
```c
bool is_fold = (NODE_TYPE(withop) == N_fold)
```
or something like this directly in an if statement condition.
If you want to access an attribute you use the `<NODENAME>_<SONNAME>` macro.
> [!info]
> The header files for these accessor macros are only generated after you compile the project once.
> For this reason it is a good idea to compile the project once if you want your language server if you have it.
## Attributes
Besides sons which are basically other nodes, each node also has `<attribute>` tags.
An attribute specifies a typed key value pair on the node.
Attributes store extra data on the node or pointers to other nodes in the tree which are not direct sons.
Attributes should have a description, a type and a target.
The attribute types (valid values for the name attribute of the `<type>` tag inside the `<attribute>` tag) are defined under the `<atributetypes>` tag.
If the type is a pointer to another node it is important that the target matches correctly to the node that is expected by the type.
If the target is not another node then the target is usually defined as `<any>` inside the `<target>` tag.
In that case the type is based on the definition in `<atributetypes>`.
The target tag should also have a [phages tag](#Phases)
Here is an example attribute definition:
```xml
<attribute name="MatchingSpawnSync" inconstructor="no">
<description>
If the node contains a spawned ap or a sync statement, this
will point to the matching spawn or sync statement.
</description>
<type name="Link">
<targets>
<target mandatory="no">
<node name="Let" />
<phases>
<!-- Read on for more info about phases -->
<phase from="fp_syn" to="final" />
</phases>
</target>
</targets>
</type>
</attribute>
```
If you want to access an attribute you use the `<NODENAME>_<ATTRIBUTENAME>` macro.
## Flags
Flags are boolean key values on nodes.
The type of a flag value is always true or false.
Flags are meant to be simple.
Flags should have a default value, a description and a [phase definition](#Phases).
Here is an example of a flag:
```xml
<flag name="DistMemHasSideEffects" default="FALSE">
<description>This function application has side effects</description>
<phases> <range from="dm_dmisefa" to="final" /> </phases>
</flag>
```
The accessor macro for this is defined using the `<nodename>_<flagname>`.
## Phases
For each son, attribute and phase of a node must also specify in which phases it is expected to be available.
The phase names are specified under the traversals tag.
This the `<phases>` tag should be placed inside the `<target>` tag alongside the `<set>` or `<node>` for sons or the `<all>` tag for attributes.
```xml
<phases>
<!-- specify here which phase it is available -->
</phases>
```
You can specify the phase in 4 different ways:
- `<all>` The son or attribute is available in every phase.
- `<unknown>` It is unknown when this attribute or son is available.
- `<range from="foo" to="bar">` The son or attribute is available from phase foo to phase bar. You may have multiple `<range>` tags inside a single `phases` tag.
- `<phase name="foo">` The attribute or son is available in the foo phase. You may have multiple `<phase>` tags inside a single a `phases` tag.
The phases might not always be up-to-date.
Make sure to fix outdated data when encountered.
## Constructor functions
The code build system generates a constructor function from all these fields.
A constructor function allocates the corresponding struct node into memory properly for you.
Each son or attribute can be an argument to the generated constructor function or not.
You can mark a son or attribute as being required in the constructor using the `inconstructor` attribute.
The values for this tag are `yes` or `no`.
If something is in the constructor you know that the value is never `NULL` because it is part of the constructor.
Anything that is not marked as `inconstructor` can be `NULL`
# Node sets
In node sets you can define AST nodes that are sets/collections of other AST nodes.
For instance the `<nodeset name="Stmt">` node set defines all the individual AST nodes that are statements.
Using node sets you can refer to a group of nodes using one name.
This is useful to express that a certain node in the AST can be any of the nodes of the node set.
For instance this `Stmt` node set is useful to define as the son of the body node.
Node sets are simple to define.
Just add something like the following xml to the `<nodesets>` tag.
```xml
<nodeset name="Stmt">
<target>
<node name="Let" />
<node name="Cond" />
<node name="Return" />
<node name="Do" />
<node name="Annotate" />
<node name="While" />
<node name="Icm" />
<node name="Cudast" />
</target>
</nodeset>
```
Just copy the other ones that are already there if you want to add a new one.
Do not forget to add any new nodes you make to proper node sets.
# Attribute Types
In the `<atributetypes>` tag all the valid types for attributes are specified.
The mean purpose of specifying this is code generation.
The idea is that each attribute types follows a certain interface.
If this interface is followed the all the tree utilities mainly specified in `src/libsac2c/tree` will work correctly.
You specify attribute types using `<type>`tag inside the `<atributetypes>` tag.
Each `<type>` tag has the following attributes:
- name: The name of the type
- ctype: The type of the attribute in c code
- init: How to initialize the value, can be any expression
- copy: How to copy the attribute. Can be:
- `literal` if the `ctype` is a c literal.
- `function` if there is a special function to copy the attribute somewhere. It is unclear however how to say what the name of this function is or where it is defined.
- `hash` Also unclear but I assume something with the hash of the memory somehow
Optional attributes include:
- vtype: Another c type but also unclear what this does.
- persist: Should be either yes or no, but I don't know what this does.
# Traversals
Traversals are edits the compiler performs on the AST.
The compiler makes these edits by walking nodes in the AST using the visitor pattern.
Almost all the phases in the compiler besides the parsing are implemented as traversals.
Defining a traversal in the XML file is simple.
Here is an example:
```xml
<traversal id="FFC" name="Filter fundef constraints" default="sons" include="filter_fundef_conditions.h">
<travuser>
<node name="Fundef" />
<node name="Ret" />
<node name="Arg" />
<node name="SPAp" />
<node name="SPId" />
</travuser>
</traversal>
```
A traversal is defined using the `<traversal>` tag.
Inside this tag you put a `<travuser>` tag and inside that you put a `<node>` tag for each node type that you want to visit in your traversal.
The `<traversal>` tag has the following attributes:
- `id` the short name to refer to the traversal. This is used in code generation. **The id should be all uppercase**
- `name` the name of traversal meant for humans. The id should be some sort of acronym from this name.
- `include` The header file which defines the functions for the traversal.
- `default` The default way of traversing nodes that are not specified inside `<travuser>`. I think that this is always just "sons" meaning that we always recursively traverse all the sons of the current node by default.
See [adding a traversal](./adding-a-traversal.md) for a full tutorial on adding a traversal to the compiler and what you should put inside the header file specified in the `include` attribute of the `<traversal>` tag.
Clone repository
  • Styleguide
  • concepts
    • Deprecated Modules
    • Globals
    • Intermediate Code Macros
    • Named Tuples
    • Overloading
    • Preprocessor
    • Primitive functions
    • Runtime Representations
    • input stdin
    • phm xt
    • ref counting methods
    • type famlies
    • type patterns
  • error messages
    • Anthropomorphic error essages
View All Pages