# Edge Language > A domain specific language for the Ethereum Virtual Machine ## Introduction Edge is a domain-specific language for the Ethereum Virtual Machine (EVM): high-level, strongly statically typed, and designed to make smart contract development more expressive without giving up control over execution. It is the brainchild of [jtriley](https://github.com/jtriley-eth), to whom the current specifications are attributable. The Edge documentation is organized into the following sections: * [Specifications](/specs/overview): An in-depth blueprint to the Edge language, including syntax showcase examples. * [Compiler](/compiler/overview): The inner workings of the Rust compiler implementation. * [Tooling](/tools/overview): An overview of Edge tooling and other developer utilities. * [Contributing](/contributing/contributing): Repository contribution guidelines. * [Contact](/contact/contact): Methods of contacting the maintainers. ## Tooling overview Edge's tooling is centered around the compiler CLI, the installer, and the language server. ### `edgec` `edgec` is the main command-line entry point for the compiler. It can compile contracts directly to EVM bytecode or stop after earlier phases for inspection. ```bash edgec examples/counter.edge edgec lex examples/counter.edge edgec parse examples/counter.edge edgec check examples/counter.edge edgec lsp ``` #### Subcommands | Subcommand | Description | | -------------- | ------------------------------------------- | | `lex ` | Lex file and print tokens (debug output) | | `parse ` | Parse file and print AST (debug output) | | `check ` | Compile for errors without producing output | | `lsp` | Start the LSP server over stdin/stdout | #### Compiler flags | Flag / Option | Short | Default | Description | | ---------------- | ----- | ---------- | ------------------------------------------------------------------------------ | | `` | — | — | Source file to compile (outputs hex bytecode to stdout) | | `--output` | `-o` | — | Write raw bytecode bytes to file (requires FILE) | | `--emit ` | — | `bytecode` | `tokens` / `ast` / `ir` / `pretty-ir` / `asm` / `bytecode` | | `-O ` | — | `0` | Optimization level (0–3) | | `--optimize-for` | — | `gas` | Optimization target: `gas` or `size` | | `--std-path` | — | — | Filesystem stdlib path (also: `EDGE_STD_PATH` env var) | | `--verbose` | `-v` | — | Verbosity; repeat for more: `-v`=WARN, `-vv`=INFO, `-vvv`=DEBUG, `-vvvv`=TRACE | | `--version` | — | — | Print version and exit | | `--help` | `-h` | — | Print help | #### Verbosity levels | `-v` count | Log level | Notes | | ---------- | --------- | ----------------------------------------- | | 0 | (off) | No tracing output | | 1 | `WARN` | | | 2 | `INFO` | | | 3 | `DEBUG` | | | 4+ | `TRACE` | egglog also set to TRACE (otherwise WARN) | #### Emit output behavior | Emit | Stdout | File (`-o`) | | ----------- | ------------------------- | ----------- | | `tokens` | Debug print of each Token | — | | `ast` | Debug print of Program | — | | `ir` | S-expression format | — | | `pretty-ir` | Pretty-printed IR | — | | `asm` | Labeled block assembly | — | | `bytecode` | `0x` string | Raw bytes | ### `edgeup` `edgeup` is the Edge toolchain manager. Install it first, then use it to install and manage `edgec` versions. ```bash # 1. Install edgeup curl -fsSL https://raw.githubusercontent.com/refcell/edge-rs/main/etc/install.sh | sh # 2. Install the Edge compiler edgeup install ``` **Supported platforms:** Linux x86\_64, macOS x86\_64, macOS arm64. Windows is not supported. `edgeup` detects your shell (bash, zsh, or fish) and appends `~/.edgeup/bin` to your `PATH` in the appropriate RC file (`~/.bashrc`, `~/.zshrc`, or `~/.config/fish/config.fish`). Restart your shell or run the printed `source` command after installation. #### Directory layout ``` ~/.edgeup/ bin/ edgec ← symlink → versions/{tag}/edgec versions/ v0.1.6/ edgec ← actual binary (chmod 755) v0.1.7/ edgec ``` #### `edgeup` subcommands | Subcommand | Description | | --------------------- | ----------------------------------------------------- | | `install [VERSION]` | Download and install Edge toolchain (default: latest) | | `update` | Alias for `install` — installs latest version | | `list` | List all installed versions | | `use ` | Switch active version (updates symlink) | | `uninstall [VERSION]` | Remove a version, or all if omitted | | `self-update` | Update `edgeup` itself to the latest release | | `version` | Print the `edgeup` version | ### LSP Edge ships an LSP server for editor integration: ```bash edgec lsp ``` The server communicates over stdin/stdout and provides parse and type-check diagnostics with precise source spans. :::warning Hover, completions, and go-to-definition are not yet implemented. The LSP currently only reports parse errors and type-check errors. ::: ### Repository utilities The repository ships a [`Justfile`](https://github.com/refcell/edge-rs/blob/main/Justfile) with common contributor workflows: | Command | Description | | --------------------- | -------------------------------------------- | | `just build` | Build all crates (`cargo build --workspace`) | | `just test` | Run all tests (`cargo test --workspace`) | | `just lint` | Run all lints (format, clippy, deny, docs) | | `just e2e` | Run end-to-end tests | | `just bench` | Run benchmarks | | `just docs` | Serve the Vocs documentation site locally | | `just docs-build` | Build the documentation site | | `just check-examples` | Parse all example contracts | | `just check-stdlib` | Parse all stdlib contracts | ### Reference material For runnable contracts and language samples, see the [`examples/`](https://github.com/refcell/edge-rs/tree/main/examples) and [`std/`](https://github.com/refcell/edge-rs/tree/main/std) directories. ## Built-in Built-in functionality refers to features available during compilation that are otherwise inaccessible through the language's regular syntax. The parser accepts any `@identifier` form without validation; unknown builtin names are caught during IR lowering (semantic analysis), not parsing. ### EVM environment builtins These builtins read EVM execution context values. Each compiles to a single `EnvRead` IR node and a corresponding EVM opcode: | Builtin | EVM opcode | Returns | | ----------------- | ---------------- | ------------------------------------- | | `@caller` | `CALLER` | Address of the direct caller | | `@callvalue` | `CALLVALUE` | Wei sent with the call | | `@value` | `CALLVALUE` | Alias for `@callvalue` | | `@calldatasize` | `CALLDATASIZE` | Size of calldata in bytes | | `@origin` | `ORIGIN` | Transaction originator address | | `@gasprice` | `GASPRICE` | Gas price of the transaction | | `@coinbase` | `COINBASE` | Current block's beneficiary address | | `@timestamp` | `TIMESTAMP` | Current block's timestamp | | `@number` | `NUMBER` | Current block number | | `@gaslimit` | `GASLIMIT` | Current block's gas limit | | `@chainid` | `CHAINID` | Chain ID (EIP-155) | | `@selfbalance` | `SELFBALANCE` | Balance of the executing contract | | `@basefee` | `BASEFEE` | Current block's base fee (EIP-1559) | | `@gas` | `GAS` | Remaining gas | | `@address` | `ADDRESS` | Address of the executing contract | | `@codesize` | `CODESIZE` | Size of the executing contract's code | | `@returndatasize` | `RETURNDATASIZE` | Size of the last call's return data | All EVM environment builtins are zero-argument. Parentheses are optional: both `@caller` and `@caller()` are valid. Arguments passed to them are currently ignored. ```edge fn checkCaller() { if @caller == 0x0000000000000000000000000000000000000000 { revert(); } } ``` ### Comptime builtins These builtins execute at compile time and are used for type introspection, compile-time assertions, and code generation. #### Types ```edge type PrimitiveType; type StructType; type UnionType; type FunctionType; type TypeInfo = | Primitive(PrimitiveType) | Struct(StructType) | Union(UnionType) | Function(FunctionType); ``` :::note `TypeInfo` does not include an `Enum` variant. In Edge, enums are a subset of union types (unions where no variant carries data). They are represented as `Union(UnionType)` in the type system — there is no distinct enum concept at the AST or IR level. ::: ```edge type HardFork = | Frontier | Homestead | Dao | Tangerine | SpuriousDragon | Byzantium | Constantinople | Petersburg | Istanbul | MuirGlacier | Berlin | London | ArrowGlacier | GrayGlacier | Paris | Shanghai | Cancun; ``` #### Functions ##### `@typeInfo` ```edge @typeInfo(typeSignature) -> TypeInfo; ``` Takes a single type signature as an argument and returns a `TypeInfo` union describing the kind of the type. ##### `@bitsize` ```edge @bitsize(typeSignature) -> u256; ``` Takes a single type signature as an argument and returns the bitsize of the underlying type. ##### `@fields` ```edge @fields(structType) -> [T, N]; ``` Takes a single `StructType` as an argument and returns an array of type signatures of length N, where N is the number of fields in the struct. ##### `@compilerError` ```edge @compilerError(errorMessage); ``` Emits a compile-time error with the provided message. Useful in `comptime` branches to enforce invariants. ##### `@hardFork` ```edge @hardFork() -> HardFork; ``` Returns the target hard fork from the compiler configuration as a `HardFork` union value. ##### `@bytecode` ```edge @bytecode(T -> U) -> Bytes; ``` Takes an arbitrary function and returns its compiled bytecode as a `Bytes` value. `Bytes` is an opaque compiler-internal type representing a sequence of raw bytes; it is not a user-definable Edge type. ## Inline assembly Edge supports inline EVM assembly for low-level control when the high-level language abstractions are insufficient. ### Opcodes The following EVM opcodes are accepted in inline assembly blocks. Opcode names are case-insensitive. **Arithmetic and logic:** `stop`, `add`, `mul`, `sub`, `div`, `sdiv`, `mod`, `smod`, `addmod`, `mulmod`, `exp`, `signextend`, `lt`, `gt`, `slt`, `sgt`, `eq`, `iszero`, `and`, `or`, `xor`, `not`, `byte`, `shl`, `shr`, `sar` **Cryptographic:** `keccak256` (alias: `sha3`) **Environment:** `address`, `balance`, `origin`, `caller`, `callvalue`, `calldataload`, `calldatasize`, `calldatacopy`, `codesize`, `codecopy`, `gasprice`, `extcodesize`, `extcodecopy`, `returndatasize`, `returndatacopy`, `extcodehash` **Block:** `blockhash`, `coinbase`, `timestamp`, `number`, `prevrandao` (alias: `difficulty`), `gaslimit`, `chainid`, `selfbalance`, `basefee`, `blobhash`, `blobbasefee` **Stack, memory, and storage:** `pop`, `mload`, `mstore`, `mstore8`, `sload`, `sstore`, `tload`, `tstore`, `mcopy` **Flow control:** `jump`, `jumpi`, `pc`, `msize`, `gas`, `jumpdest` **Push:** `push0`, `push1` through `push32` **Duplication:** `dup1` through `dup16` **Exchange:** `swap1` through `swap16` **Logging:** `log0`, `log1`, `log2`, `log3`, `log4` **System:** `create`, `call`, `callcode`, `return`, `delegatecall`, `create2`, `staticcall`, `revert`, `invalid`, `selfdestruct` In addition to mnemonics, numeric literals and identifiers are accepted (see grammar below). #### Grammar ``` ::= | | ; ``` Where `` is any of the opcodes listed above. ### Inline assembly block ``` ::= | "_" ; ::= "asm" "(" [ ("," )* [","]] ")" ["->" "(" [ ("," )* [","]] ")"] "{" ()* "}" ; ``` The `` consists of the `asm` keyword, followed by a parenthesized, comma-separated list of input expressions, an optional `-> (...)` clause listing output names, and a code block containing opcodes. The entire `-> (...)` clause may be omitted when no outputs are needed. ### Semantics Arguments are ordered such that the state of the stack at the start of the block, top to bottom, is the list of arguments, left to right. Identifiers in the output list are ordered such that the state of the stack at the end of the assembly block, top to bottom, is the list of outputs, left to right. ```edge asm (1, 2, 3) -> (a) { // stack: [1, 2, 3] add // [3, 3] mul // [9] } ``` #### Numeric literals Inside the assembly block, numeric literals are implicitly converted into `PUSH{N}` instructions. Literals are encoded in the smallest `N` by value, except that leading zeros in hex literals are preserved. For example, `0x0000` becomes `PUSH2 0x0000` to allow for bytecode padding. #### Identifiers Identifiers in the assembly body can be: * **Variables** — resolved to their stack position (scheduled by the compiler). Only compile-time constants and stack-allocated variables are supported; memory-backed variables must be passed as input arguments. * **Constants** — replaced with their `PUSH{N}` encoding, same as numeric literals. * **Opcode names** — treated as the corresponding EVM instruction (case-insensitive). #### Outputs * **Named outputs** (e.g., `a`) are bound as local variables accessible in subsequent code. * **Discarded outputs** (`_`) are popped from the stack. * **Multiple outputs** (N > 1) are stored to sequential memory slots internally and bound as `LetBind` variables via `MLOAD`. #### IR representation Inline assembly compiles to an `InlineAsm(inputs, hex_bytecode, num_outputs)` IR node. This node is opaque to the egglog optimizer — it passes through equality saturation unchanged. :::note If the input arguments contain local variables, the stack scheduling required to construct the pre-assembly stack state may be unprofitable for small assembly blocks. Consider passing values as immediate literals when possible. ::: ## Specifications ### All Edge, no drag. This document defines Edge, a domain-specific language for the Ethereum Virtual Machine (EVM). Edge is a high-level, strongly statically typed, multi-paradigm language. It provides: * A thin layer of abstraction over the EVM's instruction set architecture (ISA). * An extensible polymorphic type system with subtyping. * First-class support for modules and code reuse. * Compile-time code execution to fine-tune the compiler's input. Edge's syntax is similar to Rust and Zig where intuitive, however, the language is not designed to be a general-purpose language with EVM features as an afterthought. Rather, it extends the EVM instruction set with a reasonable type system and syntax sugar over universally understood programming constructs. #### Notation This specification uses a grammar similar to Extended Backus-Naur Form (EBNF) with the following rules: * Non-terminal tokens are wrapped in angle brackets ``. * Terminal tokens are wrapped in double quotes `"const"`. * Optional items are wrapped in brackets `["mut"]`. * Sequences of zero or more items are wrapped in parentheses and suffixed with a star `("," )*`. * Sequences of one or more items are wrapped in parentheses and suffixed with a plus `()+`. In contrast to EBNF, all items are non-atomic: arbitrary whitespace characters (`\n`, `\t`, `\r`) may surround all tokens unless wrapped with curly braces `{ "0x" ()* }`. Common abbreviations: * `ident` — identifier * `expr` — expression * `stmt` — statement #### Disambiguation ##### Return vs return The word "return" refers to two different behaviors: returned values from expressions and the halting return opcode. When "return" is used, this refers to the values returned from expressions — the values left on the stack, if any. When "halting return" is used, this refers to the EVM opcode `RETURN` that halts execution and returns a value from a slice of memory to the caller of the current execution context. ## Comments ```text ::= "//" (!"\n" )* "\n" ; ::= "/*" (!"*/" | )* "*/" ; ::= "///" (!"\n" )* "\n" ; ::= "//!" (!"\n" )* "\n" ; ``` The `` is a single-line comment, ignored by the parser. The `` is a multi-line comment, ignored by the parser. Block comments may be nested; the lexer tracks depth to find the matching close (`/* /* inner */ outer still open */` is valid). The `` is a developer documentation comment, treated as documentation for the immediately following item. The `` is a developer documentation comment, treated as documentation for the module in which it is defined. Developer documentation comments are treated as GitHub-flavored markdown. :::note Unlike regular comments, `DocComment` tokens (`///` and `//!`) are **retained** by the parser and associated with the item or module they document. Tooling that consumes the parse tree (e.g. doc generators) will find doc comments there; plain `//` and `/* */` comments are dropped before the parser ever runs. ::: ## Expressions ```text ::= | | | | | | | | | | | | | | | | | | | | | "(" ")" ; ``` An `` is any construct that produces a value. ### Binary operations ```text ::= ; ``` Binary operations use an infixed operator between two sub-expressions. See [operators](./operators) for the full operator table and precedence. ### Unary operations ```text ::= ; ``` Prefix unary operators: `-` (negation), `~` (bitwise NOT), `!` (logical NOT). ### Ternary ```text ::= "?" ":" ; ``` The ternary operator is right-associative. Both branches are full expressions. ### Literals The `` non-terminal is defined in [Literals](/specs/syntax/compile/literals). In expression context, the following additional details apply: * Integer literals support `_` as a visual separator (e.g. `1_000_000`). Type suffixes (e.g. `42u8`, `256u16`) are recognized by the lexer but currently silently discarded — the type is inferred from context or defaults to `u256`. * String literals use either double or single quotes. Supported escape sequences: `\n`, `\t`, `\r`, `\\`, `\"`, `\'`. * Hex and binary literals produce byte-array values (`Lit::Hex` and `Lit::Bin` respectively). ### Function calls ```text ::= ["::" "<" ("," )* ">"] "(" [ ("," )*] ")" ; ``` Functions are called with parenthesized argument lists. Turbofish syntax (`::`) provides explicit type arguments. ### Field and index access ```text ::= "." ; ::= "." + ; ::= "[" [":" ] "]" ; ``` Dot access resolves struct fields by name or tuple fields by numeric index. Array indexing supports both single-element access (`arr[i]`) and slicing (`arr[start:end]`). ### Instantiation ```text ::= [] "{" ":" ("," ":" )* "}" ; ::= [] "(" [ ("," )*] ")" ; ::= [] "[" [ ("," )*] "]" ; ::= "::" "(" [ ("," )*] ")" ; ``` Struct, tuple, and array instantiations may be prefixed with a `` annotation. Union variants are instantiated with path syntax (`Type::Variant(args)`). ### Pattern matching expression ```text ::= "matches" "::" ["(" ("," )* ")"] ; ``` The `matches` keyword tests whether an expression matches a union variant, optionally binding the variant's payload to identifiers. Commonly used in `if` conditions. ### Arrow functions ```text ::= ( | "(" [ ("," )*] ")") "=>" ; ``` Arrow functions (closures) take identifier parameters and a brace-delimited body. ### Compile-time expressions ```text ::= "comptime" "(" ")" ; ``` Wraps an expression for compile-time evaluation. ### Path expressions ```text ::= ("::" )+ ; ``` Double-colon-separated identifier paths, used for module paths and union variant access. ### Builtin calls ```text ::= "@" ["(" [ ("," )*] ")"] ; ``` The `@` sigil invokes compiler builtins. The parser accepts any identifier after `@`; validation of builtin names happens in later compiler stages. ### Assignment expression ```text ::= "=" ; ``` Assignment at the expression level (precedence 0, right-associative). Produces `Expr::Assign`. ### Inline assembly ```text ::= "asm" "(" [ ("," )*] ")" ["->" "(" [ ("," )*] ")"] "{" * "}" ; ::= | | ; ``` Inline assembly provides direct access to EVM opcodes. Inputs are pushed onto the stack (leftmost = top of stack). Outputs are optionally bound to identifiers; use `_` to discard a stack value. ## Identifiers ```text ::= ( | "_") ( | | "_")* ; ``` Dependencies: * `` * `` The `` is a C-style identifier, beginning with an alphabetic character or underscore, followed by zero or more alphanumeric or underscore characters. ### Reserved names Identifiers share their lexical space with keywords, primitive type names, and boolean literals. The lexer resolves ambiguity in the following priority order: 1. **EVM primitive type** — `u8`–`u256`, `i8`–`i256`, `b1`–`b32`, `addr`, `bool`, `bit` 2. **Keyword** — e.g. `let`, `fn`, `contract`, `mod`, `use`, `mut`, `pub`, `Self`, … 3. **Boolean literal** — `true`, `false` 4. **Identifier** — everything else Any string that matches a higher-priority rule will **never** produce an `Ident` token. In particular, `Self` (capital S) is a reserved keyword and cannot be used as a plain identifier. ### Special identifiers The parser accepts `self` and `super` as identifiers in certain contexts (e.g. module paths, method receivers). These are keywords but are returned as identifier nodes with the names `"self"` and `"super"` respectively. ## Data locations ```text ::= "&s" ; ::= "&t" ; ::= "&m" ; ::= "&cd" ; ::= "&rd" ; ::= "&ic" ; ::= "&ec" ; ::= | | | | | | | ; ``` The `` is a pointer annotation indicating which EVM data region a value resides in. Edge defines seven distinct location annotations. This is a divergence from general-purpose programming languages to more accurately represent the EVM execution environment. * `&s` — persistent storage * `&t` — transient storage (EIP-1153) * `&m` — memory * `&cd` — calldata * `&rd` — returndata * `&ic` — internal (local) code * `&ec` — external code :::note The `&` character is heavily overloaded in the lexer. It checks for data-location sigils first (`&s`, `&t`, `&m`, `&cd`, `&rd`, `&ic`, `&ec`), then `&=`, then `&&`, and finally falls back to bitwise AND. ::: ### Semantics Data locations can be grouped into two broad categories: buffers and maps. #### Maps Persistent and transient storage are part of the map category — 256-bit keys map to 256-bit values. Both may be written or read one word at a time. #### Buffers Memory, calldata, returndata, internal code, and external code are all linear data buffers. All can be either read to the stack or copied into memory, but only memory can be written or copied to. | Name | Read to stack | Copy to memory | Write | | ------------- | ------------- | -------------- | ----- | | memory | yes | yes | yes | | calldata | yes | yes | no | | returndata | no | yes | no | | internal code | no | yes | no | | external code | no | yes | no | #### Transitions Transitioning from map to memory buffer is performed by loading each element from the map to the stack and storing each stack item in memory O(N). Transitioning from memory buffer to a map is performed by loading each element from memory to the stack and storing each stack item in the map O(N). Transitioning from any other buffer to a map is performed by copying the buffer's data into memory then transitioning the data from memory into the map O(N+1). #### Pointer bit sizes Pointers to different data locations consist of different sizes based on the properties of that data location. In-depth semantics of each data location are specified in the type system documents. | Location | Bit size | Reason | | ------------------ | -------- | ------------------------------------------------------- | | persistent storage | 256 | Storage is a 256-bit key–value hashmap | | transient storage | 256 | Transient storage is a 256-bit key–value hashmap | | memory | 32 | Theoretical maximum memory size does not approach 2³² | | calldata | 32 | Theoretical maximum calldata size does not approach 2³² | | returndata | 32 | Maximum returndata size equals maximum memory size | | internal code | 16 | Code size is less than 0xFFFF | | external code | 176 | Contains 160-bit address and 16-bit code pointer | ## Modules ### Declaration ```text ::= ["pub"] "mod" (";" | "{" [] * "}") ; ``` Dependencies: * `` * `` * `` The `` is composed of an optional `pub` prefix, the `mod` keyword followed by an identifier, then either a semicolon (external/bodyless form) or a body delimited by curly braces. The bodyless form (`mod name;`) declares an external module whose content lives in a file with a matching name. ### Import ```text ::= "*" | ( "::" ( | "{" ("," )* [","] "}" | ) )* ; ::= ["pub"] "use" ["::" ] ";" ; ``` Dependencies: * `` The `` is a recursive production, containing either a wildcard (`*`), another module import item, or a comma-separated list of module import items delimited by curly braces. The `` is an optional `pub` annotation followed by `use`, the root module name, then optional path segments. :::warning Neither `pub mod` nor `pub use` is currently implemented. The parser's `parse_pub()` function only dispatches to `fn` and `contract` declarations, so the `pub` modifier before `mod` or `use` is silently ignored. Use plain `mod` and `use` for all module declarations and imports. ::: ### Semantics Namespace semantics in modules are defined in the namespace document. Visibility semantics in modules are defined in the visibility document. Modules can contain developer documentation, declarations, and assignments. If the module contains developer documentation, it must be the first item in the module. This is for readability. Files are implicitly modules with a name equivalent to the file name. Type, function, ABI, and contract declarations must be assigned in the same module. However, traits are declared without assignment and submodules may be declared without a block only if there is a file with a matching name. The `super` identifier represents the direct parent module of the module in which it is invoked. ## Operators Operators are syntax sugar over built-in functions. Operator overloading is disallowed. ### Binary operators ```text ::= | "+" | "-" | "*" | "/" | "%" | "**" ; ::= | "&" | "|" | "^" | "<<" | ">>" ; ::= | "==" | "!=" | "<" | "<=" | ">" | ">=" ; ::= | "&&" | "||" ; ::= | "+=" | "-=" | "*=" | "/=" | "%=" | "**=" | "&=" | "|=" | "^=" | "<<=" | ">>=" ; ::= | | | | | ; ``` ### Unary operators ```text ::= "-" ; ::= "~" ; ::= "!" ; ::= | | | ; ``` ### Precedence The expression parser uses precedence climbing (Pratt parsing). Lower numbers bind less tightly: | Precedence | Operators | Associativity | | ---------- | ----------------- | ------------- | | 0 | `=` | Right | | 1 | `\|\|` | Left | | 2 | `&&` | Left | | 3 | `==` `!=` | Left | | 4 | `<` `>` `<=` `>=` | Left | | 5 | `\|` (bitwise OR) | Left | | 6 | `^` (bitwise XOR) | Left | | 7 | `&` (bitwise AND) | Left | | 8 | `<<` `>>` | Left | | 9 | `+` `-` | Left | | 10 | `*` `/` `%` | Left | | 11 | `**` | Right | The ternary operator (`? :`) is parsed after the Pratt binary expression, with right-to-left associativity. Compound assignment operators (`+=`, `-=`, etc.) are parsed as binary operations and produce `Expr::Binary` nodes with the corresponding `BinOp` variant. ### Semantics | Operator | Types | Behavior | Panic case | | ------------ | -------- | ---------------------- | -------------- | | `+` | integers | checked addition | overflow | | `-` (binary) | integers | checked subtraction | underflow | | `-` (unary) | integers | checked negation | overflow | | `*` | integers | checked multiplication | overflow | | `/` | integers | checked division | divide by zero | | `%` | integers | checked modulus | divide by zero | | `**` | integers | exponentiation | — | | `&` | integers | bitwise AND | — | | `\|` | integers | bitwise OR | — | | `~` | integers | bitwise NOT | — | | `^` | integers | bitwise XOR | — | | `>>` | integers | bitwise shift right | — | | `<<` | integers | bitwise shift left | — | | `==` | any | equality | — | | `!=` | any | inequality | — | | `&&` | booleans | logical AND | — | | `\|\|` | booleans | logical OR | — | | `!` | booleans | logical NOT | — | | `>` | integers | greater than | — | | `>=` | integers | greater than or equal | — | | `<` | integers | less than | — | | `<=` | integers | less than or equal | — | ## Syntax Conceptually, all EVM contracts are single-entry point executables and at compile time, Edge programs are no different. Other languages have used primarily the contract-is-an-object paradigm, mapping fields to storage layouts and methods to "external functions" that may read and write the storage. Inheritance enables interface constraints, code reuse, and a reasonable model for message passing that relates to the EVM external call model. However, this is limited in scope. Conceptually, the contract object paradigm groups stateful data and functionality, limiting the deployability to the product type. Extending the deployability to arbitrary data types allows for contracts to be functions, type unions, product types, and more. While most of these are not particularly useful, this simplifies the type system as well as opens the design space to new contract paradigms. The core syntax of Edge is derived from commonly used patterns in modern programming. Functions, branches, and loops are largely intuitive for engineers with experience in C, Rust, Javascript, etc. Parametric polymorphism uses syntax similar to Rust and Typescript. Compiler built-in functions and "comptime" constructs follow the syntax of Zig. ### Top-level items An Edge source file is a sequence of top-level declarations. The following item kinds are supported at the top level: | Keyword | Form | Purpose | | ---------- | ------------------------------- | ------------------------- | | `contract` | `contract Name { … }` | Contract definition | | `fn` | `fn name(…) [-> T] { … }` | Free function | | `const` | `const NAME[: T] = expr;` | Compile-time constant | | `let` | `let [mut] name[: T] [= expr];` | Variable declaration | | `type` | `type Name[] = …;` | Type alias or union type | | `trait` | `trait Name[] { … }` | Trait definition | | `impl` | `impl Type[:Trait] { … }` | Implementation block | | `abi` | `abi Name { … }` | ABI interface declaration | | `event` | `event Name(…);` | Event declaration | | `mod` | `mod name;` / `mod name { … }` | Module declaration | | `use` | `use root::path;` | Module import | Functions and declarations may be prefixed with `pub` (public visibility). See the sub-pages for the full grammar of each item kind. ### Keywords Edge reserves the following 33 keywords: **Declaration:** `contract`, `type`, `const`, `fn`, `packed`, `trait`, `impl`, `mod`, `use`, `abi`, `event` **Modifiers:** `pub`, `mut`, `ext`, `indexed`, `anon`, `comptime` **Control flow:** `return`, `if`, `else`, `match`, `matches`, `for`, `while`, `loop`, `do`, `break`, `continue` **Variables / scope:** `let`, `Self`, `super` **Side effects / assembly:** `emit`, `asm` ## Statements ```text ::= | | | | | | | | | | | | | | | | | | | | | | | | | | | | ";" ; ``` A `` is a language construct that does not itself produce a value (unlike an expression). The top-level parse loop collects statements until EOF. ### Control flow statements ```text ::= "return" [] ";" ; ::= "break" ";" ; ::= "continue" ";" ; ``` ### Code blocks ```text ::= "{" ( | ";")* [] "}" ; ``` A code block is a brace-delimited sequence of statements. The final item may be a bare expression without a trailing semicolon (tail expression), which becomes the block's value — similar to Rust. :::note At the AST level, tail expressions are wrapped as `BlockItem::Stmt(Stmt::Expr(…))`. There is no distinct AST node for tail expressions; the semantic difference is inferred from position. ::: ### If / else ```text ::= "if" "(" ")" ("else" "if" "(" ")" )* ["else" ] ; ::= "if" "matches" "::" ["(" ("," )* ")"] ; ``` The standard `if`/`else if`/`else` chain uses parenthesized conditions and brace-delimited bodies. The `if … matches` form combines a conditional with union pattern destructuring. :::note The `Stmt::IfMatch` variant exists in the AST, but the current parser produces `Stmt::IfElse` with an `Expr::PatternMatch` as the condition instead. The dedicated variant is reserved for future use. ::: ### Match ```text ::= "match" "{" ("," )* [","] "}" ; ::= "=>" ( | | "return" []) ; ::= | | "_" ; ::= "::" ["(" ("," )* ")"] ; ``` Match arms accept a code block, a bare expression, or a `return` statement as the body. At the AST level, all arm bodies are normalized to `CodeBlock`. ### Loops ```text ::= "loop" ; ::= "for" "(" [ | ] ";" [] ";" [ | ] ")" ; ::= "while" "(" ")" ; ::= "do" "while" "(" ")" ";" ; ::= "{" ( | ";" | "break" ";" | "continue" ";")* "}" ; ``` The `` uses a separate AST type (`LoopBlock` / `LoopItem`) from regular code blocks. `break` and `continue` have dedicated `LoopItem` variants in addition to the `Stmt::Break` / `Stmt::Continue` variants used outside loops. :::warning `break` and `continue` are parsed but not yet implemented in the compiler backend. They will silently compile as if the statement were absent. ::: ### Contracts ```text ::= "contract" "{" * "}" ; ::= | "let" ":" ";" | "const" [":" ] "=" ";" | ["pub"] ["ext"] ["mut"] "fn" "(" [] ")" ["->" ] ; ::= "impl" [":" ] "{" * "}" ; ``` Contract bodies contain storage field declarations (`let`), constants, and function definitions. The `impl` block provides the implementation for a contract, optionally satisfying an ABI interface. ### Functions ```text ::= ["pub"] ["ext"] ["mut"] "fn" ["<" ("," )* ">"] "(" [] ")" ["->" ] ; ::= ":" ("," ":" )* ; ::= | "(" ("," )* ")" ; ::= [":" ("&" )*] ; ``` Functions support generic type parameters with trait bounds (``). The `self` keyword may appear as the first parameter without a type annotation (implicit `Self` type). Return types can be a single type or a tuple. Visibility and modifier flags: * `pub` — public visibility * `ext` — external ABI entry point * `mut` — may mutate contract state ### Type aliases ```text ::= "type" ["<" ("," )* ">"] "=" ";" ; ::= | ; ::= ["|"] ("|" )+ ; ::= ["(" ")"] ; ``` Type aliases bind a name to a type signature or a union type. Union types define sum types with named variants that optionally carry a payload. ### Traits and implementations ```text ::= "trait" ["<" ("," )* ">"] [":" ("+" )*] "{" * "}" ; ::= | "fn" "(" [] ")" ["->" ] (";" | ) | "const" ":" ["=" ] ";" | "type" ["=" ] ";" ; ::= "impl" ["<" ("," )* ">"] [":" ["<" ("," )* ">"]] "{" * "}" ; ::= | ["pub"] "fn" "(" [] ")" ["->" ] | ["pub"] "const" ":" "=" ";" | ["pub"] "type" "=" ";" ; ``` Traits declare abstract interfaces with optional default implementations. Supertraits use `+` syntax: `trait Ordered: Comparable + Displayable { … }`. Implementation blocks provide concrete implementations for types, optionally satisfying a trait: `impl Type : Trait { … }`. ### ABI declarations ```text ::= "abi" [":" ("+" )*] "{" * "}" ; ::= ["mut"] "fn" "(" [] ")" ["->" ] ";" ; ``` ABI declarations define external interfaces. They are similar to traits but specific to the EVM calling convention. Superabis are supported with the same `+` syntax as supertraits. ### Events and emit ```text ::= ["anon"] "event" "(" [ ("," )*] ")" ";" ; ::= ["indexed"] ":" ; ::= "emit" "(" [ ("," )*] ")" ";" ; ``` Events declare log schemas. Fields may be marked `indexed` for topic-based filtering. The `anon` modifier creates an anonymous event (no topic0 selector). The `emit` statement fires an event with the given arguments. :::warning Anonymous events (`anon event`) are parsed but the `is_anon` flag is always set to `false` by the current parser. This feature is reserved for future use. ::: ### Compile-time constructs ```text ::= "comptime" ; ::= "comptime" "fn" "(" [] ")" ["->" ] ; ``` `comptime` can prefix a statement for compile-time conditional compilation, or prefix a function declaration to define a compile-time function. :::warning Compile-time constructs are parsed but have limited backend support. Only constant expression evaluation (integer arithmetic and bitwise operations) is currently implemented. ::: ## Variables ### Declaration ```text ::= "let" ["mut"] [":" ] ["=" ] ";" ; ``` Dependencies: * `` * `` * `` The `` marks the declaration of a variable. The optional `mut` keyword marks the variable as mutable. The variable may optionally be given a type annotation and/or be assigned at the point of declaration. :::warning The `mut` keyword is parsed but not yet tracked in the AST or enforced by the compiler. All variables are currently mutable regardless of the `mut` annotation. ::: ### Constants ```text ::= "const" [":" ] "=" ";" ; ``` Dependencies: * `` * `` * `` The `` declares a compile-time constant. Unlike `let`, constants require an initializer and are immutable. By convention, constant names are written in `UPPER_SNAKE_CASE`. ### Assignment ```text ::= "=" ";" ; ``` Dependencies: * `` The `` assigns a value to a target expression. The left-hand side is a full expression, supporting simple identifiers as well as field access (`a.b = x`), array indexing (`arr[i] = x`), and other assignable forms. In addition to simple assignment, Edge supports the following compound assignment operators that combine an arithmetic or bitwise operation with assignment: | Operator | Meaning | | -------- | ----------------------- | | `+=` | add and assign | | `-=` | subtract and assign | | `*=` | multiply and assign | | `/=` | divide and assign | | `%=` | modulo and assign | | `**=` | exponentiate and assign | | `&=` | bitwise AND and assign | | `\|=` | bitwise OR and assign | | `^=` | bitwise XOR and assign | | `>>=` | right-shift and assign | | `<<=` | left-shift and assign | :::note Assignment also exists as an expression (`Expr::Assign`) at precedence level 0 in the Pratt parser. The statement form `Stmt::VarAssign` and the expression form `Expr::Assign` both accept a full expression on the left-hand side. ::: ## ABI The application binary interface is both a construct to generate a JSON ABI by the compiler and a subtyping construct for contract objects. ### Declaration ```text ::= ["mut"] "fn" "(" [( ":" ) ("," ":" )* [","]] ")" ["->" ("(" ("," )* [","] ")" | )] ";" ; ::= "abi" [":" ("+" )*] "{" * "}" ; ``` Dependencies: * `` * `` The `` maps to `AbiDecl` in the AST: * `name: Ident` * `superabis: Vec` — parent ABIs for subtyping * `functions: Vec` Each `` maps to `AbiFnDecl`: * `name: Ident` * `params: Vec<(Ident, TypeSig)>` * `returns: Vec` * `is_mut: bool` :::note Unlike regular `FnDecl`, `AbiFnDecl` does **not** have `is_pub` or `is_ext` fields. ABI functions are implicitly external interface declarations — `pub` and `ext` keywords are not valid inside an ABI block. ::: The optional `+`-separated list of identifiers after `:` represents parent ABIs, enabling ABI subtyping. The `+` separator matches the supertrait syntax on trait declarations. :::warning Superabi parsing is not yet implemented in the parser. The `superabis` field in the AST is always an empty `Vec`. The BNF above reflects the planned syntax. ::: ### Examples ```edge abi IERC20 { fn totalSupply() -> u256; fn balanceOf(owner: addr) -> u256; mut fn transfer(to: addr, amount: u256) -> bool; mut fn approve(spender: addr, amount: u256) -> bool; } abi IERC20Metadata : IERC20 { fn name() -> u256; fn symbol() -> u256; fn decimals() -> u8; } ``` ### Semantics The optional `mut` keyword indicates whether the function will mutate the state of the smart contract or the EVM. This allows contracts to determine whether to use the `call` or `staticcall` instruction when interfacing with a contract conforming to the given ABI. :::warning ABI subtyping semantics are still being finalized. It has not yet been decided whether traits fully subsume this use case. ::: ## Array types The array type is a fixed-length list of elements of a single type. ### Signature ```text ::= ["packed"] "[" ";" "]" ; ``` Dependencies: * `` * `` The `` consists of an optional `packed` keyword, a type signature and a size expression separated by a semicolon, delimited by brackets. It maps to `TypeSig::Array` or `TypeSig::PackedArray` depending on the `packed` prefix. :::warning Packed array IR lowering is not yet implemented. The `packed` keyword is accepted by the parser but currently has no effect on code generation. ::: ### Instantiation ```text ::= [] "[" [ ("," )* [","]] "]" ; ``` Dependencies: * `` * `` The `` is an optional data location annotation followed by a comma-separated list of expressions delimited by brackets. It produces `Expr::ArrayInstantiation(location, elements, span)`. ### Element access ```text ::= "[" [":" ] "]" ; ``` Dependencies: * `` Array element access is a postfix operation on any expression. A single index returns one element. When a second expression follows separated by `:`, a slice is returned. Both forms produce `Expr::ArrayIndex(expr, index, end, span)` where `end` is `Some(...)` for slices. ### Examples ```edge type TwoElementIntegerArray = [u8; 2]; type TwoElementPackedIntegerArray = packed [u8; 2]; const arr: TwoElementIntegerArray = [1, 2]; const elem: u8 = arr[0]; ``` ### Semantics #### Instantiation Instantiation of a fixed-length array stores one element per 32-byte word in either data location. #### Access Array element access depends on whether the second expression is included. A single expression returns that element. With a colon-separated second expression, a pointer of the same data location is returned. The resulting array type has the same element type but a size equal to `end - start`. :::warning Bounds checking is not yet implemented. Out-of-bounds array accesses are currently undefined behavior at runtime. ::: ## Type assignment ### Signature ```text ::= | | | | | | | | | | "<" ("," )* ">" ; ::= ; ``` Dependencies: * `` * `` * `` * `` * `` * `` * `` * `` * `` The `` enumerates every form a type can take. It maps directly to the `TypeSig` enum in the AST. A bare `` produces `TypeSig::Named(ident, [])`, while an identifier with angle-bracketed type arguments produces `TypeSig::Named(ident, args)`. The `` wraps any type with a data location annotation, producing `TypeSig::Pointer(location, inner)`. ### Declaration ```text ::= ["pub"] "type" [] ; ``` Dependencies: * `` * `` The `` maps to `TypeDecl` in the AST. It contains a name, optional type parameters, and a `is_pub` flag. ### Assignment ```text ::= "=" ( | ) ";" ; ``` The `` binds a type declaration to a type signature. When the right-hand side is a ``, the parser uses `parse_type_sig_or_union()` to accept pipe-separated union variants. ### Semantics Type assignment creates an identifier associated with a data structure or existing type. If the assignment targets an existing type, the alias shares the same fields, members, and associated items. ```edge type MyCustomType = packed (u8, u8, u8); type MyCustomAlias = MyCustomType; fn increment(rgb: MyCustomType) -> MyCustomType { return (rgb.0 + 1, rgb.1 + 1, rgb.2 + 1); } increment(MyCustomType(1, 2, 3)); increment(MyCustomAlias(1, 2, 3)); ``` To create a wrapper around an existing type without exposing its external interface, the type may be wrapped in parentheses, creating a single-element tuple with no overhead: ```edge type MyCustomType = packed (u8, u8, u8); type MyNewCustomType = (MyCustomType); ``` ## Contract objects Contract objects serve as an object-like interface to contract constructs. ### Declaration ```text ::= "let" ":" ";" ; ::= "const" [":" ] "=" ";" ; ::= ["pub"] ["ext"] ["mut"] ( | ";") ; ::= "contract" "{" * * * "}" ; ``` Dependencies: * `` * `` * `` * `` * `` The `` maps to `ContractDecl` in the AST: * `name: Ident` * `fields: Vec<(Ident, TypeSig)>` — storage fields * `consts: Vec<(ConstDecl, Expr)>` — contract constants * `functions: Vec` — inline functions Each `` maps to `ContractFnDecl`: * `name: Ident` * `params: Vec<(Ident, TypeSig)>` * `returns: Vec` * `is_pub: bool`, `is_ext: bool`, `is_mut: bool` * `body: Option` — `None` for declaration-only functions The `pub`, `ext`, and `mut` keywords are **independent** — each sets a separate boolean flag in the AST. ### Implementation ```text ::= "impl" [":" ] "{" (["pub"] ["ext"] ["mut"] )* "}" ; ``` Dependencies: * `` * `` The `` maps to `ContractImpl`: * `contract_name: Ident` * `abi_impl: Option` — ABI being satisfied * `functions: Vec` — all with `body: Some(...)` If the impl block includes `: AbiName`, it satisfies that ABI's interface. ### Examples ```edge contract ERC20 { let balances: &s map; let totalSupply: &s u256; const DECIMALS: u8 = 18; pub ext fn decimals() -> u8 { return DECIMALS; } } impl ERC20 : IERC20 { pub ext fn totalSupply() -> u256 { return self.totalSupply; } pub ext mut fn transfer(to: addr, amount: u256) -> bool { // ... } } ``` ### Semantics The contract object desugars to a single main function and storage layout with a dispatcher. Contract field declarations create the storage layout starting at slot zero, incrementing by one for each field. Fields are never packed; storage packing may be achieved by declaring contract fields as packed structs or tuples. Fields annotated with the `&s` (persistent storage) or `&t` (transient storage, EIP-1153) location receive sequential storage slots. Fields with other location annotations do not participate in storage slot assignment. :::warning[Confusing AST naming] The `&s` location in the AST maps to `Location::Stack` (see `crates/ast/src/ty.rs`). Despite the enum variant name, this represents **persistent contract storage** (EVM `SSTORE`/`SLOAD`), not the EVM execution stack. The naming is a historical artifact and may be renamed in a future refactor. ::: Contract implementation blocks contain definitions of external functions. If the impl block includes `: AbiName`, it satisfies that ABI's interface. The `ext` keyword indicates the function is exposed via the contract's dispatcher. The `mut` keyword indicates the function may mutate EVM state; non-`mut` functions may be called with `staticcall`. #### Constructor The contract compiler generates a separate constructor (init code) that runs once at deployment time. The constructor body initializes storage fields and any contract-level constants before the runtime bytecode is deployed on-chain. :::warning It has not yet been decided whether plain types with storage annotations fully subsume the contract object abstraction. The contract system may be revised. ::: ## Event types The event type is a custom type to be logged via EVM log opcodes. ### Inline event signature ```text ::= ["indexed"] ":" ; ::= ["anon"] "event" "{" [ ("," )* [","]] "}" ; ``` Dependencies: * `` * `` The `` is an inline type that maps to `TypeSig::Event(is_anon, Vec)`. Each field has `indexed: bool` and `ty: TypeSig`. The optional `indexed` keyword precedes the field name. :::note Edge uses two representations for events: the **inline event signature** (a `TypeSig` variant usable in type assignments) and the **standalone event declaration** (an `EventDecl` item). Both share the same `EventField` structure. ::: ### Standalone event declaration ```text ::= "event" "(" [ ("," )* [","]] ")" ";" ; ``` Dependencies: * `` * `` The standalone form produces `Stmt::EventDecl(EventDecl)`. The `EventDecl` struct has: * `name: Ident` * `is_anon: bool` * `fields: Vec` :::note The parser always sets `is_anon: false` for standalone event declarations. Anonymous events may be supported in a future revision. ::: ### Emit ```text ::= "emit" "(" [ ("," )* [","]] ")" ";" ; ``` Dependencies: * `` * `` The `emit` statement produces `Stmt::Emit(name, args, span)`. Arguments correspond to the event's fields in declaration order. ### Semantics The EVM allows up to four topics per log entry. If `anon` is used, the event may contain four `indexed` values. Otherwise, the first topic is reserved for the event selector — the keccak256 hash of the event name followed by a parenthesized, comma-separated list of the field type names (matching Solidity's ABI specification). In that case, at most three `indexed` fields are allowed. ## External Calls External calls enable cross-contract interaction through ABI-typed addresses. The `Impl` type wraps an address with compile-time ABI information, enabling type-safe method dispatch that compiles to EVM CALL, STATICCALL, or DELEGATECALL instructions. ### Syntax Options Two syntax options are under consideration. Both share identical semantics; the difference is purely syntactic. #### Option A: `Impl` (keyword or builtin parameterized type) ```text ::= "Impl" "<" ("+" )* ">" ; ``` `Impl` takes one or more plus-separated interface identifiers as type parameters. Whether `Impl` is a new keyword or a compiler- recognized builtin type name (like `map` or `Result`) is an open question. As a keyword, it cannot be shadowed or redefined by user code; as a builtin type, it is lighter-weight but could theoretically be shadowed. ```edge let token: &s Impl; let multi: Impl; fn withdraw(token: Impl, amount: u256) { } ``` #### Option B: `impl ABI` (keyword in type position) ```text ::= "impl" ("+" )* ; ``` Reuses the existing `impl` keyword in type position, following Rust's `impl Trait` syntax exactly. The parser disambiguates impl blocks (`impl Foo {`) from impl types (`impl Foo` in type position) by context. ```edge let token: &s impl IERC20; let multi: impl IERC20 + IERC20Metadata; fn withdraw(token: impl IERC20, amount: u256) { } ``` The remainder of this specification uses Option A syntax. All examples apply equally to Option B with the obvious substitution. ### Impl Type When the inner identifier resolves to an `abi` declaration, the `Impl` type represents an address that conforms to the given external interface. At the EVM level, values of this type are 20-byte addresses. The `Impl` wrapper exists purely at compile time to enable typed method dispatch. When the inner identifier resolves to a `trait` declaration, the `Impl` type is syntactic sugar for a generic type parameter with a trait bound. `Impl` in argument position desugars to an anonymous generic `<__T: T>`. Both ABI and trait forms use the same syntax, and the compiler disambiguates based on whether the inner identifier was declared with `abi` or `trait`. The two have different semantics, return types, and positional validity rules (see below). Mixing ABI and trait identifiers in a single `Impl` type is a compile error because they represent fundamentally different dispatch mechanisms. ABI methods compile to cross-contract EVM call instructions with ABI encoding, gas forwarding, and fallible results. Trait methods compile to monomorphized inline code with direct returns. A single `Impl` expression must resolve to one dispatch mechanism or the other — there is no meaningful way to combine them. Multiple ABI identifiers or multiple trait identifiers may be composed with `+`. ### Casting Since `Impl` is an address at runtime, it supports free casting between `Impl` types and `addr` via `as`: ```edge let token: Impl = 0xdead as Impl; // Extract raw address let raw: addr = token as addr; // Reinterpret as a different ABI let as_metadata: Impl = token as Impl; // Widen to a composed type let full: Impl = token as Impl; ``` All casts are compile-time only with no runtime cost. The underlying value is always a 20-byte address. ### Construction An `Impl` value is constructed via `as` casting (see Casting above). The `as` keyword is a new addition to the language's keyword set. ### Method Calls ```text ::= []* "." "(" [ ("," )* [","]] ")" ; ::= | "." "value" "(" ")" | "." "gas" "(" ")" | "." "delegate" "(" ")" ; ``` Dependencies: * `` * `` Method calls on `Impl` values dispatch to external contracts. The compiler generates the 4-byte function selector from the ABI function signature, ABI-encodes the arguments into memory, and emits the appropriate EVM call instruction. The call modifiers `value`, `gas`, and `delegate` are not keywords. They are recognized as method names on the internal call builder. ```edge abi IERC20 { fn balanceOf(account: addr) -> (u256); mut fn transfer(to: addr, amount: u256) -> (bool); } contract Vault { let token: &s Impl; pub fn withdraw(to: addr, amount: u256) { match token.transfer(to, amount) { Result::Ok(success) => { } Result::Err(err) => { revert; } } } pub fn checkBalance() -> (u256) { match token.balanceOf(@address()) { Result::Ok(bal) => { return bal; } Result::Err(err) => { revert; } } } } ``` ### Call Modifiers Call modifiers configure the EVM call instruction parameters. They are chained between the expression and the terminal method call. Modifiers are order-independent and are evaluated at compile time. #### value ```edge token.value(1000).transfer(to, amount) ``` Sets the `msg.value` for the call. Only valid with CALL instruction (i.e., `mut` functions). Using `value` with `delegate` is a compile error, as DELEGATECALL does not accept a value parameter. #### gas ```edge token.gas(50000).transfer(to, amount) ``` Sets the gas limit forwarded to the call. If omitted, all available gas is forwarded via the GAS opcode. #### delegate ```edge token.delegate().transfer(to, amount) ``` Switches the call instruction from CALL to DELEGATECALL. The target contract's code executes in the caller's storage context. Using `delegate` with `value` is a compile error. #### Combined ```edge token.value(1000).gas(100000).transfer(to, amount) ``` Multiple modifiers may be chained in any order before the terminal method call. ### Return Type All external method calls return `Result` where `T` is the return type declared in the ABI function signature. This is mandatory; external calls can fail due to reverts, out-of-gas, or invalid target code, and these failures must be handled explicitly. ```edge // Must handle the result match token.transfer(to, amount) { Result::Ok(success) => { return success; } Result::Err(err) => { revert; } } ``` `CallErr` contains the failure information from the call. The success flag from the EVM CALL instruction serves as the discriminant for the `Result` union. ### Instruction Selection The EVM call instruction is selected based on the ABI function declaration and call modifiers: * Functions declared without `mut` use STATICCALL by default. * Functions declared with `mut` use CALL by default. * The `delegate` modifier overrides either to DELEGATECALL. ### Positional Validity The `Impl` type is valid in different positions depending on whether the inner identifier is an ABI or trait declaration. When wrapping an ABI declaration, `Impl` is valid in all type positions: function arguments, return types, local variables, storage fields, and struct fields. At the EVM level, the value is always an address. ```edge contract Vault { let token: &s Impl; // storage field pub fn getToken() -> (Impl) { // return type return token; } } ``` When wrapping a trait declaration, `Impl` is only valid in function argument position, where it desugars to a generic type parameter. ```edge // These two declarations are equivalent: fn process(x: Impl) { } fn process(x: T) { } // Compile error: cannot store erased trait type let x: &s Impl; ``` ### Composition Multiple interfaces may be composed using `+` to indicate that a value conforms to all listed interfaces. ```edge abi IERC20 { fn balanceOf(account: addr) -> (u256); mut fn transfer(to: addr, amount: u256) -> (bool); } abi IERC20Metadata { fn name() -> (b32); fn symbol() -> (b32); } fn inspect(token: Impl) { // Can call methods from both ABIs match token.balanceOf(@address()) { Result::Ok(bal) => { } Result::Err(err) => { revert; } } match token.name() { Result::Ok(n) => { } Result::Err(err) => { revert; } } } ``` Composing ABI identifiers with trait identifiers is a compile error. The `+` operator may only combine identifiers of the same kind. ### Semantics The `Impl` type bridges the gap between ABI declarations and external contract interaction. An ABI declaration defines an interface; `Impl` creates a typed handle to an address that is asserted to conform to that interface. The type carries no runtime overhead. ABI encoding, selector computation, and instruction selection are performed entirely at compile time. The only runtime operations are memory writes for argument encoding, the call instruction itself, and return data decoding. ### New Keywords This feature requires one new keyword: * `as` — used for type casting in ` "as" ` For Option A, whether `Impl` is a new keyword or a builtin type name is an open question (see Syntax Options above). For Option B, `impl` is already a keyword. The call modifier names (`value`, `gas`, `delegate`) are not keywords; they are recognized contextually as method names on the call builder. ## Function types The function type is a type composed of input and output types. ### Signature ```text ::= "->" ; ``` Dependencies: * `` The `` maps to `TypeSig::Function(input, output)`. Since `` includes tuple signatures, a function with multiple inputs or outputs implicitly operates on a tuple. ### Declaration ```text ::= [":" ] ; ::= ["pub"] ["ext"] ["mut"] "fn" ["<" ("," )* ">"] "(" [ ("," )* [","]] ")" ["->" "(" ("," )* [","] ")"] ; ::= [":" ("+" )*] ; ``` Dependencies: * `` * `` The `` maps to `FnDecl` in the AST with these fields: * `name: Ident` * `type_params: Vec` * `params: Vec<(Ident, TypeSig)>` * `returns: Vec` * `is_pub: bool`, `is_ext: bool`, `is_mut: bool` The `pub`, `ext`, and `mut` keywords are **independent** — each sets a separate boolean flag. They may appear in any combination: ```edge pub fn read() -> u256 { ... } pub ext fn deposit() { ... } pub mut fn transfer() { ... } pub ext mut fn swap() { ... } ``` A parameter may omit its type annotation, in which case the type defaults to `Self`. This is intended for use with `self` in trait and impl methods: ```edge // These are equivalent: fn add(self, rhs: Self) -> (Self); fn add(self: Self, rhs: Self) -> (Self); ``` ### Assignment ```text ::= ; ``` Dependencies: * `` The `` is a function declaration followed by a code block body. It produces `Stmt::FnAssign(FnDecl, CodeBlock)`. ### Arrow functions ```text ::= ( | "(" [ ("," )* [","]] ")") "=>" ; ``` Dependencies: * `` * `` Arrow functions produce `Expr::ArrowFunction(params, body, span)`. The body must be a brace-delimited code block. Supported forms: ```edge x => { x + 1 } (x, y) => { x + y } () => { 42 } ``` ### Call ```text ::= ["::" "<" ("," )* ">"] "(" [ ("," )* [","]] ")" ; ``` Dependencies: * `` * `` The `` produces `Expr::FunctionCall(callee, args, type_args, span)`. The callee is any expression — supporting method calls (`obj.method()`), higher-order calls (`get_fn()()`), and turbofish instantiations (`foo::(...)`). ### Compile-time functions ```text ::= "comptime" ; ``` The `comptime` keyword before a function assignment declares a compile-time function. This produces `Stmt::ComptimeFn(FnDecl, CodeBlock)`, which is distinct from `Stmt::FnAssign`. See [compile-time functions](/specs/syntax/compile/functions). ### Semantics :::warning The function-type semantics section is still under construction. Runtime calling conventions, ABI encoding, and stack-frame layout are not yet documented. ::: ## Generics Generics are polymorphic types enabling function and type reuse across different types. ### Type parameters ```text ::= [":" ("+" )*] ; ::= "<" ("," )* [","] ">" ; ``` Dependencies: * `` Each `` maps to a `TypeParam { name, constraints }` in the AST. Trait bound constraints are separated by `+` and stored as `constraints: Vec`. The `` is a comma-separated list of individual type parameters delimited by angle brackets. ### Nested generics The parser handles `>>` in nested generics (e.g. `map>`) by splitting the `>>` token into two `>` tokens when closing generic parameter lists. ### Semantics Generics are resolved at compile time through monomorphization. Generic functions and data types are monomorphized into distinct unique functions and data types. Function duplication can become problematic due to the EVM bytecode size limit, so a series of steps will be taken to allow for granular control over bytecode size. Those semantics are defined in the codesize document. ## Implementation Implementation blocks enable method-call syntax and trait satisfaction. ### Implementation block ```text ::= "impl" [] [":" []] "{" ( | | | )* "}" ; ``` Dependencies: * `` * `` * `` * `` * `` The `` maps to `ImplBlock` in the AST: * `ty_name: Ident` — the type being implemented * `type_params: Vec` — type parameters brought into scope * `trait_impl: Option<(Ident, Vec)>` — optional trait being satisfied * `items: Vec` — function, constant, and type assignments Each body item maps to an `ImplItem` variant: | Syntax | AST variant | | ----------------------- | ----------------------- | | `fn name(...) { ... }` | `ImplItem::FnAssign` | | `const NAME: T = expr;` | `ImplItem::ConstAssign` | | `type Name = T;` | `ImplItem::TypeAssign` | The trait clause uses `:` (not `for`): ```edge impl MyType : MyTrait { fn method(self) -> T { ... } } ``` ### Semantics Associated functions, constants, and types are defined for a given type. If the type contains generics in any of its internal assignments, the type parameters must be brought into scope by annotating them directly following the type's identifier. If the impl block satisfies a trait's interface, only functions, constants, and types declared in the trait may be defined. All undefaulted trait declarations must be assigned in the impl block. ## Type system The type system builds on core primitive types inherent to the EVM with abstract data types for parametric polymorphism, nominative subtyping, and compile-time monomorphization. * [Primitive types](/specs/syntax/types/primitives) * [Type assignment](/specs/syntax/types/assignment) * [Array types](/specs/syntax/types/arrays) * [Product types](/specs/syntax/types/products) * [Sum types](/specs/syntax/types/sum) * [Generics](/specs/syntax/types/generics) * [Trait constraints](/specs/syntax/types/traits) * [Implementation](/specs/syntax/types/implementation) * [Function types](/specs/syntax/types/function) * [Event types](/specs/syntax/types/events) * [Application binary interface](/specs/syntax/types/abi) * [Contract objects](/specs/syntax/types/contracts) ## Primitive types ```text ::= "8" | "16" | "24" | "32" | "40" | "48" | "56" | "64" | "72" | "80" | "88" | "96" | "104" | "112" | "120" | "128" | "136" | "144" | "152" | "160" | "168" | "176" | "184" | "192" | "200" | "208" | "216" | "224" | "232" | "240" | "248" | "256" ; ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" | "10" | "11" | "12" | "13" | "14" | "15" | "16" | "17" | "18" | "19" | "20" | "21" | "22" | "23" | "24" | "25" | "26" | "27" | "28" | "29" | "30" | "31" | "32" ; ::= {"i" } ; ::= {"u" } ; ::= {"b" } ;
::= "addr" ; ::= "bool" ; ::= "bit" ; ::= | | |
; ::= | | | ; ``` The `` covers signed and unsigned integers, booleans, address, fixed bytes, and the single-bit type. Each maps directly to `TypeSig::Primitive(PrimitiveType)` in the AST. :::note Pointer types (` `) are not primitives — they are a separate `TypeSig::Pointer` variant that wraps any type with a storage location. See [type assignment](/specs/syntax/types/assignment) for the `` production. ::: ### Examples ```edge u8 u256 i8 i256 b4 b32 addr bool bit ``` ### Semantics Integers occupy the number of bits indicated by their size. Fixed bytes types occupy the number of bytes indicated by their size, or `size × 8` bits. Address occupies 160 bits. Booleans occupy eight bits. Bit occupies a single bit. ## Product types The product type is a compound type composed of zero or more internal types. ### Signature ```text ::= ":" ; ::= ["packed"] "{" [ ("," )* [","]] "}" ; ::= ["packed"] "(" ("," )* [","] ")" ; ``` Dependencies: * `` * `` The `` maps to `TypeSig::Struct` or `TypeSig::PackedStruct`. Each field produces a `StructField { name, ty }` in the AST. The `` maps to `TypeSig::Tuple` or `TypeSig::PackedTuple`. ### Instantiation ```text ::= ":" ; ::= [] "{" [ ("," )* [","]] "}" ; ::= [] "(" [ ("," )* [","]] ")" ; ``` Dependencies: * `` * `` * `` The `` produces `Expr::StructInstantiation(location, name, fields, span)`. The parser distinguishes struct instantiation from a code block by lookahead: if the opening `{` is followed by ` ":"`, it's a struct. The `` produces `Expr::TupleInstantiation(location, elements, span)`. A single expression in parentheses without a trailing comma is parsed as `Expr::Paren` (grouping), not a tuple. ### Field access ```text ::= "." ; ::= "." ; ``` Dependencies: * `` * `` The `` produces `Expr::FieldAccess(expr, field, span)`. The `` produces `Expr::TupleFieldAccess(expr, index, span)`. Both are postfix operations on any expression, not just identifiers. ### Examples ```edge type PrimitiveStruct = { a: u8, b: u8, c: u8, }; const primitiveStruct: PrimitiveStruct = PrimitiveStruct { a: 1, b: 2, c: 3 }; const a = primitiveStruct.a; type PackedTuple = packed (u8, u8, u8); const packedTuple: PackedTuple = (1, 2, 3); const one = packedTuple.0; ``` ### Semantics The struct field signature maps a type identifier to a type signature. The field may be accessed by the struct's identifier and field identifier separated by a dot. Prefixing the signature with the `packed` keyword will pack the fields by their bit size, otherwise each field is padded to its own 256-bit word. :::warning Packed tuple IR lowering is not yet implemented. The `packed` keyword on tuples is accepted by the parser but currently has no effect on code generation. ::: ```edge type Rgb = packed { r: u8, g: u8, b: u8 }; let rgb = Rgb { r: 1, g: 2, b: 3 }; // rgb = 0x010203 ``` :::warning Stack-allocated struct optimization (for single-word structs) is not yet implemented. All struct instantiations currently allocate memory regardless of size, and no compiler error is generated for missing data location annotations. ::: Memory instantiation consists of allocating new memory, optionally bitpacking fields, storing the struct in memory, and leaving the pointer to it on the stack. ```edge type MemoryRgb = { r: u8, g: u8, b: u8 }; let memoryRgb = MemoryRgb{ r: 1, g: 2, b: 3 }; // ptr = .. // mstore(ptr, 1) // mstore(add(32, ptr), 2) // mstore(add(64, ptr), 3) ``` Persistent and transient storage structs must be instantiated at the file level. If anything except zero values are assigned, storage writes will be injected into the initcode to be run on deployment. ```edge type Storage = { a: u8, b: u8, c: packed { a: u8, b: u8 } } const storage = @default(); fn main() { storage.a = 1; // sstore(0, 1) storage.b = 2; // sstore(1, 2) storage.c.a = 3; // ca = shl(8, 3) storage.c.b = 4; // sstore(2, or(ca, 4)) } ``` Packing rules for buffer locations pack everything exactly by its bit length. Packing rules for map locations right-align the last field; for each preceding field, left-shift by the combined bit size of all fields to its right. If a field's bit size would overflow the current word, it begins a new word. :::warning Packed struct layout currently supports single-word packing only. Fields whose combined bit size exceeds 256 bits will not be correctly packed across multiple words. Multi-word packed structs are a planned feature. ::: ```edge type Storage = { a: u128, b: u8, c: addr, d: u256 } const storage = Storage { a: 1, b: 2, c: 0x3, d: 4, }; ``` | Slot | Value | | ---- | ------------------------------------------------------------------ | | 0x00 | 0x0000000000000000000000000000000200000000000000000000000000000001 | | 0x01 | 0x0000000000000000000000000000000000000000000000000000000000000003 | | 0x02 | 0x0000000000000000000000000000000000000000000000000000000000000004 | ## Sum types The sum type is a union of multiple types where the value represents exactly one of the inner variants. ### Signature ```text ::= ["(" ")"] ; ::= ["|"] ("|" )* ; ``` Dependencies: * `` * `` The `` declares a sum type — a data structure that holds one of its declared members. Each `` is named by an identifier, optionally followed by exactly one payload type in parentheses. A leading `|` is permitted for formatting convenience. Each member maps to `UnionMember { name, inner: Option }` in the AST. The overall signature maps to `TypeSig::Union(Vec)`. ### Instantiation ```text ::= "::" "(" [ ("," )* [","]] ")" ; ``` Dependencies: * `` * `` The `` creates a union value. It consists of the union type name, `::`, the variant name, and arguments in parentheses. This produces `Expr::UnionInstantiation(type_name, variant_name, args, span)`. :::note Although each variant carries at most one type in its signature, the instantiation syntax accepts multiple comma-separated expressions. For variants with a tuple payload, these expressions correspond to the tuple elements. ::: ### Union pattern ```text ::= "::" ["(" ("," )* [","] ")"] ; ``` Dependencies: * `` The `` matches a specific variant by type name and member name, optionally binding payload values to identifiers. It maps to `UnionPattern { union_name, member_name, bindings }` in the AST. ### Pattern match expression ```text ::= "matches" ; ``` Dependencies: * `` * `` The `matches` keyword produces `Expr::PatternMatch(expr, pattern, span)` and can be used anywhere an expression is valid. ### Semantics A union where no member has an internal type is effectively an enumeration over integers: ```edge type Mutex = Locked | Unlocked; // Mutex::Locked == 0 // Mutex::Unlocked == 1 ``` Unions where any members have an internal type become proper type unions. Each variant may carry **at most one** payload type: ```edge type StackUnion = A(u8) | B(u248); type MemoryUnion = A(u256) | B | C(u8); ``` :::note Data-carrying variants are heap-allocated: the discriminant is stored at the base memory address and the single payload is stored at `base + 32`. The union value for a data-carrying variant is the base memory pointer, not an inline integer. Unit variants (no payload) are represented as an inline integer discriminant. ::: A union pattern consists of the type name and the member name separated by `::`. This pattern may be used in both `match` statements and `if` conditions: ```edge type Option = None | Some(T); impl Option { fn unwrap(self) -> T { match self { Option::Some(inner) => return inner, Option::None => revert(), }; } fn unwrapOr(self, default: T) -> T { let mut value = default; if self matches Option::Some(inner) { value = inner; } return value; } } ``` ## Trait constraints Traits are interface-like declarations that constrain generic types to implement specific methods or contain specific properties. ### Declaration ```text ::= ["pub"] "trait" [] [":" ("+" )*] "{" ( | ";" | ";" | ";" | ";" | ";" | )* "}" ; ``` Dependencies: * `` * `` * `` * `` * `` * `` * `` * `` The `` maps to `TraitDecl` in the AST. It contains a name, optional type parameters, optional supertraits, and a body of trait items. The `is_pub` flag tracks visibility. Each body item maps to a `TraitItem` variant: | Syntax | AST variant | | --------------------------- | ------------------------ | | `type Name;` | `TraitItem::TypeDecl` | | `type Name = T;` | `TraitItem::TypeAssign` | | `const NAME: T;` | `TraitItem::ConstDecl` | | `const NAME: T = expr;` | `TraitItem::ConstAssign` | | `fn name(...) -> T;` | `TraitItem::FnDecl` | | `fn name(...) -> T { ... }` | `TraitItem::FnAssign` | ### Supertrait constraints ```text ::= ":" ("+" )* ; ``` Dependencies: * `` Supertraits are separated by `+`, indicating that all listed traits must be implemented. This matches the `+` separator used for type parameter bounds (e.g. `fn f()`). Supertraits are stored as `supertraits: Vec` in `TraitDecl`. ### Semantics Traits can be defined with associated types, constants, and functions. The trait declaration allows optional assignment for each item as a default. Declarations without a default assignment must be provided in the implementation. Default assignments can be overridden in trait implementations. Types can depend on trait constraints, and traits can also depend on other traits (supertraits). Supertraits assert that types implementing a given trait also implement all of its parent traits. :::warning Trait-solving semantics are still being drafted. The compiler does not yet validate trait implementations or enforce supertrait constraints. ::: ## Branching Branching refers to blocks of code that may be executed based on a condition. ### If / else if / else ```text ::= "if" "(" ")" ("else" "if" "(" ")" )* ["else" ] ; ``` Dependencies: * `` * `` Produces `Stmt::IfElse(Vec<(Expr, CodeBlock)>, Option)`. Each condition-block pair is an element in the vector; the optional else block is the second field. ### If match ```text ::= "if" "matches" ; ``` Dependencies: * `` * `` * `` :::note[Implementation detail] `Stmt::IfMatch` exists as a variant in the AST but is **dead code**. The parser never produces it directly — instead, it always emits `Stmt::IfElse` with an `Expr::PatternMatch` as the condition. Contributors should be aware that any logic gated on `Stmt::IfMatch` will not be reached under normal compilation. ::: ### Pattern match expression ```text ::= "matches" ; ``` The `matches` keyword produces `Expr::PatternMatch(expr, pattern, span)` and works as a boolean expression usable anywhere — including as an `if` condition, ternary operand, or `let` binding value: ```edge let is_some = value matches Option::Some(x); ``` ### Match ```text ::= | | "_" ; ::= "=>" ( | | "return" []) ; ::= "match" "{" [ ("," )* [","]] "}" ; ``` Dependencies: * `` * `` * `` Each `` maps to a `MatchPattern` variant: | Pattern | AST variant | | -------------------- | ----------------------------------- | | `Type::Variant(...)` | `MatchPattern::Union(UnionPattern)` | | `name` | `MatchPattern::Ident(Ident)` | | `_` | `MatchPattern::Wildcard` | Each `` maps to `MatchArm { pattern, body: CodeBlock }`. :::note At the AST level, all arm bodies are normalized to `CodeBlock`. Bare expressions and `return` statements are wrapped in synthetic code blocks by the parser. ::: :::warning Compile-time exhaustiveness checking is not yet implemented. Non-exhaustive match blocks do not produce a compiler error. If no arm matches at runtime and no default arm is present, the program reverts. ::: ### Ternary ```text ::= "?" ":" ; ``` Dependencies: * `` Produces `Expr::Ternary(condition, then_expr, else_expr, span)`. The ternary is right-associative — `a ? b : c ? d : e` parses as `a ? b : (c ? d : e)`. ### Semantics #### If / else if The condition expression is evaluated. If it is true, the subsequent block executes. Otherwise the next `else if` condition is checked. If no condition is true and an `else` block is present, it executes. ```edge fn main() { let n = 3; if (n == 1) { // .. } else if (n == 2) { // .. } else { // .. } } ``` #### If match The `if match` branch brings into scope any identifiers bound by the pattern's payload bindings: ```edge type Union = A(u8) | B; fn main() { let u = Union::A(1); if u matches Union::A(n) { assert(n == 1); } } ``` #### Match The `match` statement evaluates the target expression and compares it against each arm's pattern in order. The first matching arm's body executes. An identifier pattern (`name`) binds the matched value irrefutably. A wildcard pattern (`_`) discards the value. Both serve as catch-all arms. ```edge type Ua = A | B; fn main() { let u = Ua::B; match u { Ua::A => {}, Ua::B => {}, } } ``` :::warning Type narrowing of wildcard/identifier bindings is not yet implemented. ::: #### Ternary The condition must evaluate to a boolean. If true, the second expression is evaluated; otherwise the third. ```edge fn main() { let condition = true; let b = condition ? 1 : 2; } ``` #### Short circuiting For boolean expressions composed of logical operators: * `expr0 && expr1` — if `expr0` is `false`, short-circuit to `false` * `expr0 || expr1` — if `expr0` is `true`, short-circuit to `true` For `if / else if` chains, if an earlier branch is taken, subsequent conditions are not evaluated. ## Code blocks A code block is a sequence of items with its own scope. It may appear standalone or as the body of a function, loop, or branch. ### Declaration ```text ::= "{" (( | ) ";")* [] "}" ; ``` Dependencies: * `` * `` The `` maps to `CodeBlock { stmts: Vec, span }`. Each item is either `BlockItem::Stmt(Box)` or `BlockItem::Expr(Expr)`. ### Tail expressions A code block's final item may omit its trailing semicolon to act as the block's **return value** (Rust-style tail expression). When the last item in a `` is an `` with no terminating `;`, the block evaluates to that expression's value. ```edge let result = { let x = 2; x * x // no semicolon — this is the block's value (4) }; ``` If the trailing semicolon is present, the block evaluates to `unit` (i.e. the expression's value is discarded). :::note At the AST level, tail expressions are not distinguished from regular expression statements — both are stored as `BlockItem::Expr(expr)`. The semantic difference (block evaluates to this value) is determined by position: only the last item in the block, if it is an expression without a semicolon, acts as the return value. ::: ### Semantics Code blocks represent a distinct scope. Identifiers declared within a code block are dropped when the block ends. Blocks may be nested arbitrarily. Orphan semicolons (e.g. after `match {}`) are silently skipped by the parser. ## Loops Loops are blocks of code that may be executed repeatedly based on conditions. ### Loop control ```text ::= "break" ";" ; ::= "continue" ";" ; ``` The `break` keyword exits the loop immediately. The `continue` keyword skips to the next iteration. :::warning `break` and `continue` are parsed into the AST but **silently dropped during IR lowering**. Using them in a loop will compile as if the statement were absent. This is a known limitation. ::: ### Loop block ```text ::= "{" (( | | | ) ";")* "}" ; ``` Dependencies: * `` * `` The `` maps to `LoopBlock { items: Vec, span }`. Each item is a `LoopItem` variant: | Variant | Description | | --------------------------- | ----------- | | `LoopItem::Expr(Expr)` | Expression | | `LoopItem::Stmt(Box)` | Statement | | `LoopItem::Break(Span)` | `break` | | `LoopItem::Continue(Span)` | `continue` | :::note Loop blocks have their own `LoopItem` enum with dedicated `Break`/`Continue` variants, separate from `Stmt::Break`/`Stmt::Continue`. The top-level statement variants exist for `break`/`continue` outside loops (which would be a semantic error), while `LoopItem` variants are used inside loop bodies. ::: ### Core loop ```text ::= "loop" ; ``` The simplest loop form. Produces `Stmt::Loop(LoopBlock)`. At the IR level, all loop forms are lowered to a `DoWhile` representation. ### For loop ```text ::= "for" "(" [ | ] ";" [] ";" [ | ] ")" ; ``` Dependencies: * `` * `` Produces `Stmt::ForLoop(init, condition, update, body)` where each of `init`, `condition`, and `update` is individually optional. ### While loop ```text ::= "while" "(" ")" ; ``` Dependencies: * `` Produces `Stmt::WhileLoop(condition, body)`. ### Do-while loop ```text ::= "do" "while" "(" ")" ";" ; ``` Dependencies: * `` Produces `Stmt::DoWhile(body, condition)`. The body executes at least once before the condition is checked. :::note The `do-while` loop requires a trailing semicolon after the closing parenthesis. This distinguishes it from other loop forms and matches the parser's expectation. ::: ### Examples ```edge fn example() { // core loop let mut i = 0; loop { if (i >= 10) { return; } i = i + 1; } // for loop for (let mut j = 0; j < 10; j = j + 1) { // ... } // while loop let mut k = 0; while (k < 10) { k = k + 1; } // do-while loop let mut m = 0; do { m = m + 1; } while (m < 10); } ``` ### Semantics :::warning Loop semantics (desugaring rules, IR lowering details) are still under construction. ::: ## Control flow Control flow is composed of code blocks, loops, branches, and pattern matching. * [Code blocks](/specs/syntax/control/code) * [Loops](/specs/syntax/control/loops) * [Branching](/specs/syntax/control/branching) ## Compile-time branching ```text ::= "comptime" ; ``` Dependencies: * `` The `` produces `Stmt::ComptimeBranch(Box)`. The `comptime` keyword may precede any statement, but meaningful conditional compilation only occurs with branching statements (`if`, `if matches`, `match`). ### Semantics Since comptime must be resolved at compile time, the branching expression must itself be a literal, constant, or expression resolvable at compile time. Branches that are not matched will be removed from the compiled output. ```edge use std::{ builtin::HardFork, op::{tstore, tload, sstore, sload}, }; const SLOT: u256 = 0; fn store(value: u256) { comptime if (@hardFork() == HardFork::Cancun) { tstore(SLOT, value); } else { sstore(SLOT, value); } } fn load() -> u256 { comptime match @hardFork() { HardFork::Cancun => tload(SLOT), _ => sload(SLOT), } } ``` ## Constants ### Declaration ```text ::= "const" [":" ] ; ``` Dependencies: * `` * `` The `` maps to `ConstDecl { name, ty: Option, span }`. ### Assignment ```text ::= "=" ";" ; ``` Dependencies: * `` The `` produces `Stmt::ConstAssign(ConstDecl, Expr, Span)`. :::note The expression must be resolvable at compile time, but this constraint is enforced semantically, not by the grammar. ::: ### Semantics Constants must be resolvable at compile time by assigning a literal, another constant, or an expression that can be evaluated at compile time. The type of a constant is inferred from its assignment when no explicit type annotation is provided. ```edge const A: u8 = 1; const B = 1; const C = B; const D = a(); comptime fn a() -> u8 { 1 } ``` ## Compile-time functions ```text ::= "comptime" ; ``` Dependencies: * `` The `` produces `Stmt::ComptimeFn(FnDecl, CodeBlock)`, which is distinct from `Stmt::FnAssign`. This distinction affects how later compiler phases evaluate and inline the function. ### Scope restrictions `comptime fn` is only valid at module/top-level scope. It cannot appear inside `contract`, `impl`, or `trait` bodies. ### Semantics Since comptime functions must be resolved at compile time, the function body must contain only expressions resolvable at compile time. ```edge comptime fn a() -> u8 { 1 } comptime fn b(arg: u8) -> u8 { arg * 2 } comptime fn c(arg: u8) -> u8 { a(b(arg)) } const A = c(1); const B = c(A); ``` ## Literals ### Characters ```text ::= "0" | "1" ; ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ; ::= | "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" | "a" | "b" | "c" | "d" | "e" | "f" | "A" | "B" | "C" | "D" | "E" | "F" ; ::= | "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" ; ::= | ; ::= ? any valid Unicode scalar value ? ; ``` ### Numeric literals ```text ::= { ( | "_")+ } ; ::= { "0x" ( | "_")+ } ; ::= | ; ``` Numeric literals are composed of decimal or hexadecimal digits. Each literal may contain underscores for readability. Hexadecimal literals are prefixed with `0x`. The parser stores integer literals as `Lit::Int(u64, Option, Span)`. :::warning[Known limitations] * **Integer range cap (`u64`):** The parser stores integer literal values as a Rust `u64`. Any literal value larger than 2⁶⁴ − 1 cannot be represented as a compile-time constant — even though the language defaults to `u256`. Large constants must currently be expressed through arithmetic or runtime construction. * **Type suffixes silently discarded:** Integer type suffixes such as `1u8` or `0xffu128` are recognized by the lexer but the suffix annotation is silently dropped. The AST always stores the literal as `Lit::Int(value, None, span)`. Type is inferred from context or defaults to `u256`. ::: ### Binary literals ```text ::= { "0b" ( | "_")+ } ; ``` Binary literals are prefixed with `0b` and produce `Lit::Bin(Vec, Span)`. ### String literals ```text ::= ('"' (!'"' )* '"') | ("'" (!"'" )* "'") ; ``` String literals may use either double or single quotes. Both forms support escape sequences: `\n`, `\t`, `\r`, `\\`, `\"`, `\'`. String literals produce `Lit::Str(String, Span)`. ### Boolean literals ```text ::= "true" | "false" ; ``` :::note `Lit::Bool(bool, Span)` exists in the AST, but the lexer never constructs it. The `true` and `false` keywords are currently parsed as keyword identifiers and resolve to integer constants (1 and 0 respectively) during compilation. ::: ### Literal ```text ::= | | | | ; ``` The `` maps to `Lit` variants in the AST: | Syntax | AST variant | | -------------------- | -------------------------------------------- | | `42`, `0xFF` | `Lit::Int(u64, Option, Span)` | | `"hello"` | `Lit::Str(String, Span)` | | `true`, `false` | (see note above) | | `0xDEADBEEF` (bytes) | `Lit::Hex(Vec, Span)` | | `0b10101010` | `Lit::Bin(Vec, Span)` | ### Semantics Numeric literals may contain arbitrary underscores. The type of a numeric literal is inferred from context; if no type can be inferred, it defaults to `u256`. Both numeric and boolean literals are roughly translated to pushing the value onto the EVM stack. String literals represent string instantiation, which behaves as a packed `u8` array instantiation. ```edge const A = 1; const B = 0xffFFff; const C = true; const D = "asdf"; const E = "💩"; ``` ## Compile time Compile time, also referred to as comptime, covers expressions, functions, and branches that are resolved during compilation. Comptime expressions and functions resolve to constant values at compile time, while comptime branches provide conditional compilation. * [Literals](/specs/syntax/compile/literals) * [Constants](/specs/syntax/compile/constants) * [Branching](/specs/syntax/compile/branching) * [Functions](/specs/syntax/compile/functions) ## Basics This page walks through the fundamental building blocks of Edge: contracts, storage, functions, types, expressions, and transient storage. All code snippets are exact copies from files in [`examples/`](https://github.com/refcell/edge-rs/tree/main/examples). ### A simple counter contract From `examples/counter.edge` — the simplest possible contract: an on-chain counter. ```edge abi ICounter { fn increment(); fn decrement(); fn get() -> (u256); fn reset(); } contract Counter { // Storage slot for the counter value let count: &s u256; // Increment the counter by 1 pub fn increment() { let val: u256 = count + 1; count = val; } // Decrement the counter by 1 (saturating at 0) pub fn decrement() { let val: u256 = count - 1; count = val; } // Return the current count pub fn get() -> (u256) { return count; } // Reset the counter to zero pub fn reset() { count = 0; } } ``` :::warning The `decrement` function performs `count - 1` with no underflow guard. On the EVM, unsigned subtraction past zero reverts — the counter does **not** saturate at 0. The upstream source comment is misleading. ::: Key points: * `let count: &s u256` declares a **persistent storage** field with the `&s` (storage) annotation. * `pub fn` marks a function as publicly callable. Inside a `contract` block, `pub fn` implicitly creates a dispatch entry (equivalent to `pub ext fn`). * Return types use parentheses: `-> (u256)`. * The `abi` block declares the external interface; the `contract` block implements it. ### Primitive types and data locations From `examples/types.edge` — a tour of all primitive types and data location annotations: ```edge // Primitive types let a: u8; let b: u256; let c: i128; let d: b32; let e: addr; let f: bool; let g: bit; // Data location annotations let stored: &s u256; // storage let transient_val: &t u256; // transient storage (EIP-1153) let in_memory: &m u256; // memory // Type aliases type TokenId = u256; type Owner = addr; type Balance = u256; // Constants const ZERO: u256 = 0; const MAX_SUPPLY: u256 = 1000000; ``` | Annotation | Opcodes | Lifetime | | ---------- | ------------------------- | ------------------------------ | | `&s` | SLOAD / SSTORE | Persists across transactions | | `&t` | TLOAD / TSTORE (EIP-1153) | Cleared after each transaction | | `&m` | MLOAD / MSTORE | Within execution only | ### Expressions and operators From `examples/expressions.edge` — all operator categories: ```edge // --- Arithmetic --- fn arithmetic(a: u256, b: u256) -> (u256) { let sum: u256 = a + b; let diff: u256 = a - b; let product: u256 = a * b; let quotient: u256 = a / b; let remainder: u256 = a % b; let power: u256 = a ** b; return sum; } // --- Comparison and logic --- fn comparisons(x: u256, y: u256) -> (bool) { let eq: bool = x == y; let lt: bool = x < y; let gt: bool = x > y; return eq; } // --- Bitwise --- fn bitwise(x: u256, y: u256) -> (u256) { let and_result: u256 = x & y; let or_result: u256 = x | y; let xor_result: u256 = x ^ y; let shifted: u256 = x << 2; return and_result; } // --- Nested expressions --- fn complex(a: u256, b: u256, c: u256) -> (u256) { return a + b * c; } ``` :::note These are free functions (not inside a `contract` block). Edge supports top-level function definitions. ::: ### Transient storage From `examples/transient.edge` — transient storage (`&t`) is erased at the end of every transaction (EIP-1153). It uses `TLOAD`/`TSTORE` opcodes (100 gas each) and is useful for reentrancy locks and within-transaction caches. ```edge contract ReentrancyGuard { // Transient lock — cleared automatically after each tx let locked: &t u256; // Persistent counter let count: &s u256; // Set the transient lock pub fn enter() { locked = 1; } // Clear the transient lock pub fn exit() { locked = 0; } // Read the lock state pub fn get_locked() -> (u256) { return locked; } // Increment the persistent counter pub fn increment() { count = count + 1; } // Read the persistent counter pub fn get_count() -> (u256) { return count; } } ``` Mix `&s` and `&t` in the same contract as needed. The compiler generates the appropriate opcodes for each. ### Built-in globals Two commonly used EVM builtins: | Builtin | Type | Description | | -------------- | ------ | ---------------------------------------------- | | `@caller()` | `addr` | Address of the immediate caller (`msg.sender`) | | `@callvalue()` | `u256` | ETH value sent with the call (`msg.value`) | The `@` prefix distinguishes built-in context accessors from regular function calls. ### Quick reference | Concept | Syntax | | ------------------------ | -------------------------------------- | | Persistent storage field | `let x: &s u256;` | | Transient storage field | `let x: &t u256;` | | Memory field | `let x: &m u256;` | | Type alias | `type TokenId = u256;` | | Constant | `const ZERO: u256 = 0;` | | Public function | `pub fn name() { ... }` | | Function with return | `pub fn get() -> (u256) { return x; }` | | ABI definition | `abi IName { fn foo() -> (u256); }` | | Caller address | `@caller()` | | ETH value sent | `@callvalue()` | ## ERC20 A complete ERC-20 fungible token in Edge. This page covers two implementations: 1. **`examples/erc20.edge`** — a minimal, self-contained ERC-20 (shown in the [Complete contract](#complete-contract) section and used for the walkthrough below). 2. **`std/tokens/erc20.edge`** — the full standard library implementation with metadata getters, the `MAX_UINT` infinite-approval pattern, and `mod`/`use` imports. Source files: [`examples/erc20.edge`](https://github.com/refcell/edge-rs/blob/main/examples/erc20.edge) · [`std/tokens/erc20.edge`](https://github.com/refcell/edge-rs/blob/main/std/tokens/erc20.edge) :::note The walkthrough below uses code from `examples/erc20.edge` unless explicitly noted otherwise. Where the standard library version differs, the difference is called out. ::: *** ### Events Events are declared at the top level with the `event` keyword. Fields marked `indexed` appear in log topics and are filterable on-chain. ```edge event Transfer(indexed from: addr, indexed to: addr, amount: u256); event Approval(indexed owner: addr, indexed spender: addr, amount: u256); ``` * `Transfer` — emitted on every token movement, including mint (`from = 0`) and burn (`to = 0`). * `Approval` — emitted when an owner sets a spender's allowance. *** ### External interface The `abi` block defines the ERC-20 public interface. ```edge abi IERC20 { fn totalSupply() -> (u256); fn balanceOf(account: addr) -> (u256); fn transfer(to: addr, amount: u256) -> (bool); fn allowance(owner: addr, spender: addr) -> (u256); fn approve(spender: addr, amount: u256) -> (bool); fn transferFrom(from: addr, to: addr, amount: u256) -> (bool); } ``` :::note The standard library version (`std/tokens/erc20.edge`) extends this ABI with `fn name() -> (b32)`, `fn symbol() -> (b32)`, and `fn decimals() -> (u8)`. ::: *** ### Contract and storage layout The `contract` block holds persistent storage fields and all function definitions. Storage fields use the `&s` (storage pointer) qualifier. ```edge contract ERC20 { const DECIMALS: u8 = 18; let name: &s b32; let symbol: &s b32; let total_supply: &s u256; let balances: &s map; let allowances: &s map>; // ... functions follow } ``` **Edge-specific features here:** * `&s` — marks a field as contract storage (persists on-chain). * `map` — Edge's built-in mapping type, equivalent to Solidity's `mapping(K => V)`. * `map>` — nested mapping for two-dimensional allowance lookup. * `b32` — a 32-byte value, used for compact string storage (name, symbol). * `const` — compile-time constant, not stored on-chain. :::note The standard library version also declares `const MAX_UINT: u256 = 0xfff...fff;` as a sentinel for infinite allowances (solmate pattern). ::: *** ### Public read functions ```edge pub fn totalSupply() -> (u256) { return total_supply; } pub fn balanceOf(account: addr) -> (u256) { return balances[account]; } pub fn allowance(owner: addr, spender: addr) -> (u256) { return allowances[owner][spender]; } ``` Map fields are accessed with `map[key]` syntax. Nested maps use chained brackets: `allowances[owner][spender]`. *** ### Transfer `transfer` moves tokens from the caller to a recipient. The caller's address is retrieved via `@caller()`. ```edge pub fn transfer(to: addr, amount: u256) -> (bool) { let from: addr = @caller(); _transfer(from, to, amount); return true; } ``` The actual balance update and event emission are delegated to the internal `_transfer` helper. *** ### Approve `approve` sets how many tokens a spender may transfer on the caller's behalf. ```edge pub fn approve(spender: addr, amount: u256) -> (bool) { let owner: addr = @caller(); _approve(owner, spender, amount); return true; } ``` *** ### TransferFrom `transferFrom` lets an approved spender move tokens from another account. It checks and decrements the allowance, then calls `_transfer`. ```edge pub fn transferFrom(from: addr, to: addr, amount: u256) -> (bool) { let caller: addr = @caller(); let current_allowance: u256 = allowances[from][caller]; allowances[from][caller] = current_allowance - amount; _transfer(from, to, amount); return true; } ``` :::note The standard library version (`std/tokens/erc20.edge`) adds the solmate infinite-approval optimization: if `current_allowance == MAX_UINT`, the allowance is not decremented, saving a storage write. ::: *** ### Internal helpers Internal functions (no `pub`) are only callable from within the contract. #### `_transfer` ```edge fn _transfer(from: addr, to: addr, amount: u256) { let from_balance: u256 = balances[from]; balances[from] = from_balance - amount; balances[to] = balances[to] + amount; emit Transfer(from, to, amount); } ``` Subtraction reverts on underflow — no explicit balance check needed. Events are emitted with `emit EventName(args...)`. #### `_approve` ```edge fn _approve(owner: addr, spender: addr, amount: u256) { allowances[owner][spender] = amount; emit Approval(owner, spender, amount); } ``` #### `_mint` ```edge fn _mint(to: addr, amount: u256) { total_supply = total_supply + amount; balances[to] = balances[to] + amount; emit Transfer(0, to, amount); } ``` Mint is represented as a transfer from the zero address (`0`). #### `_burn` ```edge fn _burn(from: addr, amount: u256) { balances[from] = balances[from] - amount; total_supply = total_supply - amount; emit Transfer(from, 0, amount); } ``` Burn is represented as a transfer to the zero address (`0`). Underflow on `balances[from]` provides implicit balance enforcement. *** ### Complete contract The full minimal ERC-20 from `examples/erc20.edge`: ```edge event Transfer(indexed from: addr, indexed to: addr, amount: u256); event Approval(indexed owner: addr, indexed spender: addr, amount: u256); abi IERC20 { fn totalSupply() -> (u256); fn balanceOf(account: addr) -> (u256); fn transfer(to: addr, amount: u256) -> (bool); fn allowance(owner: addr, spender: addr) -> (u256); fn approve(spender: addr, amount: u256) -> (bool); fn transferFrom(from: addr, to: addr, amount: u256) -> (bool); } contract ERC20 { const DECIMALS: u8 = 18; let name: &s b32; let symbol: &s b32; let total_supply: &s u256; let balances: &s map; let allowances: &s map>; pub fn totalSupply() -> (u256) { return total_supply; } pub fn balanceOf(account: addr) -> (u256) { return balances[account]; } pub fn transfer(to: addr, amount: u256) -> (bool) { let from: addr = @caller(); _transfer(from, to, amount); return true; } pub fn allowance(owner: addr, spender: addr) -> (u256) { return allowances[owner][spender]; } pub fn approve(spender: addr, amount: u256) -> (bool) { let owner: addr = @caller(); _approve(owner, spender, amount); return true; } pub fn transferFrom(from: addr, to: addr, amount: u256) -> (bool) { let caller: addr = @caller(); let current_allowance: u256 = allowances[from][caller]; allowances[from][caller] = current_allowance - amount; _transfer(from, to, amount); return true; } fn _transfer(from: addr, to: addr, amount: u256) { let from_balance: u256 = balances[from]; balances[from] = from_balance - amount; balances[to] = balances[to] + amount; emit Transfer(from, to, amount); } fn _approve(owner: addr, spender: addr, amount: u256) { allowances[owner][spender] = amount; emit Approval(owner, spender, amount); } fn _mint(to: addr, amount: u256) { total_supply = total_supply + amount; balances[to] = balances[to] + amount; emit Transfer(0, to, amount); } fn _burn(from: addr, amount: u256) { balances[from] = balances[from] - amount; total_supply = total_supply - amount; emit Transfer(from, 0, amount); } } ``` *** ### Edge syntax summary | Feature | Edge syntax | Notes | | ----------------- | ----------------------------------------- | ----------------------------------- | | Storage field | `let x: &s T` | `&s` makes it persistent | | Mapping | `map` | Indexed with `map[key]` | | Nested mapping | `map>` | Accessed as `map[k1][k2]` | | Caller address | `@caller()` | Built-in context accessor | | Emit event | `emit Transfer(from, to, amount)` | Positional args match declaration | | Event declaration | `event Transfer(indexed from: addr, ...)` | `indexed` fields go into log topics | | ABI definition | `abi IERC20 { fn ... }` | Defines external call interface | | Public function | `pub fn name() -> (b32)` | Callable externally | | Internal function | `fn _transfer(...)` | No `pub`, contract-internal only | | Constant | `const DECIMALS: u8 = 18` | Compile-time, not stored | ## Syntax showcase The following are Edge language source code examples, organized by category. Full source files are in the [`examples/`](https://github.com/refcell/edge-rs/tree/main/examples) directory, with standard library modules in [`std/`](https://github.com/refcell/edge-rs/tree/main/std). ### Basics * [Basics](/specs/showcase/basics): Core Edge constructs — variables, functions, contracts, storage, types, operators * [ERC20](/specs/showcase/erc20): A complete ERC-20 token walkthrough ### Introductory examples These top-level example files cover the fundamentals of Edge syntax: | Example | What it covers | | --------------------------- | -------------------------------------------------------------------------------------------------------------------------------- | | `examples/counter.edge` | `abi`, `contract`, `&s` storage pointers, `pub fn` visibility | | `examples/erc20.edge` | `event`, `indexed`, `map` storage mappings, `emit` | | `examples/expressions.edge` | Arithmetic, comparison, and bitwise operators | | `examples/types.edge` | Primitive types (`u8`..`u256`, `i128`, `b32`, `addr`, `bool`, `bit`), data locations (`&s`, `&t`, `&m`), `type` aliases, `const` | | `examples/transient.edge` | Transient storage (`&t`) with EIP-1153 TLOAD/TSTORE | ### Token standards Full and partial implementations of ERC token standards: | File | Standard | | ------------------------------ | ------------------------------------------------------ | | `examples/tokens/erc20.edge` | ERC-20 airdrop contract with `mod`/`use` imports | | `examples/tokens/erc721.edge` | ERC-721 NFT collection with stdlib trait usage | | `examples/tokens/erc4626.edge` | ERC-4626 tokenized vault with multi-module composition | | `std/tokens/erc20.edge` | Full ERC-20 reference implementation (solmate pattern) | | `std/tokens/erc721.edge` | ERC-721 base contract with `abi`/`trait` definitions | | `std/tokens/erc1155.edge` | ERC-1155 multi-token standard | | `std/tokens/weth.edge` | Wrapped ETH with `@caller()`, `@callvalue()` builtins | ### Library primitives Foundational modules imported by other examples: | File | Purpose | | ------------------------------- | -------------------------------------------------------------------- | | `std/math.edge` | WAD/RAY fixed-point math, safe arithmetic | | `std/auth.edge` | Ownership traits (`IOwned`, `IAuth`) and contracts (`Owned`, `Auth`) | | `std/tokens/safe_transfer.edge` | Safe ERC-20 and ETH transfer helpers | ### Utility libraries Stateless utility functions: | File | Purpose | | ----------------------- | ----------------------------------------------------------- | | `std/utils/merkle.edge` | Merkle proof verification with fixed arrays and bitwise ops | | `std/utils/bits.edge` | Bit manipulation: popcount, leading/trailing zeros | | `std/utils/bytes.edge` | Bytes32 utilities: address extraction, packing, masking | ### Access control | File | Pattern | | -------------------------- | ----------------------------------------------------------------- | | `std/access/ownable.edge` | Single-owner with 2-step transfer, `trait`/`impl` | | `std/access/roles.edge` | Role-based access control with nested `map>` | | `std/access/pausable.edge` | Pausable pattern with `bool` state machine | ### Finance / DeFi | File | Pattern | | --------------------------- | ------------------------------------------ | | `std/finance/amm.edge` | Constant product AMM (x · y = k) | | `std/finance/staking.edge` | ERC-20 staking with per-second rewards | | `std/finance/multisig.edge` | N-of-M multisig with sum types and `match` | ### Design patterns | File | Pattern | | ------------------------------------ | --------------------------------------------------- | | `std/patterns/reentrancy_guard.edge` | Reentrancy protection with `&t` transient storage | | `std/patterns/timelock.edge` | Time-locked operations with sum types carrying data | | `std/patterns/factory.edge` | CREATE2 deterministic deployment factory | ### Type system deep dives Dedicated examples for each major type system feature: | File | Feature | | ------------------------------ | --------------------------------------------------------------- | | `examples/types/structs.edge` | Product types: structs, packed structs, tuples, generic structs | | `examples/types/enums.edge` | Sum types: enums, unions with data, `Option`, `Result` | | `examples/types/generics.edge` | Generics, trait constraints, monomorphization | | `examples/types/arrays.edge` | Fixed arrays, packed arrays, slices, iteration | | `examples/types/comptime.edge` | Compile-time evaluation: `const`, `comptime fn` | ## Codesize & optimization This document details Edge's optimization pipeline. The compiler transforms source programs through a multi-stage IR optimization pipeline before generating EVM bytecode. ### Optimization levels Edge supports four optimization levels controlled via `--opt-level N` (default: 0): | Level | Description | Egglog iterations | Rulesets | | ----- | ---------------------- | -------------------- | ----------------------------------------------------------------------------------- | | O0 | No equality saturation | 0 (skipped entirely) | Only pre-egglog `var_opt` passes + store forwarding | | O1 | Fast, safe only | 3 | peepholes, u256-const-fold, const-prop, dead-code, range-analysis, type-propagation | | O2 | Full suite | 5 | O1 rulesets + arithmetic-opt, storage-opt, memory-opt, cse-rules | | O3 | Aggressive | 10 | Same rulesets as O2 | At O0, the program still benefits from pre-egglog variable optimization (dead variable elimination, store forwarding, constant propagation) but equality saturation is skipped entirely for maximum compilation speed. ### Gas vs size optimization Pass `--optimize-for gas` (default) or `--optimize-for size` to control what the optimizer minimizes: * **Gas mode** — Each IR node is weighted by its EVM gas cost. Key costs: | Node | Gas cost | Note | | ------------------------------------------------------------------------------- | --------- | --------------------------------------- | | `ADD`, `SUB`, `LT`, `GT`, `EQ`, `AND`, `OR`, `XOR`, `SHL`, `SHR`, `SAR`, `BYTE` | 3 | W\_verylow | | `MUL`, `DIV`, `SDIV`, `MOD`, `SMOD` | 5 | W\_low | | `EXP` | 60 | 10 + \~50/byte | | `CheckedAdd`, `CheckedSub` | 20 | Higher than unchecked to prefer elision | | `CheckedMul` | 30 | Higher than unchecked to prefer elision | | `SLOAD` | 2100 | Warm SLOAD | | `SSTORE` | 5000 | | | `TLOAD`, `TSTORE` | 100 | EIP-1153 | | `MLOAD`, `MSTORE`, `CalldataLoad` | 3 | | | `Call` (internal) | 1,000,000 | Forces extractor to prefer inlined body | | `LOG` | 375 | | | `ExtCall` | 100 | | | `LetBind` | 3 | MSTORE cost | | `Var` | 3 | MLOAD cost | | `VarStore` | 6 | PUSH offset + MSTORE | The optimizer selects the equivalent program form with the lowest total gas. * **Size mode** — Every IR node costs 1, regardless of opcode. The optimizer minimizes instruction count, which reduces deployment cost and helps stay within the EVM's 24 KB contract size limit. Both modes use egglog's `TreeAdditiveCostModel` extractor, which picks the cheapest equivalent program discovered during equality saturation. ### Pipeline stages #### Stage 1: Pre-egglog variable optimization (`var_opt`) Runs before equality saturation. Performs tree-level transforms that require occurrence counting — something egglog pattern matching cannot express directly. Transforms are applied bottom-up in a single traversal: | Transform | Condition | Effect | | ------------------------- | --------------------------------------------------- | -------------------------------------------------------------------------- | | Dead variable elimination | `reads=0, writes=0` | Removes `LetBind`; preserves side effects via `Concat` | | Single-use inlining | `reads=1, writes=0, not in loop, pure init` | Substitutes init at the single read site, drops `LetBind` | | Last-store forwarding | `reads=1, writes=1, not in loop` | Forwards `VarStore` value directly to the `Var` read site, eliminates both | | Constant propagation | `writes=0, not in loop, init is a literal constant` | Substitutes the constant at all read sites, drops `LetBind` | | Function inlining | O1+ only | Substitutes function body at each call site, renames locals, recurses | | Early drop insertion | Always (post-optimize) | Inserts `Drop` markers before halting branches for unused variables | **Function inlining** at O1+ (`inline_calls`): for each `Call("f", args)`, the compiler looks up the function body, substitutes actual arguments for formal parameters, renames all local variables with a unique suffix to avoid collisions, and recursively inlines nested calls in the substituted body. This eliminates `Call` nodes from the IR tree before egglog runs, enabling subsequent dead-variable elimination and constant propagation across inlined boundaries. #### Stage 2: Storage LICM — loop invariant code motion After `var_opt` and before egglog, `storage_hoist` hoists storage accesses with constant slot indices out of loop bodies: 1. Emits `let $var = SLOAD(slot)` before the loop. 2. Replaces `SLOAD(slot)` → `$var` and `SSTORE(slot, val)` → `$var = val` inside the loop. 3. Emits `SSTORE(slot, $var)` write-back after the loop (only for slots that were written). This eliminates repeated expensive `SLOAD`/`SSTORE` opcodes (2100 and 5000 gas respectively) in hot loops. Both persistent storage (`SLOAD`/`SSTORE`) and transient storage (`TLOAD`/`TSTORE`) are hoisted. Loops containing external calls or nested loops are not hoisted, as they may access storage via unknown aliases. #### Stage 3: Equality saturation (egglog) The core optimization engine. The IR is serialized to s-expressions and submitted to an egglog e-graph along with \~330 rewrite rules across 12 files: | Rule file | Ruleset | Rules | Purpose | | ------------------------ | ------------------ | ----- | ------------------------------------------------------------------------------------------------ | | `peepholes.egg` | `peepholes` | 52 | Algebraic identities (x+0=x, x×1=x, double-negation, etc.) | | `arithmetic.egg` | `arithmetic-opt` | 31 | Strength reductions, shift/mask patterns | | `storage.egg` | `storage-opt` | 16 | SStore→SLoad forwarding, cross-slot rules (state-threaded) | | `memory.egg` | `memory-opt` | 6 | Memory read/write simplifications | | `dead_code.egg` | `dead-code` | 45 | Dead code and unreachable branch elimination | | `range_analysis.egg` | `range-analysis` | 64 | Min/max bound propagation for U256 values | | `u256_const_fold.egg` | `u256-const-fold` | 28 | Constant folding using full 256-bit arithmetic | | `type_propagation.egg` | `type-propagation` | 59 | Type information propagation (analysis rules) | | `checked_arithmetic.egg` | `range-analysis` | 27 | Elides `CheckedAdd`/`CheckedSub`/`CheckedMul` → unchecked when range analysis proves no overflow | | `cse.egg` | `cse-rules` | 0 | See note below | | `inline.egg` | `peepholes` | 1 | `Call(name, args) + Function(name, ...)` → substituted body | | `const_prop.egg` | `const-prop` | 1 | Constant propagation through `LetBind`/`Var` chains | :::note **`cse.egg` contains 0 rewrite rules.** Common subexpression elimination is achieved for free via e-graph hash-consing: structurally identical expressions automatically share the same e-class, so CSE requires no explicit rules. The file exists as a named placeholder in the ruleset schedule. ::: Rulesets are run in an analysis-first schedule: cheap analysis rulesets (`dead-code`, `range-analysis`) saturate first so their facts are available to guarded rewrite rules in `peepholes`, `arithmetic`, and `checked-arithmetic`. **Call node costs:** In the egglog cost model, `Call` nodes are assigned a cost of 1,000,000. This astronomically high cost ensures the extractor always prefers the inlined function body form over the `Call` form when both are equivalent, making function inlining effectively unconditional at O1+. **Immutable variable facts:** Variables that are never mutated (no `VarStore`) are declared as `(ImmutableVar "name")` facts before egglog runs. This allows the `const-prop` ruleset to safely propagate their values through `LetBind`/`Var` chains without worrying about mutation aliasing. #### Stage 4: Post-egglog passes After egglog extraction, additional passes run: * **Cleanup** — Simplifies state parameter chains (which become bloated during egglog) back to a sentinel placeholder, since codegen does not use state parameters for ordering. Also eliminates dead code after halting instructions (`RETURN`, `REVERT`) in `Concat` chains. * **Store forwarding** — Propagates `SSTORE` values forward to subsequent `SLOAD`s of the same constant slot in straight-line `Concat` chains and eliminates dead intermediate stores. At O0 (when egglog is skipped), this is the only storage optimization that runs. At O1+, egglog's `storage-opt` ruleset handles equivalent optimizations during equality saturation, and this pass handles remaining cases in the post-egglog IR. * **Dead function elimination** — After the runtime is optimized, the compiler collects all `Call` names still present in the runtime (transitively) and discards any internal functions no longer referenced. Each surviving function is optimized independently through egglog. ### Dead code elimination Dead code elimination is handled at multiple stages: * **Egglog `dead-code` ruleset** — Eliminates unreachable branches during equality saturation (e.g., `if true { A } else { B }` → `A`). * **`var_opt` dead variable pass** — Removes `LetBind` nodes whose variable is never read or written. * **Post-egglog cleanup** — Removes instructions following a halting operation (`RETURN`/`REVERT`) in a `Concat` chain. * **Dead function elimination** — Removes internal functions unreachable from the contract's runtime after inlining. ### Checked arithmetic `CheckedAdd`, `CheckedSub`, and `CheckedMul` compile to arithmetic operations that `REVERT` on overflow. At O1+, the `checked-arithmetic` egglog ruleset uses range analysis to elide these checks: when bounds propagation proves that no overflow is possible (e.g., both operands are statically bounded below their respective overflow thresholds), the checked operation is rewritten to a plain `Add`/`Sub`/`Mul`. This eliminates the overhead of the overflow check while preserving safety guarantees. ### Bytecode peephole optimizer In addition to IR-level optimization, the Edge compiler applies an egglog-based peephole optimizer at the bytecode level. This pass operates on basic blocks of generated EVM bytecode and applies 66 rewrite rules across four rulesets: | Ruleset | Rules | Examples | | ----------------------- | ----- | --------------------------------------------------- | | `bytecode-peepholes` | 15 | DUP deduplication, SWAP cancellation, commutativity | | `bytecode-const-fold` | 10 | `PUSH i, PUSH j, ADD` → `PUSH (i+j)` | | `bytecode-strength-red` | 38 | Identity elimination, MUL→SHL, DIV→SHR, MOD→AND | | `bytecode-dead-push` | 3 | `PUSH x, POP` → ε | ## Namespaces A namespace contains valid identifiers for items that may be used. Edge uses a hierarchical module-based namespace system. ### Module namespaces Each file is implicitly a module, forming a namespace for all items declared within it. Items in a module are accessed using `::` path syntax: ```edge // Access an item from another module super::moduleA::TypeA ``` Items must be explicitly imported into the current scope with `use` before they can be referenced by their short name. See [Scoping](/specs/semantics/scoping) for details on how `use`, `super::`, and `pub use` bring items into scope. ### Item namespaces The following kinds of items occupy the module namespace: * **Types** — declared with `type` * **Functions** — declared with `fn` * **Constants** — declared with `const` * **Traits** — declared with `trait` * **Implementation blocks** — declared with `impl` * **Submodules** — declared with `mod` ### Storage field namespaces Contract storage fields are declared with `let` and a data location annotation, mapping a field name to a sequential storage slot and a type: ```edge contract Token { let balance: &s u256; let owner: &s addr; } ``` Storage field names occupy a separate namespace from local variables and functions. At the IR level, each storage field is represented as a `StorageField(name, slot_index, type)` node. Name resolution maps the source field name to its concrete slot index at compile time (slots are assigned sequentially starting at 0). ### Name resolution at the IR level By the time source code is lowered to the Edge IR, all names are fully resolved: * Function calls are represented as `Call("fully_resolved_name", args)`. * Local variables are represented as `LetBind("unique_name", ...)` and `Var("unique_name")`. * Storage fields are represented as `StorageField("name", slot_index, type)`. The IR uses plain strings for all names. Name uniqueness (preventing collisions between locals in different scopes or after function inlining) is the responsibility of the frontend lowering pass, which renames variables as needed. When function inlining runs (at optimization level O1+), local variable names in inlined function bodies are renamed with a unique suffix (e.g., `_s0`, `_s1`) to prevent collisions with names at the call site. :::note The full cross-module name resolution rules and the interaction of namespaces with `pub use` re-exports are still being expanded in the specification. ::: ## Semantics The semantics section covers language features that are not tied to specific syntax constructs — general compiler behaviors, name resolution, and access control. * [Codesize & optimization](/specs/semantics/codesize) * [Namespaces](/specs/semantics/namespaces) * [Scoping](/specs/semantics/scoping) * [Visibility](/specs/semantics/visibility) ### Optimization overview Edge compiles through a functional IR (intermediate representation) that is optimized before code generation. The optimization pipeline has three phases: 1. **Pre-egglog (`var_opt` + storage LICM)** — Tree-level transforms that require occurrence counting: dead variable elimination, single-use inlining, last-store forwarding, constant propagation, function inlining (O1+), and early drop insertion. **Storage LICM** (`storage_hoist`) hoists loop-invariant `SLOAD`/`SSTORE` accesses out of loop bodies *before* egglog runs. 2. **Equality saturation (egglog)** — The core optimization engine. Applies \~333 rewrite rules across 12 rule files in an iterative schedule determined by the optimization level (O0–O3). Discovers algebraic equivalences, eliminates dead code, folds 256-bit constants, propagates types and value ranges, eliminates redundant storage accesses, and elides checked arithmetic when provably safe. 3. **Post-egglog** — State-chain cleanup, dead code after halting instructions, store forwarding (O0 only), and dead function elimination. The compiler supports two cost models: `--optimize-for gas` (default) minimizes EVM execution cost; `--optimize-for size` minimizes instruction count. See [Codesize & optimization](/specs/semantics/codesize) for full details. ## Scoping Items are brought into scope by import or declaration. ### Module The module scope contains items explicitly imported from another scope or explicitly declared in the current module scope. Items may be accessed directly by their identifier with no other annotations. Files are implicitly modules. :::warning The examples below use `mod`, `pub use`, and path syntax (`super::moduleA::TypeA`) to illustrate scoping concepts. These features are **planned but not yet implemented** in the parser — see [Modules](/specs/syntax/modules) for details. ::: ```edge mod moduleA { // `TypeA` declared. type TypeA = u8; // `TypeA` may be accessed as follows: const CONST_A: TypeA = 0u8; } mod moduleB { // import `TypeA` into the local module scope use super::moduleA::TypeA; // `TypeA` may now be accessed as follows: const CONST_A: TypeA = 0u8; } mod moduleC { // publicly import `TypeA` into the local module scope. "pub" enables exporting. pub use super::moduleA::TypeA; } mod moduleD { // publicly import `moduleA` into the local module scope. "pub" enables exporting. pub use super::moduleA; } mod moduleF { // `TypeA` may be accessed in one of the following ways. const CONST_A: super::moduleA::TypeA = 0u8; const CONST_B: super::moduleC::TypeA = 0u8; const CONST_C: super::moduleD::moduleA::TypeA = 0u8; } ``` ### Implementation The implementation block scope contains items explicitly imported from another scope or explicitly declared in the current implementation block scope. Items may be accessed either directly or under the Self namespace. ```edge type MyStruct = { inner: T }; type MyError = Overflow | Underflow; trait TryPlusOne: Add { type Error; fn tryPlusOne(self: Self) -> Result; } impl MyStruct { fn new(inner: T) -> Self { return Self { inner: T }; } } impl MyStruct: Add { fn add(lhs: Self, rhs: Self) -> Self { return Self { inner: lhs.inner + rhs.inner }; } } impl MyStruct: TryPlusOne { type Error = MyError; fn tryPlusOne(self: Self) -> Result { if self.inner > max() - 1 { return Result::Err(Error::Overflow); } return Add::add(self, Self { inner: 1 }); } } ``` ### Function The function scope implicitly imports items from parent scopes up to the parent module. Items may be explicitly declared or imported from external modules. ```edge mod moduleA { const CONST_A = 0u8; const CONST_B = 1u8; const CONST_C = 2u8; } use moduleA::CONST_A; const CONST_D = 3u8; fn func() -> u8 { use moduleA::CONST_B; fn innerFunc() -> u8 { return CONST_A + CONST_B + moduleA::CONST_C + CONST_D; } return innerFunc(); } ``` ### Blocks Code blocks, branch blocks, loop blocks, and match blocks implicitly import items from the parent scopes up until the parent module. Items may be imported from external modules explicitly and items may be defined in each. ### IR lowering and name uniqueness When source code is lowered to the Edge IR, all scoped names are resolved to unique flat strings. The IR uses plain string identifiers for all `LetBind` variables, `Function` definitions, and `Call` targets — there is no nested scope structure in the IR. The frontend lowering pass is responsible for: * Resolving module paths (`super::moduleA::TypeA`) to their canonical names. * Ensuring local variables declared in different scopes (or after function inlining) have unique names by appending suffixes as needed. * Lowering inner functions (e.g., `fn innerFunc()` declared inside `fn func()`) to top-level named `Function` nodes in the IR. At optimization level O1+, the inliner renames all local variables in inlined function bodies (appending a unique suffix like `_s0`) to prevent collisions with variables at the call site. See [Visibility](/specs/semantics/visibility) for how `pub`, `pub ext`, and `pub mut` interact with scope boundaries and EVM dispatch. ## Visibility Visibility controls which items are accessible from outside their declaring scope. Edge has four visibility levels: | Modifier | Visibility | | --------- | ---------------------------------------------------------------------------------------------------------------------------------- | | *(none)* | **Private** — accessible only within the declaring scope (module or implementation block) | | `pub` | **Public** — accessible from sibling modules; inside a `contract` block, also generates a dispatch entry (equivalent to `pub ext`) | | `pub ext` | **External** — explicitly callable via EVM ABI dispatch; generates a function selector entry in the contract's dispatch table | | `pub mut` | **Mutable public** — accessible externally with write (state-mutating) permissions | ### Private (default) Items with no visibility modifier are private to their declaring module or implementation block. They are not accessible from other modules and are not exported in any ABI. ```edge fn helperFunction() -> u256 { // Only accessible within this module return 42; } ``` ### `pub` — Module-public The `pub` modifier makes an item accessible from other modules. It can also be used with `use` to re-export an imported item: ```edge mod moduleA { pub type TypeA = u256; } mod moduleB { // Re-export TypeA so it is accessible as moduleB::TypeA pub use super::moduleA::TypeA; } ``` ### `pub ext` — External (ABI-visible) The `pub ext` modifier marks a function as externally callable via the EVM ABI. The compiler generates a 4-byte function selector (keccak256 of the function signature) and adds a dispatch entry in the contract's runtime that routes incoming calls to this function. ```edge contract Token { let balance: &s u256; pub ext fn transfer(to: addr, amount: u256) -> bool { // Callable by anyone via EVM CALL return true; } } ``` Both `pub` and `pub ext` functions receive selector entries in the dispatch table. Inside a `contract` block, `pub fn` implicitly creates a dispatch entry (equivalent to `pub ext fn`). Private functions (no modifier) are not reachable from outside the contract unless inlined. :::warning The `pub ext` modifier is parsed and used for dispatch table generation, but full ABI metadata emission (JSON ABI) is not yet implemented. The modifier correctly generates selector entries and calldata decoding in the compiled bytecode. ::: ### `pub mut` — Mutable public The `pub mut` modifier marks a function as externally callable with permission to mutate contract state. This is a subtype of external visibility that additionally signals state-mutating intent, which is reflected in the ABI and affects tooling (e.g., transaction simulation, static analysis). ```edge contract Owned { let owner: &s addr; pub mut fn setOwner(newOwner: addr) { // State-mutating external function } } ``` :::warning The `pub mut` modifier is parsed and treated as an external function for dispatch purposes. Distinguishing `pub ext` (view) from `pub mut` (mutating) in the generated ABI metadata is not yet fully implemented — both currently generate dispatch entries with identical codegen behavior. ::: ### Visibility and EVM dispatch At the IR level, the contract runtime is structured as a dispatcher that reads the first 4 bytes of calldata (the function selector), matches it against registered selectors, and jumps to the corresponding function body. Only `pub`, `pub ext`, and `pub mut` functions all generate selector entries. Private functions (no modifier) are internal and may be inlined by the optimizer. The dispatch strategy depends on the number of public functions: | Public functions | Strategy | | ---------------- | -------------------------------------- | | \< 4 | Linear if-else chain — O(N) | | ≥ 4 | Balanced binary search tree — O(log N) | :::note The complete visibility rules, including interaction with trait implementations and cross-contract calls, are still being finalized in the specification. ::: ## Contributing This document covers contribution guidelines for [edge-rs](https://github.com/refcell/edge-rs). ### Issues Search [existing issues](https://github.com/refcell/edge-rs/issues) before opening a new one. When reporting a bug, include the `edgec --version` output, the source file that triggered the problem, and the full error output. ### Pull requests 1. Fork the repository and create a branch from `main`. 2. Build the project: `just build` (or `cargo build --workspace`). 3. Run the test suite before submitting: `just test` (or `cargo test --workspace`). 4. Run the linter: `just lint`. 5. Open a pull request against `main` with a clear description of the change. ### Development workflow The repository includes a [`Justfile`](https://github.com/casey/just) with common workflows: ```bash just build # build all crates just test # run all tests just lint # run all lints (format, clippy, deny, docs) just e2e # run end-to-end tests just bench # run benchmarks just check-examples # parse all example contracts just check-stdlib # parse all stdlib contracts just docs # serve the documentation site locally just docs-build # build the documentation site ``` :::note `just lint` runs four checks: `check-format` (requires `cargo +nightly fmt`), `check-clippy`, `check-deny`, and `check-docs`. All four must pass for CI to be green. ::: ### Crate structure The compiler is organized into these crates: | Crate | Path | Purpose | | ------------------ | --------------------- | --------------------------------- | | `edge-lexer` | `crates/lexer/` | Tokenization | | `edge-parser` | `crates/parser/` | Parsing | | `edge-ast` | `crates/ast/` | AST types | | `edge-types` | `crates/types/` | Shared type definitions | | `edge-typeck` | `crates/typeck/` | Type checking | | `edge-diagnostics` | `crates/diagnostics/` | Error reporting | | `edge-ir` | `crates/ir/` | IR lowering + egglog optimization | | `edge-codegen` | `crates/codegen/` | Bytecode generation + optimizer | | `edge-driver` | `crates/driver/` | Pipeline orchestration | | `edge-lsp` | `crates/lsp/` | Language server | | `edge-evm-tests` | `crates/evm-tests/` | EVM test host | | `edge-bench` | `crates/bench/` | Benchmarks | | `edge-e2e` | `crates/e2e/` | End-to-end tests | For a detailed walkthrough of the compiler pipeline, see [Compiler Architecture](/compiler/overview). ### Labels Issues and PRs are tagged to indicate their status and area: * **bug** — something is broken * **enhancement** — new feature or improvement * **documentation** — docs-only changes * **good first issue** — suitable for first-time contributors ### Assistance For questions or discussion, open an issue on [GitHub](https://github.com/refcell/edge-rs/issues) or see the [Contact](/contact/contact) page for other ways to reach the team. ## Contact If none of the below methods work, reach out to [refcell](https://github.com/refcell) on [x.com @ andreaslbigger](https://x.com/andreaslbigger). ### Telegram channel :::note Telegram details have not been published in this repository yet. Until they are, use GitHub issues or X for project contact. ::: ### Opening an issue For bugs, compiler regressions, documentation fixes, or feature requests, open an issue in the [edge-rs issue tracker](https://github.com/refcell/edge-rs/issues). ## Compiler architecture Edge compiles `.edge` source files into EVM bytecode through a multi-stage pipeline. The compiler uses **egglog** (equality saturation) as its core optimization framework at two distinct stages: once on the high-level IR and once on the emitted bytecode. ### Pipeline overview ```text Source (.edge) │ ▼ ┌──────────┐ │ Lexer │ crates/lexer/ └────┬─────┘ │ Vec ▼ ┌──────────┐ │ Parser │ crates/parser/ └────┬─────┘ │ Program (AST) ▼ ┌───────────────────┐ │ Import Resolution │ crates/driver/ (Compiler::resolve_imports) └────┬──────────────┘ │ Program (AST, stdlib items merged) ▼ ┌────────────┐ │ Type Check │ crates/typeck/ └────┬───────┘ │ (checked marker — errors gate pipeline, result discarded) ▼ ┌────────────────┐ │ AST → IR Lower │ crates/ir/src/to_egglog.rs └────┬───────────┘ │ EvmProgram (RcExpr trees) ▼ ┌───────────────────┐ │ Rust Pre-Passes │ var_opt, storage_hoist └────┬──────────────┘ │ EvmProgram (optimized) ▼ ┌──────────────────────┐ │ Egglog EqSat (IR) │ crates/ir/src/optimizations/*.egg └────┬─────────────────┘ [skipped at O0] │ EvmProgram (extracted best-cost) ▼ ┌─────────┐ │ Cleanup │ crates/ir/src/cleanup.rs └────┬────┘ │ EvmProgram (simplified state params) ▼ ┌──────────────────────┐ │ ASM Emit (optional) │ crates/codegen/ [if --emit=asm, exits here] └────┬─────────────────┘ │ EvmProgram ▼ ┌────────────────┐ │ Expr Compiler │ crates/codegen/src/expr_compiler.rs └────┬───────────┘ │ Vec ▼ ┌──────────────────────────┐ │ Egglog EqSat (Bytecode) │ crates/codegen/src/bytecode_opt/ └────┬─────────────────────┘ [skipped at O0] │ Vec (optimized) ▼ ┌───────────────────────┐ │ Subroutine Extraction │ crates/codegen/src/subroutine_extract.rs └────┬──────────────────┘ [size mode only, O2+] │ Vec ▼ ┌───────────┐ │ Assembler │ crates/codegen/src/assembler.rs └────┬──────┘ │ Vec (runtime bytecode) ▼ ┌─────────────────────┐ │ Constructor Wrapper │ crates/codegen/src/contract.rs └────┬────────────────┘ │ Vec (deployment bytecode) ▼ Final Output ``` ### Crate map | Crate | Path | Purpose | | ------------------ | --------------------- | ----------------------------------------------------------- | | `edge-lexer` | `crates/lexer/` | Tokenization with context-sensitive disambiguation | | `edge-parser` | `crates/parser/` | Recursive descent + Pratt expression parsing | | `edge-ast` | `crates/ast/` | AST type definitions | | `edge-types` | `crates/types/` | Shared type definitions (tokens, spans, literals) | | `edge-typeck` | `crates/typeck/` | Type checking, storage layout, selector generation | | `edge-diagnostics` | `crates/diagnostics/` | Error reporting infrastructure | | `edge-ir` | `crates/ir/` | Egglog-based IR: lowering, optimization rules, extraction | | `edge-codegen` | `crates/codegen/` | EVM bytecode generation, bytecode optimizer, assembler | | `edge-driver` | `crates/driver/` | Pipeline orchestration, compiler config, session management | | `edge-lsp` | `crates/lsp/` | LSP server: parse + type check diagnostics with spans | | `edge-evm-tests` | `crates/evm-tests/` | EVM test host (revm-based) | | `edge-bench` | `crates/bench/` | Benchmarks | | `edge-e2e` | `crates/e2e/` | End-to-end tests | | `edgec` | `bin/edgec/` | CLI binary | | `edgeup` | `bin/edgeup/` | Installer and version manager binary | *** ### Stage 1: Lexing **File:** `crates/lexer/src/lexer.rs` Converts source text into tokens. The lexer tracks a `Context` (Global vs Contract) to disambiguate EVM type names from opcode names (e.g., `bytes32` as a type vs `byte` as an opcode). * Hex literal parsing (`0x`/`0b`) requires the `'0'` match arm to come before the generic digit arm * Lookback token enables context-sensitive tokenization * Outputs `Iterator>` ### Stage 2: Parsing **File:** `crates/parser/src/parser.rs` Recursive descent parser with Pratt parsing for operator precedence. Eagerly lexes all tokens into a `Vec` (dropping whitespace/comments) for O(1) random access. Key design decisions: * SHL/SHR operand swap happens at parse time: `a << b` becomes `Bop(Shl, b, a)` to match EVM stack order * Produces `Program { stmts: Vec, span }` ### Stage 3: Type checking **File:** `crates/typeck/src/checker.rs` Walks the AST, resolves types, computes storage layouts (sequential slot assignment), and generates 4-byte function selectors via `keccak256("name(type1,type2,...)")`. Validates types and computes storage layouts and selectors. Errors gate the pipeline; no typed AST is threaded into later stages. ### Stage 4: AST → IR lowering **File:** `crates/ir/src/to_egglog.rs` (\~1700 lines) The most complex stage. Converts `edge_ast::Program` into `EvmProgram`, an IR designed for egglog equality saturation. #### Core IR type: `EvmExpr` The IR is a tree of `EvmExpr` nodes (25 variants), reference-counted as `RcExpr = Rc`: | Category | Variants | | ---------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Leaf / Constants | `Const(EvmConstant, EvmType, EvmContext)`, `Selector(String)`, `StorageField(String, usize, EvmType)` | | Leaf / Arguments | `Arg(EvmType, EvmContext)`, `Empty(EvmType, EvmContext)` | | Variables | `LetBind(name, init, body)`, `Var(name)`, `VarStore(name, val)`, `Drop(name)` | | Operators | `Bop(op, lhs, rhs)`, `Uop(op, arg)`, `Top(op, a, b, c)` — binary ops include `Add`, `Sub`, `Mul`, `Div`, `Mod`, `Exp`, `And`, `Or`, `Xor`, `Shl`, `Shr`, `Eq`, `Lt`, `Gt`, `LogAnd`, `LogOr`, and storage/mem reads; unary ops include `IsZero`, `Not`, `Clz`; ternary ops include `Select`, `CalldataCopy`, and storage/mem writes | | Control flow | `If(pred, inputs, then_body, else_body)`, `DoWhile(inputs, pred_and_body)` | | Sequencing | `Concat(first, second)`, `Get(tuple, index)` | | EVM environment | `EnvRead(EvmEnvOp, state)`, `EnvRead1(EvmEnvOp, arg, state)` | | Functions | `Function(name, in_type, out_type, body)`, `Call(name, args)`, `InlineAsm(inputs, hex_ops, num_outputs)` | | Effects | `Log(n, topics, offset, size, state)`, `Revert(offset, size, state)`, `ReturnOp(offset, size, state)`, `ExtCall(...)` | #### `EvmContext` — context-sensitive optimization Leaf nodes (`Const`, `Arg`, `Empty`) carry an `EvmContext` parameter that tells egglog where the expression lives. This enables context-sensitive rewrites without requiring a separate pass. | Variant | Meaning | | ----------------------------- | --------------------------------------------------------------- | | `InFunction(String)` | Inside a named function body | | `InBranch(bool, pred, input)` | Inside a conditional branch (true/false side, predicate, input) | | `InLoop(input, pred_output)` | Inside a do-while loop body | #### Key design decisions * **State threading**: IR uses explicit `StateT` tokens for side-effect ordering. Codegen ignores state parameters entirely, relying on `Concat` sequencing instead. * **Memory-backed variables**: `VarDecl` creates `LetBind`/`Var`/`VarStore`/`Drop` nodes. `LetBind(name, init, body)` allocates, `Var(name)` reads, `VarStore(name, val)` writes, `Drop(name)` marks lifetime end. * **Store-forwarding at source**: First `VarStore` init is extracted directly into `LetBind` when safe, avoiding the `LetBind(x, 0, Concat(VarStore(x, real), ...))` pattern. * **Function inlining**: Functions are inlined at call sites. `inline_depth` counter ensures `return` inside inlined functions produces just the value (not `RETURN` opcode). * **Checked arithmetic**: User `+`, `-`, `*` lower to `OpCheckedAdd`/`OpCheckedSub`/`OpCheckedMul`. Internal compiler arithmetic (mapping slots, memory offsets) uses unchecked ops. * **Mapping slots**: `keccak256(key . base_slot)` — MSTORE key at offset 0, MSTORE slot at offset 32, KECCAK256(0, 64). * **DoWhile ordering**: `pred_and_body = Concat(body, cond)` — body side effects run before condition re-evaluation. ### Stage 5: Rust pre-passes Two Rust passes run before egglog, handling transforms that pattern matching cannot express: #### 5a: Variable optimization (`crates/ir/src/var_opt.rs`) Counting-based transforms requiring occurrence analysis: | Transform | Description | | ------------------------- | ------------------------------------------------------------------------------------- | | Dead variable elimination | Remove `LetBind(x, pure_init, body)` where `x` has 0 reads | | Single-use inlining | Replace `Var(x)` with init value when read\_count == 1, not in loop | | Constant propagation | Propagate constant inits through multi-use variables | | Allocation analysis | Decide stack vs memory mode per variable | | Early Drop insertion | Insert `Drop(var)` before RETURN/REVERT in branches that don't reference the variable | | Immutable var collection | Emit `ImmutableVar` facts for egglog bound propagation | **Allocation modes:** * **Stack** (DUP-based, 3 gas/read): `write_count == 0`, not in loop, `read_count <= 8` * **Memory** (MSTORE/MLOAD, 6 gas each): everything else * Drop-based free-list reclaims memory slots #### 5b: Storage hoisting (`crates/ir/src/storage_hoist.rs`) LICM (Loop-Invariant Code Motion) for `SLoad`/`SStore` in loops: 1. Identifies constant-slot storage ops inside `DoWhile` bodies 2. Hoists them into `LetBind` locals before the loop 3. Replaces `SLoad`/`SStore` with `Var`/`VarStore` inside the loop body 4. Emits write-backs after loop exit This pass is critical because it prevents egglog's storage forwarding rules from firing unsoundly across loop back-edges. Also performs straight-line `SStore` → `SLoad` forwarding and dead store elimination. **Bail conditions:** `ExtCall` in loop body, nested loops. ### Stage 6: Egglog equality saturation (stage 1 — IR level) **Files:** `crates/ir/src/optimizations/*.egg`, `crates/ir/src/schedule.rs`, `crates/ir/src/costs.rs` The IR is serialized to S-expressions (`sexp.rs`), fed into an egglog e-graph with 330 optimization rules across 12 rule files, and the best-cost result is extracted. #### Schedule by optimization level | Level | Schedule | | ----- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | O0 | Skip egglog entirely | | O1 | `saturate(dead-code + range-analysis + type-propagation)` once, then 3× `peepholes → u256-const-fold → saturate(const-prop + u256-const-fold) → saturate(dead-code + range-analysis + type-propagation)` | | O2 | `saturate(dead-code + range-analysis + type-propagation)` once, then 5× `peepholes → arithmetic-opt → u256-const-fold → saturate(const-prop + u256-const-fold) → storage-opt → memory-opt → saturate(dead-code + range-analysis + type-propagation) → cse-rules` | | O3+ | Same as O2 but 10× iterations | #### Cost model Two modes controlled by `--optimize-for`: | Expression | Gas cost | Size cost | | ----------------------------------- | -------- | --------- | | Cheap arith (ADD, SUB, LT, GT, ...) | 3 | 1 | | Expensive arith (MUL, DIV, ...) | 5 | 1 | | CheckedAdd/CheckedSub | 20 | 1 | | CheckedMul | 30 | 1 | | SLoad | 2100 | 1 | | SStore | 5000 | 1 | | TLoad/TStore | 100 | 1 | | MLoad/MStore | 3 | 1 | | Keccak256 | 36 | 1 | | Const/Selector | 3 | 1 | | LetBind | 3 | 1 | | Var | 3 | 1 | | ExtCall | 100 | 1 | | Log | 375 | 1 | | If/DoWhile | 10 | 1 | In size mode, every node costs 1, minimizing total node count. In gas mode, costs reflect EVM gas prices, so the extractor avoids expensive operations. #### Egglog optimization rules ##### Peepholes (`peepholes.egg` — ruleset: `peepholes`) 52 rules for algebraic simplification: | Category | Examples | Count | `:subsume` | | ---------------------- | ----------------------------------------------------------- | ----- | ---------- | | Identity removal | `x + 0 → x`, `x * 1 → x`, `x - 0 → x`, `x / 1 → x` | 7 | Yes | | Zero/annihilation | `0 * x → 0`, `0 / x → 0` | 4 | Yes | | SmallInt const-fold | `SmallInt(i) + SmallInt(j) → SmallInt(i+j)` (all arith ops) | 5 | Yes | | Comparison fold | `LT(SmallInt i, SmallInt j)` → `0` or `1` | 6 | Yes | | IsZero of constants | `IsZero(0) → 1`, `IsZero(n) → 0` | 2 | Yes | | Boolean simplification | `true && x → x`, `false \|\| x → x`, etc. | 6 | Yes | | Double negation | `IsZero(IsZero(x)) → x`, `NOT(NOT(x)) → x` | 2 | Yes | | Self-cancellation | `x - x → 0`, `x ^ x → 0`, `x == x → 1` | 3 | No | | Select simplification | `Select(c, x, x) → x` | 1 | Yes | | Constant-condition If | `If(true, ...) → then`, `If(false, ...) → else` | 4 | Yes | | Reassociation | `(x + i) + j → x + (i+j)`, `(x * i) * j → x * (i*j)` | 6 | No | | Commutativity | `ADD(a, b) ↔ ADD(b, a)` (for ADD, MUL, EQ, AND, OR, XOR) | 6 | No | Checked arithmetic peepholes (8 additional rules): `CheckedAdd(x, 0) → x`, `CheckedMul(x, 1) → x`, `CheckedSub(x, x) → 0`, etc. ##### Arithmetic optimization (`arithmetic.egg` — ruleset: `arithmetic-opt`) 31 rules for strength reduction and algebraic identities: | Category | Examples | Count | | --------------------- | ---------------------------------------------------------------------------- | ----- | | MUL → SHL | `val * 2^n → SHL(n, val)` (computed via `log2`) | 2 | | DIV → SHR | `val / 2^n → SHR(n, val)` | 1 | | MOD → AND | `val % 2^n → AND(val, 2^n - 1)` | 1 | | EXP reduction | `x**0 → 1`, `x**1 → x`, `x**2 → x*x`, `x**3 → x*(x*x)`, `x**4 → (x*x)*(x*x)` | 5 | | Bitwise identity | `x & x → x`, `x \| x → x` | 2 | | Bitwise zero/identity | `x ^ 0 → x`, `x & 0 → 0`, `x \| 0 → x` | 6 | | Shift by zero | `SHL(0, e) → e`, `SHR(0, e) → e` | 3 | | EQ to ISZERO | `EQ(e, 0) → IsZero(e)` | 2 | | Absorption laws | `x & (x \| y) → x`, `x \| (x & y) → x` | 4 | | Bitwise const-fold | `AND/OR/XOR/SHL/SHR(SmallInt, SmallInt) → SmallInt` | 5 | ##### Storage optimization (`storage.egg` — ruleset: `storage-opt`) 16 rules for storage load/store optimization: | Rule | Pattern | Result | `:subsume` | | -------------------- | ------------------------------------------ | --------------------- | ---------- | | Load-after-store | `SLoad(slot, SStore(slot, val, st))` | `val` | Yes | | Redundant store | `SStore(slot, SLoad(slot, st), st)` | `st` | Yes | | Dead store | `SStore(slot, v, SStore(slot, v2, st))` | `SStore(slot, v, st)` | Yes | | Cross-slot load | `SLoad(s1, SStore(s2, v, st))` where s1≠s2 | `SLoad(s1, st)` | No | | SLoad through MStore | `SLoad(slot, MStore(..))` | `SLoad(slot, state)` | Yes | | SLoad through TStore | `SLoad(slot, TStore(..))` | `SLoad(slot, state)` | Yes | | SLoad through Log | `SLoad(slot, Log(..))` | `SLoad(slot, state)` | Yes | Same 8 rules mirrored for `TLoad`/`TStore` (transient storage). ##### Memory optimization (`memory.egg` — ruleset: `memory-opt`) 6 rules mirroring storage optimization for memory: * `MLoad(off, MStore(off, val, st)) → val` * `MStore(off, MLoad(off, st), st) → st` * `MStore(off, v, MStore(off, v2, st)) → MStore(off, v, st)` * `MLoad` forwarded through `SStore`, `TStore`, `Log` ##### Constant propagation (`const_prop.egg` — ruleset: `const-prop`) 1 rule for propagating constants through immutable variable chains. Works in tandem with `u256-const-fold` inside a `saturate` loop to propagate freshly folded constants. ##### Function inlining (`inline.egg` — ruleset: `peepholes`) 1 rule for inlining small function bodies at call sites. ##### Dead code elimination (`dead_code.egg` — ruleset: `dead-code`) 45 rules combining `IsPure` analysis with elimination rewrites: | Rule | Pattern | Result | | -------------------- | ---------------------------------- | -------------------- | | Empty concat | `Concat(Empty, rest)` | `rest` | | Empty concat (right) | `Concat(inner, Empty)` | `inner` | | Pure dead code | `Concat(pure_inner, rest)` | `rest` | | Nested pure dead | `Concat(Concat(prev, pure), rest)` | `Concat(prev, rest)` | | Dead variable | `LetBind(x, pure_init, Drop(x))` | `Empty` | **Pure expressions:** `Const`, `Arg`, `Empty`, `Selector`, `Var`, `Drop`, all `Uop`s, most `Bop`s (including `SLoad`, `TLoad`, `MLoad`), `Keccak256`, `Select`, `EnvRead`. Also `Concat(pure, pure)` and `If(pure, pure, pure)`. **Not pure:** `SStore`, `TStore`, `MStore`, `MStore8`, `Log`, `Revert`, `ReturnOp`, `ExtCall`, `CheckedAdd/Sub/Mul` (can revert), `VarStore`, `LetBind`. ##### Range analysis (`range_analysis.egg` — ruleset: `range-analysis`) 64 rules combining lattice-based interval tracking with guarded rewrites: **Lattice functions:** * `upper-bound(EvmExpr) → i64` with `:merge (min old new)` * `lower-bound(EvmExpr) → i64` with `:merge (max old new)` * `u256-upper-bound`/`u256-lower-bound` for full U256 range * `max-bits(EvmExpr) → i64` * Relations: `NonZero(EvmExpr)`, `IsBool(EvmExpr)` **ImmutableVar bound propagation:** When `ImmutableVar(name)` is asserted (by `var_opt`), all bounds from `LetBind` init are propagated to `Var(name)` reads. This enables checked arithmetic elision through variables. **Analysis-guarded rewrites:** * `x / x → 1` when `NonZero(x)` * `x % x → 0` when `NonZero(x)` * `IsZero(IsZero(x)) → x` when `IsBool(x)` * `bool & 1 → bool` when `IsBool` ##### U256 constant folding (`u256_const_fold.egg` — ruleset: `u256-const-fold`) 28 rules for folding operations on `LargeInt` (U256) constants using a custom egglog U256 sort: * Arithmetic: ADD, SUB, MUL, DIV, MOD, EXP * Bitwise: AND, OR, XOR, SHL, SHR * Comparison: EQ, LT, GT, ISZERO (6 rules with two branches each) * Power-of-2 strength reduction on U256 values (MUL/DIV/MOD → SHL/SHR/AND) * `LargeInt → SmallInt` normalization when value fits in i64 ##### Checked arithmetic elision (`checked_arithmetic.egg` — ruleset: `range-analysis`) 27 rules combining elision and bound propagation: | Rule | Guard | Result | | ------------------------------ | ------------------------------------------- | ------------- | | `CheckedAdd(a, b) → Add(a, b)` | `u256-add-no-overflow(upper(a), upper(b))` | Unchecked add | | `CheckedSub(a, b) → Sub(a, b)` | `u256-sub-no-underflow(lower(a), upper(b))` | Unchecked sub | | `CheckedMul(a, b) → Mul(a, b)` | `u256-mul-no-overflow(upper(a), upper(b))` | Unchecked mul | Checked ops also propagate bounds (since no overflow is guaranteed): `CheckedAdd(a, b)` upper = upper(a) + upper(b), enabling cascading elision. ##### Type propagation (`type_propagation.egg` — ruleset: `type-propagation`) 59 purely additive analysis rules populating `HasType` and `FunctionHasType` relations. No rewrites. Used by other passes for type-aware optimization. ##### CSE (`cse.egg` — ruleset: `cse-rules`) No explicit rules. CSE is automatic via egglog's e-graph hash-consing. Commutativity rules in `peepholes.egg` ensure `ADD(a,b)` and `ADD(b,a)` share an e-class. ### Stage 7: Post-egglog cleanup **File:** `crates/ir/src/cleanup.rs` Two passes after egglog extraction: 1. **State simplification**: Replaces all nested state parameters (massive `SStore`/`SLoad` chains) with a simple `Arg(StateT)` sentinel. Codegen ignores state params entirely. 2. **Dead code after halt**: Removes unreachable code after `ReturnOp`/`Revert` in `Concat` chains. Also runs straight-line `SStore` → `SLoad` forwarding one more time to catch patterns egglog's cross-slot rules couldn't handle. ### Stage 8: Expression compiler **File:** `crates/codegen/src/expr_compiler.rs` Walks the `EvmExpr` tree and emits EVM opcodes into an `Assembler`. Since the EVM is a stack machine, children are compiled first (postorder), then the operator. **Key state:** * `let_bindings: HashMap` — variable name → memory offset * `next_let_offset: usize` — high-water mark starting at `0x80` * `free_slots: Vec` — reclaimed by `Drop` * `stack_vars: HashMap` — maps variable name to absolute stack position when it was pushed * `stack_depth: usize` — tracks current stack depth for DUP indexing * `overflow_revert_label` — shared trampoline for all checked arithmetic **Memory layout:** Fixed offsets starting at `0x80`. No free memory pointer — the codegen never reads `0x40`. Memory slots are reclaimed via `free_slots` after `Drop`, so memory usage is not monotonically increasing — slots are reused for subsequent variables in the same scope. **Checked arithmetic codegen:** * `CheckedAdd`: `b > result` overflow detection (6 extra opcodes) * `CheckedSub`: `a < b` pre-check (5 extra opcodes) * `CheckedMul`: `result/a != b` with `a==0` short-circuit (\~12 extra opcodes) * Shared `overflow_revert` trampoline (`PUSH0, PUSH0, REVERT`) emitted once **Branch handling:** `compile_if` saves/restores `stack_vars`, `let_bindings`, `free_slots` per branch so that Drop/slot-reuse in one branch doesn't affect the other. Halting branches get special stack depth handling. ### Stage 9: Bytecode optimization (stage 2 — egglog) **Files:** `crates/codegen/src/bytecode_opt/` A second egglog pass operating on `AsmInstruction` sequences. Splits code into basic blocks, optimizes each through egglog, and reassembles. #### Schedule | Level | Schedule | | ----- | ------------------------------------------------------------------------------------------ | | O0 | None | | O1 | 3× `bytecode-peepholes → bytecode-dead-push` | | O2 | 5× `bytecode-peepholes → bytecode-const-fold → bytecode-strength-red → bytecode-dead-push` | | O3+ | 10× all rulesets | #### Bytecode rewrite rules (\~66 rules) | Category | Examples | Count | | --------------------- | ----------------------------------------------------------------- | ----- | | DUP dedup | `PUSH x, PUSH x → PUSH x, DUP1` | 3 | | Cancellation | `SWAPn SWAPn → ε`, `NOT NOT → ε`, `DUPn POP → ε` | 8 | | Commutative swap elim | `SWAP1 ADD → ADD` (also MUL, AND, OR, XOR, EQ) | 6 | | Const fold | `PUSH(i) PUSH(j) ADD → PUSH(i+j)` (8 ops) | 10 | | Strength reduction | `PUSH(0) ADD → ε`, `PUSH(1) MUL → ε`, `PUSH(2) MUL → PUSH(1) SHL` | \~20 | | MOD → AND | `PUSH(2^n) MOD → PUSH(2^n - 1) AND` | 8 | | Dead push | `PUSH(x) POP → ε` | 3 | **Pre-pass:** Dead code elimination after `RETURN`/`REVERT`/`STOP`. **Post-pass:** Label aliasing (consecutive labels → keep last). #### Bytecode cost model | Tier | Gas | Opcodes | | ----------- | ----- | -------------------------------------------------------------------------------------- | | Gzero | 0 | STOP, RETURN, REVERT, INVALID | | Gbase | 2 | POP, ADDRESS, ORIGIN, CALLER, CALLVALUE, ... | | Gverylow | 3 | ADD, SUB, LT, GT, EQ, AND, OR, XOR, SHL, SHR, MLOAD, MSTORE, PUSH, DUP, SWAP (default) | | Glow | 5 | MUL, DIV, SDIV, MOD, SMOD, SIGNEXTEND | | Medium | 8 | ADDMOD, MULMOD | | KECCAK256 | 36 | | | EXP | 60 | | | Warm access | 100 | BALANCE, EXTCODESIZE, TLOAD, TSTORE, CALL, ... | | SLOAD | 2100 | | | SSTORE | 5000 | | | LOG | 750 | | | CREATE | 32000 | | ### Stage 10: Subroutine extraction **File:** `crates/codegen/src/subroutine_extract.rs` Size-mode only (O2+). Detects repeated instruction sequences and extracts them into JUMP-based subroutines, trading \~30 gas/call for code size reduction. **Algorithm:** 1. Find straight-line regions (between labels/jumps) 2. Find all repeated subsequences (min 3 occurrences, min 15 bytes, min 5 instructions) 3. Greedy selection of most profitable non-overlapping candidates 4. Rewrite: replace inline code with calls, append subroutine bodies **Calling convention:** `PushLabel(ret) + JumpTo(sub) + JUMPDEST(ret)`. Subroutine uses SWAP chains for stack management. `PushLabel(label)` only emits `PUSH addr` (the return address), while `JumpTo(label)` emits `PUSH addr + JUMP`. The distinction matters: `PushLabel` is used to push a return address onto the stack *before* jumping, while `JumpTo` performs the actual transfer of control. **Profitability formula:** ``` savings = N * byte_size - (byte_size + sub_overhead) - N * call_overhead ``` Where `call_overhead = 8` bytes and `sub_overhead = 2 + inputs + outputs`. Only candidates with `savings > 0` and ≥ 3 non-overlapping occurrences are extracted. ### Stage 11: Assembler **File:** `crates/codegen/src/assembler.rs` Converts `AsmInstruction` sequences into final bytecode with label resolution: * **Short jumps**: `PUSH1` for contracts \< 256 bytes, `PUSH2` otherwise * **Two-pass assembly**: first pass computes label offsets, second pass emits bytes * **`AsmInstruction` variants**: `Op(Opcode)`, `Push(Vec)`, `Label(String)`, `JumpTo(String)`, `JumpITo(String)`, `PushLabel(String)`, `Comment(String)`, `Raw(Vec)` — `Raw` carries verbatim bytecode bytes (used by `InlineAsm` IR nodes to embed hand-coded EVM sequences); the assembler emits these bytes as-is and the optimizer never touches them. * **Label generation**: `fresh_label(prefix)` generates `"{prefix}_{N}"` strings using a monotonic counter, used for all synthetic jump targets (subroutine return addresses, if/else branches, etc.) ### Stage 12: Constructor wrapper **File:** `crates/codegen/src/contract.rs` Produces two-part deployment bytecode: 1. **Constructor** (init code): Runs constructor body, then `CODECOPY` + `RETURN` to deploy runtime 2. **Runtime**: Dispatcher + inlined function bodies **Dispatcher:** IR-driven dispatch — the selector if-else chain is encoded in the IR itself. `ExprCompiler` compiles it directly to EVM opcodes via `compile_expr(contract.runtime)`. There is no separate binary-search or jump-table builder; the dispatch shape is determined by the IR lowering pass upstream. ### Egglog advanced features The compiler makes heavy use of egglog's advanced capabilities: | Feature | Usage | | ------------------- | -------------------------------------------------------------------------------------------------------- | | Merge functions | `upper-bound` uses `:merge (min old new)`, `lower-bound` uses `:merge (max old new)` — lattice semantics | | Subsumption | `:subsume` on identity removal, constant folding, annihilation rules to keep e-graph lean | | Computed functions | `log2`, `&`, bitwise ops for generalized power-of-2 detection | | Custom sorts | `U256Sort` for full 256-bit arithmetic in egglog | | Sentinel context | `InFunction("__opt__")` for self-cancellation rules (`x-x→0`) | | Analysis scheduling | `saturate(seq(run dead-code)(run range-analysis))` before optimization rulesets | | Dual cost models | Parameterized `:cost` annotations on the schema, switched at compile time | ### Summary statistics | Component | Rule count | Ruleset | | ------------------------ | --------------- | ------------------------------------------------ | | `peepholes.egg` | 52 | `peepholes` | | `arithmetic.egg` | 31 | `arithmetic-opt` | | `checked_arithmetic.egg` | 27 | `range-analysis`, `peepholes`, `u256-const-fold` | | `storage.egg` | 16 | `storage-opt` | | `memory.egg` | 6 | `memory-opt` | | `dead_code.egg` | 45 | `dead-code` | | `range_analysis.egg` | 64 | `range-analysis` | | `u256_const_fold.egg` | 28 | `u256-const-fold` | | `type_propagation.egg` | 59 | `type-propagation` | | `const_prop.egg` | 1 | `const-prop` | | `inline.egg` | 1 | `peepholes` | | `cse.egg` | 0 (implicit) | `cse-rules` | | **Stage 1 total** | **330** | **12 rulesets** | | Bytecode rules | \~66 | 4 rulesets | | **Grand total** | **\~396 rules** | **16 rulesets** | ## Quickstart This page gives you the fastest path from a fresh checkout to compiling an `.edge` contract. For a deeper walkthrough of the internals, see [Compiler Architecture](/compiler/overview). ### Install Install `edgeup`, the Edge toolchain manager: ```bash curl -fsSL https://raw.githubusercontent.com/refcell/edge-rs/main/etc/install.sh | sh ``` Then install the Edge compiler: ```bash edgeup install ``` `edgeup` detects your shell (bash, zsh, or fish) and appends the toolchain directory to your `PATH` automatically. Restart your shell or run the printed `source` command before continuing. **Supported platforms:** Linux x86\_64, macOS x86\_64, macOS arm64 (Apple Silicon). Windows is not supported. ### Compile a contract Pass a source file directly to `edgec` to compile it. By default, the compiler prints EVM bytecode as a hex string to stdout. ```bash edgec examples/counter.edge edgec examples/counter.edge -o counter.bin edgec -v examples/expressions.edge ``` ### Inspect intermediate stages The compiler can stop after earlier phases of the pipeline using subcommands or the `--emit` flag: ```bash # Subcommands edgec lex examples/counter.edge # print tokens edgec parse examples/counter.edge # print AST edgec check examples/counter.edge # type-check only, no output # --emit flag variants edgec --emit tokens examples/counter.edge # lex only edgec --emit ast examples/counter.edge # parse only edgec --emit ir examples/counter.edge # IR (s-expression) edgec --emit pretty-ir examples/counter.edge # IR (pretty-printed) edgec --emit asm examples/counter.edge # pre-final assembly edgec --emit bytecode examples/counter.edge # EVM bytecode (default) ``` ### ABI output Passing `--emit abi` prints the contract ABI as JSON to stdout, compatible with the Ethereum ABI specification. This is useful for generating interface files consumed by frontends, deployment scripts, and other tooling. ```bash edgec --emit abi examples/counter.edge # [{"type":"function","name":"increment","inputs":[],"outputs":[],"stateMutability":"view"}, ...] ``` ### Standard JSON I/O The `edgec standard-json` command implements the standard JSON IPC protocol used by Foundry and solc-compatible toolchains. It reads a JSON request from stdin containing source files and compiler settings, compiles every source, and writes a JSON response to stdout with ABI and bytecode fields for each contract. The command always exits 0; compilation errors are reported inside the JSON response rather than as a non-zero exit code. This is the interface that the `foundry-compilers` crate uses to drive external compilers. ```bash echo '{"language":"Edge","sources":{"counter.edge":{"content":"..."}}}' | edgec standard-json # {"sources":{"counter.edge":{"id":0}},"contracts":{"counter.edge":{"Counter":{"abi":[...],"evm":{"bytecode":{"object":"604d..."},...}}}}} ``` ### Optimization flags Control the optimization level and target metric: ```bash edgec -O2 examples/counter.edge # optimization level 0–3 (default: 0) edgec --optimize-for size examples/counter.edge # optimize for bytecode size edgec --optimize-for gas examples/counter.edge # optimize for gas cost (default) ``` ### Standard library path The compiler embeds the Edge standard library at build time. To use a local checkout of the stdlib instead, set `--std-path` or the `EDGE_STD_PATH` environment variable: ```bash edgec --std-path ./std examples/counter.edge EDGE_STD_PATH=./std edgec examples/counter.edge ``` ### Language server Edge ships an LSP server for editor integration. Start it with: ```bash edgec lsp ``` The server communicates over stdin/stdout and provides parse and type-check diagnostics with precise source spans. :::warning Hover, completions, and go-to-definition are not yet implemented. ::: ### Explore the reference programs The repository includes a growing set of example contracts under [`examples/`](https://github.com/refcell/edge-rs/tree/main/examples), ranging from small syntax samples to larger token-style contracts.