Chester Reader Architecture

Design of Chester’s parsers (“readers”) that transform source code into abstract syntax trees.

Overview

Chester currently has two parser implementations:

ReaderV1: The original parser using FastParse combinators
ReaderV2: The newer implementation using a token-based state machine

Both parsers produce semantically identical ASTs using different internal approaches.

Core Design Principles

Context-Free Parsing: Uniform rules for all expressions; identifiers treated consistently
Separation of Concerns: Parse syntax without imposing semantics
Uniform Symbol Treatment: No special keywords - just identifiers and operators
Flat Operator Sequences: Operator precedence handled later in the semantic phase
Newline Significance: }\n terminates expressions in blocks
Block Return Values: Last expression in a block is its return value

ReaderV1 Implementation

ReaderV1 uses the FastParse library to implement a parser combinator approach.

Key Components

TokenParsers: Small parsers for basic lexemes (identifiers, literals, operators)
Combinators: Composable functions that build larger parsers from smaller ones
ParsingContext: Tracks parsing state (e.g., whether currently in an operator sequence)
ExprMeta: Metadata handling for source positions and comments

Characteristics

Declarative grammar definitions
FastParse-based error reporting
Recursive descent parsing model

Implementation Structure

ReaderV1 consists of:

Expression Parsers: Methods like parseExpr, parseAtom, and parseOperator form the core of the parser. They use FastParse combinators to build complex parsers from simpler ones.
Context Tracking: A ParsingContext object tracks the current parsing state, including whether we’re in an operator sequence, a block, or other specialized contexts.
Source Position Tracking: Dedicated methods map character positions to line/column positions for error reporting, with special handling for UTF-16 surrogate pairs.
Whitespace and Comment Handling: Dedicated parsers for whitespace, line endings, and comments ensure these elements are preserved in the AST.
Parser Extensions: Custom extension methods for FastParse parsers add support for metadata attachment, relaxed parsing, and error recovery.
Parser Composition: The implementation composes smaller parsers into larger ones, following FastParse’s combinator approach.

ReaderV2 Implementation

ReaderV2 uses a custom tokenizer and a state machine-based approach for parsing, with significant improvements to block termination detection and object expression parsing.

Key Components

Lexer: Converts source code into a stream of tokens for efficient parsing
ReaderState: Tracks current token position, history, and pending whitespace/comments
ReaderContext: Contains context flags like newLineAfterBlockMeansEnds for parsing decisions
Token: Represents tokens like identifiers, operators, literals, with source position information
Token Handlers: Specialized methods for parsing different token types and structures

Characteristics

Pre-tokenization for efficient token stream processing
Separate lexing and parsing phases for cleaner code organization
Context-aware parsing with explicit state tracking
Enhanced UTF-16 aware Unicode and emoji handling
Robust block termination detection with the }\n pattern
Comprehensive object expression support with multiple key types
Optimized comment handling and attachment

Implementation Structure

ReaderV2 consists of:

Two-Phase Parsing: Separates tokenization from parsing, with a dedicated Tokenizer creating a stream of tokens before parsing begins.
State Management: The parser maintains state through two complementary objects:
- ReaderState: Tracks token position, history, and pending whitespace/comments
- ReaderContext: Contains context flags like newLineAfterBlockMeansEnds for syntactic decisions
- Together they enable precise tracking of parser state and contextual information
Context-Aware Processing: Context flags enable important syntactic decisions like proper block termination with the }\n pattern, while maintaining uniform symbol treatment.
Optimized Comment Handling: Non-recursive methods like skipComments() and pullComments() efficiently manage comment attachment, replacing the previous recursive approach.
Robust Block Termination: The special }\n pattern detection is implemented in the checkForRBraceNewlinePattern() method, which uses the newLineAfterBlockMeansEnds flag from ReaderContext to determine when blocks should end.
Enhanced Object Expressions: Support for multiple key types:
- Identifier keys (e.g., { x = 1 })
- String literal keys (e.g., { "x" = 1 })
- Symbol literal keys (e.g., { 'x = 1 })
- Both = and => operators in object clauses
Error Handling: The parser produces structured ParseError objects with detailed source position information and recovery mechanisms.
Bottom-Up Construction: Parsing builds expressions from atoms and then extends them through continuation-based parsing in parseRest().

Key Similarities Between Implementations

Both parsers:

Track source positions for error reporting
Preserve comments in the AST
Handle the }\n block termination pattern
Produce flat operator sequences without precedence handling
Parse the same language syntax
Use context tracking for parsing decisions
Generate identical AST structures

Key Differences Between Implementations

Feature	ReaderV1	ReaderV2
Parsing Approach	Parser combinators (FastParse)	Token-based state machine
Error Recovery	Limited	Enhanced with token-based recovery
Token Creation	On-demand during parsing	Separate tokenization phase
State Handling	Implicit in parse context	Explicit in ReaderState
Code Structure	Grammar-centric	Process-centric
Performance	Good	Better (especially on large files)
Unicode Support	Basic	Enhanced with better UTF-16 handling

Chester’s test framework validates parser correctness and compatibility between V1 and V2 implementations. This framework, defined in reader/shared/src/test/scala/chester/reader/parseAndCheck.scala, provides several key testing functions:

Core Testing Functions

Parser-Specific Testing:
- parseV1(input): Parses input with V1 parser only and returns the result
- parseV2(input): Parses input with V2 parser only and returns the result
- parseAndCheckV1(input, expected): Tests V1 parser against expected output
- parseAndCheckV2(input, expected): Tests V2 parser against expected output
Cross-Parser Verification:
- parseAndCheckBoth(input, expected): Tests both parsers and ensures they produce identical results
- Tests backward compatibility and feature parity
Top-Level Parsing:
- parseTopLevelV1/V2 and parseAndCheckTopLevelV1/V2/Both: Similar functions for testing top-level parsing
- Handle file-level parsing with multiple expressions

Error Reporting

The testing framework provides error reporting with:

Detailed error messages showing exact failure position
Visual pointer to error location in source code
Context-aware error descriptions
Comparison between expected and actual AST structures

Serialization Verification

The framework also tests that parsed expressions can be correctly serialized and deserialized:

Verifies JSON serialization with read[Expr](write[Expr](value))
Confirms binary serialization with readBinary[Expr](writeBinary[Expr](value))
Ensures AST structures maintain integrity through serialization cycles

Test Organization

Parser tests are organized into several categories:

Expression Tests: Verify parsing of individual expression types
Integration Tests: Test combined language features
Regression Tests: Ensure previously fixed issues don’t reoccur
Migration Tests: Track progress in supporting V1 features in V2

File-Based Testing

In addition to the core testing functions, Chester implements file-based integration tests:

FileParserTest.scala: Tests ReaderV2 against a suite of test files in tests/parser directory
FileParserTestV1.scala: Tests ReaderV1 against the same test suite for comparison

These file-based tests:

Ensure consistency when parsing complete Chester files
Verify parser behavior across a wide range of syntax combinations
Automatically generate expected output for regression testing
Maintain backward compatibility during parser evolution

Future Development

ReaderV2 is the focus of ongoing development, with priorities including:

Completing error recovery implementation
Adding source maps support
Migrating any remaining V1-only tests
Expanding test coverage
Optimizing token handling for better performance

See devlog.md for chronological implementation details.

The Chester Programming Language

Chester Reader Architecture

Overview

Core Design Principles

ReaderV1 Implementation

Key Components

Characteristics

Implementation Structure

ReaderV2 Implementation

Key Components

Characteristics

Implementation Structure

Key Similarities Between Implementations

Key Differences Between Implementations

Testing Infrastructure

Core Testing Functions

Error Reporting

Serialization Verification

Test Organization

File-Based Testing

Future Development