mirror of https://github.com/yhirose/cpp-peglib.git synced 2026-05-14 02:38:10 -04:00

History

yhirose dffbcb6edd Add language independent test framework (#339 )		2026-04-19 21:49:18 -04:00
..
mini-js	Add language independent test framework (#339 )	2026-04-19 21:49:18 -04:00
schema	Add language independent test framework (#339 )	2026-04-19 21:49:18 -04:00
tests	Add language independent test framework (#339 )	2026-04-19 21:49:18 -04:00
README.md	Add language independent test framework (#339 )	2026-04-19 21:49:18 -04:00

README.md

PEG Parser Test Specification

A language-independent test suite for PEG parser implementations that support the cpp-peglib extended PEG grammar syntax.

Purpose

This specification defines a common set of test cases in JSON format that any conforming PEG parser implementation can use to verify compatibility. The same .peg grammar files and test inputs should produce identical results across C++, Rust, Go, or any other language implementation.

Directory Structure

spec/
  schema/
    test-case.schema.json       # JSON Schema for test case files
  tests/
    core/                       # Standard PEG operators
      sequence.json
      choice.json
      repetition.json
      literal.json
      character_class.json
      lookahead.json
      token_boundary.json
      ignore.json
      dot.json
    extensions/                 # cpp-peglib extended syntax
      whitespace.json
      word.json
      cut.json
      dictionary.json
      macro.json
      precedence.json
      capture_backreference.json
      left_recursion.json
      case_insensitive.json
      hex_unicode.json
    error/                      # Error reporting and recovery
      error_report.json
      error_recovery.json
      error_message.json
      infinite_loop.json
    ast/                        # AST generation and optimization
      ast_generation.json
      ast_optimization.json
    semantic/                   # Semantic actions
      token_conversion.json
      choice_detection.json
      arithmetic.json
      enter_leave.json
      predicate.json
      state_variables.json
    packrat/                    # Packrat parsing
      packrat.json
    unicode/                    # Unicode support
      unicode.json

Test Case Format

Each JSON file contains an array of test cases. A minimal test case specifies a grammar, input, and expected result:

[
  {
    "name": "basic_sequence",
    "grammar": "S <- 'hello' ' ' 'world'",
    "cases": [
      { "input": "hello world", "match": true },
      { "input": "hello", "match": false }
    ]
  }
]

Grammar

The grammar field contains PEG grammar text, exactly as it would appear in a .peg file. Multi-line grammars use \n for line breaks.

Cases

Each entry in cases specifies:

Field	Type	Description
`input`	string	Input text to parse
`match`	boolean	Whether parsing should succeed
`match_length`	integer	(optional) Expected number of characters consumed
`expected_value`	any	(optional) Expected semantic value from the start rule
`expected_ast`	string	(optional) Expected AST text dump
`expected_trace`	string[]	(optional) Expected enter/leave event sequence
`expected_error`	object	(optional) Expected error `{ "line": N, "col": N, "message": "..." }`

Semantic Actions

The actions field defines semantic actions declaratively:

{
  "grammar": "Expr <- Term (('+' / '-') Term)*\nTerm <- < [0-9]+ >",
  "actions": {
    "Term": { "op": "token_to_number", "type": "int" },
    "Expr": {
      "op": "reduce",
      "initial": { "child": 0 },
      "step": {
        "operator_child": 1,
        "value_child": 2,
        "operators": { "+": "add", "-": "subtract" }
      }
    }
  },
  "cases": [
    { "input": "1+2-3", "expected_value": 0 }
  ]
}

Available Action Operations

Operation	Description
`token_to_number`	Convert matched token to a number
`token_to_string`	Return matched token as a string
`choice`	Return the matched choice index (0-based)
`size`	Return the number of child semantic values
`join_tokens`	Join token values with a separator
`reduce`	Fold child values with binary operators
`passthrough`	Return the Nth child value (default: 0)

State Variables

The vars field declares mutable state accessible from handlers:

{
  "vars": {
    "indent": { "type": "int", "init": 0 },
    "require_upper": { "type": "bool", "init": false },
    "banned": { "type": "string[]", "init": ["bad", "evil"] }
  }
}

Supported variable types: bool, int, string[].

Handlers (enter/leave/predicate)

The handlers field defines enter, leave, and predicate callbacks:

{
  "handlers": {
    "Block": {
      "enter": { "set": { "indent": "indent + 2" } },
      "leave": { "set": { "indent": "indent - 2" } }
    },
    "Samedent": {
      "predicate": {
        "check": { "sv.length": { "eq": "indent" } },
        "error": "different indent..."
      }
    }
  }
}

Enter/Leave Actions

Field	Description
`set`	Object mapping variable names to new values. Values can be constants or simple expressions like `"var + N"` or `"var - N"`

Predicate

Field	Description
`when`	(optional) Variable name; predicate only applies when the variable is truthy
`check`	Condition to evaluate (see below)
`error`	Error message on failure

Check Conditions

The left side of a check is a property of the parse result:

Property	Description
`sv`	Matched string
`sv.length`	Length of matched string
`token_string`	Token text
`token_number`	Token converted to number

The right side is a comparison:

Comparison	Description
`eq`	Equal to a value or variable
`matches`	Regex match
`between`	`[min, max]` inclusive range
`in`	Contained in a string[] variable
`not_in`	Not contained in a string[] variable

Trace Verification

To verify enter/leave ordering:

{
  "trace": {
    "rules": ["A", "B"],
    "events": ["enter", "leave"]
  },
  "cases": [
    {
      "input": "ab",
      "match": true,
      "expected_trace": ["enter_A", "leave_A", "enter_B", "leave_B"]
    }
  ]
}

The test harness hooks enter/leave on the specified rules and records events as "enter_RULE" / "leave_RULE" strings.

AST Verification

To verify AST output:

{
  "ast": { "optimize": true },
  "cases": [
    {
      "input": "1+2*3",
      "expected_ast": "+ Expr/0\n  - Term[Number] (1)\n  + Expr/0[Term]\n    - Number (2)\n    - Term[Number] (3)"
    }
  ]
}

The AST text format uses:

+ RuleName — non-terminal node (has children)
- RuleName — terminal node (token)
/N — choice index (0-based)
[OriginalRule] — original rule name before AST optimization
(token) — token content
Indentation: 2 spaces per depth level

Implementing a Test Harness

Each language implementation provides a test harness that:

Reads JSON test files
Creates a parser from the grammar field
Binds actions to native semantic action functions
Initializes vars as mutable state
Binds handlers (enter/leave/predicate) to native callbacks
Sets up trace hooks if specified
Enables AST mode if ast is specified
Runs each case and verifies against expected results

The harness is typically a few hundred lines of code. Once written, adding new test cases requires no harness changes.

Scope

This specification covers the PEG grammar format and its parsing behavior. It does not cover:

Language-specific parser combinator APIs (e.g., C++ Definition, seq, cho)
Language-specific binding details beyond what the action DSL expresses
Performance characteristics

These aspects are tested by each implementation's own native test suite.