A formally verified parser and Document Object Model (DOM) for Simple Markup Language (SML) - a restricted subset of XML designed for formal verification. The library is implemented in SPARK Ada with mathematical proofs of correctness, memory safety, and performance guarantees.
✅ Implementation Complete - All packages fully implemented ✅ SPARK Verified - Flow analysis passes with zero high-priority issues ✅ Schema Validation - Type-safe configuration validation system ✅ Build Successful - Compiles without errors ✅ Production Ready - Formally verified and tested
# Build the core library and run tests
alr build -- -cargs:ada -gnat2022 -gnatX0
alr test
# Run SPARK verification
alr exec -- gnatprove -P sml.gpr --mode=prove --level=4 --prover=cvc5 >> proof.log 2>&1Examples and integration demos have moved to a separate repository:
sml_examples (https://github.com/berkeleynerd/sml_examples).
It contains runnable binaries (sml_example, schema_validation_example,
test_schema_loader) and local fixtures.
| File | Purpose |
|---|---|
| AGENTS.md | Contributor guide with build, style, and review expectations |
| MEMORY.md | Memory configuration guide - stack sizing, troubleshooting, performance tuning |
| (Additional docs) | Architecture and schema specification (to be added) |
SML is a strict subset of XML with these restrictions to enable formal verification:
- ✅ No attributes on elements
- ✅ No processing instructions, DTDs, or schemas
- ✅ No CDATA sections or comments
- ✅ No namespaces
- ✅ UTF-8 encoding only
- ✅ Only 5 standard XML entities:
<>&"'
- Formally Verified: SPARK Ada proofs of correctness and safety
- Memory Safe: No buffer overflows, leaks, or undefined behavior
- Performance Guaranteed: O(n) parsing, O(1) DOM operations
- Schema Validation: Formally verified type constraints and structure rules
- RFC 3629 Compliant: Full UTF-8 validation with security hardening
- Bounded Resources: Provably bounded memory usage (configurable)
The library uses index-based node references and arena allocation for SPARK compatibility:
- SML.DOM — Core DOM types and traversal
- SML.DOM.Parser — Single-pass parser with contracts
- SML.DOM.Builder — DOM construction/modification API
- SML.DOM.Writer — SML serialization
Formal proof units live in .Proof children mirroring core modules, e.g.:
- SML.DOM.Proof, SML.UTF8.Proof, SML.Entities.Proof, SML.Memory_Model.Proof
# Using Alire (recommended)
alr build
# Or using GPRbuild directly
gprbuild -P sml.gpr
# Run tests via Alire actions
alr test# Flow analysis (fast, checks data flow)
alr exec -- gnatprove -P sml.gpr --mode=flow --report=all
# Level 0 proof (fast, partial verification)
alr exec -- gnatprove -P sml.gpr --level=0 --report=all
# Level 4 proof (complete verification, slower)
alr exec -- gnatprove -P sml.gpr --level=4 --report=allSee MEMORY.md for detailed guidance on configuring memory limits and stack sizes for different document sizes.
Default limits:
- 2,000 nodes maximum
- 200KB string storage
- Suitable for documents up to ~50KB
SML includes a formally verified schema validation system for type-safe configuration files and data validation.
SML.Schema - Validation engine with:
- Element vocabulary - Define allowed element names
- Content types - integer, boolean, decimal, string with ranges/lengths
- Strings: length, optional prefix/suffix
- Decimals: min/max range
- Enumerations - Restrict values to specific lists (e.g., log levels)
- Cardinality - Min/max occurrence constraints (minOccurs, maxOccurs)
- Structure rules - Sequence (ordered), choice (one-of), and all (unordered)
- Sequences enforce strict sibling order; extra elements are rejected; text/whitespace between elements is ignored.
- Group-level occurs on sequence/choice via child
<minOccurs>/<maxOccurs>;unboundedallowed (not forall). allfollows XSD:minOccurs∈ {0,1},maxOccurs= 1.
- Type safety - Formal invariants with 0 high-priority SPARK issues
See sml_examples for a reference SML.Schema.Loader implementation
and end-to-end schema validation demos.
with SML.Schema;
procedure Validate_Config is
Schema : Schema_Document;
Port_Type : Simple_Type_Definition;
Success : Boolean;
begin
-- Build schema
Initialize (Schema);
Port_Type.Name := To_Bounded_String ("portType");
Port_Type.Base_Type := Integer_Type;
Port_Type.Min_Value := 1;
Port_Type.Max_Value := 65535;
Add_Simple_Type (Schema, Port_Type, Success);
-- Validate document
Result := Validate_Document (Schema, Doc);
if Result.Status = Valid then
-- Document is valid!
end if;
end Validate_Config;See the sml_examples repository for build and run instructions.
Schemas are expressed in SML itself. A concise schema specification will be added; see the sml_examples repository for working examples.
Example schema:
<schema>
<simpleType>
<name>portType</name>
<restriction>
<base>integer</base>
<minValue>1</minValue>
<maxValue>65535</maxValue>
</restriction>
</simpleType>
<element>
<name>config</name>
<type>configType</type>
</element>
</schema>Loading schemas: See the loader in sml_examples/src/ for a reference implementation.
Development guidance and extended documentation will be added soon. For questions or proposals, please open an issue or PR.
[Add your license here]