/poly_doctest

Polyglot documentation snippet test generator - extract code snippets from docs and generate tests

Primary LanguageRust

readme is a wip. initial release.

poly_doctest

A polyglot documentation snippet test generator that extracts code snippets from markdown documentation and generates test files.

Features

  • Polyglot Support: Extensible architecture supports any programming language
  • Markdown Processing: Extracts code snippets from fenced code block
  • HIDE Line Support: Use HIDE: prefix to exclude setup code from generated tests
  • Multiple Sources: Supports both local files and remote Git repositories (GitHub, GitLab)
  • Built-in CLI: Ready-to-use command-line interface with flexible options
  • Smart Test Naming: Automatically generates meaningful test names from markdown headings

Architecture

This crate provides the core functionality for extracting documentation snippets and generating tests. Language-specific generators implement the LangGenerator trait to produce language-appropriate test files.

Core Modules

  • processor: Markdown parsing, code extraction, and HIDE line processing
  • source: Unified interface for local and remote documentation sources
  • generator: Language generator trait and orchestration logic
  • cli: Command-line interface with argument parsing
  • model: Data structures for code snippets and source files
  • error: Unified error handling across all modules

Core Components

  • LangGenerator: Trait for implementing language-specific test generators
  • DocsSource: Enum supporting local directories and remote Git repositories
  • CodeSnippet: Processed code snippet with auto-generated test name
  • SourceFileSnippets: Collection of snippets from a single source file
  • CliArgs: Structured command-line arguments with validation

Usage

As a Library

This crate is designed to be used by language-specific generator crates:

use poly_doctest::{LangGenerator, run_cli, Result, CodeSnippet, SourceFileSnippets};
use std::path::{Path, PathBuf};

#[derive(Default)]
pub struct MyLanguageGenerator;

impl LangGenerator for MyLanguageGenerator {
    fn code_fence_languages(&self) -> &[&str] {
        &["mylang", "ml"]  // Language identifiers in markdown code fences
    }

    fn default_output(&self) -> PathBuf {
        PathBuf::from("tests/docs")
    }

    fn generate(&self, source_files: &[SourceFileSnippets], output_path: &Path) -> Result<()> {
        // Generate language-specific test files
        for source_file in source_files {
            for snippet in &source_file.snippets {
                // Generate test file using snippet.name and snippet.code
                println!("Test: {} -> {}", snippet.name, snippet.code);
            }
        }
        Ok(())
    }
}

fn main() -> anyhow::Result<()> {
    let generator = MyLanguageGenerator::default();
    poly_doctest::run_cli(generator)?;
    Ok(())
}

Command Line Interface

The built-in CLI supports multiple source types and options:

# Local directory (recursive)
my-generator --local ./docs --recursive --output ./tests

# Remote GitHub repository
my-generator --remote https://github.com/owner/repo/tree/main/docs --output ./tests

# Custom hide prefix
my-generator --local ./docs --hide-prefix "SKIP:" --output ./tests

Documentation Format

Code Fence Requirements

Documentation snippets must use the following format to be processed:

```language test
// Your code here
```
  • Language identifier: Must match one of the languages supported by your generator
  • test keyword: Required to indicate this block should generate a test
  • Content: Any valid code for the target language

HIDE Line Processing

Use the HIDE: prefix (configurable) to remove the prefix but keep the content:

```rust test
HIDE:use std::collections::HashMap;
HIDE: // This comment will be kept
let mut map = HashMap::new();
map.insert("key", "value");
assert_eq!(map.get("key"), Some(&"value"));
```

The generated test will contain:

use std::collections::HashMap;
// This comment will be kept
let mut map = HashMap::new();
map.insert("key", "value");
assert_eq!(map.get("key"), Some(&"value"));

Test Naming Strategy

Test names are automatically generated using this hierarchy:

  1. With headings: Uses heading path + sequential counter across entire document
  2. Without headings: filename_01, filename_02, etc.
  3. Sanitization: Non-alphanumeric characters are converted to underscores

Important: Headings accumulate as you traverse the document, and the counter is global across all snippets in a single document.

Example:

## Section A

```rust test
let x = 1;  // Generated name: section_a_01
```

## Section B

### Subsection

```rust test
let y = 2;  // Generated name: section_a_section_b_subsection_02
```

```rust test
let z = 3;  // Generated name: section_a_section_b_subsection_03
```

This generates: section_a_01, section_a_section_b_subsection_02, section_a_section_b_subsection_03

Remote Repository Support

Supports extracting documentation from remote Git repositories:

GitHub

# Full repository
--remote https://github.com/owner/repo

# Specific branch and path  
--remote https://github.com/owner/repo/tree/develop/documentation

GitLab

# GitLab repositories (auto-detected from URL)
--remote https://gitlab.com/owner/repo/tree/main/docs

API Reference

Main Functions

  • run_cli(generator): Run with command-line argument parsing
  • run_with_args(generator, args): Run with pre-parsed arguments
  • generate_docs_with_options(): Low-level generation with full control

Examples

See the tests/rust_doctest/rustgen.rs for a complete Rust generator implementation that creates test modules with proper imports and test functions.

License

MIT OR Apache-2.0