ICM (Internal Coherence Maximization) is a Python tool for unsupervised elicitation of language models. Based on the paper "Unsupervised Elicitation of Language Models", ICM fine-tunes pretrained language models on their own generated labels without external supervision.
- Unsupervised Learning: Generate high-quality labeled datasets without human supervision
- Mutual Predictability: Find labels that are logically consistent and mutually predictable
- Multiple Task Types: Support for classification, comparison, mathematical reasoning, and more
- Flexible Export: Export to various formats (DPO, CSV, JSON) and push to Hugging Face
git clone https://github.com/codelion/icm.git
cd icm
pip install -e .pip install -r requirements.txtGenerate a labeled dataset using ICM:
icm run --model google/gemma-3-1b-it --dataset truthful_qa --task-type truthfulqa --max-examples 100icm export --input-path icm_results/truthfulqa_dialoGPT_20240115_143022.jsonl --output-path truthfulqa_dpo.jsonl --format dpoicm push --input-path truthfulqa_dpo.jsonl --hf-repo-id your-username/icm-truthfulqa-dataset| Use Case | Dataset | Link |
|---|---|---|
| Fine-tuning the model | dpo dataset |
ICM uses two key components:
- Mutual Predictability: Measures how well the model can predict each label given all other labels
- Logical Consistency: Enforces simple logical constraints to prevent degenerate solutions
The algorithm uses simulated annealing to search for optimal label assignments that maximize:
U(D) = α × P_θ(D) - I(D)
Where:
P_θ(D)is the mutual predictability scoreI(D)is the inconsistency penaltyαbalances the two terms
# Fully automatic - detects config='multiple_choice' and split='validation'
icm run --model google/gemma-3-1b-it --dataset truthful_qa --task-type truthfulqa
# Or explicitly specify parameters
icm run --model google/gemma-3-1b-it --dataset truthful_qa --config multiple_choice --split validation --task-type truthfulqa# Fully automatic - detects config='main'
icm run --model google/gemma-3-1b-it --dataset gsm8k --task-type gsm8k
# Or explicitly specify parameters
icm run --model google/gemma-3-1b-it --dataset gsm8k --config main --task-type gsm8kicm run --model google/gemma-3-1b-it --dataset path/to/dataset.jsonl --task-type classificationICM can generate synthetic datasets for testing and experimentation. These are perfect for:
- Testing ICM: Validate the algorithm on simple, verifiable tasks
- Quick experiments: Generate datasets instantly without external dependencies
- Educational purposes: Understand how ICM works with clear logical relationships
Generates simple addition problems with both correct and incorrect solutions:
Example Output:
Question: What is 42 + 17?
Claim: 42 + 17 = 59
I think this Claim is [True/False]
How it works:
- Random numbers between 1-100
- Creates correct solutions (True labels)
- Creates incorrect solutions with random errors (False labels)
- Double the requested size:
--synthetic-size 500creates 1000 examples (500 correct + 500 incorrect) - Perfectly balanced: 50% True, 50% False labels
Generates number comparison tasks:
Example Output:
Query: Which number is larger?
Response A: 73
Response B: 45
Claim: Response A is larger than Response B
I think this Claim is [True/False]
How it works:
- Random pairs of numbers
- True/False based on actual comparison
- Single example per iteration (not doubled)
# Math problems - creates 1000 examples (500 pairs)
icm run --model google/gemma-3-1b-it --synthetic math --synthetic-size 500
# Number comparisons - creates 300 examples
icm run --model google/gemma-3-1b-it --synthetic comparison --synthetic-size 300
# Quick test with defaults (100 examples)
icm run --model google/gemma-3-1b-it --synthetic math- Instant generation: No need to download or configure external datasets
- Verifiable ground truth: Clear logical relationships for validation
- Reproducible: Consistent results with same seed
- Perfect for testing: Simple tasks ideal for algorithm validation
- No dependencies: Works offline without internet connection
All synthetic examples follow the standard ICM format:
{
"input": "Question: What is 42 + 17?\nClaim: 42 + 17 = 59\nI think this Claim is [True/False]",
"metadata": {
"gold_label": "True",
"task": "math"
}
}Run ICM on a dataset to generate labeled examples.
Required Arguments:
--model: Model name or path (e.g.,google/gemma-3-1b-it)
Dataset Arguments:
--dataset: Dataset name or path--task-type: Task type (auto,classification,comparison,truthfulqa,gsm8k)--split: Dataset split (default:train)--max-examples: Maximum examples to process
Synthetic Dataset Options:
--synthetic: Create synthetic dataset (math,comparison)--synthetic-size: Number of synthetic examples to generate (default: 100)
ICM Algorithm Parameters:
--alpha: Weight for mutual predictability vs consistency (default: 100.0)--initial-temperature: Starting temperature for simulated annealing (default: 3.0)--final-temperature: Ending temperature (default: 0.001)--cooling-rate: Temperature cooling rate (default: 0.98)--initial-examples: Number of initial random examples (default: 20)--max-iterations: Maximum search iterations (default: 1000)
Generation Parameters:
--generation-temperature: Temperature for text generation (default: 0.2)--generation-top-p: Top-p for nucleus sampling (default: 0.9)--generation-max-tokens: Maximum tokens to generate (default: 512)
System Parameters:
--device: Computation device (cuda,cpu,auto)--seed: Random seed for reproducibility (default: 42)--log-level: Logging level (DEBUG,INFO,WARNING,ERROR)
Export ICM results to various formats.
Required Arguments:
--input-path: Path to ICM result file--output-path: Output file path--format: Export format (json,dpo,csv,analysis)
Optional Arguments:
--include-stats: Include statistics in JSON export--create-pairs: Create chosen/rejected pairs for DPO format--hf-push: Push to Hugging Face after export--hf-repo-id: Hugging Face repository ID--private: Make Hugging Face repository private
Push files to Hugging Face Hub.
Required Arguments:
--input-path: Local file path to upload--hf-repo-id: Hugging Face repository ID (e.g.,username/dataset-name)
Optional Arguments:
--file-name: Custom filename in repository--private: Make repository private
List all saved ICM results.
icm list --results-dir icm_resultsAnalyze ICM results and show statistics.
# Analyze all results
icm analyze
# Analyze specific result file
icm analyze --result-file icm_results/truthfulqa_gpt2_20240115_143022.jsonlClean old result files, keeping only the latest N results.
icm clean --keep-latest 10Create a config.json file:
{
"search_params": {
"alpha": 30.0,
"initial_temperature": 15.0,
"final_temperature": 0.005,
"max_iterations": 2000
},
"model_params": {
"generation_temperature": 0.8,
"generation_top_p": 0.95
},
"system_params": {
"device": "cuda",
"seed": 123
}
}Set common parameters via environment variables:
export ICM_MODEL="google/gemma-3-1b-it"
export ICM_DEVICE="cuda"
export ICM_LOG_LEVEL="INFO"from icm import ICMSearcher, load_icm_dataset
# Load dataset
dataset = load_icm_dataset("truthful_qa", task_type="truthfulqa")
# Create searcher
searcher = ICMSearcher(
model_name="google/gemma-3-1b-it",
alpha=50.0,
max_iterations=1000
)
# Run ICM search
result = searcher.search(dataset, max_examples=100)
# Access results
print(f"Generated {len(result.labeled_examples)} labeled examples")
print(f"Final score: {result.score:.4f}")from icm import ICMSearcher, ICMDataset, ICMExample
from icm.consistency import LogicalConsistencyChecker, MathConsistencyRule
# Create custom dataset
examples = [
ICMExample("What is 2+2?", {"category": "math"}),
ICMExample("What is 3+3?", {"category": "math"})
]
dataset = ICMDataset(examples)
# Custom consistency checker
checker = LogicalConsistencyChecker([MathConsistencyRule()])
# Advanced searcher
searcher = ICMSearcher(
model_name="google/gemma-3-1b-it",
alpha=30.0,
initial_temperature=20.0,
consistency_checker=checker,
seed=42
)
result = searcher.search(dataset)from icm.storage import ICMStorage
from icm.exporters import ICMExporter
# Save results
storage = ICMStorage("my_results")
storage.save_result(result, "experiment_1")
# Export to DPO format
exporter = ICMExporter(storage)
exporter.export_to_dpo_format(
result.labeled_examples,
"training_data.jsonl"
)
# Push to Hugging Face
exporter.export_to_huggingface(
result.labeled_examples,
repo_id="username/my-icm-dataset",
task_type="classification",
model_name="google/gemma-3-1b-it"
)# Create synthetic math dataset
icm run --model google/gemma-3-1b-it --synthetic math --synthetic-size 500 --max-iterations 500
# Use real GSM8K dataset
icm run --model google/gemma-3-1b-it --dataset gsm8k --task-type gsm8k --max-examples 200# Generate preference dataset
icm run --model google/gemma-3-1b-it --dataset anthropic/hh-rlhf --task-type comparison --alpha 30.0# Export to DPO format for training
icm export --input-path results.jsonl --output-path dpo_data.jsonl --format dpo --create-pairs
# Export analysis report
icm export --input-path results.jsonl --output-path analysis.json --format analysis --include-examplesCUDA Out of Memory:
# Use smaller model, MPS (Apple Silicon), or CPU
icm run --model google/gemma-3-1b-it --device cpu
# or on Apple Silicon:
icm run --model google/gemma-3-1b-it --device mpsModel Loading Errors:
# Verify model name and check internet connection
icm run --model google/gemma-3-1b-it --log-level DEBUGPoor Quality Results:
# Increase alpha or iterations
icm run --model your-model --alpha 100.0 --max-iterations 2000Dataset Configuration Errors:
# ICM now auto-detects both config and split for known datasets
# TruthfulQA: automatically uses config='multiple_choice' and split='validation'
# GSM8K: automatically uses config='main' and split='train'
# Your commands should work automatically:
icm run --model google/gemma-3-1b-it --dataset truthful_qa --task-type truthfulqa
icm run --model google/gemma-3-1b-it --dataset gsm8k --task-type gsm8k
# Or specify manually if needed:
icm run --model google/gemma-3-1b-it --dataset truthful_qa --config multiple_choice --split validation --task-type truthfulqa
icm run --model google/gemma-3-1b-it --dataset gsm8k --config main --task-type gsm8kMemory Usage Issues:
# ICM uses memory-efficient sampling to handle large datasets
# If you still encounter memory issues, reduce the dataset size:
icm run --model google/gemma-3-1b-it --dataset large-dataset --max-examples 50
# Or use a smaller model:
icm run --model distilgpt2 --dataset your-dataset --max-examples 100Enable detailed logging:
icm run --model google/gemma-3-1b-it --dataset your-data --log-level DEBUG --log-file debug.loggit clone https://github.com/codelion/icm.git
cd icm
pip install -e ".[dev]"pytest tests/If you use ICM in your research, please cite:
@software{icm,
title = {ICM: Internal Coherence Maximization},
author = {Asankhaya Sharma},
year = {2025},
publisher = {GitHub},
url = {https://github.com/codelion/icm}
}