Internal Coherence Maximization (ICM)

ICM (Internal Coherence Maximization) is a Python tool for unsupervised elicitation of language models. Based on the paper "Unsupervised Elicitation of Language Models", ICM fine-tunes pretrained language models on their own generated labels without external supervision.

Key Features

Unsupervised Learning: Generate high-quality labeled datasets without human supervision
Mutual Predictability: Find labels that are logically consistent and mutually predictable
Multiple Task Types: Support for classification, comparison, mathematical reasoning, and more
Flexible Export: Export to various formats (DPO, CSV, JSON) and push to Hugging Face

Installation

From Source

git clone https://github.com/codelion/icm.git
cd icm
pip install -e .

Dependencies

pip install -r requirements.txt

Quick Start

Basic Usage

Generate a labeled dataset using ICM:

icm run --model google/gemma-3-1b-it --dataset truthful_qa --task-type truthfulqa --max-examples 100

Export to Training Format

icm export --input-path icm_results/truthfulqa_dialoGPT_20240115_143022.jsonl --output-path truthfulqa_dpo.jsonl --format dpo

Push to Hugging Face

icm push --input-path truthfulqa_dpo.jsonl --hf-repo-id your-username/icm-truthfulqa-dataset

Try Now

Use Case	Dataset	Link
Fine-tuning the model	dpo dataset

Algorithm Overview

ICM uses two key components:

Mutual Predictability: Measures how well the model can predict each label given all other labels
Logical Consistency: Enforces simple logical constraints to prevent degenerate solutions

The algorithm uses simulated annealing to search for optimal label assignments that maximize:

U(D) = α × P_θ(D) - I(D)

Where:

P_θ(D) is the mutual predictability score
I(D) is the inconsistency penalty
α balances the two terms

Supported Tasks

TruthfulQA (Truthfulness)

# Fully automatic - detects config='multiple_choice' and split='validation'
icm run --model google/gemma-3-1b-it --dataset truthful_qa --task-type truthfulqa

# Or explicitly specify parameters
icm run --model google/gemma-3-1b-it --dataset truthful_qa --config multiple_choice --split validation --task-type truthfulqa

GSM8K (Mathematical Reasoning)

# Fully automatic - detects config='main'
icm run --model google/gemma-3-1b-it --dataset gsm8k --task-type gsm8k

# Or explicitly specify parameters
icm run --model google/gemma-3-1b-it --dataset gsm8k --config main --task-type gsm8k

Custom Datasets

icm run --model google/gemma-3-1b-it --dataset path/to/dataset.jsonl --task-type classification

Synthetic Datasets

ICM can generate synthetic datasets for testing and experimentation. These are perfect for:

Testing ICM: Validate the algorithm on simple, verifiable tasks
Quick experiments: Generate datasets instantly without external dependencies
Educational purposes: Understand how ICM works with clear logical relationships

Available Synthetic Types

Math Dataset (`--synthetic math`)

Generates simple addition problems with both correct and incorrect solutions:

Example Output:

Question: What is 42 + 17?
Claim: 42 + 17 = 59
I think this Claim is [True/False]

How it works:

Random numbers between 1-100
Creates correct solutions (True labels)
Creates incorrect solutions with random errors (False labels)
Double the requested size: --synthetic-size 500 creates 1000 examples (500 correct + 500 incorrect)
Perfectly balanced: 50% True, 50% False labels

Comparison Dataset (`--synthetic comparison`)

Generates number comparison tasks:

Example Output:

Query: Which number is larger?
Response A: 73
Response B: 45
Claim: Response A is larger than Response B
I think this Claim is [True/False]

How it works:

Random pairs of numbers
True/False based on actual comparison
Single example per iteration (not doubled)

Usage Examples

# Math problems - creates 1000 examples (500 pairs)
icm run --model google/gemma-3-1b-it --synthetic math --synthetic-size 500

# Number comparisons - creates 300 examples  
icm run --model google/gemma-3-1b-it --synthetic comparison --synthetic-size 300

# Quick test with defaults (100 examples)
icm run --model google/gemma-3-1b-it --synthetic math

Why Use Synthetic Datasets?

Instant generation: No need to download or configure external datasets
Verifiable ground truth: Clear logical relationships for validation
Reproducible: Consistent results with same seed
Perfect for testing: Simple tasks ideal for algorithm validation
No dependencies: Works offline without internet connection

Dataset Format

All synthetic examples follow the standard ICM format:

{
  "input": "Question: What is 42 + 17?\nClaim: 42 + 17 = 59\nI think this Claim is [True/False]",
  "metadata": {
    "gold_label": "True",
    "task": "math"
  }
}

Command Reference

`icm run`

Run ICM on a dataset to generate labeled examples.

Required Arguments:

--model: Model name or path (e.g., google/gemma-3-1b-it)

Dataset Arguments:

--dataset: Dataset name or path
--task-type: Task type (auto, classification, comparison, truthfulqa, gsm8k)
--split: Dataset split (default: train)
--max-examples: Maximum examples to process

Synthetic Dataset Options:

--synthetic: Create synthetic dataset (math, comparison)
--synthetic-size: Number of synthetic examples to generate (default: 100)

ICM Algorithm Parameters:

--alpha: Weight for mutual predictability vs consistency (default: 100.0)
--initial-temperature: Starting temperature for simulated annealing (default: 3.0)
--final-temperature: Ending temperature (default: 0.001)
--cooling-rate: Temperature cooling rate (default: 0.98)
--initial-examples: Number of initial random examples (default: 20)
--max-iterations: Maximum search iterations (default: 1000)

Generation Parameters:

--generation-temperature: Temperature for text generation (default: 0.2)
--generation-top-p: Top-p for nucleus sampling (default: 0.9)
--generation-max-tokens: Maximum tokens to generate (default: 512)

System Parameters:

--device: Computation device (cuda, cpu, auto)
--seed: Random seed for reproducibility (default: 42)
--log-level: Logging level (DEBUG, INFO, WARNING, ERROR)

`icm export`

Export ICM results to various formats.

Required Arguments:

--input-path: Path to ICM result file
--output-path: Output file path
--format: Export format (json, dpo, csv, analysis)

Optional Arguments:

--include-stats: Include statistics in JSON export
--create-pairs: Create chosen/rejected pairs for DPO format
--hf-push: Push to Hugging Face after export
--hf-repo-id: Hugging Face repository ID
--private: Make Hugging Face repository private

`icm push`

Push files to Hugging Face Hub.

Required Arguments:

--input-path: Local file path to upload
--hf-repo-id: Hugging Face repository ID (e.g., username/dataset-name)

Optional Arguments:

--file-name: Custom filename in repository
--private: Make repository private

`icm list`

List all saved ICM results.

icm list --results-dir icm_results

`icm analyze`

Analyze ICM results and show statistics.

# Analyze all results
icm analyze

# Analyze specific result file
icm analyze --result-file icm_results/truthfulqa_gpt2_20240115_143022.jsonl

`icm clean`

Clean old result files, keeping only the latest N results.

icm clean --keep-latest 10

Configuration

Using Configuration Files

Create a config.json file:

{
  "search_params": {
    "alpha": 30.0,
    "initial_temperature": 15.0,
    "final_temperature": 0.005,
    "max_iterations": 2000
  },
  "model_params": {
    "generation_temperature": 0.8,
    "generation_top_p": 0.95
  },
  "system_params": {
    "device": "cuda",
    "seed": 123
  }
}

Environment Variables

Set common parameters via environment variables:

export ICM_MODEL="google/gemma-3-1b-it"
export ICM_DEVICE="cuda"
export ICM_LOG_LEVEL="INFO"

Python API

Basic Usage

from icm import ICMSearcher, load_icm_dataset

# Load dataset
dataset = load_icm_dataset("truthful_qa", task_type="truthfulqa")

# Create searcher
searcher = ICMSearcher(
    model_name="google/gemma-3-1b-it",
    alpha=50.0,
    max_iterations=1000
)

# Run ICM search
result = searcher.search(dataset, max_examples=100)

# Access results
print(f"Generated {len(result.labeled_examples)} labeled examples")
print(f"Final score: {result.score:.4f}")

Advanced Usage

from icm import ICMSearcher, ICMDataset, ICMExample
from icm.consistency import LogicalConsistencyChecker, MathConsistencyRule

# Create custom dataset
examples = [
    ICMExample("What is 2+2?", {"category": "math"}),
    ICMExample("What is 3+3?", {"category": "math"})
]
dataset = ICMDataset(examples)

# Custom consistency checker
checker = LogicalConsistencyChecker([MathConsistencyRule()])

# Advanced searcher
searcher = ICMSearcher(
    model_name="google/gemma-3-1b-it",
    alpha=30.0,
    initial_temperature=20.0,
    consistency_checker=checker,
    seed=42
)

result = searcher.search(dataset)

Storage and Export

from icm.storage import ICMStorage
from icm.exporters import ICMExporter

# Save results
storage = ICMStorage("my_results")
storage.save_result(result, "experiment_1")

# Export to DPO format
exporter = ICMExporter(storage)
exporter.export_to_dpo_format(
    result.labeled_examples,
    "training_data.jsonl"
)

# Push to Hugging Face
exporter.export_to_huggingface(
    result.labeled_examples,
    repo_id="username/my-icm-dataset",
    task_type="classification",
    model_name="google/gemma-3-1b-it"
)

Examples

Generate Math Dataset

# Create synthetic math dataset
icm run --model google/gemma-3-1b-it --synthetic math --synthetic-size 500 --max-iterations 500

# Use real GSM8K dataset  
icm run --model google/gemma-3-1b-it --dataset gsm8k --task-type gsm8k --max-examples 200

Comparison Tasks

# Generate preference dataset
icm run --model google/gemma-3-1b-it --dataset anthropic/hh-rlhf --task-type comparison --alpha 30.0

Export and Use

# Export to DPO format for training
icm export --input-path results.jsonl --output-path dpo_data.jsonl --format dpo --create-pairs

# Export analysis report
icm export --input-path results.jsonl --output-path analysis.json --format analysis --include-examples

Troubleshooting

Common Issues

CUDA Out of Memory:

# Use smaller model, MPS (Apple Silicon), or CPU
icm run --model google/gemma-3-1b-it --device cpu
# or on Apple Silicon:
icm run --model google/gemma-3-1b-it --device mps

Model Loading Errors:

# Verify model name and check internet connection
icm run --model google/gemma-3-1b-it --log-level DEBUG

Poor Quality Results:

# Increase alpha or iterations
icm run --model your-model --alpha 100.0 --max-iterations 2000

Dataset Configuration Errors:

# ICM now auto-detects both config and split for known datasets
# TruthfulQA: automatically uses config='multiple_choice' and split='validation'
# GSM8K: automatically uses config='main' and split='train'

# Your commands should work automatically:
icm run --model google/gemma-3-1b-it --dataset truthful_qa --task-type truthfulqa
icm run --model google/gemma-3-1b-it --dataset gsm8k --task-type gsm8k

# Or specify manually if needed:
icm run --model google/gemma-3-1b-it --dataset truthful_qa --config multiple_choice --split validation --task-type truthfulqa
icm run --model google/gemma-3-1b-it --dataset gsm8k --config main --task-type gsm8k

Memory Usage Issues:

# ICM uses memory-efficient sampling to handle large datasets
# If you still encounter memory issues, reduce the dataset size:
icm run --model google/gemma-3-1b-it --dataset large-dataset --max-examples 50

# Or use a smaller model:
icm run --model distilgpt2 --dataset your-dataset --max-examples 100

Debug Mode

Enable detailed logging:

icm run --model google/gemma-3-1b-it --dataset your-data --log-level DEBUG --log-file debug.log

Development Setup

git clone https://github.com/codelion/icm.git
cd icm
pip install -e ".[dev]"

Running Tests

pytest tests/

Citation

If you use ICM in your research, please cite:

@software{icm,
  title = {ICM: Internal Coherence Maximization},
  author = {Asankhaya Sharma},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/codelion/icm}
}

Related Work

Eliciting Fine-Tuned Transformer Capabilities: Paper
Weak-to-Strong Generalization: Paper
Constitutional AI: Paper
Discovering Latent Knowledge: Paper

codelion/icm

Internal Coherence Maximization (ICM)

Key Features

Installation

From Source

Dependencies

Quick Start

Basic Usage

Export to Training Format

Push to Hugging Face

Try Now

Algorithm Overview

Supported Tasks

TruthfulQA (Truthfulness)

GSM8K (Mathematical Reasoning)

Custom Datasets

Synthetic Datasets

Available Synthetic Types

Math Dataset (--synthetic math)

Comparison Dataset (--synthetic comparison)

Usage Examples

Why Use Synthetic Datasets?

Dataset Format

Command Reference

icm run

icm export

icm push

icm list

icm analyze

icm clean

Configuration

Using Configuration Files

Environment Variables

Python API

Basic Usage

Advanced Usage

Storage and Export

Examples

Generate Math Dataset

Comparison Tasks

Export and Use

Troubleshooting

Common Issues

Debug Mode

Development Setup

Running Tests

Citation

Related Work

Math Dataset (`--synthetic math`)

Comparison Dataset (`--synthetic comparison`)

`icm run`

`icm export`

`icm push`

`icm list`

`icm analyze`

`icm clean`