An implementation of EMMA (End-to-End Multimodal Model for Autonomous Driving) using the Claude API, based on the EMMA paper. This implementation uses Claude for trajectory prediction and scene understanding instead of the original Gemini model.
- End-to-end autonomous driving trajectory prediction
- Integration with nuScenes dataset
- Real-time visualization tools for predictions
- Scene understanding and critical object detection
- Comprehensive evaluation metrics
- Command-line interface for different operations
- First, install system dependencies and set up the Python environment:
# Make the setup script executable
chmod +x scripts/setup.sh
# Run the setup script
./scripts/setup.sh
- Copy the environment file and configure your credentials:
cp .env.example .env
- Allow direnv:
direnv allow
If you prefer to install dependencies manually:
- Install system dependencies (Ubuntu/Debian):
sudo apt-get update
sudo apt-get install -y \
build-essential \
python3-dev \
gcc \
pkg-config \
libfreetype6-dev \
libpng-dev \
python3-matplotlib
- Create and activate virtual environment:
uv venv --python python3.10
source .venv/bin/activate
- Install Python dependencies:
uv pip install -r requirements.txt
uv pip install -r requirements-dev.txt
Copy .env.example
to .env
and fill in your credentials:
cp .env.example .env
Required environment variables:
ANTHROPIC_API_KEY
: Your Claude API keyNUIMAGES_DATAROOT
: Path to your nuScenes dataset
This project uses direnv
to automatically manage environment variables and virtual environments.
- Install direnv:
# On macOS
brew install direnv
# On Ubuntu/Debian
sudo apt-get install direnv
# On Fedora
sudo dnf install direnv
- Add direnv hook to your shell configuration:
For bash (~/.bashrc
):
eval "$(direnv hook bash)"
For zsh (~/.zshrc
):
eval "$(direnv hook zsh)"
For fish (~/.config/fish/config.fish
):
direnv hook fish | source
- Allow direnv in the project directory:
direnv allow
The included .envrc
will automatically:
- Create and activate a Python virtual environment using
uv
- Set up the PYTHONPATH
- Load environment variables from
.env
- Configure development paths
Note: Make sure to copy .env.example
to .env
and fill in your credentials:
cp .env.example .env
This project uses the nuScenes dataset. There are different versions available:
This project uses the nuScenes image data for autonomous driving predictions. For development and testing, we recommend using the mini dataset (~4GB).
After downloading, your data directory should look like this:
/data/sets/nuimages/
samples/ - Sensor data for keyframes (annotated images)
sweeps/ - Sensor data for intermediate frames (unannotated images)
v1.0-mini/ - JSON tables with metadata and annotations
-
Create an account at nuScenes website and accept the Terms of Use.
-
Download the following files for the mini set:
v1.0-mini
(metadata and annotations)samples
(keyframe images)sweeps
(intermediate frame images)
-
Extract the archives to your data directory without overwriting folders that occur in multiple archives.
-
Update your
.env
file with the dataset path:
NUIMAGES_DATAROOT={WORSPACE_DIR}/data/sets/nuimages
Install and test the nuscenes-devkit:
# Install devkit
uv pip install nuscenes-devkit
# Verify setup (in python)
from nuimages import NuImages
nusc = NuImages(version='v1.0-mini', dataroot='{WORSPACE_DIR}/data/sets/nuimages', verbose=True, lazy=True)
Note: While the full nuScenes dataset includes lidar, radar, and map data, this project focuses only on the image data for Claude-based predictions.
- Process a single sample:
python -m src.scripts.cli predict sample_token_123 --output-dir outputs
- Run evaluation:
python -m src.scripts.cli evaluate --num-samples 100 --output-dir eval_results
- Visualize predictions:
python -m src.scripts.cli visualize sample_token_123 predictions/sample_123.json
from src.model.emma import ClaudeEMMA
from src.data.nuimages_loader import NuImagesLoader
from src.visualization.visualizer import EMMAVisualizer
# Initialize components
emma = ClaudeEMMA(api_key="your-api-key")
nuim_loader = NuImagesLoader(dataroot="path/to/nuimages")
visualizer = EMMAVisualizer()
# Process a sample
sample_data = nuim_loader.get_sample_data(sample_token)
prediction = emma.predict_trajectory(
camera_images=sample_data.images,
ego_history=sample_data.ego_history,
command=sample_data.command
)
# Visualize results
visualizer.visualize_prediction(
front_image=sample_data.images['CAM_FRONT'],
prediction=prediction,
ground_truth=sample_data.ground_truth,
save_path="prediction.png"
)
Run tests:
pytest tests/
Format code:
black src/ tests/
ruff check src/ tests/
mypy src/
The implementation includes several metrics for evaluating prediction quality:
- Average Displacement Error (ADE)
- Final Displacement Error (FDE)
- Trajectory Smoothness
- Scene Understanding Accuracy
The visualization module provides:
-
Camera View:
- Object detections with distance annotations
- Critical object highlighting
-
Bird's Eye View:
- Predicted trajectory
- Ground truth trajectory (when available)
- Critical objects with velocity vectors
- Ego vehicle position and orientation
-
Text Description:
- Scene analysis
- Critical object list
- Reasoning explanation
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
MIT License - see LICENSE file for details.
If you use this implementation in your research, please cite both the original EMMA paper and this implementation:
@article{hwang2024emma,
title={EMMA: End-to-End Multimodal Model for Autonomous Driving},
author={Hwang, Jyh-Jing and others},
journal={arXiv preprint arXiv:2410.23262},
year={2024}
}