/DAViD

Primary LanguagePythonMIT LicenseMIT

DAViD: Data-efficient and Accurate Vision Models from Synthetic Data

The repo accompanies the ICCV 2025 paper DAViD: Data-efficient and Accurate Vision Models from Synthetic Data and contains instructions for downloading and using the SynthHuman dataset and models described in the paper.

📊 The SynthHuman Dataset

Face DataFull Body DataUpper Body Data

The SynthHuman dataset contains approximately 300,000 images of synthetic humans with ground-truth annotations for foreground alpha masks, absolute depth, surface normals and camera intrinsics. There are approximately 100,000 images for each of three camera scenarios: face, upper-body and full-body. The data is generated using the latest version of our synthetic data generation pipeline, which has been used to create a number of datasets: Face Synthetics, SimpleEgo and SynthMoCap. Ground-truth annotations are per-pixel with perfect accuracy due to the graphics-based rendering pipeline:

Face GT Full Body GT

Data Format

The dataset contains 298008 samples. There first 98040 samples feature the face, the next 99976 sample feature the full body and the final 99992 samples feature the upper body. Each sample is made up of:

  • rgb_0000000.png - RGB image
  • alpha_0000000.png - foreground alpha mask
  • depth_0000000.exr - absolute z-depth image in cm
  • normal_0000000.exr - surface normal image (XYZ)
  • cam_0000000.txt - camera intrinsics (see below)

The camera text file includes the standard intrinsic matrix:

f_x 0.0 c_x
0.0 f_y c_y
0.0 0.0 1.0

Where f_x, and f_y are in pixel units. This can be easily loaded with np.loadtxt(path_to_camera_txt).

Downloading the Dataset

The dataset is broken in 60 zip files to make downloading easier. Each zip file contains 5000 samples and has a maximum size of 8.75GB. The total download size is approximately 330GB. To download the dataset simply run download_data.py TARGET_DIRECTORY [--single-sample] [--single-chunk] which will download and unzip the zips into the target folder. You can optionally download a single sample or a single chunk to quickly take a look at the data.

Loading the Dataset

You can visualize samples from the dataset using visualize_data.py SYNTHHUMAN_DIRECTORY [--start-idx N]. This script shows examples of how to load the image files correctly and display the data.

Dataset License

The SynthHuman dataset is licensed under the CDLA-2.0. The download and visualization scripts are licensed under the MIT License.

🔓 Released Models

We release models for the following tasks:

Task Version ONNX Model Model Card
Soft Foreground Segmentation Base Download Model Card
Large Download
Relative Depth Estimation Base Download Model Card
Large Download
Surface Normal Estimation Base Download Model Card
Large Download
Multi-Task Model Large Download Model Card

🚀 Run the Demo

This demo supports running:

  • Relative depth estimation
  • Soft foreground segmentation
  • Surface normal estimation

To install the requirements for running demo:

pip install -r requirement.txt

You can use either run:

  1. A multi-task model that performs all tasks simultaneously
python demo.py \
  --image path/to/input.jpg \
  --multitask-model models/multitask.onnx
  1. Or using individual models
python demo.py \
  --image path/to/input.jpg \
  --depth-model models/depth.onnx \
  --foreground-model models/foreground.onnx \
  --normal-model models/normal.onnx

🧠 Notes:

  • The script expects ONNX models. Ensure the model paths are correct.
  • If both multi-task and individual models are provided, results from both will be shown and compared.
  • Foreground masks are used for improved visualization of depth and normals.

Here is an example output image after running the demo:

Example results

Model License

DAViD models and runtime code are licensed under the MIT License.

📖 Citation

If you use the SynthHuman Dataset or any of the DAViD models in your research, please cite the following:

@misc{saleh2025david,
    title={{DAViD}: Data-efficient and Accurate Vision Models from Synthetic Data},
    author={Fatemeh Saleh and Sadegh Aliakbarian and Charlie Hewitt and Lohit Petikam and Xiao-Xian and Antonio Criminisi and Thomas J. Cashman and Tadas Baltrušaitis},
    year={2025},
    eprint={2507.15365},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2507.15365},
}