/extrapolaTED

Bringing TED experiences to every topic with Gen AI

Primary LanguageJupyter Notebook

extrapolaTED: Unveiling the Unspoken

Bringing TED experiences to every topic with Gen AI.

extrapolaTED Banner

extrapolaTED is a pioneering application designed to craft TED-like lectures on any conceivable topic on demand. Through a fusion of AI and human genius, we're pushing the boundaries of education and information dissemination. Our mission is to democratize knowledge, making every recent scientific achievement known and inspiring to all.

🚀 Features

  • Boundless Exploration: Venture into any subject with TED-like insights powered by Generative AI.
  • Holistic Understanding: Utilize a rich tapestry of multi-modal resources, including Arxiv papers and Wiki.
  • Creative Generation: Witness the blend of ChatGPT and Stable Diffusion in producing captivating stories.
  • Voice Synthesis: Experience smooth, natural narrations courtesy of ElevenLabs API.
  • Multi-step API Calls: Harness the power of Wordware for efficient API orchestration.

💡 Methodology

Technology

Embark on a technological odyssey encompassing retrieval, embedding, story, and image generation:

  • USearch is our go-to technology for Semantic Vector Search.
  • UForm takes charge of Vision Language Understanding.
  • Wordware facilitates multi-step API calls, forming a cohesive workflow.
  • ChatGPT transforms raw data into engaging narratives, bringing topics to life.
  • Stable Diffusion creates captivating visuals that resonate with the generated content.
  • ElevenLabs API gives a recognizable voice to our content, making the learning experience more immersive.

Data

Dive deep into the heart of extrapolaTED with a wide array of datasets that serve as the bedrock of our content generation:

  • TED Dataset: With over 1,000 transcripts, this dataset provides a profound understanding of the topics already covered in TED talks, aiding in exploring new territories.
  • Arxiv Abstracts (unum-cloud/ann-arxiv-2m): A treasure trove of 2 million vectorized abstracts summarizing the latest strides in scientific research.
  • WIT - Wikipedia Images Dataset: A rich collection of well over 3 million images aiding in the visual representation of generated content.
  • Wikipedia Abstracts (wikipedia): The 6 million abstracts in this dataset are a solid foundation for textual content, providing ground-truth retrieval of factual information.

🛠️ Setup

Get started with extrapolaTED in a breeze:

  1. Environment Setup:

    • Using Anaconda:
      conda env create -f conda.yml
      conda activate extrapolaTED
    • Or manually with PIP:
      conda create -n extrapolated python=3.10
      pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
      pip3 install -r requirements.txt
  2. Data Preparation:

    • Execute the following scripts to download and prepare the necessary data:
      ./download_arxiv_texts.sh
      ./download_wiki_images.sh
      python prepare_arxiv_texts.ipynb
      python prepare_ted_texts.ipynb
      python prepare_wiki_images.py
      python prepare_wiki_texts.ipynb
  3. Server Startup:

    • Launch the server to power retrieval augmentation:
      python server.py
  4. Exploration:

    • Open your Jupyter Notebook and generate TED-like lectures on your desired topics.
    • Or simply open the Wordware Prompt Pipeline and start playing with it!

Your journey toward creating insightful and illuminating TED-like lectures begins now!

🌟 The Minds Behind

  • Ash Vardanian: The architect behind USearch, UForm, and the retrieval pipelines.
  • Tyler Neylon: The person behind Explacy and maestro of prompting and video generation.
  • Robert Chandler: The visionary behind Wordware, the platform that empowers Language model apps.

📁 Project Structure

Navigate through the well-organized directory structure to explore the different facets of extrapolaTED:

├── README.md
├── requirements.txt
├── conda.yml
├── mount_disks.sh
├── download_arxiv_texts.sh
├── download_wiki_images.py
├── download_wiki_images.sh
├── prepare_embeddings.py
├── prepare_arxiv_texts.ipynb
├── prepare_ted_texts.ipynb
├── prepare_wiki_images.py
├── prepare_wiki_texts.ipynb
├── server.py
├── data
│   ├── ann-arxiv-2m
│   │   ├── abstract.e5-base-v2.fbin
│   │   ├── abstract.e5-base-v2.usearch
│   │   ├── title_abstract.parquet
│   │   └── title_abstract.tsv
│   ├── ann-wiki-6m
│   │   ├── abstract.e5-base-v2.fbin
│   │   ├── abstract.e5-base-v2.usearch
│   │   ├── downloads
│   │   ├── title_abstract.parquet
│   │   └── wikipedia
│   └── ann-wiki-images-3m
│       ├── abstract.e5-base-v2.fbin
│       ├── abstract.e5-base-v2.usearch
│       ├── abstract.uform-vl-english.fbin
│       ├── images.uform-vl-english.fbin
│       └── title_abstract.parquet
├── article_generation
│   ├── add_images_to_transcript.ipynb
│   ├── betterfy_prompt.txt
│   ├── example_input_1_french_architecture.json
│   ├── example_transcript_1_french_architecture.json
│   ├── learn_to_generate_articles.ipynb
│   ├── learn_to_generate_images.ipynb
│   └── raw_articles
│       ├── architecture_of_paris.txt
│       ├── french_architecture.txt
│       ├── grand_palais.txt
│       ├── jean_nouvel.txt
│       └── paris_architecture_of_the_belle_epoque.txt
└── use_wordware_ouput
    ├── grav_wave_astronomy.json
    ├── make_silence.sh
    ├── make_video.py
    └── superconductors.json

extrapolaTED: Where the quest for knowledge never ends.