Bringing TED experiences to every topic with Gen AI.
extrapolaTED is a pioneering application designed to craft TED-like lectures on any conceivable topic on demand. Through a fusion of AI and human genius, we're pushing the boundaries of education and information dissemination. Our mission is to democratize knowledge, making every recent scientific achievement known and inspiring to all.
- Boundless Exploration: Venture into any subject with TED-like insights powered by Generative AI.
- Holistic Understanding: Utilize a rich tapestry of multi-modal resources, including Arxiv papers and Wiki.
- Creative Generation: Witness the blend of ChatGPT and Stable Diffusion in producing captivating stories.
- Voice Synthesis: Experience smooth, natural narrations courtesy of ElevenLabs API.
- Multi-step API Calls: Harness the power of Wordware for efficient API orchestration.
Embark on a technological odyssey encompassing retrieval, embedding, story, and image generation:
- USearch is our go-to technology for Semantic Vector Search.
- UForm takes charge of Vision Language Understanding.
- Wordware facilitates multi-step API calls, forming a cohesive workflow.
- ChatGPT transforms raw data into engaging narratives, bringing topics to life.
- Stable Diffusion creates captivating visuals that resonate with the generated content.
- ElevenLabs API gives a recognizable voice to our content, making the learning experience more immersive.
Dive deep into the heart of extrapolaTED with a wide array of datasets that serve as the bedrock of our content generation:
- TED Dataset: With over 1,000 transcripts, this dataset provides a profound understanding of the topics already covered in TED talks, aiding in exploring new territories.
- Arxiv Abstracts (
unum-cloud/ann-arxiv-2m
): A treasure trove of 2 million vectorized abstracts summarizing the latest strides in scientific research. - WIT - Wikipedia Images Dataset: A rich collection of well over 3 million images aiding in the visual representation of generated content.
- Wikipedia Abstracts (
wikipedia
): The 6 million abstracts in this dataset are a solid foundation for textual content, providing ground-truth retrieval of factual information.
Get started with extrapolaTED in a breeze:
-
Environment Setup:
- Using Anaconda:
conda env create -f conda.yml conda activate extrapolaTED
- Or manually with PIP:
conda create -n extrapolated python=3.10 pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 pip3 install -r requirements.txt
- Using Anaconda:
-
Data Preparation:
- Execute the following scripts to download and prepare the necessary data:
./download_arxiv_texts.sh ./download_wiki_images.sh python prepare_arxiv_texts.ipynb python prepare_ted_texts.ipynb python prepare_wiki_images.py python prepare_wiki_texts.ipynb
- Execute the following scripts to download and prepare the necessary data:
-
Server Startup:
- Launch the server to power retrieval augmentation:
python server.py
- Launch the server to power retrieval augmentation:
-
Exploration:
- Open your Jupyter Notebook and generate TED-like lectures on your desired topics.
- Or simply open the Wordware Prompt Pipeline and start playing with it!
Your journey toward creating insightful and illuminating TED-like lectures begins now!
- Ash Vardanian: The architect behind USearch, UForm, and the retrieval pipelines.
- Tyler Neylon: The person behind Explacy and maestro of prompting and video generation.
- Robert Chandler: The visionary behind Wordware, the platform that empowers Language model apps.
Navigate through the well-organized directory structure to explore the different facets of extrapolaTED:
├── README.md
├── requirements.txt
├── conda.yml
├── mount_disks.sh
├── download_arxiv_texts.sh
├── download_wiki_images.py
├── download_wiki_images.sh
├── prepare_embeddings.py
├── prepare_arxiv_texts.ipynb
├── prepare_ted_texts.ipynb
├── prepare_wiki_images.py
├── prepare_wiki_texts.ipynb
├── server.py
├── data
│ ├── ann-arxiv-2m
│ │ ├── abstract.e5-base-v2.fbin
│ │ ├── abstract.e5-base-v2.usearch
│ │ ├── title_abstract.parquet
│ │ └── title_abstract.tsv
│ ├── ann-wiki-6m
│ │ ├── abstract.e5-base-v2.fbin
│ │ ├── abstract.e5-base-v2.usearch
│ │ ├── downloads
│ │ ├── title_abstract.parquet
│ │ └── wikipedia
│ └── ann-wiki-images-3m
│ ├── abstract.e5-base-v2.fbin
│ ├── abstract.e5-base-v2.usearch
│ ├── abstract.uform-vl-english.fbin
│ ├── images.uform-vl-english.fbin
│ └── title_abstract.parquet
├── article_generation
│ ├── add_images_to_transcript.ipynb
│ ├── betterfy_prompt.txt
│ ├── example_input_1_french_architecture.json
│ ├── example_transcript_1_french_architecture.json
│ ├── learn_to_generate_articles.ipynb
│ ├── learn_to_generate_images.ipynb
│ └── raw_articles
│ ├── architecture_of_paris.txt
│ ├── french_architecture.txt
│ ├── grand_palais.txt
│ ├── jean_nouvel.txt
│ └── paris_architecture_of_the_belle_epoque.txt
└── use_wordware_ouput
├── grav_wave_astronomy.json
├── make_silence.sh
├── make_video.py
└── superconductors.json
extrapolaTED: Where the quest for knowledge never ends.