Tsunami | Auto Scraping / Cleaning / LLM Analysis

Tsunami allows you to easily automate scraping data from numerous sources, and then feed it into large language models for analysis.

Tsunami:

Reads your instructions from the project config json - data sources, models to use, prompts, etc.
Downloads data from sources
Cleans the data, formatting documents into readable versions without extra tokens
Sends each doc/file to be analyzed by an LLM (with your specified prompt)
Has a model merge each analysis n responses at a time
Repeats #4 until it has less than m responses, and then merges the final m responses into a final analysis A workspace is created in ./workspace/{project_name} containing all doc/data downloads, each response, and the final analysis. Cost data is output after each response completion, including cumulative cost.

⚠️⚠️WARNING⚠️⚠️

⚠️⚠️⚠️ AUTOMATED ANALYSIS OF LARGE AMOUNTS OF DATA CAN BE EXTREMELY EXPENSIVE ⚠️⚠️⚠️

Make sure you know what you are doing and use cheaper models, such as Haiku, until you are familiar with the program.

Terms/Conditions

A condition for using this program is that you take responsibility for all costs incurred through any/all API usage. Do not use the program if you don't accept these terms.

Quick start

git clone https://github.com/dnbt777/Tsunami
run pip install -r requirements.txt
Create a file called ".env" in the below format and fill it out with your keys/region/username

AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=
AWS_USERNAME=

Run the example script with python ./example_project.py -download -analyze

Currently supported

Models:

Claude (AWS bedrock)

Request a model via DM or by opening an Issue.

Data sources:

Youtube
- Individual video links
- Playlist links
Arxiv semantic search queries
Pubmed semantic search queries
Github
- Repo links
- Queries for repos

Usage - Documentation/Examples

See guides in the DOCS-EXAMPLES folder

Support

Submit an issue, DM me on twitter (https://twitter.com/dnbt777), or DM me on github

TODO

Documentation RAG Add more models Save logs Add more data sources

dnbt777/Tsunami