Tsunami allows you to easily automate scraping data from numerous sources, and then feed it into large language models for analysis.
- Reads your instructions from the project config json - data sources, models to use, prompts, etc.
- Downloads data from sources
- Cleans the data, formatting documents into readable versions without extra tokens
- Sends each doc/file to be analyzed by an LLM (with your specified prompt)
- Has a model merge each analysis n responses at a time
- Repeats #4 until it has less than m responses, and then merges the final m responses into a final analysis A workspace is created in ./workspace/{project_name} containing all doc/data downloads, each response, and the final analysis. Cost data is output after each response completion, including cumulative cost.
Make sure you know what you are doing and use cheaper models, such as Haiku, until you are familiar with the program.
A condition for using this program is that you take responsibility for all costs incurred through any/all API usage. Do not use the program if you don't accept these terms.
- git clone https://github.com/dnbt777/Tsunami
- run
pip install -r requirements.txt
- Create a file called ".env" in the below format and fill it out with your keys/region/username
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=
AWS_USERNAME=
- Run the example script with
python ./example_project.py -download -analyze
- Claude (AWS bedrock)
Request a model via DM or by opening an Issue.
- Youtube
- Individual video links
- Playlist links
- Arxiv semantic search queries
- Pubmed semantic search queries
- Github
- Repo links
- Queries for repos
See guides in the DOCS-EXAMPLES folder
Submit an issue, DM me on twitter (https://twitter.com/dnbt777), or DM me on github
Documentation RAG Add more models Save logs Add more data sources