- https://www.bu.edu/peaclab/files/2023/08/USENIX_23_Poster.pdf
- https://kili-technology.com/large-language-models-llms/9-open-sourced-datasets-for-training-large-language-models
- National Vulnerability Database: https://nvd.nist.gov/
- Exploit Database: https://www.exploit-db.com/
- Step 0 - define objectives and data source(s) to use -> define title and objectives (on google sheet) by March 8 th
- Step 1 – choose LLM; plan the development tasks; create project repository -> presentation on March 22 nd
- Step 2 – development of the project: software for data collection, LLM exploration and integration -> presentation on May 26 th
- Step 3 – final article writing and final repository updates -> until June 16th
- One-shot or few shot learning, i.e. by simply providing context to the LLM, as well as other forms of prompt engineering
- Retrieval augmented generation, encoding available knowledge in vector databases or other forms
- LLM refinement by re-training or using RLHF (reinforcement learning with human feedback)
- Other architectures of integration of LLMs with other tools: ReWOO, agent-based LLMs, planning, etc.
AI Assistants for Data Tasks with Gemma
- Large Language Models (LLMs) have captured the world's attention and imagination.
- Much of their potential lies in their ability to be adapted to accomplish specialized tasks for a seemingly unlimited number of use cases.
- There’s a massive opportunity to uncover the best methods and approaches for adapting LLMs to new and specialized use cases.
- The goal of this competition is to create a notebook that demonstrates how to use the Gemma LLM to accomplish a data science oriented task https://www.kaggle.com/competitions/data-assistants-with-gemma/overview
AI Assistants for Data Tasks with Gemma Possible tasks:
- Explain or teach basic data science concepts.
- Answer common questions about the Python programming language.
- Summarize Kaggle solution write-ups.
- Explain or teach concepts from Kaggle competition solution write- ups.
- Answer common questions about the Kaggle platform.
Possible ideas:
- Develop a chatbot that can provide advice and discuss topics related with good food habits (e.g. integrating with scientific or other specialized literature)
- Develop an AI assistant that work as a PT (personalized trainer) helping to define training plans and discussing advantages or disadvantages of specific exercises and fitness plans (e.g. to loose weight, for specific health issues or fitness targets, etc)
- Develop a tutor for mathematics for given school levels
- Develop a chatbot that is a political or a sports commentator
- Develop a chatbot that can suggest specific service providers or products in different areas (e.g. integrating the LLM with a web browser or with a product or service catalog)