📚🤖 S2QA: Question Answering on research papers from Semantic Scholar
Have you ever wondered what research papers have to say about your burning questions? Look no further than Semantic Scholar Q&A with GPT-4! 🙌
This Python script lets you enter a question, and it uses the power of Semantic Scholar and OpenAI to generate an answer based on the content of the top research papers. 🤖🔍
🚀 What's New!
- Ollama support. See notebook
- S2QA now supports llama-index! Check out the data loader here on llama hub
- Updated UI with chat interface:
- Keeps the context
- Shows only used references
- Index once for each research space; vectors and index are cached
- Full-text support!
💻 Demo Link :
https://shauryr.github.io/s2qa-demo/
👏 Acknowledgements
This project is in collaboration with the Semantic Scholar Team. I am thankful for their support and feedback.
📘 Notebooks
- ollama.ipynb ⭐ Ollama support demo
- chat_qa.ipynb ⭐ sends the context to ChatGPT(using revChatGPT) for generating answers. This gives the best answers and is free. Seems less prone to hallucinations than other pipelines in this repo.
- utils.py - has all the necessary functions for search and GPT-3 prompting
- s2qa_sources_langchain.ipynb - Get better answers with langchain mapreduce but this is very expensive. Prone to hallucinations.
- s2qa_nb.ipynb - first notebook. Very basic QA. Prone to hallucinations.
⚠️
🚨 Caution Caution should be exercised when using generative AI for scientific text, as the output generated by such models may not be rigorously fact-checked or peer-reviewed, and may therefore contain errors or inaccuracies that could mislead researchers and readers. Be responsible and use this tool with caution.
🤖 Examples
❓ Question
what is the impact of transformers on language
🤖 ChatGPT Response:
Transformers have had a significant impact on language in a variety of ways. One of the most prominent areas of impact is in natural language processing (NLP), which has been revolutionized by the development of transformer-based language models. These models have been used for a variety of NLP tasks, including machine translation [1], code summarization [2], detecting patronizing and condescending language [3], and spoken language understanding tasks [4].
Moreover, transformer-based models have been shown to be effective in domain adaptation for bidirectional transformers [7], multi-modal transformer models have been explored for image captioning [8], and pre-training objectives have been used in software engineering to boost performance in code-related tasks [6]. Additionally, transformers have also been used to incorporate document timestamps into language models, which has shown to have a positive impact on downstream classification tasks and masked language modeling [5].
Overall, transformers have had a significant impact on language, particularly in the field of natural language processing, where transformer-based models have been used for a variety of tasks with great success. They have also been explored in other areas such as software engineering and image captioning, showing the versatility of the technology.
A Study on the journey of Natural Language Processing models: from Symbolic Natural Language Processing to Bidirectional Encoder Representations from Transformers - International Journal of Scientific Research in Computer Science Engineering and Information Technology, 2021
[1]Code Summarization: Do Transformers Really Understand Code? - , 2022
[2]PALI-NLP at SemEval-2022 Task 4: Discriminative Fine-tuning of Transformers for Patronizing and Condescending Language Detection - International Workshop on Semantic Evaluation, 2022
[3]Benchmarking Transformers-based models on French Spoken Language Understanding tasks - Interspeech, 2022
[4]Temporal Language Modeling for Short Text Document Classification with Transformers - Conference on Computer Science and Information Systems, 2022
[5]Automating Code-Related Tasks Through Transformers: The Impact of Pre-training - ArXiv, 2023
[6]Exploiting Auxiliary Data for Offensive Language Detection with Bidirectional Transformers - WOAH, 2021
[7]What Does a Language-And-Vision Transformer See: The Impact of Semantic Information on Visual Representations - Frontiers in Artificial Intelligence, 2021
[8]🤖 Answers with sources and langchain mapreduce
🧰 Requirements
OpenAI API key
(if you are using langchain)Semantic Scholar Academic Graph API key
- https://www.semanticscholar.org/product/api
These can be added in the constants.py
The main third-party package requirements are tiktoken
, openai
, transformers
and langchain
.
To install all the required packages
pip install -r requirements.txt
📍 Pipeline
1️⃣ Searching
: We begin by searching the vast and ever-growing database of Semantic Scholar to find the most up-to-date and relevant papers and articles related to your question.
2️⃣ Re-Ranking
: We then use SPECTER to embed these papers and re-rank the search results, ensuring that the most informative and relevant articles appear at the top of your search results.
3️⃣ Answering
: Finally, we use the powerful natural language processing capabilities of GPT-3 to generate informative and accurate answers to your question, using custom prompts to ensure the best results.
🖊️ Customizable
-
Try other open embedding methods on huggingface to see better re-ranking results.
-
Try other prompts or refine the current prompt to get even better answers.