This is a Question-Answering Bert project for HTML content. The model simply answers the questions using the text of the HTML context, then post-process the answer and return the html snippet/card that contains the predicted answer after removing the unnecessary items like styles, similar to Google featured snippets.
you can read my article about it here
{"html_url": "the url for the article",
"question": "user question",
"article": "HTML article to extract answer from"
}
{"html_url": "the url for the whole article",
"question": "user question",
"article": "HTML article to extract answer from",
"html_snippet": "HTML chunk/section that holds the answer of the question",
"text_snippet": "text chunk/section that holds the answer of the question",
"images": "list of images exists in the article",
"reader": "indicates if model managed to answer or not"
}
Here I use distilbert which is pre-trained on the QA task and only works with text, not HTML, however, I want a model that can returns the HTML version of the answer, to do that I have to search for the answer in the HTML content and find the container element which has the answer in it.
I parsed the HTML as a tree and started looking in each branch for the model predicted answer.
you can use different models by changing the model name in predictor.py
This is deployed using cortex-project.
$ bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/0.18/get-cli.sh)"
cortex deploy
cortex get reader
cortex log reader