tldrstory: AI-powered understanding of headlines and story text

tldrstory is a framework for AI-powered understanding of headlines and text content related to stories. tldrstory applies zero-shot labeling over text, which allows dynamically categorizing content. This framework also builds a txtai index that enables text similarity search. A customizable Streamlit application and FastAPI backend service allows users to review and analyze the data processed.

Examples

The following links are example applications built with tldrstory.

Election 2020 (Configuration files)

Installation

The easiest way to install is via pip and PyPI

pip install tldrstory

You can also install tldrstory directly from GitHub. Using a Python Virtual Environment is recommended.

pip install git+https://github.com/neuml/tldrstory

Python 3.6+ is supported

Check out troubleshooting link to help resolve environment-specific install issues.

Indexing

Configures indexing of content. Currently supports pulling data via the Reddit API. See this link for more information on setting up a Reddit API account, read-only access is all that is needed.

name

name: string

Application name

schedule

schedule: string

Cron-style string that enables scheduled running of the indexing job. See this link for more information on cron strings.

path

path: string

Where to store model output, path will be created if it doesn't already exist.

api

api.subreddit: name of subreddit to pull from 
api.sort: sort type
api.time: time range
api.queries: list of text queries to run
api.ignore: list of url patterns to ignore

Runs a series of Reddit API queries. See PRAW documentation for more details on this.

labels

labels: dict

Label configuration for zero-shot classifier. This configuration sets a category along with a list of topic values.

Example:

labels:
  topic:
    values: [Label 1, Label 2]

The example above configures the category "Topic" with two possible labels, "Label 1" and "Label 2". Any label can be set here and a large-scale NLP model will be used to categorize input text into those labels.

embeddings

embeddings: dict

Configures a txtai index used for searching topics. See txtai configuration for more details on this.

API

Configures a FastAPI backed interface for pulling indexed data.

path

path: string

Path to a model index.

Application

The default application is powered by Streamlit and driven by a YAML configuration file. The configuration file sets the application name, API endpoint for pulling content, and component configuration. A custom Streamlit application or any other application can be used in place of this to pull content from the API endpoint directly.

name

name: string

Application name

api

api: url

API endpoint for pulling content.

layout

description: string

Markdown string that is used to build a sidebar description.

queries

queries.name: Queries drop down header
queries.values: List of values to use for queries drop down

Configures the query drop down box. This should be a list of pre-canned queries to use. If a value of "Latest" is present, it will query for the last N articles. If a value of "--Search--" is present, it will present another text box to allow entering custom queries.

filters

filters: list

List of slider filters. This should map to the zero-shot labels configured in the indexing section.

chart

chart.name: Chart name
chart.x: Chart x-axis column
chart.y: Chart y-axis column
chart.scale: Color scale for list of colors
chart.colors: List of colors

Allows configuration of a scatter plot that graphs two label points. This chart can be used to plot and apply coloring to applied labels.

table

"column name": dynamic range of coloring

Data table that shows result details. In addition to default columns, this section allows adding additional columns based on the zero-shot labels applied. The default mode is to show the numeric value of the label but a range of text labels can also be applied.

For example:

[0, 5.0, Label 1, "color: #F00"]
[5.0, 10.0, Label 2, "color: #0F0"]

The above would output the text "Label 1" in red for values between 0 and 5. Values between 5 and 10 would output the text "Label 2" in green.

dimroc/tldrstory

tldrstory: AI-powered understanding of headlines and story text

Examples

Installation

Indexing

name

schedule

path

api

labels

embeddings

API

path

Application

name

api

layout

queries

filters

chart

table