/textSQL

Primary LanguageJavaScriptMIT LicenseMIT

Natural Language → SQL

Demo on US Census Data: CensusGPT.com

PRs Welcome Github Stars License GitHub commit activity

How it works:

With CensusGPT, ask any questions related to census data.

These natural language questions get converted to SQL using GPT-3.5 and are then used to query the census database.

Here are some examples:

Screenshot 2023-03-10 at 12 55 44 AM

Roadmap:

We're splitting the roadmap for this project broadly into three categories

Visualizations:

Currently, textSQL only supports visualizing zipcodes and cities on the map using Mapbox. But data can be visualized in many other interesting ways like Bar Charts, Heatmaps and Pie charts. Not every kind of data can be (or should be) visualized on a map. For example, a query like "What percent of total crime in San Francisco is burglary vs in New York City" is perfect for visualizing as a stacked bar chart, but really hard to visualize on map.

[coming soon] Heatmap:

Screenshot 2023-03-10 at 12 58 33 AM

[coming soon] Bar Chart:

Screenshot 2023-03-10 at 1 00 03 AM

Datasets:

A lot of the users of this project have asked for historical census data (trends), weather, health, transportation and real-estate data. Feel free to create a pull request or drop a link to your dataset in this Discord.

More data → Better CensusGPT

Query Interface:

Users build complex queries progressively. They start with a simple query like "Which neighborhoods in LA have the best schools?" and then progressively add details like "with median income that is under $100,000". One of the most powerful things that GPT-3.5 turbo enables is iterating on a query.

Turning search into a chat interface will allow the users to do just that -- iterate on a query and progressively build it.

How to Contribute:

Join our discord

ReadMe for the backend here

ReadMe for the frontend here

Note: Census data, like any other dataset, has its limitations and potential biases. Some data may not be collected or reported uniformly across different regions or time periods, which can affect the comparability of results. Users should keep these limitations in mind when interpreting the results of their queries and exercise caution when making decisions based on census data.