/quotation-tool

A tool to extract quotes and other useful information from a text.

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Quotation Tool

Abstract: This QuotationTool can be used to extract quotes from a text. In addition to extracting the quotes, the tool also provides information about who the speakers are, the location of the quotes (and the speakers) within the text, the identified named entities, etc., which can be useful for your text analysis.

Setup

This tool has been designed for use with minimal setup from users. You are able to run it in the cloud and any dependencies with other packages will be installed for you automatically. In order to launch and use the tool, you just need to click the below icon.

  1. This is the preferred link, CILogon authentication is required where you can sign in with your institutional logon or Google/Microsoft account. Binder

If you are unable to access the tool via the first link above, then use the second link below. This is the free version of Binder, with less CPU and memory capacity (up to 2GB only).

  1. This link is for people without Australian institutional affiliations
    Binder

Note: this may take a few minutes to launch as Binder needs to install the dependencies for the tool.

Setting up on your own computer

If you know your way around the command line and are comfortable installing software, you might want to set up your own computer to run this notebook.

Firstly, you need to install the Anaconda Python distribution (You may also need to install Git if you are on Windows). Then, open your terminal (on MacOS) or your Git command line (on Windows) and follow the below steps to set up an environment with all the required packages:

Load the data

Using this tool, you can extract quotes directly from a text file (or a number of text files). Alternatively, you can also extract quotes from a text column inside your excel spreadsheet. You just need to upload your files (.txt, .xlsx or .csv) and access them via the Notebook.

Note: If you have a large number of text files (more than 10MB in total), we suggest you compress (zip) them and upload the zip file instead. If you need assistance on how to compress your file, please check the user guide.

Extract and Display the Quotes

Once your files have been uploaded, you can use the QuotationTool to extract quotes from the text. The quotes, along with their metadata, will be stored in a table format inside a pandas dataframe.

Additionally, using the interactive tool, you can display the text, along with the extracted quotes, speakers and named entities, on the Notebook for further analysis.

Reference

This code has been adapted (with permission) from the GenderGapTracker GitHub page and modified to run on a Jupyter Notebook. The quotation tool’s accuracy rate is evaluated in this article.