Enterpret-Claims-Extractor is a Python library designed to extract verifiable atomic claims from various types of records. These records can include conversations, feedback, surveys, or any similar textual data. The library provides an efficient way to analyze and extract meaningful information from unstructured text.
- Extract atomic claims from various types of records
- Support for CSV file input
- Custom record definition
- Visualization of extracted claims and their sources
Install Claims Extractor using pip:
pip install enterpret-claims-extractor
- Create a .env file in the root directory of your application and add your OpenAI API Key
OPENAI_API_KEY=sk-*************************
- Run the following command in the terminal
export OPENAI_API_KEY=sk-*************************
- Initialise the
extractor
and it will prompt you for your OPENAI_API_KEY
from enterpret_claims_extractor.extractor import ClaimsExtractor
from enterpret_claims_extractor.utils import read_records_from_csv
There are two main ways to use the Claims Extractor:
- Reading records from a CSV file
- Defining records manually
# Initialize the extractor
extractor = ClaimsExtractor()
# Read records from a CSV file
records = read_records_from_csv('./data/main_df.csv', row_ids=['3c5dfb85-23bb-5cd6-bc75-f652802d3721'])
You can also define records manually as a list of dictionaries:
# Define records manually
records = [
{
"id": "3c5dfb85-23bb-5cd6-bc75-f652802d3721",
"url": "https://abcd.com/1",
"type": "RecordTypeConversation",
"source": "Example Source 1",
"timestamp": "2023-07-04T12:00:00Z",
"content": "User: Hello\nAgent: Hi there! How can I assist you today?\nUser: I'm having trouble with my order.\nAgent: I'm sorry to hear that. Can you provide me with your order number?"
},
]
# Extract claims
results = extractor.extract_claims(records)
# Print results
print(results)
[
{
'record_id': '3c5dfb85-23bb-5cd6-bc75-f652802d3721',
'claim_indices': [2, 3, 5]
}
]
Use the record Id to view the extracted claims
extracted_claims = extractor.view_extracted_claims("3c5dfb85-23bb-5cd6-bc75-f652802d3721")
print(extracted_claims)
{
2: 'Expansion to multi-channel + multi-modal feedback analysis',
3: 'Setup of relevant dashboards and training sessions',
5: 'Backfill not counting towards consumption quota'
}
claims_sources = extractor.view_claim_source("3c5dfb85-23bb-5cd6-bc75-f652802d3721")
print(claims_sources)
{
2: 'We are thrilled to expand the value of Enterpret from multi-channel textual feedback to multi-channel + multi-modal feedback analysis for the HopSkipDrive team.',
3: "We'll get started on the Amazon Connect integration in our next sprint only (starts next Wednesday), and work with you closely for questions and clarifications to get that live, before setting up relevant dashboards and training sessions on the support call data.",
5: 'Also, confirming, that as part of building the integration, we will backfill all support calls from January to March 2024, which will not count towards the consumption quota.'
}
tokenized_input = extractor.tokenized_inputs['3c5dfb85-23bb-5cd6-bc75-f652802d3721']
print(tokenized_input)
{
1: 'Agent: Hi Corey McMahon - Confirming that our partnership amendment to include the support calls is now fully executed.',
2: 'We are thrilled to expand the value of Enterpret from multi-channel textual feedback to multi-channel + multi-modal feedback analysis for the HopSkipDrive team.',
3: "We'll get started on the Amazon Connect integration in our next sprint only (starts next Wednesday), and work with you closely for questions and clarifications to get that live, before setting up relevant dashboards and training sessions on the support call data.",
4: 'Matt Miller will coordinate from our end throughout the process.',
5: 'Also, confirming, that as part of building the integration, we will backfill all support calls from January to March 2024, which will not count towards the consumption quota.',
6: 'User: Matt Miller I will be OOO next week, so if there is anything you need ahead of time let me know.',
7: 'Otherwise, we can discuss when I get back.'
}
The main class for extracting claims from records.
extract_claims(records)
: Extracts claims from the given records.view_extracted_claims(record_id)
: Returns the extracted claims for a specific record.view_claim_source(record_id)
: Returns the source of the claims based on their indices.
A utility function to read records from a CSV file.
file_path
: Path to the CSV file.row_ids
: (Optional) List of specific row IDs to read from the CSV.