/DataBerry_Cluster

API for text quality analysis, personal information extraction, data cleaning, toxic comments classification, toxic comments remover, gibberish data identification and much more.

Primary LanguageHTMLApache License 2.0Apache-2.0

DataBerry Cluster

" For a Safe, Secure and Positivity enriched Digital World "

DataBerry Cluster offers wide range of APIs pertaining to multiple domains primarily built on core vision of problem solving.

    1) DataBerry-Text Quality Analysis API for Real-time Data Quality Validation
    2) DataBerry Personal Identifier API for Real-time Personal Information Identification
    3) DataBerry Text Translator API for Real-time translation of text into multiple languages using Google Translate library
    4) DataBerry- Text Cleaner API for Automated Data Cleaning in input text query.
    5) DataBerry-Detoxifier API for Toxic Comments Removal for input text
    6) Domain Specific Sentiment Classification

Choose specific end-points that match your requirements from DataBerry cluster.

API Reference

DataBerry Cluster API

Endpoint Method Description
/text_quality POST Text Quality Analysis API
/personal_identifier POST Personal Identifier API
/translate POST Text Translator API
/datacleaner POST Text Data Cleaner API
/detoxify POST Detoxifier API

1) Text Quality Analysis API

Given a text input, the API will return

  • Query Language
  • Number of Words
  • Query Length
  • Gibberish Score {in active development}
  • Toxicity Level / Score
  • Text Sentiment Score {in active development}

Input URL

URL Link : https://databerrycluster.herokuapp.com/text_quality

JSON Input

{"text": "The patient ordered a pizza and soon after eating it went to ICU in the most dreadfully horrible way and much more story to go."}

JSON Output:

{
    "count_of_words": 26,
    "count_unique_words": 24,
    "language": "english",
    "query_length": 127,
    "toxicity": 0
}

2) Personal Identifier API

Given a text input, the API will extract all personal information such as

  • Dates (Y-M-D Format)
  • Email
  • Gender
  • Phone Number
  • Names {in active development}
  • Geo-location {in active development}
  • Designation {in active development}
  • Education Qualification {in active development}

Input URL

URL Link : https://databerry_cluster.com/personal_identifier

JSON Input

{"text":"Hi Sukanthen,i was born on Oct 15 1999 in Neyveli, @ sukanthen1999@gmail.com. I am a male human with phone number 2A +1-541-754-3010."}

JSON Output:

{
    "dates": "1999-10-15",
    "email": "sukanthen1999@gmail.com.",
    "gender": "male",
    "phone_number": "+1-541-754-3010"
}

3) DataBerry Cluster Cleaner API

Given a text input, the Cleaner API will remove unwanted stopwords, html tags and user-defined words from the input text.

Input URL

URL Link : https://databerry_cluster.com/datacleaner

JSON Input

{
	"text":"Ravi is a boy in Kerala.<p> I have a bad head ache. </p> <p align=center> But the problem is due to overeating and watching TV for long hours </p>",
	"remove_html":"True",
	"remove_stopwords":"True",
	"stopwords_list":"None"
}

JSON Output:

{
    "cleaned_text": "Ravi boy Kerala . I bad head ache . But problem due overeating watching TV long hours"
}

4) Detoxifier API

Given a input query with toxic comments, the API will exactly spot the toxic words and replace the letters of that particular word with asterisk (*). (eg: moron --> *****)

Input URL

URL Link : https://databerry_cluster.com/detoxify

JSON Input

# Dialogue from American Series Silicon Valley (Season 1)
{"text":"Hello, Richard Hendricks. I'm a total fucking retard."}

JSON Output:

{
    "cleaned_text": "Hello, Richard Hendricks. I'm a total ******* ******.",
}

5) Validator API

Personality Identification is an important part of KYC. However, it takes time. To validate your user at the very next second, it always good to check if the identity ID they are typing are valid. Some of them are Aadhaar card, PAN Card, Driving License, Credit cards etc. Given Personality Identifier Numbers as input,the Validator API will return in less than milliseconds if they are valid customer ID.

Input URL

URL Link : https://databerry_cluster.com/e-validation

JSON Input

# Dialogue from American Series Silicon Valley (Season 1)
{
"data":"ABCXY1234Z",
"type_data":"pancard"
}

JSON Output:

{
    "validID":1
}


🔗 Links

linkedin twitter