/text-analysis-on-finra-docs

The notebook uses several text mining techniques to extract essential features from a corpus of legal documents and then uses these features to answer several research questions and identify the driving factors that influence the FINRA arbitration decisions.

Primary LanguageJupyter Notebook

FINRA operates the largest securities dispute resolution forum in the United States, and has extensive experience in providing a fair, efficient and effective venue to handle a securities-related dispute.

Arbitration and mediation are two distinct ways of resolving securities and business disputes between and among investors, brokerage firms and individual brokers, and offer a prompt and inexpensive means of resolving issues.

Our data set encompasses all such cases which are extracted from the FINRA website and then converted to machine readable text files for easier processing.

Research Methodologies

The focus of this research is looking at the effects of different types of claimant representation on the outcome of cases. This was further distilled to looking at the effect of attorney versus non- attorney representation. The working hypothesis was as follows: a claimant is more likely to achieve a favorable outcome if he or she hires an attorney representative as opposed to hiring a non-attorney representative or hiring no representative at all (appearing as a pro se claimant).

Grouping on the basis of claimant Representation

  1. Pro Se: grouping all instances when the claimant name matched the claimant representative and there was only one representative.
  2. One non-attorney cases: This group of non-attorney representatives was comprised of single registered NARs, self-styled claimant advocates, and a few other unidentified types of representatives who could be friends, family members,or corporate representatives.
  3. One attorney cases: Cases with a single attorney who doesn't belong to NARs.
  4. Multiple Attorney cases: Cases with more than one claimant representative

Binning into Claims

The various claims were binned into 16 comprehensive types to aid further analysis.

Additional Features

Several other features such as Length of Case, Region of the case, type of claimant were also mined from the text corpus and added to the research data set.

CART analysis results

  1. Regression analysis on the data showed that while there was a significant difference between the likelihood of winning as a pro se claimant and as a claimant with a representative, there was not much distinction between the win rate of the attorney representative and the non-attorney representatives. The pro se group fared much worse compared to the non-attorney representative or attorney representative groups.

  2. Statistical tests such as Fischer test also stressed that there exists a significant difference between the Pro se and One Attorney Cases. While there was not significant difference between the one and many case groups based on the case outcomes.

  3. Regions also had a significant impact on the outcome of the cases. Using the Midwest as the baseline for measurement, we found that claimants were more successful in the West and would be most successful in the Northeast, all other factors the same. The South did not yield a statistically significant difference for the purpose of our analysis.