IdentityChecker-KYC

It is a source code for checking new customers' backgrounds to figure out, is appropriate to open an account in banks. It has several stages to confirm customer's information. I have used Spark Cluster for the project because process speed is one of the most important parameters. In this document, I will explain all of the stages as well as I can.

Installation

To run the project you have to install some components

pip install suds,
pip install googletrans==3.1.0a0

Checking Turkish Identification Number

I have used two different validation algorithm. First of them checks, is the identification number format true?

Second validation algorithm checks, is the identicication number valid?. The algorithm needs identification number, name, surname and birth year. If your information is incorrect, it return false

Checking Similarities

I have created an algorithm to calculate how similar the customers are. It checks customers' similarities by looking at words, and dispersion of words. It counts matched words and total words and calculates total percentage of similarity.

Checking Foreign Customers

This project is compatible with foreign customers' information. To provide that point, I used a translator. If the customer's origin is AR or RU, the algorithm translates to Latin Alphabet for the checking with data.

Processing Data

I have used a Spark Cluster. First of all, I created a schema for the customer data and criminal data, after that, I read all of the data, and created a loop to check similarities between the test customer and all data in the cluster so I got tables about the similarity results.