/2-Miilion-Invoice-Data-Analysis

Data Cleaning, Web scrapping, Distributed System

Primary LanguagePython

Invoice Data Analysis Using Python and SQL

Classify these invoice data into different types and get a cleaner data for further study.

About this project

An invoice is an essential record of everyday transactions. Whether we purchase a coffee, a cigarette, or a cake to celebrate a friend's birthday, the list of items on an invoice can often be ambiguous and difficult to interpret. It is crucial, however, to have a clear understanding of the products or services we buy and the corresponding amounts we pay for them. To address this issue, I am eager to explore methods for accurately labeling invoice data. By categorizing invoice data, we can gain valuable insights into customer behaviors and market trends, allowing us to develop more effective business strategies.

What I have achieved

  • Create a database for storage of 2 million invoice data

  • Classified 40000 unique items in one day with distributed system

Technologies used ⚙️:

  • Python Python
  • MSSQL mssql
  • Excel excel
Python Libraries :
  • Selenium selenium | Multiprocessing multiprocess | Pandas pandas | Numpy numpy

Share My Results

  • Work flow

flow