The Tunisian Company of Electricity and Gas (STEG) is public and non-administrative. It is responsible for delivering electricity and gas across Tunisia. The company suffered tremendous losses in 200 million Tunisian Dinars due to fraudulent manipulations of meters by consumers.
The challenge aims to detect and recognise clients involved in fraudulent activities using the client's billing history. The findings will enhance the company’s revenues and reduce losses.
- Quick start
- What's included
- Installations
- Results
- Possible improvements
- Creator
- Copyright and license
The challenge launched by the Tunisian Company of Electricity and Gas (STEG) contained two csv files. The first contains an annotated (fraudulent or not) list of clients with their locations and categories. The second one contains a list of invoices for each client over time.
The goal of the current repo is to address the following questions:
- What is the rate of fraudulent clients in the dataset?
- How are the fraudulent clients distributed across regions/ districts?
- What period did the most Fraudulent clients join the company?
- What variables/combination of variables can be used to predict if a client is fraudulent or not?
The notebook contains the code and the results for each step of the modelling process. It is organised in two main steps:
-
Exploratory Data Analysis
- Data type/description
- Clients distribution per region
- Clients distribution per category
-
Data Modeling
- Data cleaning
- Feature engineering
- Feature Selection
- Fraud detection using xgboost
The data analysis in the current notebook require the installation of the latest versions of pandas and numpy. The XGBoost package is also required for classification of the clients using decision trees.
pip install pandas
pip install numpy
pip install xgboost
The hyper-parameters of the model were picked randomly. Optimising these hyper-parameters (e.g., using Optuna) would probably improve the current notebook's results.
Bousbiat Hafsa
Code released under the MIT License.