Data Scientist

M.Tech(Computer Science) Post graduate with 5+ Years hands on experience in Data Science & SAP ABAP technologies which involved in solving complex business problems by leveraging AI & ML. Focus areas include CRM , OCR & Healthcare Technologies.

Technical Skills

  • Python (Programming Language)
  • Machine Learning
  • TensorFlow
  • Data Visualization
  • Natural Language Processing
  • Convolutional Neural Network
  • SQL
  • SAP ABAP

Education

Work Experience

Data Scientist @ Cognier Insights Private Limited (May 2021 - Present)

  • Developed CNN LSTM Model for converting hand-written text into machine-encoded (Digital) text.
  • Developed a customer churn prediction model for a General Insurance company using ensemble ML modelling techniques.
  • Developed an ML model for forecasting high-demand periods for cab bookings based on climatic conditions.

Software Developer @ Appsian Tech Pvt Ltd (May 2019 - April 2021)

  • Analyzed requirements & developed objects based on customer use cases.
  • Prepared Technical design documents & implemented code as per approval.
  • Debugged SAP programs & implemented logic enhancements.
  • Conducted performance grading & testing of ABAP/4 programs & reports.

Projects

CNN LSTM Model for Converting hand-written text into machine-encoded (Digital) text in the Optical Character Recognition (OCR) domain

Publication

  • Developed a Handwritten Character Recognition model for a medical company.
  • The client wanted to convert handwritten doctor prescriptions into digital text.
  • While this can be done manually, it requires a significant amount of effort and time, so the client wanted to automate the process.Therefore, we created a neural network model that takes images as input, reads the text from the images, and converts it into digital text.
  • To solve this problem statement, we used a CNN-LSTM model.
  • Developed model layers, configured activation functions, and implemented normalization and dropout layers to prevent overfitting.
  • Implemented Recurrent Neural Network (RNN) LSTM layers to process sequential data.
  • Executed model training over multiple epochs, continually monitoring loss curves and validation metrics.
  • Conducted rigorous validation of the trained model using separate validation and test datasets.
  • Collaborated with front-end developers to integrate the character recognition model into the user interface

Hand Written character Recognition

  • Developed and implemented a machine learning model to predict customer churn for General Insurance Company, leveraging advanced data mining techniques.
  • The client wanted to identify customers who are likely to churn and understand the impact on the business, allowing them to take proactive retention measures to save the business by targeting these likely churn customers.
  • For this problem, we chose and built machine learning models using suitable classification algorithms such as logistic regression, decision tree, random forest, and gradient boosting machines.These algorithms were trained on the prepared data.
  • Once the models were trained, their performance was assessed using evaluation metrics like accuracy, precision, recall, F1-score, and ROC-AUC.
  • Additionally, we built an ANN model to achieve better accuracy scores.
  • Finally, the churn prediction model was deployed into the production environment and integrated with the customer management system.
  • This helped in identifying at-risk customers and enabled the business to take proactive retention measures.

Customer Churn Prediction

  • Developed a forecasting ML model for an app-based online cab booking service providing company, which operates both bikes and cars in the US.
  • Being a cab booking app company, understanding cab supply and demand could increase the efficiency of their service and enhance user experience by minimizing waiting time.
  • Therefore, they wanted to use machine learning for business growth.
  • Initially, we thought the data might be seasonal, so we used appropriate time series forecasting models such as ARIMA (Autoregressive Integrated Moving Average) and SARIMA (Seasonal Autoregressive Integrated Moving Average). However, we found the data is not seasonal.
  • Consequently, we started working with traditional regression ML algorithms like Decision Tree, Random Forest, XGBoost, SVM, KNN, Gradient Boost, and AdaBoost.
  • The Random Forest Regressor performed exceptionally well on the given data. Therefore, we tuned the Random Forest Regressor model using GridSearchCV and K-Fold Cross-Validation, increasing accuracy by 2%.
  • The performance of the trained models was evaluated using appropriate metrics such as Root Mean Squared Error (RMSE), R², Mean Squared Error (MSE), and Mean Squared Logarithmic Error (MSLE).
  • Finally, the trained model was deployed into a production environment and integrated with the cab booking system to provide real-time demand forecasts.

![Forecasting cab booking demand](/assets/img/Cab booking.png)