Data Scientist

utribedi18@gmail.com | (682) 529-1360 | LinkedIn | GitHub | Medium

📌 About Me

Hey there! I'm a curious mind with a passion for all things data. Over the years, I've dabbled in data science, nlp, machine learning, and data engineering, always looking for that "aha!" moment when data reveals its secrets. Whether I'm crafting innovative solutions, untangling data knots, or collaborating with a diverse team, I bring my all to the table. So, if you're looking for someone who's as much about the numbers as the narrative behind them, let's chat!


💼 Experience

NLP Engineer

RevSavvy | Remote
August 2023 – Present

  • Architected a custom PDF parsing service, integrating with a RoBERTa based NLP model via REST API.
  • Set new benchmarks in software release note analysis with precise labeling and confidence scoring.

Data Engineer

CVS Health | Irving, Texas
Jan 2023 – May 2023

  • Improved data quality and consistency through ETL optimization, reducing data errors by 20%
  • Automated data pipelines utilizing PySpark and DAGs using Apache Airflow, resulting in improved system efficiency and reliability
  • Automated the deployment of machine learning models in production using Kubeflow. It involved setting up workflows to train, test, and deploy multiple models, ensuring that they are optimized for performance and scalability
  • Collaborated with Marketing and Sales Teams to identify customer segments and tailor data-driven strategies for the creation of new offers, coupons, and discounts, boosting customer engagement.

Data Scientist

Capstone Project @Canton & Company
August 2022 – Dec 2022

  • Led and managed a team of 6 data scientists to identify different market segments, players in each segment, their capabilities, and the types of clients served by each player in the healthcare domain
  • Developed an AI-based generic web scraper for any client website using custom BERT-based language model. This improved the market landscape assessment process by optimizing 70% of manual tasks.

Graduate Research Assistant

University of Texas at Arlington
Aug 2022 - Dec 2022

Passenger dwell time cost benefit analysis research

(Under the patronage of): the Department of Homeland Security (DHS), Centre for Accelerating Operational Efficiency (CAOE), University of Arizona, University of Texas at Arlington

  • Spearheaded the Data Science part of the research by analyzing all possible factors contributing to increase in passenger dwell times at airports across multiple datasets and leveraging them to build predictive ML models under the mentorship of Dr. Ross Maciejewski and Dr. Randy Napier.

Graduate Teaching Assistant

University of Texas at Arlington
Jan 2022 - Jul 2022

  • Teaching course contents: Delivering lectures on specialized topics as the professor may deem necessary in or in absence of the professor.
  • Preparing course materials: Preparing course materials in the form of power point presentations, handouts, and exam questions.
  • Grading: Helping the professor in evaluating answer sheets and grading.
  • Subjects: Advanced Statistics, Systems Analysis and Design, Enterprise Resource Planning, Operations and Supply Chain Management.

Data Scientist

Upwork
May 2020 - May 2021

  • Analyzed ~2.9M customer churn data as a part of a project to optimize marketing campaigns for a client; leading to a 25% increase in marketing ROI. Presented visualizations and data-driven actionable insights using Tableau dashboards, enabling effective tracking of various KPIs.
  • Leveraged R (dplyr, ggplot2) and Python (Scikit-learn) to conduct statistical analysis and build predictive models (Linear Regression, Random Forest, XGBoost), reducing demand forecasting MAPE by 15% as a part of a project where the supply chain efficiency had to be increased. This improvement led to a 20% decrease in stock-outs and overstock situations
  • Executed A/B tests to evaluate different business strategies, contributing to data-driven decision-making for clients
  • Streamlined ETL processes and data pipeline management with SQL, BigQuery, and Apache Spark, slashing data processing time by 40% and leading to a 10% sales uptick from timely, data-informed campaigns

📚 Publication

  • Stock Market Prediction using Fusion of ARIMA and SVR
    International Conference on Computational Methods in Engineering & Health Sciences (ICCMEH 2023)
    Predicting stock market prices with accuracy is a complicated task and in the financial sector, forecasting stock prices accurately and reliably is continuing to be a major challenge to the researchers. This paper focuses on the evaluation of prediction methods in the financial ecosystem of the stock market considering the closing price of the stock. The historical data of stock price is extracted from Yahoo finance. The paper compares the performance of traditional statistical analysis methods with the fusion of ARIMA and SVR for stock closing price prediction. [Link]

🏆 Certification

  • AWS Certified Solutions Architect – Associate

Amazon Web Services
Issued May 2023
Demonstrated the ability to design distributed systems on AWS. This certification validates knowledge in deploying robust and cost-effective applications on the AWS platform. [Link]


🛠 Skills

  • Programming: Python, Java, R, SQL
  • Data Science Libraries: NumPy, Pandas, scikit-learn, SciPy, TensorFlow, Keras, PyTorch, selenium, matplotlib, NLTK, spaCy, TextBlob
  • Tools: Tableau, AWS, Apache Spark, Snowflake, Docker, Apache Airflow, Kafka, MLFlow, Kubeflow, Jupyter, Jenkins, JIRA, Git, MS Excel
  • Topics: Data Visualization, CI/CD, Statistical Modeling, Machine Learning (regression, clustering, classification algorithms including Random Forest, Logistic Regression, XGBoost, K-Means Clustering, etc.), Natural Language Processing, LLMs-Large Language Models- (BERT, BARD, GPT), Langchain, Pinecone

🎓 Education

MS in Business Analytics

University of Texas at Arlington
Dec 2022
GPA: 3.64

BTech in Electronics and Instrumentation Engineering

Manipal Institute of Technology
May 2020