data-exploration-and-preprocessing

There are 21 repositories under data-exploration-and-preprocessing topic.

  • vector-io

    AI-Northstar-Tech/vector-io

    The only Vector tooling you'll need. Star the repo and look out for an email to try out a brand new Vector Data Exploration demo! Use the universal VDF format for vector datasets to easily export and import data from all vector databases, and re-embed it using any model

    Language:Jupyter Notebook19864026
  • SayamAlt/Company-Bankruptcy-Prediction

    Successfully developed a machine learning model which can accurately predict whether a firm will become bankrupt or not, depending on various features such as net value growth rate, borrowing dependency, cash/total assets, etc.

    Language:Jupyter Notebook6100
  • nafisalawalidris/Employee-Attrition-Control

    The Employee Attrition Control project uses data analysis and predictive modeling to understand and address employee turnover. It provides insights and recommendations to reduce attrition and improve employee satisfaction and retention.

    Language:Jupyter Notebook210
  • andysontran/health-factors-ml-pred

    CHL5230 - Applied Machine Learning for Health Data (Fall 2023) @ University of Toronto: Datathon #1 - Lifestyle and Health Factors

    Language:Jupyter Notebook1100
  • AngelX62/Data-Science-Job-Clean

    Data was downloaded through Kaggle

    Language:Jupyter Notebook1100
  • andysontran/icu-mortality-ml-pred

    CHL5230 - Applied Machine Learning for Health Data (Fall 2023) @ University of Toronto: Datathon #4 - ICU Mortality Prediction Using ML

    Language:Jupyter Notebook00
  • andysontran/mhealth-wearable-ml-pred

    CHL5230 - Applied Machine Learning for Health Data (Fall 2023) @ University of Toronto: Datathon #5 - mHealth and ML

    Language:Jupyter Notebook0100
  • andysontran/stroke-ml-pred

    CHL5230 - Applied Machine Learning for Health Data (Fall 2023) @ University of Toronto: Datathon #2 - Early Prediction of Heart Failure

    Language:Jupyter Notebook0100
  • Aniket2021448/Movie-recommender-system

    A Machine Learning Project implemented from scratch which involves web scraping, data engineering, exploratory data analysis, NLP processing and ML, achieving the functionality of a Content based movie recommender system

    Language:HTML00
  • saikaryekar/PySpark-Plane-Dataset-Exploration

    Explored a dataset of planes while learning PySpark commands.

    Language:Jupyter Notebook0100
  • venkat-a/Happiness-Prediction

    Prediction of happy Customers based on Happiness Survey Data

    Language:Jupyter Notebook0100
  • SaiSurajMatta/Covid-19-Data-Exploration-Project

    An SQL-based exploration of COVID-19 data and vaccination progress using the Covid-Deaths dataset for insights into global pandemic trends.

  • SayamAlt/Bank-Customer-Churn-Prediction-using-PySpark

    Successfully established a machine learning model using PySpark which can accurately classify whether a bank customer will churn or not up to an accuracy of more than 86% on the test set.

    Language:Jupyter Notebook
  • SayamAlt/Credit-Card-Approval-Prediction

    Successfully developed a machine learning model which can accurately predict up to 100% accuracy whether a credit card application of a given applicant would be approved or not, based on several demographic features such as applicant age, total income, marital status, total years of work experience, etc.

    Language:Jupyter Notebook10
  • SayamAlt/Employee-Attrition-Prediction

    Successfully established a machine learning model which can accurately predict whether an employee of a given company will leave it in the impending future or not, based on several employee details and employment metrics.

    Language:Jupyter Notebook10
  • SayamAlt/Financial-News-Sentiment-Analysis

    Successfully developed a fine-tuned DistilBERT transformer model which can accurately predict the overall sentiment of a piece of financial news up to an accuracy of nearly 81.5%.

    Language:Jupyter Notebook10
  • SayamAlt/Global-News-Headlines-Text-Summarization

    Successfully established a text summarization model using Seq2Seq modeling with Luong Attention, which can give a short and concise summary of the global news headlines.

    Language:Jupyter Notebook10
  • SayamAlt/Symptoms-Disease-Text-Classification

    Successfully developed a fine-tuned BERT transformer model which can accurately classify symptoms to their corresponding diseases upto an accuracy of 89%.

    Language:Jupyter Notebook10
  • SayamAlt/Taxi-Trip-Fare-Prediction

    Successfully created a machine learning model which can accurately predict the fare of a taxi trip based on several features such as trip duration, tip amount, etc.

    Language:Jupyter Notebook10
  • srosalino/Prediction_of_Seoul_Bikes_Demand

    The objective of this project is to predict the number of bicycles needed to be made available each hour in order to make the service as efficient as possible

    Language:Jupyter Notebook