/data-science-utils

Various code to aid in data science projects for tasks involving data cleaning, ETL, EDA, NLP, viz, feature engineering, feature selection, etc.

Primary LanguagePythonMIT LicenseMIT

data-science-utils

Various code to aid in data science projects for tasks involving data cleaning, ETL, EDA, NLP, viz, feature engineering, feature selection, etc.

Project Organization

├── README.md          <- The top-level README for developers using this project.
├── gists              <- Code gists with commonly used code (change to root
│                         directory, connect to database, profile data, etc)
├── io                 <- Code for input/output utilities
├── etl                <- For building reproducible ETL pipelines, including data
│                         checks and transformers
├── ml                 <- Machine Learning utility code (feature engineering, etc) 
├── pandas             <- Pandas related utility code
│   ├── analysis                  
│   ├── cleaning
│   ├── engineering
│   ├── text    
│   ├── datetime     
│   ├── optimization       
│   └── profiling      
├── text               <- Code for dealing with text. Includes distributed loading of text corpus, 
│                         entity statement extraction, sentiment analysis, etc.	
├── __init__.py        <- Makes src a Python module               
├── project_utils.py   <- For project specific utilities
└── LICENSE