Holistic Data Modeling Practices A to Z: All Steps A Data Scientist Should Know and Apply
Our goal is to develop and share the map of foundations and useful topics (curriculum) of data analyses methods using statistical and machine learning from scratch. When building a data-based model, it is crucial to explore it first (data mining and feature engineering) from holistic and comprehensive approaches using statistical perspectives, thus, better models are achieved. Here, we apply data science methods and tools in workshop style and exploratory format including the practices of imputation method, multivariate visualization, clustering and predictive analytics, manifold learning along with basic methods and powerful packs such as Seaborn, Plotly, Scikitlearn, and Keras.
In the author group, we have a statistician, a programmer, an applied mathematician, an AI expert, and a chameleon (;) so we consider multi-aspects of touching a data with inquiries using classical and all up-to-date methods and tools. Designing search grid and pipeline, writing a python function, automating these practices, designing models of everything, applying in real-world data sets are some of the practices here. This developing material is crucial to know for data scientist when touching the data first time as well. We also include the tools used in business intelligent and analytics.
The methods in our workshops are in the table of contents and under the related notebook folders. We will develop as we progress and apply new methods. We owe thanks to the data scientists who share their resources we shared in our notes. Hopefully, all files will be reorganized and reshaped in the form of workshops and notebooks for each method here. We are learning and working hard.
Data Science Group, Rochester, March 2020