/DS-links

A lot of useful DS links

DS-links

A lot of useful DS links

Table of Contents

1. Team leading

2. Books

3. Business (incl. BI)

4. Courses

4.1 Business courses

4.2 DS, ML, dev courses

courses

4.3 DS schools

5. Lists of Tools

6. DS Libraries and Instruments

- State-of-the-art algorithms for time series classification, regression, and forecasting (ported from the Java-based tsml toolkit),
- Transformers for time series: single-series transformations (e.g. detrending or deseasonalization), series-as-features transformations (e.g. feature extractors), and tools to compose different transformers,
- Pipelining for transformers and models,
- Model tuning,
- Ensembling of models — e.g. a fully customizable random forest for time-series classification and regression; ensembling for multivariate problems.

7. Notebooks

7.1 General

7.2 EDA

7.3 Time-Series and Anomaly Detection

8. General DS Links

8.1 General

8.2 Time Series and Anomaly Detection

8.3 GAN

8.4 AutoML

8.5 Recommender systems

9. Testing in DS

10. Metrics in DS Projects

11. Causal Inference and Explainable AI

12. Reproducibility and Automatization

The ML REPA track is traditionally dedicated to the tools and practices of experiment management in Machine Learning, Reproducibility and process automation.
We have a fairly wide range of topics that overlap with the topics of other tracks - ML Infra, SysML, Lean Data Science and others. All these topics are related, and the task of ML REPA is to show how to build a process for developing ML solutions, how to organize teamwork and what tools can help you.
Kedro is an open-source Python framework that applies software engineering best-practice to data and machine-learning pipelines. You can use it, for example, to optimise the process of taking a machine learning model into a production environment. You can use Kedro to organise a single user project running on a local environment, or collaborate within a team on an enterprise-level project.
Data Scientists and ML Engineers use BentoML to:
- Accelerate and standardize the process of taking ML models to production
- Build scalable and high performance prediction services
- Continuously deploy, monitor, and operate prediction services in production

MLOpsMLOps

  • Pytorch Code for Reproducibility:
Details
def set_determenistic(seed=666, precision=10):
  np.random.seed(seed)
  random.seed(seed)
  torch.backends.cudnn.benchmark = False
  torch.backends.cudnn.deterministic = True
  torch.cuda.manual_seed_all(seed)
  torch.manual_seed(seed)
  torch.set_printoptions(precision=precision)
  • Tensorflow Code for Reproducibility:
Details
def Random(seed_value):
    # 1. Set `PYTHONHASHSEED` environment variable at a fixed value
    import os
    os.environ['PYTHONHASHSEED'] = str(seed_value)
    # 2. Set `python` built-in pseudo-random generator at a fixed value
    import random
    random.seed(seed_value)
    # 3. Set `numpy` pseudo-random generator at a fixed value
    import numpy as np
    np.random.seed(seed_value)
    # 4. Set `tensorflow` pseudo-random generator at a fixed value
    import tensorflow as tf
    tf.random.set_seed(seed_value)

13. AI Products Architecture and System Design