/data-science-roadmap

An ordering with comments, summaries and analysis for the data science universe.

MIT LicenseMIT


A basic to advanced ordering for data science students and professionals.



Live the future
Professional and academic in agile technologies for software
development, data architecture, microservices and graphic design.

Special thanks to:
Thiago Corrêa, Vinicius Rodrigues & Lucas Barra



The "Roadmap to Data Science" is an ordering (with comments, summaries and analyses) from basic to advanced for students and professionals. This "interactive list" is designed to offer you the best possible path to excellence within the area of ​​technology, specifically, data.
Here you will find the best "chronology" to follow on your study journey.

NOTES:
This repository is free for the community to change, my initial idea was to make a trail according to my own studies. Therefore, nothing prevents me from doing the same and, thus, improving what I created.
I am not open to partnerships or monetization of this content, everything is purely academic.
Most images are clickable and lead to pages with reliable information outside the repository.
At the end of each topic/subject there will be a link to go back to the table of contents (to streamline browsing time).

Summary

Introduction - Python - Datas - Culture - Scientific Methodologies - Scrum - CRISP-DM
NumPy - pandas - Matplotlib - Seaborn - Package installation - Git


Introduction



source: ead.pucpr.br/blog/ciencia-de-dados-o-que-e

What is Data Science?

Data science is an interdisciplinary field that uses a variety of tools and algorithms to identify patterns and insights from raw data. Economic, financial and social data, structured and unstructured, can be extracted and transformed into knowledge in order to detect patterns that will help companies.

This science makes it possible to identify trends and produce information that companies can use to make better decisions and create more innovative products and services.

What is the profile of a Data Science professional?

Working with Data Science involves calculations, statistics and algorithms. In other words, having an affinity with the exact sciences is essential for the profession.

In addition, data scientists must have strategic and analytical thinking to extract data and transform it into relevant information for the companies in which they work.

← Back to the top


Python


source: cienciaedados.com/por-que-cientistas-de-dados-escolhem-python

Why Python?

Large Community – With Python, you can find a large (and growing) community. At the end of the day, if you get lost, you can count on a large community of experts to help you find a suitable solution for coding (even in specific niches) as well as answers to questions related to Data Science and Data Analytics.

Growing number of data analysis libraries – With Python, you can find a wide variety of data science libraries (e.g. NumPy, SciPy, StatsModels, scikit- learn, pandas, etc.) that are growing exponentially. Constraints (on optimization methods/functions) that were missing a year ago are no longer an issue and you can find a suitable robust solution that works reliably.

Juypyter-Notebook – this is simply a great tool. You can run multiple lines/blocks of code in different cells, you can play with the data, move it up or d own and you can even get your results right below the cell. It really is like a magical organizer that Data Scientists (and people who run code) have always dreamed of. You can also write in R, SQL, Scala, and other languages ​​with Jupyter-Notebook which makes the workflow much easier and more efficient.

Python is easy to learn – Python's main advantage is that anyone can learn it quickly and easily. The language was designed to be simple.

Scalability – Relative to other languages/packages for Data Science (like MatLab, Stata, R) Python is much faster. It is true that Java and Scala are much faster than Python, but with Anaconda (Continuum Analytics) Python may be the right solution.

Visualization/Graphics – Python isn't as good as R (yet), but we'll see more and more APIs (eg Plotly) and data visualization libraries that make R's partial advantage negligible compared to Python. You can do really cool stuff with Python.

Variables

← Back to the top