A Data Science repository to facilitate studies.
[WARNING] The links cited here were extracted from various sites and are for study use, I am sharing them so that as well as I can learn, I hope I have not infringed any copyright; and if your website, repository or any other link is here and the owner would not like it to be, please contact us so you can withdraw !THANKS FOR UNDERSTANDING!
This part is for Younger Padawans or Jedi in Data Science
- TO-DO
- Motivation
- Introductory Area
- What is Data Science
- Mentors and Role Models
- Themes
- Learning Platforms
- Searches
- GitHub Projects
- Good Separate Texts
- Awesome Lists
- Notes
- Images
repository updates:
- Open sites and verify that they are active and that they are part of the repository.
- Add [EN-US] or [PT-BR] to the links
This is a repository of shortcuts to start studying Data Science.
An important addendum is that I intend to focus on the security area. So, in this repository I will have a part where I will leave some links and information sheets focused on Information Security.
We start with a What is Data Science ?. Basic readings for you to understand more about what Data Science is and what I must study to be a professional desired by companies.
The next steps are separated into Courses (MOOCs), for learning; Data set for study; Blogs that bring together the hottest and most up-to-date topics in the area, computer links for general knowledge; Youtube links to the best courses mentioned by the community; tools for producing analysis; online magazines; links to books for study and to buy; sites that gather competitions to increase the skills of Data Scientist; Main tool tutorials; Some of the best lists and repositories for studying Data Science; search links.
The idea of the list is not to make me a Unicorn Data Scientist, but when I am a Data Scientist I have a In the hall of everything that encompasses the area, and can talk to the various professionals involved in the area so that manage to define the best projects and approaches to which I will work.
References have been gathered here that have appeared throughout the studies and from various researches to facilitate studies and understanding. It is not a step by step to become a Data Scientist, but it serves as a Guide for those seeking knowledge in the field, and a repository for easy access without having your browser full of disorganized favorites.
So it follows a structure where I will address about:
- Data Engineer
- Data Scientist
- Machine Learning Engineer
Data Engineer Responsible for taking raw data from various sources and placing the data in a Data Lake, a database where other team members will access. Responsible for bringing, processing and making data available from different sources in one place for the team. In some places known as Big Data Developer, who would be a Data Engineer, someone who will make the data available, but who has knowledge in Big Data, that is, has knowledge of programs that will work with large data sets such as Apache Spark or Apache Hadoop. The function of the Data Lake is only to store the data, the treatment will be done at the time of reading the data. The idea of Data Lake is because a Data Warehouse has data processed and cleaned, it would take longer and consequently lose data / value. A Data Pipeline architecture supports batch and processing and real-time. When talking about a distributed system, we are not necessarily talking only about the Hadoop architecture.
Data Scientist Responsible for taking the data provided by the Data Engineer and rationalizing that data. Take the data and create Machine Learning models to solve the problems. Create Prediction Models or Classification Algorithms to solve some things. You take the information that is likely to be inside a Data Lake, and the Data Scientist will streamline these things, streamline these models and try to find the best model possible to improve the results. It is the guy who will be trying to rationalize the data, think about the best solutions and try to solve the problems.
Machine Learning Engineer He comes to put the model that Data Scientist created in Production. It basically takes the model that the Data Scientist and puts it in a scalable way.
First Learn Python:
You don't need to know Python in PRO mode, for example, but to enter this world you need certain basic knowledge:
- Don't you know Python? I highly recommend Gustavo Guanabara from Curso em Vídeo
- All about Python Real Python
- Tools link to learn: developer script
Second Prepare the PC:
I suggest Anaconda Navigator for those who are starting their studies in this field. It is Open Source for the programming languages Python and R. And it has all the necessary tools.
-
Python can be downloaded through the Anaconda distribution: Download Anaconda
-
Now, you can also use an IDE. I particularly recommend Visual Studio Code
Third Libraries:
The following are some libraries dedicated to the study of Data Science. There are several libraries that can be used, facilitating data analysis. Some must-have libraries for learning:
- Numpy - Library for arrays and mathematical functions.
- Matplotlib - For plotting graphs and visualizing data.
- OpenCV - For viewing and editing images via Python.
- Virtualenv - It is a tool to create isolated Python environments. The basic problem to be solved is one of the dependencies and versions and indirectly permissions.
- Pandas - It is an Open Source library, licensed by BSD, that provides high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
- CharmPy - It is a high level parallel and distributed programming framework with a simple and powerful API, based on Python migrable objects and remote method invocation; built on top of an adaptable C / C ++ runtime system that provides speed, scalability and dynamic load balancing.
- Pip - It is a package management system used to install and manage software packages written in the Python programming language.
- SciPy - It is an Open Source library in Python language that was made for mathematicians, scientists and engineers.
- Urllib - It is a Python module for searching URLs.
- Beautiful Soup - It is a Python package for analyzing HTML and XML documents.
- Papermill - A tool to parameterize, execute and analyze Jupyter Notebooks.
- Nteract - It is a dynamic tool to give flexibility when writing code, exploring data and creating text to share insights about the data.
- RISE - "Live" Reveal.js Jupyter / IPython slide show extension.
- Scikit-learn - Python library with all kinds of algorithms.
Note: For those who want to deal with neural networks / Deep Learning, it is another trail. There are four major frameworks: TensorFlow, Keras, PyTorch and Theano. TensorFolw being the most known and used.
Fourth TensorFlow:
First, get to know Playground Tensorflow
After reading this material, it's time for installation:
- Install CUDA Toolkit, and check that the system variables are correct.
- Install CUDA Toolkit drivers
- Install cuDNN
- Install TensorFlow, CPU or GPU version (preferably only one installation)
When installing, follow own step-by-step instructions TensorFlow
An addendum: as you can see, the focus is on NVIDIA. Regarding the AMD GPU, I am unaware.
Installed? Tested it? Did it roll? Now you don't know where to start? Here are tips:
- There are own tutorials on TensorFlow
- A tutorial for base MNIST
- A tutorial for base CIFAR-10
In the tutorials this is clear, but I reinforce: learn to use the TensorBoard, manager and viewer of the TensorFlow neural networks. Until you save the current network status to reload later, you can:
Fifth: Some more things
- Your account Kaggle is mandatory.
- Short list of databases to use as a guide. It has on Wikipedia
- Historical and classic databases. It has in the UCI Machine Learning Repository
**Valuable Tips ** Below are some tips for you to go on your way:
- Know the trail made by Leonardo Ferreira who in 1 year and a half became a data scientist and is in the 30th position worldwide as a data scientist [Kaggle] (https://www.linkedin.com/pulse/data-science- from-zero-kaggle-kernel-master-leonardo-ferreira /).
- Organize your studies and don't mix or try to absorb many of the teachings.
- Have profiles on Linkedin, GitHub, Kaggle and Hacker Rank. Have a Twitter developer account for text mining.
- Go deeper by drinking teaching from various sources! example: when studying Python, read e-books and several other materials from different sources because each course has a different didactic and if you "pack" in some ok subject, it is common: look for other didactics until you understand.
- If you want and can, invest in paid courses and specializations.
- Use and learn with concept maps
Kind of obligatorily, understand:
- Predictive modeling.
- Naive Bayes.
- Time Series Analysis and Visualization.
- Exploratory data analysis.
- Statistics.
- Univariate analysis.
- Bivariate analysis.
- Graphics for when and how to use.
- Qualitative and Quantitative Variables.
- Basic mathematical requirements ..
- Notions of analytical and numerical optimization.
- Discover tools for data extraction on the web.
- Basics of linear algebra, eigenvectors, eigenvalues, base changes, among others.
- Basic probability and statistics: conditional probability, basic formulas, most common distributions, basic metrics, regression, rˆ2, p-value, inference, among others.
- It is worth knowing the least about: Amazon AWS server and Amazon QuickSight and Microsoft Power BI.
- And also: version control, markdown, git, GitHub, R and RStudio.
Like anything you want to learn, you should get involved with it, and a tip is always do not try to understand everything at once, take it easy. Frequent communities. The Python community in Brazil is one of the strongest and most active I have ever seen.
And that's it: this is the Area's Introductory Package. You are now able to start experimenting with the area.
Comments:
- The above script is not the only and not necessarily the best way to learn; it reflects what I intend to acquire as knowledge and I have noted in meetings and lectures and conversations in my career in the area.
Below is a list of favorite sites that have a variety of topics related to computing in general:
- How to create your Data Scientist portfolio and publicize your work [PT-BR]
- What’s the Difference Between a Data Analyst, Data Scientist, and Machine Learning Engineer? [EN-US]
- After all, what is Data Science [PT-BR]
- Becoming a data scientist - Resume via Metromap [EN-US]
- What is Data Science [EN-US]
- Theories behind Data Science [EN-US]
- [Data Science from Zero to Kaggle Kernels Master](https://medium.com/ensina-ai/ci%C3%AAncia-de-dados-do-zero-%C3%A0-kaggle-kernels-master- 7f735d7fceb2) [PT-BR]
- [12 common mistakes in Data Science that compromise decision making](https://cio.com.br/12-erros-comuns-em-ciencia-de-dados-que-comprometem-a-tomada-de -decision /) [PT-BR]
- [10 types of data professionals: from data engineers to big data DevOps and data analysts, which of these classifications would you fit in?](Https://medium.com/@luis.anderson.sp/10-tipos-of-data-professionals-of-data-engineers-to-big-data-devops-and-analysts-of-94259531270f? ref = datahackers) [PT-BR]
- What is Machine Learning and how to learn without spending anything [ PT-BR]
- TO START IN DATA SCIENCE [PT-BR]
- Through this text: Jupyter is now a full-fledged IDE. Learn and put into practice: nbdev and [@ jupyterlab / debugger] (https://github.com/jupyterlab/debugger)
It is always nice to add a short column for role models nearby that inspire you:
- Paulo Vasconcellos - Brazilian Data Scientist [PT-BR]
- Déborah Mesquita [PT-BR]
- Lucas Caton [PT-BR]
- Greg Reda [EN-US]
- Kevin Davenport [EN-US]
- Julia Evans [EN-US]
- Meta Analysis [EN-US]
- Sentdex
- Deepkapha [EN-US]
- The File Drawer [EN-US]
- Hilary Parker [EN-US]
- Kenny Bastani [EN-US]
- Adventures in Data Land [EN-US]
- Shane Lynn [EN-US]
- John Myles White [EN-US]
- Daniel Forsyth [EN-US]
- Learning Lover [EN-US]
- Data-Mania [EN-US]
- Noah Weber
- ClaoudML
- Shane Lynn
- Andrew Ng
Study Links.
Computer Science study projects:
- Mathematics [EN-US] - E-prints from Cornell University related to Mathematics.
- Quantitative Finance [EN-US] - E-prints from Cornell University related to Quantitative Finance.
- Statistics [EN-US] - E-prints from Cornell University related to Statistics.
- Econometrics [PT-BR] - Econometrics is a study that uses mathematical and statistical methods to evaluate theories on economics and finance.
- Math and Science Done Right [EN-US] - Studies in Mathematics, Science and Engineering through small interactive learning experiences.
- Terence Tao Website [EN-US] - Updates research and expository papers, discussion of open problems, and other maths-related topics.
The Protocols and Structures for Inference (PSI) project aims to develop an architecture for presenting machine learning algorithms:
- Research- protocols and structures for inference a res tful api for machine learning - James Montgomery [EN-US] - Build Machine Leaning API structure.
- Protocols and Structures for Inference Project [EN-US] - GitHub Project.
- Evolutionary Database Design [EN-US] - Databases structure.
- PSI RESTful API
Below is a list of Tools, Environments and Libraries for Data Scientists:
- Scikit-Learn - Machine learning in Python.
- NumPy - It is fundamental for scientific computing with Python. It supports large, multidimensional arrays and arrays and includes a variety of high-level math functions to operate on these arrays.
- SciPy SciPy works with NumPy arrays and provides efficient routines for numerical integration and optimization.
- Tensor Flow - TensorFlow is an open source software library for machine intelligence.
- nbviewer - Render Jupyter Notebooks as static web pages.
- Matplotlib - 2D plotting library in Python that produces quality publication numbers in a variety of printed formats and interactive environments across platforms.
- seaborn - Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphs.
- Auto PY to EXE - Convert .py to .exe using a simple graphical interface.
- plot.ly - Data visualization library for Python.
- Caffe - Deep learning structure made with expression, speed and modularity in mind.
- Albumentations - A fast and agnostic image augmentation library framework that implements a diverse set of augmentation techniques.
- Ember Charts - A powerful and easy to use graphics library for Ember.js.
- amCharts - Libraries and tools for all your Data Visualization needs.
- AnyChart - It's a set of flexible HTML5 JavaScript libraries for all your data visualization needs.
- cartodb - Mapping tool.
- Cube - System to collect timestamp events and derive metrics.
- d3plus - Data visualization made easy.
- D3js - Data-Driven Documents - JavaScript library for manipulating documents based on data.
- dygraphs - Flexible and fast open source JavaScript graphics creation library.
- exhibit - Allows you to easily create web pages with advanced text search and filtering features, with interactive maps, timelines and other visualizations.
- Gatherplot - Generalized scatter plots for nominal data.
- ggplot2 - System for declaratively creating graphics, based on The Grammar of Graphics.
- Glue - Python library for exploring relationships within and between related data sets.
- Google Chart Gallery - Provides a variety of charts designed to meet your data visualization needs.
- jqplot - Plot and graphics plugin for the jQuery JavaScript framework.
- nvd3 - Build reusable graphics and graphics components.
- Opendata-tools - List of tools to explore, publish and share public data sets.
- Openrefine - A powerful and free tool for working with confusing data.
- raw - The missing link between spreadsheets and data visualization.
- techanjs - A visual and technical analysis and graphics library based on D3. Create interactive financial charts for modern and mobile browsers.
- Timeline - Open source tool that allows anyone to build interactive and visually rich timelines.
- variancecharts - Allows engineers, designers, journalists, scientists and analysts to create elegant and personalized data graphics for the Web, using only HTML and CSS.
- life - Visualization of open source data.
- Wrangler - Interactive tool for cleaning and transforming data.
- r2d3 - It is an experiment in expressing statistical thinking with interactive design.
- NetworkX - Python package for creating, manipulating and studying the structure, dynamics and functions of complex networks.
- Redash - Built to allow quick and easy access to billions of records.
- C3 - D3-based reusable graphics library.
- Heroku - It is a platform as a service (PaaS) that allows developers to create, run and operate applications entirely in the cloud.
- OpenStack - It is open source software, capable of managing the components of multiple virtualized infrastructures.
- DigitalOcean - Provides developers and companies with a reliable and easy-to-use cloud computing platform for virtual servers (Droplets), object storage (Spaces) and more.
- Google Cloud Platform - It is a cloud computing suite offered by Google, operating on the same infrastructure that the company uses for its products aimed at users, including Google Search and YouTube.
- Amazon Web Services Cloud - It is a platform for cloud computing services, which form a cloud computing platform offered by Amazon.
- nbextensions - This repository contains a collection of extensions that add functionality to the Jupyter notebook.
- tqdm - Instantly make your loops show an intelligent progress meter.
- hchart - This generic function can graph various R objects in real time.
- pyswarms - A research toolkit for optimizing particle swarms in Python.
- MoviePy - Python module for video editing, video composition, video processing or to create advanced effects.
- requests-toolbelt
- nltk
- stanza
- Nbviewer
- Auto PY to EXE
- SciPy
- chartjs
Below is a list of sites to put the knowledge of Data Analysis into practice:
- Exercise List for Python [PT-BR] - This is a list with suggestions for programs for beginners in programming.
- URI Online Judge [EN-US] - The main objective is to promote the practice of programming and knowledge sharing.
- Kaggle [EN-US] - Kaggle is the place to do Data Science projects.
- DrivenData [EN-US] - Data Science competitions to save the world.
- Analytics Vidhya [EN-US] - The last battleground for Data Scientists.
- The Data Science Game [EN-US] - An international student challenge.
- InnoCentive [EN-US] - Global pioneer in crowdsourcing innovation.
- TuneedIT [EN-US] - Challenges of Machine Learning Algorithms and Data Mining.
Machine learning studies addressing Front-End:
Machine learning studies addressing Back-End:
Machine learning studies addressing Big Data:
Machine learning studies addressing Theory:
Data Science Desktop:
Study related links to learn first:
Data Science for business study:
Data Science Security study:
Study Links.
Below is a list of sites that offer a variety of free and paid courses:
- edX [EN-US] - Flexible programming learning.
- Coursera [EN-US] - Learn skills from the best universities for free.
- Udacity [EN-US] - Courses and Certifications.
- Edraak [EN-US] - Edraak, is a massive open online course platform (MOOC), which is an initiative of the Queen Rania Foundation (QRF).
- Open HPI [EN-US] - MOOC´S for Reading and Learning.
- MIT OPEN COURSEWARE [EN-US] - It is a web-based publication of virtually all MIT course content, open and available to the world.
- cK-12 [EN-US] - 100% free and personalized learning for each student.
- Udemy [EN-US] - The largest selection of courses in the world.
- SKILLSHARE [EN-US] - Skillshare is an online learning community with thousands of classes in design, business, technology and more.
- Codecademy [EN-US] - Learn the technical skills you need for the job you want.
- P2PU [EN-US] - connects educational resources open to career paths in an equitable and empowering way.
- Saylor Academy [EN-US] - Saylor Academy is a non-profit initiative working since 2008 to offer free and open online courses for everyone who wants to learn.
- Academic Earth [EN-US] - Find free online courses, lectures and videos from top colleges like Yale, MIT and Stanford.
- Learn To Be [EN-US] - Non-profit organization that brings 1-on-1, online tutoring for young people.
- Floqq - FLOQQ is the largest Spanish-language video search engine course.
- Course Talk [CA-ES] - Discover the best courses on the web based on your interests and student feedback.
- Marginal Revolution University [EN-US] - Creates free and engaging economic videos taught by top professors.
- Alison [EN-US] - Free online courses with certificates.
- Data Science Academy [PT-BR] - Community of experts in Data Science.
- SOLYD [PT-BR] - Online training and courses.
- DataCamp [EN-US] - Learn Data Science online.
- Google for Education [EN-US] - Google's Python class.
- VEDUCA [PT-BR] - Here you study for free and you can earn your certificate for a price that fits in your pocket.
- Fundação Bradesco [PT-BR] - Escola Virtual is an educational portal that offers free courses, free of charge, in the distance mode.
- Khan Academy [EN-US] - Offers hands-on exercises, instructional videos and a personalized learning panel that enables students to study at their own pace inside and outside the classroom of class.
- EADCCNA [PT-BR] - Variety of online courses in IT.
- Teaching Channel [PT-BR] - Free courses and books in the public domain.
- 4Linux [PT-BR] - Linux and open software courses.
- Impacta [PT-BR] - IT, Management and Design Courses.
- Microsoft Academy [EN-US] - Microsoft Professional Program for Artificial Intelligence.
- Microsoft Virtual Academy [EN-US] - Free Microsoft training provided by experts.
- [Universia](http://noticias.universia.com.br/destaque/especial/2013/07/10/1035282/8/700-cursos-online-gratis-das-melhores-universidades-do-mundo/cursos -online-gratis-de-ci% C3% Computer-science% C3% A7% C3% A3o-e-intelig% C3% AIncia-artificial.html) [EN-BR] - Universia Brasil brought together 700 online courses free from the best universities in Brazil and the world. Check out courses in Computer Science and Artificial Intelligence.
- Duolingo [EN-US] - Learn languages for free, forever.
- e-stude [EN-US] - E-learning platform aimed at training software development teams.
- Google Developers [EN-US] - The Machine Learning Learning Course.
- Acclaim [EN-US] - Complete a series of online Data Science courses.
- Data School [EN-US] - Data Science Courses.
- Dataquest [EN-BR] - Learn Python, R, SQL, data visualization, data analysis and machine learning.
Below is a list of paid and free books:
- Artificial Intelligence: A Machine Learning Approach [PT-BR]
- Data Science from Zero. First Rules with Python [PT-BR]
- Python For Data Analysis PT-BR]
- Introduction to Data Mining. Basic Concepts, Algorithms and Applications [PT-BR]
- Introduction to Data Mining With Applications in R [PT-BR]
- Python Data Science Handbook [EN-US]
- The Data Science Handbook [EN-US]
- The Art of Data Usability [EN-US]
- Think Like a Data Scientist [EN-US]
- R in Action, Second Edition [EN-US]
- Introducing Data Science [EN-US]
- Practical Data Science with R [EN-US]
- Exploring Data Science [EN-US]
- Exploring the Data Jungle [EN-US]
- Python® for R Users: A Data Science Approach [EN-US]
- Classic Computer Science Problems in Python [EN-US]
- R for Data Science [EN-US]
- An Introduction to Statistical Learning - with Applications in R [EN-US]
- Pattern Recognition and Machine Learning (Information Science and Statistics) [EN-US]
- R for Data Science [EN-US]
- Syncfusion - Ebooks [EN-US]
- Free Programming Books [EN-US]
- Free Software Testing Books [EN-US]
- Go Books [EN-US]
- R Books [EN-US]
- Mind Expanding Books [EN-US]
- Book Authoring [EN-US]
- Elixir Books [EN-US]
Courses related to computing:
Courses related to computing:
- Free Microsoft Power BI Course (Workload: 54 Hours)
- Free Big Data Fundamentals Course (Hours: 8 Hours)
- Free Python Fundamentals Course for Data Analysis (Hours: 54 Hours)
- Free Course on Introduction to Data Science (Hours: 8 Hours)
- Free Course on Fundamentals of Artificial Intelligence (Hours: 8 Hours)
Courses related to computing:
- Python Fundamentals for Data Analysis
- Data Analysis and Interpretation Specialization (Wesleyan University)
- Data Management and Visualization
- Data Analysis Tools
- Regression Modeling in Practice
- Machine Learning for Data Analysis
- Data Analysis and Interpretation Capstone
- Introduction to Computer Science and Programming Using Python (MIT)
- Using Python for Research (Harvard)
- Intro to Python for Data Science
- Deep Learning Prerequisites: The Numpy Stack in Python
- Making Graphs in Python using Matplotlib for Beginners:
- Google's Python Class
Below is a list of data science issues:
- Blog Mining Data [PT-BR] - This project aims to help you learn more about Data Science and related areas in a practical and quick way.
- The Statistician [PT-BR] - Blog with the mission is to promote statistics in a simple, fun and affordable way, like you've never seen before.
- Pizza de Dados [PT-BR] - The Brazilian podcast on Data Science.
- Post-Graduate [PT-BR] - Content and daily humor for graduate students.
- Hackernoon [EN-US] - Hacker Noon is everything that hackers need at noon.
- Towards Data Science [EN-US] - Towards Data Science, Sharing concepts, ideas and codes.
- Data Science Central [EN-US] - Industry online resource for data professionals.
- Mining the Social Web [EN-US] - A complement to the book with the simple objective of integrating the mainstream social mining of the web.
- Becoming a Data Scientist [EN-US] - Documenting the path of SQL Data Analyst seeking a Master of Engineering for Data Scientist.
- AllThings Data Science [EN-US] - All things about Data Science.
- MDM - A Geeks Point Of View [EN-US] - Technology blog on master data management and every buzz around it.
- The Open Source Data Science Masters [EN-US] - The open source curriculum for learning Data Science.
- Data Science London [EN-US] - Data Science London is a non-profit organization dedicated to the free and open dissemination of Data Science.
- Open Source Research [EN-US] - PhD student in the field of Operations Research at Berkeley.
- Louis Dorard [EN-US] - A tech guy with a penchant for the web and data, big and small.
- Machine Learning Mastery [EN-US] - About helping professional programmers to confidently apply machine learning algorithms to solve complex problems.
- Data Science Weekly [EN-US] - A free weekly newsletter with curated news, articles and works related to Data Science.
- Revolution Analytics [EN-US] - Daily news on the use of open source R for big data analysis, predictive modeling, data science and visualization.
- R Bloggers [EN-US] - R-Bloggers.com is an aggregator of content blogs contributed by bloggers who write about R.
- Datascope Analytics [EN-US] - Data-driven consulting and design.
- Yet Another Data Blog [EN-US] - Reflections on Collective Intelligence, Data Disputes, Data Science, Predictive Modeling, Start -ups and a repository of ideas.
- KDNuggets [EN-US] - Leader in Business Analysis, Big Data, Data Mining, Data Science and Machine Learning.
- Data Scientist [EN-US] - Developed for data scientists to collaborate in sharing knowledge and experiences.
- What´s The Big Data [EN-US] - Explores its impact on information technology, the business world, government agencies and our lives.
- Decisions & Discovery [EN-US] - Focusing on science, data science, business, technology,
- New Data Scientist [EN-US] - How a social scientist jumps into the world of big data.
- Data Science 101 [EN-US] - Learning to be a Data Scientist.
- Data Scientist Journey [EN-US] - Digital nomad couple talking about Data Science.
- Dataists [EN-US] - More than seeing your model there are no heteroscedastic errors.
- Data-Magnum [EN-US] - Provides the information, education and assessment necessary for the planning and successful implementation of Big Data projects .
- The MapR Blog [EN-US] - Find insights, best practices and useful resources to help you leverage data more effectively in growing your business.
- P-value [EN-US] - Reflections on data science, machine learning and statistics.
- DATA MINERS BLOG [EN-US] - A place to read about topics of interest to data miners, ask questions to data mining experts at data miners .
- FlowingData [EN-US] - Visualization and Statistics.
- O'reilly Learning Blog [EN-US] - Perspectives on learning tools, technologies and methods.
- Dominodatalab [EN-US] - Includes the post on Data Science.
- i am trask [EN-US] - Crafts for Machine Learning.
- Vademecum of Practical Data Science [EN-US] - It aims to share some of the problems, solutions and alternative solutions and best practices of the authors who helped them on their journey of Dice.
- Dataconomy [EN-US] - On the new emerging data economy.
- Vidhya Analytics [EN-US] - A complete website on data science and analysis study material.
- Colah's Blog [EN-US] - To understand neural networks.
- Sebastian's Blog [EN-US] - To understand NLP and transfer of learning.
- DATAVERSITY [EN-US] - Data Education for Business and IT Professionals.
- Science and Data [PT-BR] - The objective is to talk about the fascinating adventure of Data Science.
- Institute of Applied Artificial Intelligence [PT-BR] - It is a non-profit organization where young students receive free education on artificial intelligence, develop projects.
- BiaData Bussiness [PT-BR] - Information about Big Data.
- Portal Action [PT-BR] - The largest statistical portal in Brazil.
- HackerRank [EN-US] - It is a technology hiring platform that is the standard for assessing the skills of developers for more than 1,000 companies worldwide.
- SQL Magazine Magazine [PT-BR] - Content about SQL.
- DATAQUEST [EN-US] - Data science, data analysis and tutorials and data engineering articles.
- Data Elixir [EN-US] - It is a curator of the best news, resources and inspirations from Data Science.
- Simply Statistics [EN-US] - News and texts on statistics.
- ClaoudML [EN-US] - Free data science and machine learning resources.
- PyData [EN-US] - Forum for the international community of users and developers of data analysis tools to share ideas and learn from each other.
- freeCodeCamp [EN-US] - Learn new developer skills.
- Vooo [EN-US] - News and texts on Data Science.
- Bitfactor [EN-US] - Thoughts about design, technology and other very important things.
- The Fashion Robot [EN-US] - About inspiring technologies in the fashion industry.
- OpenMined [EN-US] - An open source community focused on researching, developing and elevating tools for secure artificial intelligence.
- Shivam Bansal's [EN-US] - Data Scientist and Kaggle Kernels Grandmaster.
- 7WDATA [EN-US] - It's the Hotspot about new news of all things.
- mathbabe [EN-US] - Exploring and venting on quantitative issues.
- Hipsters Ponto Tech [PT-BR] - Podcast where the Caelum and Alura people enter into heated discussions about programming, design, ux, gadgets, startups and the latest technology.
- Artificial Neural Networks
- Statistical Handouts
- Simply Statistics
- Machine Learning Mastery
- Acclaim Data Science
- Dataversity
- Khan Academy
- HackerRank
- KDnuggets
- Como funciona a inteligência artificial [PT-BR]
Below is a list of YouTube Channels, Videos I liked and Playlist to study and keep up to date:
- Channel Google Open Online Education [EN-US] - Online courses offered by Google and tools that allow you to create your own courses .
- Channel Pycursos [PT-BR] - Python specialists in the most diverse areas, from Web Development to Data Science and Big Data.
- Channel Sentdex [EN-US] - Python programming tutorials, going beyond the basics, learn about machine learning, finance, data analysis, robotics , web development, game development and more.
- Channel Professor Mateus Mota [PT-BR] - - Tutorials on Python and Data Science.
- Channel Chanel SANDECO [EN-BR] - Develop Data Science, Machine Learning applications, using large masses of data contained in Big Data stores.
- Channel Deep Learning Brazil [PT-BR] - Its objective is to promote the dissemination of the theme in Brazil.
- Channel Siraj Raval [EN-US] - Learn to develop and build Artificial Intelligence, Games, music, chatbots, art, using Python.
- Channel Data School [EN-US] - Learning Data Science to get your first job as a Data Scientist.
- Channel Professor Fernando Amaral [PT-BR] - Channel with Machine Learning, Big Data, NoSQL and related subjects content.
- Playlist USP Channel [PT-BR] - Aulas USP | Artificial Intelligence in health: the use of Machine Learning.
- Playlist Zurubabel [EN-BR] - Programming Course R.
- Playlist Hugo Larochelle [EN-US] - Class of Neural Networks University of Sherbrooke.
- Playlist Tomer Ben David [EN-US] - Data Science Primer.
- Playlist mrshmt [EN-US] - Learning from the Data.
- Playlist Logo Code - Programming and Artificial Intelligence [EN-US] - Python Natural Language Processing - Intro to spaCy.
- Playlist sentdex [EN-US] - Introduction and Basics - Python Reddit API Wrapper (PRAW) tutorial.
- Playlist Google Developers [EN-US] - Machine Learning Recipes with Josh Gordon.
- Video What is machine learning? [EN-US] - AI's goals are to create a machine that mimics the human mind and, for this, she needs learning resources.
- Video [Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning] (https://www.youtube.com/watch?v=n1ViNeWhC24) [EN-US] - Self-Learning and Resource Learning Unsupervised.
- Video Deep Learning: Intelligence from Big Data [EN-US] - A machine learning approach inspired by the brain.
- Video Introduction to Deep Learning with Python [EN-US] - Talks about deep learning.
- Video What is machine learning, and how does it work? [EN-US] - Definition, provide some examples of machine learning and explain in high level of how machine learning works.
- Video Neural Nets for Newbies by Melanie Warrick (May 2015) [EN-US] - This talk is aimed at anyone who is passionate about understanding algorithms and codes to define and leverage standards in data.
- Video What comes after NoSQL? NewSQL: a new era of challenges in scalable data processing [EN-BR] - This talk is about NewSQL.
- Video Deep Learning: Intelligence from Big Data from Big Data - A machine learning approach inspired by the brain.
Below is a list of favorite sites to stay informed:
- The Future of Things [EN-BR] - The future visible to all, Artificial Intelligence, Robotics, innovations and new medical technologies.
- Chupadados [EN-BR] - This project brings together Latin American stories about the massive collection and processing of data by governments, companies and ourselves to monitor cities, homes, pockets and bodies.
- PCWorld - Technology consultant, with analysis and product guide, tests, reviews, tips and download.
- GSTI Portal - Content, area to answer questions, information on job vacancies, competitions and certifications.
- The Next Web - Original and proudly opinionated perspectives on notable stories for Generation T.
- Intel IT Center - Resources for IT Leaders.
- indy100: discover - Various news about everything.
- Skynet Today [EN-US] - Accessible and informed coverage of the latest AI and Panic hype.
- Hacker News Bulletin [EN-US] - Discover the latest trends, interesting news and useful tips on hackers, hackers, cybersecurity, technology and anonymous worldwide.
- Datatau [EN-US] - Like Hacker News, but for data.
- Fossbytes [EN-US] - Leading source of technology news, focusing on Linux distro releases, security and hacker news, tutorials, tips and tricks, VPNs and more more.
- ICML [EN-US] - International Conference on Machine Learning
- EPJ Data Science [EN-US] - Publishing platform to address this evolution, bringing together all academic disciplines related to science.
- Journal of Data Science [EN-US] - An international magazine dedicated to the application of statistical methods in general.
- Big Data Research [EN-US] - It aims to promote and communicate advances in big data research, providing a quick and high quality for researchers, practitioners and policy makers from many different communities working on this topic.
- Journal of Big Data [EN-US] - Publishes high quality academic papers, methodologies and case studies covering a wide range of topics, from big data analysis to data-intensive computing and all big data research applications.
- Big Data & Society [EN-US] - It is a peer-reviewed academic journal that publishes interdisciplinary works mainly in the social sciences, humanities and computing and their intersections with the arts and natural sciences about the implications of big data for societies.
- Data Science Journal [EN-US] - Allows you to easily search, browse and cite the latest articles published by academic societies on Japan and you can access documents using the reference reference or the cited link.
- Coding Coach
- Vooo Data Science
- Bitbay
- Quanta Magazine
- Playing Numbers
- Towards Data Science
Magazine related links:
- Hackernoon - Hacker Noon Rips Out Medium’s Software, Replaces it With Their Own.
- The Startup - Medium's largest active publication, followed by +598K people. Follow to join our community.
- Concretebr - We develop digital products with innovation, agility and excellent practices, for the Brazilian and Latin American market.
- freecodecamp - Learn to code with free online courses, programming projects, and interview preparation for developer jobs.
- geeksforgeeks - A Computer Science portal for geeks.
- Machine Learning for Everyone
- Becoming Human
- Daniel Godoy [EN-US] and [PT-BR] - Data Scientist, developer, teacher and writer. Author of "Deep Learning with PyTorch Step-by-Step: A Beginner’s Guide"
- Be a Data Scientist [PT-BR]
- I played the dice [PT-BR] - Statistics, Python, Machine Learning: sketches & projects by a Data Scientist journalist. GO TO INDEX
Below are lists with more content that increase the capacity to study:
Download Related Links.
Below is a list of sites that have a variety of datasets for study and learning:
- DATAQUEST [EN-US] - 18 places to find data sets for data science projects.
- Quora's Big Datasets Answer [EN-US] - Links to sites to find great data sets open to the public.
- ISPDados [PT-BR] - Open Data Page of the Public Security Institute. You will be able to access the databases of criminal records and police activity in the state of Rio de Janeiro.
- BRAZILIAN OPEN DATA PORTAL [PT-BR] - More than 6 thousand data sets.
- Google Trends [EN-US] - See what the world is searching for.
- Sorocaba Open Data [PT-BR] - This portal makes publicly available data that is generated by municipal departments and agencies.
- Sorocaba Transparency Portal [PT-BR] - Publication of data in open format.
- Open Data of Capes [PT-BR] - Here you will find data and information about Brazilian postgraduate courses, about the training of teachers for basic education and others themes related to education.
- GEOCAPES [PT-BR] - Capes Georeferenced Information System.
- Academic Torrents [EN-US] - We are a distributed repository maintained by the community for datasets and scientific knowledge.
- Hadoop Illuminated [EN-US] - Publicly Available Big Data Sets.
- United States Census Bureau [EN-US] - Economic indicators from the USA Census Bureau.
- US Government Data Sources [EN-US] - US government web services and XML data sources.
- [Enigma] http://enigma.com/) [EN-US] - Browse the world of public data - Quickly search and analyze billions of public records published by governments, companies and organizations.
- Datahub [EN-US] - Provides important and commonly used data as high quality, easy to use and open data packages.
- Amazon - Open Data on AWS [EN-US] - Open data search datasets.
- re3data [EN-US] - Data sharing made easy.
- DataCite [EN-US] - Center for research data.
- Quandl[ EN-US] - The main source of financial, economic and alternative data sets, serving investment professionals.
- figshare [EN-US] - Get more citations for all your academic research results over 5000 citations of sharing content.
- MAXMIND [EN-US] - GeoLite databases and legacies.
- Kaggle Datasets [EN-US] - Dataset for use in Kaggle.
- IGSR: The international genome sample resource [EN-US] - Providing ongoing support for the 1000 Genomas Project data.
- World Bank Open Data [EN-US] - Free and open access to global development data.
- Open Data Philly [EN-US] - It is a catalog of open data in the Philadelphia region.
- Grouplens [EN-US] - Sample of movie data sets (with ratings), book and wiki.
- UC Irvine Machine Learning Repository [EN-US] - Currently maintains 446 data sets as a service for the machine learning community.
- NOAA - National Center for Environmental Information [EN-US] - They are responsible for preserving, monitoring, evaluating and providing public access to the nation's treasury of data and information and historical information about the climate.
- MapLight [EN-US] - MapLight tracks several sets of data that you can look for evidence of the influence of money on politics.
- GHDx [EN-US] - A catalog of health and demographic data sets from around the world and including results from the HMI.
- UNICEF Data [EN-US] - UNICEF data on statistics and monitoring.
- UN Data [EN-US] - UN data on statistics and monitoring.
- The GDELT Project [EN-US] - GDELT project monitors worldwide broadcast, print and web news from almost every corner of every country.
- San Fransisco Government Open Data [EN-US] - Search hundreds of data sets for the city and San Francisco County.
- Global Open Data Index [EN-US] - The Global Open Data Index provides the most comprehensive snapshot available of the state of publishing open government data.
- GHTorrent [EN-US] - A scalable, consultable and offline data mirror offered by the Github REST API.
- Microsoft Research Open Data [EN-US] - A collection of free Microsoft Research data sets to promote cutting-edge research in areas such as natural language processing, computer vision and science domain-specific.
- Open Government Data Platform India [EN-US] - It is a platform to support the Open Data initiative of the Government of India.
- UCI Machine Learning Repository [EN-US] - Machine Learning Center and Intelligent Systems.
- Google Dataset Search [EN-US] - Google Data Sets.
- Brazil Datasets [EN-US] - Brazilian Data Set.
- Kaggle Datasets [EN-US] - Kaggle Dataset.
- Datasets [EN-US] - Is a lightweight library providing two main features.
Data Science Tutorials:
- Artificial Neural Networks [PT-BR] - You will see on this page an introductory tutorial on Artificial Neural Networks, especially on the Multi Layer Perceptron networks trained with BackPropagation.
- Data Science using Python and R [EN-US] - Ways to do Data Engineering and Machine Learning in R and Python
Project My Binder:
- Binder Examples [EN-US]
- Amazon Elastic Container Service [EN-US]
- Docker basics for Amazon ECS [EN-US]
- AWS and dockerized applications [PT-BR]
- Binder - - Transform a GitHub repository into a collection of interactive notebooks [EN-US]
- Using Binder [EN-US]
- Introducing Binder 2.0 - Share your interactive research environment [EN-US]
- Video Running Containers Dockers on AWS [PT-BR]
- Video Jupyter, Jupyter Lab and Binder - Overview and associated technologies [PT-BR]
Jupyterlab Tutorials:
- How to configure JupyterLab on AWS [EN-US]
- Video Google Compute Engine and Jupyter Notebook Setup: Part - 1 [EN-US]
- Video Google Compute Engine and Jupyter Notebook Setup: Part - 2 [EN-US]
- JUPYTER - jupyter / docker-stacks [EN-US] - Docker images ready to run containing Jupyter applications.
- DOCKERHUB - jupyter / datascience-notebook [EN-US] - Jupyter Notebook Data Science Stack.
Below is a list of tools that make the job easier:
- Jupyter - Project Jupyter exists to develop open source software, open standards and services for interactive computing in dozens of programming languages.
- neptune.ml - Community-compatible platform that supports data scientists in creating and sharing machine learning models. Neptune facilitates teamwork, infrastructure management, model comparison and reproducibility.
- Steppy 1 - Lightweight, Python library for experimenting with fast and reproducible machine learning. It features a very simple interface that allows for a clean machine learning pipeline project.
- Steppy-toolkit 2 - Cured collection of neural networks, transformers and models that make your machine learning faster and more effective.
- Cloud Datalab Google - Easily explore, visualize, analyze and transform data using familiar languages, such as Python and SQL, interactively.
- Hortonworks Sandbox - It's a personal, portable Hadoop environment that comes with a dozen interactive Hadoop tutorials.
- R - It is a free software environment for statistical computing and graphics.
- RStudio - Powerful IDE for R, free and open source, works on Windows, Mac and Linux.
- Weka - Application with graphical interface for reading data, pre-processing and machine learning algorithms.
- Anaconda Cloud - Anaconda Cloud is where data scientists share their work. You can search and download popular Python and R packages and notebooks to start your data science work.
- Data Science Toolbox - It is a virtual environment based on Ubuntu Linux that is specifically suited for doing data science.
- Datadog Solutions, code and devops for high-scale data science.
- Kite Development Kit - It's a high-level data layer for Hadoop. It is an API and a set of tools that accelerate development. You configure how Kite stores your data on Hadoop, instead of creating and maintaining that infrastructure on your own.
- Domino Data Labs - Run, scale, share and deploy your models without any infrastructure or configuration.
- Apache Flink A platform for efficient, distributed and general purpose data processing.
- Apache Hama - It is a high-level open source project from Apache, allowing you to do advanced analysis beyond MapReduce.
- Weka - It is a collection of machine learning algorithms for data mining tasks.
- Octave - It is a high-level interpreted language, intended mainly for Free Matlab numerical calculations.
- Apache Spark - Extremely fast cluster computing.
- Hydrosphere Mist - a service to expose Apache Spark analytics jobs and machine learning models as real-time, batch or reactive web services.
- Torch - It is a scientific computing framework with extensive support for machine learning algorithms that puts GPUs first.
- Neon - Nervana's Python based Deep Learning Framework - It is Intel's reference deep learning framework, committed to the best performance on all hardware. Designed for ease of use and extensibility.
- Skale - High Performance Distributed Data Processing in NodeJS.
- Aerosolve - A machine learning package designed for humans.
- Datawrapper 1 - An open source data visualization platform that helps everyone to create simple, correct and embeddable graphics.
- Datawrapper 2 - It's also on GitHub.
- Natural Language Toolkit - It is a leading platform for creating Python programs to work with human language data.
- nlp-toolkit for node.js - This module covers some basic principles and implementations of nlp.
- Julia - High-level, high-performance dynamic programming language for technical computing.
- IJulia - A Julia language backend combined with the Jupyter interactive environment.
- Apache Zeppelin - eb-based notebook that allows data usage, interactive data analysis and collaborative documents with SQL, Scala and more.
- Featuretools - An open source framework for automated resource engineering written in Python.
- Optimus - Cleaning, pre-processing, resource engineering, exploratory data analysis and easy ML with PySpark back-end.
- DVC - An open source data science version control system. It helps to track, organize and make data science projects reproducible.
- Markdown - Markdown Guide is a free, open source reference guide that explains how to use Markdown, the simple and easy to use markup language that you can use to format almost any document.
- Git - It's a free, open source distributed version control system designed to handle everything from small to very large projects, with speed and efficiency.
- Bitbucket - It's more than just Git code management. Bitbucket gives teams a place to plan projects, collaborate on code, test and deploy.
- GitHub - Development platform inspired by the way you work. From open source to business, you can host and analyze code, manage projects and build software.
- GitBook - Documentation made easy. Helps your team to write, collaborate and publish content online.
- Pivotal Tracker - It is the agile project management tool of choice for developers worldwide for real-time collaboration around a prioritized and shared backlog.
- Stack Overflow - It is the largest and most trusted online community for developers to learn, share their knowledge and build their careers.
- NotABug - Open source code collaboration platform for freely licensed projects.
- Kite - It is a cloud-based co-pilot that increases your programming environment.
- reddit - Offers the best of the internet in one place. Get a constant update of news, fun stories, photos, memes and videos just for you.
- Online Box Plot Generator - Box Plot Statistics Calculator.
- Grafana - Data visualization and monitoring with support for Graphite, InfluxDB, Prometheus, Elasticsearch and many other databases.
- Graph Viz - Leading platform for visualization and exploration for all types of graphics and networks. Gephi is open source and free.
- Tableau - Visualization of interactive data focused on Business Intelligence.
- Collaboratory - It's a free Jupyter notebook environment that requires no configuration and runs entirely in the cloud.
- Vega - Vega is a declarative format for creating, saving and sharing visualization projects. With Vega, visualizations are described in JSON and generate interactive visualizations using HTML5 Canvas or SVG.
- Vega - VOYAGER - It is a visualization browser for exploring open data. It provides a gallery of recommended views, produced by the Compass view recommendation engine.
- Python Anywhere - Host, run and code Python in the cloud.
- Neo4j - It is a graphical database management system.
- Docker - It is a software technology that provides containers, providing an additional layer of abstraction and automation of operating system level virtualization in Windows and Linux.
- Binder - It is a Git repository that has been equipped with the appropriate compilation files so that its content can be connected to a BinderHub instance. These repositories currently live mainly on GitHub, although we plan to support more online repositories, such as GitLab or BitBucket.
- IPython - Interactive interpreter for several programming languages, but especially focused on Python.
- Overleaf - LaTeX, Evolved. The easy-to-use, online and collaborative LaTeX editor.
- RED HAT - OpenShift - Deployment and management of container-based software. It is a supported distribution of Kubernetes using Docker and DevOps tools for accelerated application development.
- InfluxData
- Apache PredictionIO - Machine learning as a service.
- Google Colaboratory
- Jupyter
- Anaconda
- edgedb
Below is a list of downloads:
- LibGen ou Library Genesis - It's a search engine for scientific articles and fiction books, has more than 2 million scientific articles (which are published by researchers from universities around the world) and 2.7 million fiction books in several languages, mainly English, but it is possible to find content in Portuguese.
- Sci-Hub - It's an online repository with more than 64 million scientific articles, available on its website. New documents are sent daily through the domains of educational institutions, which bypass systems that restrict access to Internet users without paid records on their websites. It was founded by a neuroscientist from Kazakhstan. To get a scientific article, just place the DOI (Digital Object Identifier - a standard for identifying digital objects) in the search field and the website will redirect you to purchase the article. A good website for picking up DOIs is at ScienceDirect.
- Scielo - Scientific articles in Portuguese Scielo is a digital library of FAPESP, CNPq, Pan American Health Organization, Virtual Health Library and the Support Foundation to the Federal University of SP, where thousands of articles from all areas can be found in Portuguese and easily downloaded.
- Z-Library - The Z library is one of the largest online libraries in the world. We aim to make literature accessible to everyone.
- startpage - The world's most private search engine.
- Open Library - This site allows you to borrow digital books in English.
- ScanLibs - IT Ebooks Free Download PDF, EPUB, MOBI! Elearning Video For Programming Free Download MP4, AVI!
- All IT ebooks - Free IT eBooks Download.
- Free Online Books
Projects to facilitate study:
- The Data Science & Engineering Society
- restsims
- Netron - Is a viewer for neural network, deep learning and machine learning models.
- Data Visualization Curriculum
- Deep Learning Models
- Data-Science--Cheat-Sheet
- Digital Tools for Citizen Science
- Big List of Naughty Strings
- Progress Bar
- Computer Vision and Image Processing Tutorials
- Albumentations
- PyMatting: A Python Library for Alpha Matting
- Dask - Is a flexible parallel computing library for analytics.
- cookiecutter-spacy-fastapi
Links from different sites:
- What is the importance of Exploratory Data Analysis? [PT-BR]
- Learning about web scraping in Python using BeautifulSoup [EN-US]
- Docker images ready to run containing Jupyter applications [EN-US]
- Stanford - CS 229 - Machine Learning, Deep Learning - Professor Thiago Marques [EN-US]
- Your company does not need a Data Lake to start a Machine Learning project [PT-BR]
- lambda, map and filter in Python [EN-US]
- List comprehension in Python [EN-US]
- Faster Data Science Education - KAGGLE [EN-US]
- Stanford Project and Poster Reports, Spring 2018 [EN-US]
- These notes and tutorials are intended to complement material from Stanford's CS230 Deep Learning class [EN-US]
- Guide: How to contribute in Open Source
- Object Oriented Programming in Python: How to use inheritance in Python [PT-BR]
- 18 machine learning platforms for developers [EN-US]
- 5 quick and easy visualizations of Python data with code [EN-US]
- The best guide to data classes in Python 3.7 [EN-US]
- Starting a Python Project with Anaconda [EN-US]
- PEP 8 - Python Code Style Guide [EN-US]
- The Python tutorial [PT-BR]
- Welcome to the Basemap Matplotlib 1 Toolkit documentation [EN-US]
- Drawing a background map for the Basemap Matplotlib 2 [EN-US]
- How to use Jupyter Notebooks and pandas to analyze data [EN-US]
- Artificial Neural Networks [PT-BR]
- How to know if your Machine Learning model is really working [PT-BR]
- The mathematics of machine learning [EN-US]
- When to use MLP, CNN and RNN neural networks [EN-US]
- 24 Ultimate Data Science Projects boost your knowledge and skills to access for free [EN-US]
- How to search efficiently [PT-BR]
- Graphic lies, misleading visuals: Reflections on the challenges and pitfalls of evidence-driven visual communication [EN-US]
- A series of Jupyter notebooks to help data scientists get started with Python and Neo4j [EN-US]
- Examples of Matplotlib Lines, Bars and Markers [EN-US]
- Bar Chart Annotations With Pandas and Matplotlib [EN-US]
- HashTran-DNN: A framework to improve the robustness of deep neural networks against adverse malware samples [EN-US]
- What is this Apache Kafka? [EN-US]
- First steps at InfluxDB [PT-BR]
- This notebook walks through basic code samples for integrating various packages with Neo4j, including py2neo, ipython-cypher, pandas, networkx, igraph, and jgraph [EN-US]
- Bar graph in matplotlib [EN-US]
- How to use the FISH / QTCR / 5SS method to read scientific articles [PT-BR]
- Top 20 Python Libraries for Data Science [PT-BR]
- How to Become a Data Scientist Before Graduating [EN-US]
- Data Science career paths: different roles in the industry [EN-US]
- 12 Useful Python Techniques in Python for Data Manipulation [PT- BR]
- StackExchange Data Manager [EN-US]
- What to consider when choosing colors for data visualization [EN-US]
- git and github part 1: what are they and how to use them? [PT-BR]
- Six open source panels to organize your data [EN-US]
- Laboratory of Immaterial Materials and Data Harvesting [EN-US]
- The 10 most popular coding challenge sites for 2017 [EN-US]
- Jupyter IPython Notebook quick start guide [EN-US]
- Jupyter Notebook [EN-US]
- IPython [EN-US]
- The PHP framework for the development of Chatbot [EN-US]
- At the end of the day: what is Kubernetes? [PT-BR ]
- JupyterLab - your personal science data bank [EN-US]
- JupyterLab - a great tool for data scientists! [EN- US]
- A curated list of amazing Jupyter projects, libraries and resources [EN-US]
- Data Analysis in Python [EN-US]
- Hitchhiker's Guide to Exploratory Data Analysis [EN-US]
- Hitchhiker's Guide to Exploratory Data Analysis (Part 2) [EN-US]
- 5945852-2 Connectionist Psychology - PG Psiciobiologia USP / RP - Prof. Antonio C. Roque [PT-BR]
- Are we there yet? - Dataset [EN-US]
- Cornell University Library [EN-US]
- Kernel Eensity Estimation (KDE) [EN-US]
- Data Visualization Catalog - All Charts [ES-ES]
- Economics [EN-US]
- Learn Computing with Python’s documentation! [PT-BR]
- Kaggle Github Plugin [EN-US]
- Understanding the kernel trick [EN-US]
- The Art of Cleaning Your Data [EN-US]
- datascience-br [EN-US]
- Particle Swarm Optimization from Scratch with Python [EN-US]
- PSO Resources [EN-US]
- Deep Learning Book [PT-BR]
- Top Examples of Why Data Science is Not Just .fit().predict() [EN-US]
- 10 Data Science libraries for Python that nobody tells you [EN]
- The (semi) definitive Guide for Data Lakes [EN-US]
- Linear Regression and its role in Data Science [PT-BR]
- Radial Basis Functions: sin x [EN-US]
- How to run Linear regression in Python scikit-Learn [EN-US]
- 19 Data Science and Machine Learning Tools for people who Don’t Know Programming [EN-US]
- How-to: Create a Simple Hadoop Cluster with VirtualBox [EN-US]
- Hadoop for Beginners- Part 1 [EN-US]
- Teste Kolmogorov-Smirnov [EN-US]
- Perform the Kolmogorov-Smirnov test for goodness of fit [EN-US]
- Demystifying Data Science For All [EN-US]
- Web Face Recognition [PT-BR]
- Study plan for machine learning with content in Portuguese. [PT-BR]
- 23 data science and data enrichment startups [PT-BR]
- How to understand Machine Learning through food [PT-BR]
- Compressing and Extracting Files in Python [EN-US]
- Play with neural networks! [EN-US]
- Create smart AWS diagrams [EN-US]
- Tidymodels [EN-US]
- Hitchhiker’s guide to Exploratory Data Analysis(Part- 2) [EN-US]
- Differences between segmentation and clustering [PT-BR]
- Feature Selection a Silver Bullet [PT-BR]
- Classifying Spotify Songs with SVM (With Python Codes) [PT-BR]
- How to Use Machine Learning to Predict Stock Prices on the Stock Exchange - The Complete Case Study. [PT-BR]
- Time Series Modeling with Python - SciPy-SP [PT-BR]
- Linear Regression [PT-BR]
- How to Create a Simple Model to Predict Time Series Using Machine Learning in Python [PT-BR]
- Applying Linear Regression with Scikit-Learn [PT-BR]
- 7 TECHNIQUES FOR REDUCING DIMENSIONALITY [PT-BR]
- 10 Machine Learning Algorithms you need to know [PT-BR]
- Understand the types of machine learning algorithms and their applications [PT-BR]
- Python Serverless Microframework for AWS [EN-US]
- 8 Machine Learning Algorithms in Python – You Must Learn [EN-US]
- 10 simple hacks to speed up your data analysis in Python [EN- US]
- 10 reasons why software development projects fail [EN-US]
- Creating an NLP model for classifying tweets with fklearn [EN-US]
- Ludwig is a toolbox that allows you to train and test deep learning models without the need to write code. [EN-US]
- Do you want to become a data engineer? Here is a comprehensive list of resources to get started [EN-US]
- The Problemeter: A spreadsheet that helps startups solve the right problem [EN-US]
- Beginners Guide to Modeling Python Topics [EN-US]
- Gensim Tutorial - A Complete Guide for Beginners [EN-US]
- Pre-processing of text data: A step by step in Python [EN-US]
- 10 more Data Science libraries for Python that nobody tells you [EN-BR]
- HTML microdata - Defines new HTML attributes to incorporate simple machine-readable data into HTML documents.
- Stacked Capsule Autoencoders
- DuckDB is an embeddable SQL OLAP database management system
- nltk
- Machine Learning Rest API in a Docker
- Building a Movie Recommendation Service with Apache Spark & Flask - Part 2
- Standardizing the World of Machine Learning Web Service APIs
- How to deploy ML models using Flask + Gunicorn + Nginx + Docker
- The Open Source Computer Science Degree
- Banguroo
- MACHINE LEARNING FOR COMPUTER SECURITY
- MLSec Project
- Fraud Detection
- 10 Simple hacks to speed up your Data Analysis in Python
- How to deploy ML models using Flask + Gunicorn + Nginx + Docker
- How to Build a Simple Machine Learning Web App in Python
- dongweiming/data-analysis
- Kyso
- MapMyCab: How I Chose a Data Engineering Project
- INSIGHT DATA ENGINEERING FELLOWS PROGRAM
- The Data Engineering Cookbook
- Using iloc, loc, & ix to select rows and columns in Pandas DataFrames
- Flask by Example – Text Processing with Requests, BeautifulSoup, and NLTK
- Data Engineering — the Cousin of Data Science, is Troublesome
- Scrapy Vs Selenium Vs Beautiful Soup for Web Scraping
- WIBD Workshops 2018
- Predicting the Fake News using Python
- EasyOCR
- CRISP-DM, SEMMA and KDD: find out the best techniques for data exploration
- Between causes and effects: how to identify causality amid correlations
- Recurrent Neural Networks: a brief introduction
- Python Fundamentals for Data Analysis
- Introducing New Relic’s Dynamic Baseline Alerts - Time series analysis.
- TIME SERIES CHARACTERISTICS
- Novelty and Outlier Detection - Anomaly detection.
- Anomaly Detection, a short tutorial using Python - Anomaly detection.
- mxGraph - Graph / data flow widgets.
- Altair - Altair is a declarative statistical visualization library for Python, based on the powerful visualization grammar Vega-Lite -Visualization.
- Bokeh - Bokeh is a Python interactive visualization library that targets modern browsers for presentation -View.
- Matplotlib D3 (mpld3) - The mpld3 project brings together Matplotlib, the popular Python-based graphics library, and D3js, the popular JavaScript library for creating data visualizations interactive for the web -View.
- Data-Driven Documents - D3.js is a JavaScript library for manipulating documents based on data -View.
- Using Flask to serve a machine learning model as a RESTful web service - Article.
- How to Build a Simple Machine Learning Web App in Python
- docker-ml-api
- Springer has released 65 Machine Learning and Data books for free
- Python Data Streamer [EN-US]
- docker_django_mongodb [EN-US]
- There is a bot for
- Gartner Hype Cycle Data Science 2019
- Cookiecutter Data Science
- Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data
- Gradient
- Standardizing Machine Learning World Web Service APIs - Article.
- Machine Learning Rest API on a Docker - Article.
- Using Flask to serve a machine learning model as a RESTful webservice
- Fastapi Vue.js - Python Javascript Integration
- Building a movie recommendation service with Apache Spark & Flask - Part 2 - Article.
- Nvidia Fundamentals of Deep Learning for Computer Vision
- Complex Data Mining
- Amdahl's Law
- What mathematics do I need to become a data scientist?
- Downloadable: Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Data Science PDF
- Awesome production machine learning [EN-US] - This repository contains a curated list of amazing open source libraries that will help you deploy , monitor, version, scale and protect production machine learning.
- Introduction to SVELTE - A JavaScript framework for building fast, beautiful and responsive user interfaces.
- 6 books ‘non-techs’ for Data Sciencea [EN-US]
- TOP 20 PYTHON LIBRARIES FOR DATA SCIENCE
- Grafana with InfluxDB and Telegraf to generate graphics
- DATA SCIENCE SCHOOL
- Open Source Government
- Data Science for a Cause
Below are lists with more content that increase the capacity of this list to x1000:
- Awesome Deep Learning Project Ideas
- Awesome production machine learning
- Starting a Python Project with Anaconda
- Awesome Graph Classification
- Awesome Research Tools
- Awesome dataviz
- Awesome Microservices
- Python Web Scraping
- Awesome Data Engineering
- Awesome Big Data
- Awesome Jupyter
- Data Science IPython Notebooks - Data Science: Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS and several command lines.
- Awesome R - A curated list of amazing R packages, frameworks and software.
- Awesome Public Datasets - A topic-centric list of high quality open data sets in public domains. Galaxy Data Scientist Guide - This repository was made by and for the community.
- Awesome Indexed - Search the Awesome data set.
- Awesome Search - Quick search for awesome listings.
- Awesome Machine Learning On Source Code [EN-US] - Legal links and research articles related to Learning Machine applied to the source code (MLonCode).
- Awesome Data Science [EN-US] - An open source data science repository to learn and apply in real-world problem solving.
- Awesome [EN-US] - Curated list of impressive lists.
- Open Data Sources [EN-US] - Open Data Sources.
- Github free data source list [EN-US] - Github's large list of sets public data.
- Public Git Archive [EN-US] - Public Git archive.
- Datasharing [EN-US] - The Leek group's guide to data sharing.
- Awesome Awesomeness [EN-US] - An impressive curated list.
- Awesome Machine Learning [EN-US] - A curated list of amazing Machine Learning structures, libraries and software.
- Lists [EN-US] - List of useful, silly and impressive lists, selected on GitHub.
- Awesome dataviz [EN-US] - A curated list of libraries and impressive data visualization resources.
- Awesome Python [EN-US] - A curated list of impressive Python structures, libraries, software and resources.
- Python Web Scraping - Awesome Github repository.
- Data Science IPython Notebooks [EN-US] - Data Science: Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras) , scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS and several command lines.
- Awesome R [EN-US] - A curated list of amazing R packages, frameworks and software.
- Awesome Public Datasets [EN-US] - A topic-centric list of high quality open data sets in public domains.
- Machine Learning & Deep Learning Tutorials [EN-US] - This repository contains a curated list of topics from Machine Learning and Detailed Learning tutorials, articles and other resources. Other impressive lists can be found in this list.
- Awesome Artificial Intelligence use cases [EN-US] - A list of impressive and proven artificial intelligence cases and applications.
- Top-down learning path: Machine Learning for Software Engineers [EN-US] - A complete daily plan to study to become a machine learning engineer.
- Data Science Tutorials and Courses [EN-US] - Learn Data Science online from the best submitted Data Science courses and tutorials and voted by the programming community. Mathematics and Statistics courses required for Data Science are also included here.
- The Free Big Data Sources Everyone Should Know [EN-US] - The sources free Big Data that everyone should know.
- 20 Big Data Repositories You Should Check Out [EN-US] - 20 large data repositories that you should check out.
- Awesome Public Datasets [EN-US] - Public data sets.
- Awesome Public Datasets [EN-US] - A list of high quality open data sets in public domains.
- Galaxy Data Scientist Guide [EN-BR] - This repository was made by and for the community.
- Awesome production machine learning [EN-US] - This repository contains a curated list of amazing open source libraries that will help you deploy , monitor, version, scale and protect production machine learning.
Space to add Data Science notes:
ADD NOTES HERE
Yes! More than a thousand words are worth ..
In the folder image, you will find a compilation of images referring to Data Science.
Remember!
Copying everything from StackOverflow, doesn't make you understand anything, it just makes you a good copier!