/data-science-resource-inventory

A Data Science repository to facilitate your search for sites related to the area through the concept of resource inventory.

Data Science Resource Inventory 💻

A Data Science repository to facilitate studies.

[WARNING] The links cited here were extracted from various sites and are for study use, I am sharing them so that as well as I can learn, I hope I have not infringed any copyright; and if your website, repository or any other link is here and the owner would not like it to be, please contact us so you can withdraw !THANKS FOR UNDERSTANDING!

This part is for Younger Padawans or Jedi in Data Science



TO-DO

repository updates:

  • Open sites and verify that they are active and that they are part of the repository.
  • Add [EN-US] or [PT-BR] to the links

GO TO INDEX


Motivation

This is a repository of shortcuts to start studying Data Science.

An important addendum is that I intend to focus on the security area. So, in this repository I will have a part where I will leave some links and information sheets focused on Information Security.

We start with a What is Data Science ?. Basic readings for you to understand more about what Data Science is and what I must study to be a professional desired by companies.

The next steps are separated into Courses (MOOCs), for learning; Data set for study; Blogs that bring together the hottest and most up-to-date topics in the area, computer links for general knowledge; Youtube links to the best courses mentioned by the community; tools for producing analysis; online magazines; links to books for study and to buy; sites that gather competitions to increase the skills of Data Scientist; Main tool tutorials; Some of the best lists and repositories for studying Data Science; search links.

The idea of the list is not to make me a Unicorn Data Scientist, but when I am a Data Scientist I have a In the hall of everything that encompasses the area, and can talk to the various professionals involved in the area so that manage to define the best projects and approaches to which I will work.

References have been gathered here that have appeared throughout the studies and from various researches to facilitate studies and understanding. It is not a step by step to become a Data Scientist, but it serves as a Guide for those seeking knowledge in the field, and a repository for easy access without having your browser full of disorganized favorites.

So it follows a structure where I will address about:

  • Data Engineer
  • Data Scientist
  • Machine Learning Engineer

Data Engineer Responsible for taking raw data from various sources and placing the data in a Data Lake, a database where other team members will access. Responsible for bringing, processing and making data available from different sources in one place for the team. In some places known as Big Data Developer, who would be a Data Engineer, someone who will make the data available, but who has knowledge in Big Data, that is, has knowledge of programs that will work with large data sets such as Apache Spark or Apache Hadoop. The function of the Data Lake is only to store the data, the treatment will be done at the time of reading the data. The idea of Data Lake is because a Data Warehouse has data processed and cleaned, it would take longer and consequently lose data / value. A Data Pipeline architecture supports batch and processing and real-time. When talking about a distributed system, we are not necessarily talking only about the Hadoop architecture.

Data Scientist Responsible for taking the data provided by the Data Engineer and rationalizing that data. Take the data and create Machine Learning models to solve the problems. Create Prediction Models or Classification Algorithms to solve some things. You take the information that is likely to be inside a Data Lake, and the Data Scientist will streamline these things, streamline these models and try to find the best model possible to improve the results. It is the guy who will be trying to rationalize the data, think about the best solutions and try to solve the problems.

Machine Learning Engineer He comes to put the model that Data Scientist created in Production. It basically takes the model that the Data Scientist and puts it in a scalable way.

GO TO INDEX


Introductory Area

First Learn Python:

You don't need to know Python in PRO mode, for example, but to enter this world you need certain basic knowledge:

Second Prepare the PC:

I suggest Anaconda Navigator for those who are starting their studies in this field. It is Open Source for the programming languages Python and R. And it has all the necessary tools.

Third Libraries:

The following are some libraries dedicated to the study of Data Science. There are several libraries that can be used, facilitating data analysis. Some must-have libraries for learning:

  • Numpy - Library for arrays and mathematical functions.
  • Matplotlib - For plotting graphs and visualizing data.
  • OpenCV - For viewing and editing images via Python.
  • Virtualenv - It is a tool to create isolated Python environments. The basic problem to be solved is one of the dependencies and versions and indirectly permissions.
  • Pandas - It is an Open Source library, licensed by BSD, that provides high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
  • CharmPy - It is a high level parallel and distributed programming framework with a simple and powerful API, based on Python migrable objects and remote method invocation; built on top of an adaptable C / C ++ runtime system that provides speed, scalability and dynamic load balancing.
  • Pip - It is a package management system used to install and manage software packages written in the Python programming language.
  • SciPy - It is an Open Source library in Python language that was made for mathematicians, scientists and engineers.
  • Urllib - It is a Python module for searching URLs.
  • Beautiful Soup - It is a Python package for analyzing HTML and XML documents.
  • Papermill - A tool to parameterize, execute and analyze Jupyter Notebooks.
  • Nteract - It is a dynamic tool to give flexibility when writing code, exploring data and creating text to share insights about the data.
  • RISE - "Live" Reveal.js Jupyter / IPython slide show extension.
  • Scikit-learn - Python library with all kinds of algorithms.

Note: For those who want to deal with neural networks / Deep Learning, it is another trail. There are four major frameworks: TensorFlow, Keras, PyTorch and Theano. TensorFolw being the most known and used.

Fourth TensorFlow:

First, get to know Playground Tensorflow

After reading this material, it's time for installation:

  • Install CUDA Toolkit, and check that the system variables are correct.
  • Install CUDA Toolkit drivers
  • Install cuDNN
  • Install TensorFlow, CPU or GPU version (preferably only one installation)

When installing, follow own step-by-step instructions TensorFlow

An addendum: as you can see, the focus is on NVIDIA. Regarding the AMD GPU, I am unaware.

Installed? Tested it? Did it roll? Now you don't know where to start? Here are tips:

In the tutorials this is clear, but I reinforce: learn to use the TensorBoard, manager and viewer of the TensorFlow neural networks. Until you save the current network status to reload later, you can:

Fifth: Some more things

**Valuable Tips ** Below are some tips for you to go on your way:

  1. Know the trail made by Leonardo Ferreira who in 1 year and a half became a data scientist and is in the 30th position worldwide as a data scientist [Kaggle] (https://www.linkedin.com/pulse/data-science- from-zero-kaggle-kernel-master-leonardo-ferreira /).
  2. Organize your studies and don't mix or try to absorb many of the teachings.
  3. Have profiles on Linkedin, GitHub, Kaggle and Hacker Rank. Have a Twitter developer account for text mining.
  4. Go deeper by drinking teaching from various sources! example: when studying Python, read e-books and several other materials from different sources because each course has a different didactic and if you "pack" in some ok subject, it is common: look for other didactics until you understand.
  5. If you want and can, invest in paid courses and specializations.
  6. Use and learn with concept maps

Kind of obligatorily, understand:

  • Predictive modeling.
  • Naive Bayes.
  • Time Series Analysis and Visualization.
  • Exploratory data analysis.
  • Statistics.
  • Univariate analysis.
  • Bivariate analysis.
  • Graphics for when and how to use.
  • Qualitative and Quantitative Variables.
  • Basic mathematical requirements ..
  • Notions of analytical and numerical optimization.
  • Discover tools for data extraction on the web.
  • Basics of linear algebra, eigenvectors, eigenvalues, base changes, among others.
  • Basic probability and statistics: conditional probability, basic formulas, most common distributions, basic metrics, regression, rˆ2, p-value, inference, among others.
  • It is worth knowing the least about: Amazon AWS server and Amazon QuickSight and Microsoft Power BI.
  • And also: version control, markdown, git, GitHub, R and RStudio.

Like anything you want to learn, you should get involved with it, and a tip is always do not try to understand everything at once, take it easy. Frequent communities. The Python community in Brazil is one of the strongest and most active I have ever seen.

And that's it: this is the Area's Introductory Package. You are now able to start experimenting with the area.


Comments:

  • The above script is not the only and not necessarily the best way to learn; it reflects what I intend to acquire as knowledge and I have noted in meetings and lectures and conversations in my career in the area.

GO TO INDEX


What is Data Science

Below is a list of favorite sites that have a variety of topics related to computing in general:

GO TO INDEX


Mentors and Role Models

It is always nice to add a short column for role models nearby that inspire you:

GO TO INDEX


Themes

Study Links.


Mathematics

Computer Science study projects:

  • Mathematics [EN-US] - E-prints from Cornell University related to Mathematics.
  • Quantitative Finance [EN-US] - E-prints from Cornell University related to Quantitative Finance.
  • Statistics [EN-US] - E-prints from Cornell University related to Statistics.
  • Econometrics [PT-BR] - Econometrics is a study that uses mathematical and statistical methods to evaluate theories on economics and finance.
  • Math and Science Done Right [EN-US] - Studies in Mathematics, Science and Engineering through small interactive learning experiences.
  • Terence Tao Website [EN-US] - Updates research and expository papers, discussion of open problems, and other maths-related topics.

GO TO INDEX


API PSI

The Protocols and Structures for Inference (PSI) project aims to develop an architecture for presenting machine learning algorithms:

GO TO INDEX


Visualization

Below is a list of Tools, Environments and Libraries for Data Scientists:

  • Scikit-Learn - Machine learning in Python.
  • NumPy - It is fundamental for scientific computing with Python. It supports large, multidimensional arrays and arrays and includes a variety of high-level math functions to operate on these arrays.
  • SciPy SciPy works with NumPy arrays and provides efficient routines for numerical integration and optimization.
  • Tensor Flow - TensorFlow is an open source software library for machine intelligence.
  • nbviewer - Render Jupyter Notebooks as static web pages.
  • Matplotlib - 2D plotting library in Python that produces quality publication numbers in a variety of printed formats and interactive environments across platforms.
  • seaborn - Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphs.
  • Auto PY to EXE - Convert .py to .exe using a simple graphical interface.
  • plot.ly - Data visualization library for Python.
  • Caffe - Deep learning structure made with expression, speed and modularity in mind.
  • Albumentations - A fast and agnostic image augmentation library framework that implements a diverse set of augmentation techniques.
  • Ember Charts - A powerful and easy to use graphics library for Ember.js.
  • amCharts - Libraries and tools for all your Data Visualization needs.
  • AnyChart - It's a set of flexible HTML5 JavaScript libraries for all your data visualization needs.
  • cartodb - Mapping tool.
  • Cube - System to collect timestamp events and derive metrics.
  • d3plus - Data visualization made easy.
  • D3js - Data-Driven Documents - JavaScript library for manipulating documents based on data.
  • dygraphs - Flexible and fast open source JavaScript graphics creation library.
  • exhibit - Allows you to easily create web pages with advanced text search and filtering features, with interactive maps, timelines and other visualizations.
  • Gatherplot - Generalized scatter plots for nominal data.
  • ggplot2 - System for declaratively creating graphics, based on The Grammar of Graphics.
  • Glue - Python library for exploring relationships within and between related data sets.
  • Google Chart Gallery - Provides a variety of charts designed to meet your data visualization needs.
  • jqplot - Plot and graphics plugin for the jQuery JavaScript framework.
  • nvd3 - Build reusable graphics and graphics components.
  • Opendata-tools - List of tools to explore, publish and share public data sets.
  • Openrefine - A powerful and free tool for working with confusing data.
  • raw - The missing link between spreadsheets and data visualization.
  • techanjs - A visual and technical analysis and graphics library based on D3. Create interactive financial charts for modern and mobile browsers.
  • Timeline - Open source tool that allows anyone to build interactive and visually rich timelines.
  • variancecharts - Allows engineers, designers, journalists, scientists and analysts to create elegant and personalized data graphics for the Web, using only HTML and CSS.
  • life - Visualization of open source data.
  • Wrangler - Interactive tool for cleaning and transforming data.
  • r2d3 - It is an experiment in expressing statistical thinking with interactive design.
  • NetworkX - Python package for creating, manipulating and studying the structure, dynamics and functions of complex networks.
  • Redash - Built to allow quick and easy access to billions of records.
  • C3 - D3-based reusable graphics library.
  • Heroku - It is a platform as a service (PaaS) that allows developers to create, run and operate applications entirely in the cloud.
  • OpenStack - It is open source software, capable of managing the components of multiple virtualized infrastructures.
  • DigitalOcean - Provides developers and companies with a reliable and easy-to-use cloud computing platform for virtual servers (Droplets), object storage (Spaces) and more.
  • Google Cloud Platform - It is a cloud computing suite offered by Google, operating on the same infrastructure that the company uses for its products aimed at users, including Google Search and YouTube.
  • Amazon Web Services Cloud - It is a platform for cloud computing services, which form a cloud computing platform offered by Amazon.
  • nbextensions - This repository contains a collection of extensions that add functionality to the Jupyter notebook.
  • tqdm - Instantly make your loops show an intelligent progress meter.
  • hchart - This generic function can graph various R objects in real time.
  • pyswarms - A research toolkit for optimizing particle swarms in Python.
  • MoviePy - Python module for video editing, video composition, video processing or to create advanced effects.
  • requests-toolbelt
  • nltk
  • stanza
  • Nbviewer
  • Auto PY to EXE
  • SciPy
  • chartjs

GO TO INDEX


Competitions

Below is a list of sites to put the knowledge of Data Analysis into practice:

  • Exercise List for Python [PT-BR] - This is a list with suggestions for programs for beginners in programming.
  • URI Online Judge [EN-US] - The main objective is to promote the practice of programming and knowledge sharing.
  • Kaggle [EN-US] - Kaggle is the place to do Data Science projects.
  • DrivenData [EN-US] - Data Science competitions to save the world.
  • Analytics Vidhya [EN-US] - The last battleground for Data Scientists.
  • The Data Science Game [EN-US] - An international student challenge.
  • InnoCentive [EN-US] - Global pioneer in crowdsourcing innovation.
  • TuneedIT [EN-US] - Challenges of Machine Learning Algorithms and Data Mining.

GO TO INDEX


Front End Development

Machine learning studies addressing Front-End:

GO TO INDEX


Back End Development

Machine learning studies addressing Back-End:

GO TO INDEX


Big Data

Machine learning studies addressing Big Data:

GO TO INDEX


Theory

Machine learning studies addressing Theory:

GO TO INDEX


Development Environment

Data Science Desktop:

GO TO INDEX


Learning

Study related links to learn first:

GO TO INDEX


Business

Data Science for business study:

GO TO INDEX


Security

Data Science Security study:

GO TO INDEX


Learning Platforms

Study Links.


Open and Massive Online Course

Below is a list of sites that offer a variety of free and paid courses:

  • edX [EN-US] - Flexible programming learning.
  • Coursera [EN-US] - Learn skills from the best universities for free.
  • Udacity [EN-US] - Courses and Certifications.
  • Edraak [EN-US] - Edraak, is a massive open online course platform (MOOC), which is an initiative of the Queen Rania Foundation (QRF).
  • Open HPI [EN-US] - MOOC´S for Reading and Learning.
  • MIT OPEN COURSEWARE [EN-US] - It is a web-based publication of virtually all MIT course content, open and available to the world.
  • cK-12 [EN-US] - 100% free and personalized learning for each student.
  • Udemy [EN-US] - The largest selection of courses in the world.
  • SKILLSHARE [EN-US] - Skillshare is an online learning community with thousands of classes in design, business, technology and more.
  • Codecademy [EN-US] - Learn the technical skills you need for the job you want.
  • P2PU [EN-US] - connects educational resources open to career paths in an equitable and empowering way.
  • Saylor Academy [EN-US] - Saylor Academy is a non-profit initiative working since 2008 to offer free and open online courses for everyone who wants to learn.
  • Academic Earth [EN-US] - Find free online courses, lectures and videos from top colleges like Yale, MIT and Stanford.
  • Learn To Be [EN-US] - Non-profit organization that brings 1-on-1, online tutoring for young people.
  • Floqq - FLOQQ is the largest Spanish-language video search engine course.
  • Course Talk [CA-ES] - Discover the best courses on the web based on your interests and student feedback.
  • Marginal Revolution University [EN-US] - Creates free and engaging economic videos taught by top professors.
  • Alison [EN-US] - Free online courses with certificates.
  • Data Science Academy [PT-BR] - Community of experts in Data Science.
  • SOLYD [PT-BR] - Online training and courses.
  • DataCamp [EN-US] - Learn Data Science online.
  • Google for Education [EN-US] - Google's Python class.
  • VEDUCA [PT-BR] - Here you study for free and you can earn your certificate for a price that fits in your pocket.
  • Fundação Bradesco [PT-BR] - Escola Virtual is an educational portal that offers free courses, free of charge, in the distance mode.
  • Khan Academy [EN-US] - Offers hands-on exercises, instructional videos and a personalized learning panel that enables students to study at their own pace inside and outside the classroom of class.
  • EADCCNA [PT-BR] - Variety of online courses in IT.
  • Teaching Channel [PT-BR] - Free courses and books in the public domain.
  • 4Linux [PT-BR] - Linux and open software courses.
  • Impacta [PT-BR] - IT, Management and Design Courses.
  • Microsoft Academy [EN-US] - Microsoft Professional Program for Artificial Intelligence.
  • Microsoft Virtual Academy [EN-US] - Free Microsoft training provided by experts.
  • [Universia](http://noticias.universia.com.br/destaque/especial/2013/07/10/1035282/8/700-cursos-online-gratis-das-melhores-universidades-do-mundo/cursos -online-gratis-de-ci% C3% Computer-science% C3% A7% C3% A3o-e-intelig% C3% AIncia-artificial.html) [EN-BR] - Universia Brasil brought together 700 online courses free from the best universities in Brazil and the world. Check out courses in Computer Science and Artificial Intelligence.
  • Duolingo [EN-US] - Learn languages for free, forever.
  • e-stude [EN-US] - E-learning platform aimed at training software development teams.
  • Google Developers [EN-US] - The Machine Learning Learning Course.
  • Acclaim [EN-US] - Complete a series of online Data Science courses.
  • Data School [EN-US] - Data Science Courses.
  • Dataquest [EN-BR] - Learn Python, R, SQL, data visualization, data analysis and machine learning.

GO TO INDEX


Books

Below is a list of paid and free books:

GO TO INDEX


Course Links

Courses related to computing:


Data Science Academy

Courses related to computing:

GO TO INDEX


Course

Courses related to computing:

GO TO INDEX


Blogs

Below is a list of data science issues:

  • Blog Mining Data [PT-BR] - This project aims to help you learn more about Data Science and related areas in a practical and quick way.
  • The Statistician [PT-BR] - Blog with the mission is to promote statistics in a simple, fun and affordable way, like you've never seen before.
  • Pizza de Dados [PT-BR] - The Brazilian podcast on Data Science.
  • Post-Graduate [PT-BR] - Content and daily humor for graduate students.
  • Hackernoon [EN-US] - Hacker Noon is everything that hackers need at noon.
  • Towards Data Science [EN-US] - Towards Data Science, Sharing concepts, ideas and codes.
  • Data Science Central [EN-US] - Industry online resource for data professionals.
  • Mining the Social Web [EN-US] - A complement to the book with the simple objective of integrating the mainstream social mining of the web.
  • Becoming a Data Scientist [EN-US] - Documenting the path of SQL Data Analyst seeking a Master of Engineering for Data Scientist.
  • AllThings Data Science [EN-US] - All things about Data Science.
  • MDM - A Geeks Point Of View [EN-US] - Technology blog on master data management and every buzz around it.
  • The Open Source Data Science Masters [EN-US] - The open source curriculum for learning Data Science.
  • Data Science London [EN-US] - Data Science London is a non-profit organization dedicated to the free and open dissemination of Data Science.
  • Open Source Research [EN-US] - PhD student in the field of Operations Research at Berkeley.
  • Louis Dorard [EN-US] - A tech guy with a penchant for the web and data, big and small.
  • Machine Learning Mastery [EN-US] - About helping professional programmers to confidently apply machine learning algorithms to solve complex problems.
  • Data Science Weekly [EN-US] - A free weekly newsletter with curated news, articles and works related to Data Science.
  • Revolution Analytics [EN-US] - Daily news on the use of open source R for big data analysis, predictive modeling, data science and visualization.
  • R Bloggers [EN-US] - R-Bloggers.com is an aggregator of content blogs contributed by bloggers who write about R.
  • Datascope Analytics [EN-US] - Data-driven consulting and design.
  • Yet Another Data Blog [EN-US] - Reflections on Collective Intelligence, Data Disputes, Data Science, Predictive Modeling, Start -ups and a repository of ideas.
  • KDNuggets [EN-US] - Leader in Business Analysis, Big Data, Data Mining, Data Science and Machine Learning.
  • Data Scientist [EN-US] - Developed for data scientists to collaborate in sharing knowledge and experiences.
  • What´s The Big Data [EN-US] - Explores its impact on information technology, the business world, government agencies and our lives.
  • Decisions & Discovery [EN-US] - Focusing on science, data science, business, technology,
  • New Data Scientist [EN-US] - How a social scientist jumps into the world of big data.
  • Data Science 101 [EN-US] - Learning to be a Data Scientist.
  • Data Scientist Journey [EN-US] - Digital nomad couple talking about Data Science.
  • Dataists [EN-US] - More than seeing your model there are no heteroscedastic errors.
  • Data-Magnum [EN-US] - Provides the information, education and assessment necessary for the planning and successful implementation of Big Data projects .
  • The MapR Blog [EN-US] - Find insights, best practices and useful resources to help you leverage data more effectively in growing your business.
  • P-value [EN-US] - Reflections on data science, machine learning and statistics.
  • DATA MINERS BLOG [EN-US] - A place to read about topics of interest to data miners, ask questions to data mining experts at data miners .
  • FlowingData [EN-US] - Visualization and Statistics.
  • O'reilly Learning Blog [EN-US] - Perspectives on learning tools, technologies and methods.
  • Dominodatalab [EN-US] - Includes the post on Data Science.
  • i am trask [EN-US] - Crafts for Machine Learning.
  • Vademecum of Practical Data Science [EN-US] - It aims to share some of the problems, solutions and alternative solutions and best practices of the authors who helped them on their journey of Dice.
  • Dataconomy [EN-US] - On the new emerging data economy.
  • Vidhya Analytics [EN-US] - A complete website on data science and analysis study material.
  • Colah's Blog [EN-US] - To understand neural networks.
  • Sebastian's Blog [EN-US] - To understand NLP and transfer of learning.
  • DATAVERSITY [EN-US] - Data Education for Business and IT Professionals.
  • Science and Data [PT-BR] - The objective is to talk about the fascinating adventure of Data Science.
  • Institute of Applied Artificial Intelligence [PT-BR] - It is a non-profit organization where young students receive free education on artificial intelligence, develop projects.
  • BiaData Bussiness [PT-BR] - Information about Big Data.
  • Portal Action [PT-BR] - The largest statistical portal in Brazil.
  • HackerRank [EN-US] - It is a technology hiring platform that is the standard for assessing the skills of developers for more than 1,000 companies worldwide.
  • SQL Magazine Magazine [PT-BR] - Content about SQL.
  • DATAQUEST [EN-US] - Data science, data analysis and tutorials and data engineering articles.
  • Data Elixir [EN-US] - It is a curator of the best news, resources and inspirations from Data Science.
  • Simply Statistics [EN-US] - News and texts on statistics.
  • ClaoudML [EN-US] - Free data science and machine learning resources.
  • PyData [EN-US] - Forum for the international community of users and developers of data analysis tools to share ideas and learn from each other.
  • freeCodeCamp [EN-US] - Learn new developer skills.
  • Vooo [EN-US] - News and texts on Data Science.
  • Bitfactor [EN-US] - Thoughts about design, technology and other very important things.
  • The Fashion Robot [EN-US] - About inspiring technologies in the fashion industry.
  • OpenMined [EN-US] - An open source community focused on researching, developing and elevating tools for secure artificial intelligence.
  • Shivam Bansal's [EN-US] - Data Scientist and Kaggle Kernels Grandmaster.
  • 7WDATA [EN-US] - It's the Hotspot about new news of all things.
  • mathbabe [EN-US] - Exploring and venting on quantitative issues.
  • Hipsters Ponto Tech [PT-BR] - Podcast where the Caelum and Alura people enter into heated discussions about programming, design, ux, gadgets, startups and the latest technology.
  • Artificial Neural Networks
  • Statistical Handouts
  • Simply Statistics
  • Machine Learning Mastery
  • Acclaim Data Science
  • Dataversity
  • Khan Academy
  • HackerRank
  • KDnuggets
  • Como funciona a inteligência artificial [PT-BR]

GO TO INDEX


YouTube

Below is a list of YouTube Channels, Videos I liked and Playlist to study and keep up to date:

GO TO INDEX


Magazines

Below is a list of favorite sites to stay informed:

  • The Future of Things [EN-BR] - The future visible to all, Artificial Intelligence, Robotics, innovations and new medical technologies.
  • Chupadados [EN-BR] - This project brings together Latin American stories about the massive collection and processing of data by governments, companies and ourselves to monitor cities, homes, pockets and bodies.
  • PCWorld - Technology consultant, with analysis and product guide, tests, reviews, tips and download.
  • GSTI Portal - Content, area to answer questions, information on job vacancies, competitions and certifications.
  • The Next Web - Original and proudly opinionated perspectives on notable stories for Generation T.
  • Intel IT Center - Resources for IT Leaders.
  • indy100: discover - Various news about everything.
  • Skynet Today [EN-US] - Accessible and informed coverage of the latest AI and Panic hype.
  • Hacker News Bulletin [EN-US] - Discover the latest trends, interesting news and useful tips on hackers, hackers, cybersecurity, technology and anonymous worldwide.
  • Datatau [EN-US] - Like Hacker News, but for data.
  • Fossbytes [EN-US] - Leading source of technology news, focusing on Linux distro releases, security and hacker news, tutorials, tips and tricks, VPNs and more more.
  • ICML [EN-US] - International Conference on Machine Learning
  • EPJ Data Science [EN-US] - Publishing platform to address this evolution, bringing together all academic disciplines related to science.
  • Journal of Data Science [EN-US] - An international magazine dedicated to the application of statistical methods in general.
  • Big Data Research [EN-US] - It aims to promote and communicate advances in big data research, providing a quick and high quality for researchers, practitioners and policy makers from many different communities working on this topic.
  • Journal of Big Data [EN-US] - Publishes high quality academic papers, methodologies and case studies covering a wide range of topics, from big data analysis to data-intensive computing and all big data research applications.
  • Big Data & Society [EN-US] - It is a peer-reviewed academic journal that publishes interdisciplinary works mainly in the social sciences, humanities and computing and their intersections with the arts and natural sciences about the implications of big data for societies.
  • Data Science Journal [EN-US] - Allows you to easily search, browse and cite the latest articles published by academic societies on Japan and you can access documents using the reference reference or the cited link.
  • Coding Coach
  • Vooo Data Science
  • Bitbay
  • Quanta Magazine
  • Playing Numbers
  • Towards Data Science

GO TO INDEX


Medium

Magazine related links:

  • Hackernoon - Hacker Noon Rips Out Medium’s Software, Replaces it With Their Own.
  • The Startup - Medium's largest active publication, followed by +598K people. Follow to join our community.
  • Concretebr - We develop digital products with innovation, agility and excellent practices, for the Brazilian and Latin American market.
  • freecodecamp - Learn to code with free online courses, programming projects, and interview preparation for developer jobs.
  • geeksforgeeks - A Computer Science portal for geeks.
  • Machine Learning for Everyone
  • Becoming Human
  • Daniel Godoy [EN-US] and [PT-BR] - Data Scientist, developer, teacher and writer. Author of "Deep Learning with PyTorch Step-by-Step: A Beginner’s Guide"
  • Be a Data Scientist [PT-BR]
  • I played the dice [PT-BR] - Statistics, Python, Machine Learning: sketches & projects by a Data Scientist journalist. GO TO INDEX

Poadcast

Below are lists with more content that increase the capacity to study:

GO TO INDEX


Searches

Download Related Links.


Datasets

Below is a list of sites that have a variety of datasets for study and learning:

  • DATAQUEST [EN-US] - 18 places to find data sets for data science projects.
  • Quora's Big Datasets Answer [EN-US] - Links to sites to find great data sets open to the public.
  • ISPDados [PT-BR] - Open Data Page of the Public Security Institute. You will be able to access the databases of criminal records and police activity in the state of Rio de Janeiro.
  • BRAZILIAN OPEN DATA PORTAL [PT-BR] - More than 6 thousand data sets.
  • Google Trends [EN-US] - See what the world is searching for.
  • Sorocaba Open Data [PT-BR] - This portal makes publicly available data that is generated by municipal departments and agencies.
  • Sorocaba Transparency Portal [PT-BR] - Publication of data in open format.
  • Open Data of Capes [PT-BR] - Here you will find data and information about Brazilian postgraduate courses, about the training of teachers for basic education and others themes related to education.
  • GEOCAPES [PT-BR] - Capes Georeferenced Information System.
  • Academic Torrents [EN-US] - We are a distributed repository maintained by the community for datasets and scientific knowledge.
  • Hadoop Illuminated [EN-US] - Publicly Available Big Data Sets.
  • United States Census Bureau [EN-US] - Economic indicators from the USA Census Bureau.
  • US Government Data Sources [EN-US] - US government web services and XML data sources.
  • [Enigma] http://enigma.com/) [EN-US] - Browse the world of public data - Quickly search and analyze billions of public records published by governments, companies and organizations.
  • Datahub [EN-US] - Provides important and commonly used data as high quality, easy to use and open data packages.
  • Amazon - Open Data on AWS [EN-US] - Open data search datasets.
  • re3data [EN-US] - Data sharing made easy.
  • DataCite [EN-US] - Center for research data.
  • Quandl[ EN-US] - The main source of financial, economic and alternative data sets, serving investment professionals.
  • figshare [EN-US] - Get more citations for all your academic research results over 5000 citations of sharing content.
  • MAXMIND [EN-US] - GeoLite databases and legacies.
  • Kaggle Datasets [EN-US] - Dataset for use in Kaggle.
  • IGSR: The international genome sample resource [EN-US] - Providing ongoing support for the 1000 Genomas Project data.
  • World Bank Open Data [EN-US] - Free and open access to global development data.
  • Open Data Philly [EN-US] - It is a catalog of open data in the Philadelphia region.
  • Grouplens [EN-US] - Sample of movie data sets (with ratings), book and wiki.
  • UC Irvine Machine Learning Repository [EN-US] - Currently maintains 446 data sets as a service for the machine learning community.
  • NOAA - National Center for Environmental Information [EN-US] - They are responsible for preserving, monitoring, evaluating and providing public access to the nation's treasury of data and information and historical information about the climate.
  • MapLight [EN-US] - MapLight tracks several sets of data that you can look for evidence of the influence of money on politics.
  • GHDx [EN-US] - A catalog of health and demographic data sets from around the world and including results from the HMI.
  • UNICEF Data [EN-US] - UNICEF data on statistics and monitoring.
  • UN Data [EN-US] - UN data on statistics and monitoring.
  • The GDELT Project [EN-US] - GDELT project monitors worldwide broadcast, print and web news from almost every corner of every country.
  • San Fransisco Government Open Data [EN-US] - Search hundreds of data sets for the city and San Francisco County.
  • Global Open Data Index [EN-US] - The Global Open Data Index provides the most comprehensive snapshot available of the state of publishing open government data.
  • GHTorrent [EN-US] - A scalable, consultable and offline data mirror offered by the Github REST API.
  • Microsoft Research Open Data [EN-US] - A collection of free Microsoft Research data sets to promote cutting-edge research in areas such as natural language processing, computer vision and science domain-specific.
  • Open Government Data Platform India [EN-US] - It is a platform to support the Open Data initiative of the Government of India.
  • UCI Machine Learning Repository [EN-US] - Machine Learning Center and Intelligent Systems.
  • Google Dataset Search [EN-US] - Google Data Sets.
  • Brazil Datasets [EN-US] - Brazilian Data Set.
  • Kaggle Datasets [EN-US] - Kaggle Dataset.
  • Datasets [EN-US] - Is a lightweight library providing two main features.

GO TO INDEX


Tutorials

Data Science Tutorials:

  • Artificial Neural Networks [PT-BR] - You will see on this page an introductory tutorial on Artificial Neural Networks, especially on the Multi Layer Perceptron networks trained with BackPropagation.
  • Data Science using Python and R [EN-US] - Ways to do Data Engineering and Machine Learning in R and Python

GO TO INDEX


Binder

Project My Binder:

GO TO INDEX


Jupyterlab on AWS

Jupyterlab Tutorials:

GO TO INDEX


Tools

Below is a list of tools that make the job easier:

  • Jupyter - Project Jupyter exists to develop open source software, open standards and services for interactive computing in dozens of programming languages.
  • neptune.ml - Community-compatible platform that supports data scientists in creating and sharing machine learning models. Neptune facilitates teamwork, infrastructure management, model comparison and reproducibility.
  • Steppy 1 - Lightweight, Python library for experimenting with fast and reproducible machine learning. It features a very simple interface that allows for a clean machine learning pipeline project.
  • Steppy-toolkit 2 - Cured collection of neural networks, transformers and models that make your machine learning faster and more effective.
  • Cloud Datalab Google - Easily explore, visualize, analyze and transform data using familiar languages, such as Python and SQL, interactively.
  • Hortonworks Sandbox - It's a personal, portable Hadoop environment that comes with a dozen interactive Hadoop tutorials.
  • R - It is a free software environment for statistical computing and graphics.
  • RStudio - Powerful IDE for R, free and open source, works on Windows, Mac and Linux.
  • Weka - Application with graphical interface for reading data, pre-processing and machine learning algorithms.
  • Anaconda Cloud - Anaconda Cloud is where data scientists share their work. You can search and download popular Python and R packages and notebooks to start your data science work.
  • Data Science Toolbox - It is a virtual environment based on Ubuntu Linux that is specifically suited for doing data science.
  • Datadog Solutions, code and devops for high-scale data science.
  • Kite Development Kit - It's a high-level data layer for Hadoop. It is an API and a set of tools that accelerate development. You configure how Kite stores your data on Hadoop, instead of creating and maintaining that infrastructure on your own.
  • Domino Data Labs - Run, scale, share and deploy your models without any infrastructure or configuration.
  • Apache Flink A platform for efficient, distributed and general purpose data processing.
  • Apache Hama - It is a high-level open source project from Apache, allowing you to do advanced analysis beyond MapReduce.
  • Weka - It is a collection of machine learning algorithms for data mining tasks.
  • Octave - It is a high-level interpreted language, intended mainly for Free Matlab numerical calculations.
  • Apache Spark - Extremely fast cluster computing.
  • Hydrosphere Mist - a service to expose Apache Spark analytics jobs and machine learning models as real-time, batch or reactive web services.
  • Torch - It is a scientific computing framework with extensive support for machine learning algorithms that puts GPUs first.
  • Neon - Nervana's Python based Deep Learning Framework - It is Intel's reference deep learning framework, committed to the best performance on all hardware. Designed for ease of use and extensibility.
  • Skale - High Performance Distributed Data Processing in NodeJS.
  • Aerosolve - A machine learning package designed for humans.
  • Datawrapper 1 - An open source data visualization platform that helps everyone to create simple, correct and embeddable graphics.
  • Datawrapper 2 - It's also on GitHub.
  • Natural Language Toolkit - It is a leading platform for creating Python programs to work with human language data.
  • nlp-toolkit for node.js - This module covers some basic principles and implementations of nlp.
  • Julia - High-level, high-performance dynamic programming language for technical computing.
  • IJulia - A Julia language backend combined with the Jupyter interactive environment.
  • Apache Zeppelin - eb-based notebook that allows data usage, interactive data analysis and collaborative documents with SQL, Scala and more.
  • Featuretools - An open source framework for automated resource engineering written in Python.
  • Optimus - Cleaning, pre-processing, resource engineering, exploratory data analysis and easy ML with PySpark back-end.
  • DVC - An open source data science version control system. It helps to track, organize and make data science projects reproducible.
  • Markdown - Markdown Guide is a free, open source reference guide that explains how to use Markdown, the simple and easy to use markup language that you can use to format almost any document.
  • Git - It's a free, open source distributed version control system designed to handle everything from small to very large projects, with speed and efficiency.
  • Bitbucket - It's more than just Git code management. Bitbucket gives teams a place to plan projects, collaborate on code, test and deploy.
  • GitHub - Development platform inspired by the way you work. From open source to business, you can host and analyze code, manage projects and build software.
  • GitBook - Documentation made easy. Helps your team to write, collaborate and publish content online.
  • Pivotal Tracker - It is the agile project management tool of choice for developers worldwide for real-time collaboration around a prioritized and shared backlog.
  • Stack Overflow - It is the largest and most trusted online community for developers to learn, share their knowledge and build their careers.
  • NotABug - Open source code collaboration platform for freely licensed projects.
  • Kite - It is a cloud-based co-pilot that increases your programming environment.
  • reddit - Offers the best of the internet in one place. Get a constant update of news, fun stories, photos, memes and videos just for you.
  • Online Box Plot Generator - Box Plot Statistics Calculator.
  • Grafana - Data visualization and monitoring with support for Graphite, InfluxDB, Prometheus, Elasticsearch and many other databases.
  • Graph Viz - Leading platform for visualization and exploration for all types of graphics and networks. Gephi is open source and free.
  • Tableau - Visualization of interactive data focused on Business Intelligence.
  • Collaboratory - It's a free Jupyter notebook environment that requires no configuration and runs entirely in the cloud.
  • Vega - Vega is a declarative format for creating, saving and sharing visualization projects. With Vega, visualizations are described in JSON and generate interactive visualizations using HTML5 Canvas or SVG.
  • Vega - VOYAGER - It is a visualization browser for exploring open data. It provides a gallery of recommended views, produced by the Compass view recommendation engine.
  • Python Anywhere - Host, run and code Python in the cloud.
  • Neo4j - It is a graphical database management system.
  • Docker - It is a software technology that provides containers, providing an additional layer of abstraction and automation of operating system level virtualization in Windows and Linux.
  • Binder - It is a Git repository that has been equipped with the appropriate compilation files so that its content can be connected to a BinderHub instance. These repositories currently live mainly on GitHub, although we plan to support more online repositories, such as GitLab or BitBucket.
  • IPython - Interactive interpreter for several programming languages, but especially focused on Python.
  • Overleaf - LaTeX, Evolved. The easy-to-use, online and collaborative LaTeX editor.
  • RED HAT - OpenShift - Deployment and management of container-based software. It is a supported distribution of Kubernetes using Docker and DevOps tools for accelerated application development.
  • InfluxData
  • Apache PredictionIO - Machine learning as a service.
  • Google Colaboratory
  • Jupyter
  • Anaconda
  • edgedb

GO TO INDEX


Download

Below is a list of downloads:

  • LibGen ou Library Genesis - It's a search engine for scientific articles and fiction books, has more than 2 million scientific articles (which are published by researchers from universities around the world) and 2.7 million fiction books in several languages, mainly English, but it is possible to find content in Portuguese.
  • Sci-Hub - It's an online repository with more than 64 million scientific articles, available on its website. New documents are sent daily through the domains of educational institutions, which bypass systems that restrict access to Internet users without paid records on their websites. It was founded by a neuroscientist from Kazakhstan. To get a scientific article, just place the DOI (Digital Object Identifier - a standard for identifying digital objects) in the search field and the website will redirect you to purchase the article. A good website for picking up DOIs is at ScienceDirect.
  • Scielo - Scientific articles in Portuguese Scielo is a digital library of FAPESP, CNPq, Pan American Health Organization, Virtual Health Library and the Support Foundation to the Federal University of SP, where thousands of articles from all areas can be found in Portuguese and easily downloaded.
  • Z-Library - The Z library is one of the largest online libraries in the world. We aim to make literature accessible to everyone.
  • startpage - The world's most private search engine.
  • Open Library - This site allows you to borrow digital books in English.
  • ScanLibs - IT Ebooks Free Download PDF, EPUB, MOBI! Elearning Video For Programming Free Download MP4, AVI!
  • All IT ebooks - Free IT eBooks Download.
  • Free Online Books

GO TO INDEX


GitHub Projects

Projects to facilitate study:

GO TO INDEX


Good Separate Texts

Links from different sites:

GO TO INDEX


Awesome Lists

Below are lists with more content that increase the capacity of this list to x1000:

GO TO INDEX


Notes

Space to add Data Science notes:

ADD NOTES HERE

GO TO INDEX


Images

Yes! More than a thousand words are worth ..

In the folder image, you will find a compilation of images referring to Data Science.


Remember!

Copying everything from StackOverflow, doesn't make you understand anything, it just makes you a good copier!