Data science is the domain of study that deals with vast volumes of data using modern tools and techniques to find unseen patterns, derive meaningful information, and make business decisions. Data science uses complex machine learning algorithms to build predictive models.
The data used for analysis can come from many different sources and presented in various formats.
- The power of data science comes from a deep understanding of statistics and algorithms, programming and hacking, and communication skills. More importantly, data science is about applying these three skill sets in a disciplined and systematic manner.
Top 5 Data Science Projects with Source Code to kick-start your Career by Data Flair Training
The Data Science Lifecycle Data science’s lifecycle consists of five distinct stages, each with its own tasks this is after you frame the problem:
1). Capture: Data Acquisition, Data Entry, Signal Reception, Data Extraction. This stage involves gathering raw structured and unstructured data.
2). Maintain: Data Warehousing, Data Cleansing, Data Staging, Data Processing, Data Architecture. This stage covers taking the raw data and putting it in a form that can be used.
3). Process: Data Mining, Clustering/Classification, Data Modeling, Data Summarization. Data scientists take the prepared data and examine its patterns, ranges, and biases to determine how useful it will be in predictive analysis.
4). Analyze: Exploratory/Confirmatory, Predictive Analysis, Regression, Text Mining, Qualitative Analysis. Here is the real meat of the lifecycle. This stage involves performing the various analyses on the data. 5). Communicate: Data Reporting, Data Visualization, Business Intelligence, Decision Making. In this final step, analysts prepare the analyses in easily readable forms such as charts, graphs, and reports.
(i). Modern Python programming language and Version control system.
(ii). SQL and Databases - Non relational and relational databases. - Postgres, MySQL, MongoDB, and DynamoDB
(iii). Cloud services, AWS and GCP
(iv). Application of Data Science.
(v). Task management systems like Jira.
(i). Python programming environment, preferably jupyter notebook, but you can use any of your choice.
(ii). SQL server, MySQL workbench , postgresql server. We will install this together, but you can install dbeaver or dbvisualizer.
(iii). An account with AWS and Google GCP.