This is the code repository for Statistics for Data Science, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish.
This book will take you through an entire journey of statistics from knowing very little to becoming comfortable in using various statistical methods for data science tasks. This book will start off with simple statistics and then move on to statistical methods that are used in data science algorithms. The R programs for the statistical computations will be clearly explained along with the logic. You will come across various mathematical concepts such as variance, standard deviation, probability, matrix calculations, and more. You will learn only what is required to implement statistics in data science tasks such as data cleaning, mining, and analysis. You will learn required statistics to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks.
All of the code is organized into folders. Each folder starts with a number followed by the application name. For example, Chapter02.
The code will look like the following:
MyFile <-"C:/GammingData/SlotsResults.csv"
MyData <- read.csv(file=MyFile, header=TRUE, sep=",")
This book is intended for those with a data development background who are interested in possibly entering the field of data science and are looking for concise information on the topic of statistics with the help of insightful programs and simple explanation. Just bring your data development experience and an open mind!