# Data Analysis in Software Engineering (DASE) Javier Dolado and Daniel Rodriguez (DRAFT - Not ready yet `r Sys.Date()`) This course covers sereral aspects of data analysis in Software Engieering (SE) and is been created by [Javier Dolado](www.sc.ehu.eus/jiwdocoj/) at the [University of the Basque Country](www.ehu.eus) and [Daniel Rodriguez](http://www.cc.uah.es/drg) at the [University of Alcala](http://www.uah.es/). It is mainly based on [R](http://cran.r-project.org) and [RStudio](http://www.rstudio.com/) but we will also show some examples in [Weka](http://www.cs.waikato.ac.nz/ml/weka/) and other packages. [RMarkdown](http://rmarkdown.rstudio.com/) can be compiled to html (and other formats) with RStudio's Knit. It is structured as follows: 1 [Introduction to data analysis and model building](./sections/intro.Rmd) 2 [Data Sources](./sections/dataSources.Rmd) - Sources of information - Public repositories in software engineering 3 [Preprocessing Techniques](./sections/basicPreprocessing.Rmd) + Data types + Data Cleaning (duplicates, imbalance, noise) + Data Discretisation + Data Normalisation 4 [Exploratory Data Analysis](./sections/exploratoryDataAnalysis.Rmd) + Visualization 5 [Descriptive Statistics](./sections/descriptiveStatistics.Rmd) 6 [Basic Model Building](./sections/basicModelBuilding.Rmd) (Machine Learning Techniques) + Supervised + Regression and classification + Rules and Decision Trees + Nearest Neighbours (Lazy approaches) + Neural Networks + Probabilistic Classifiers + Unsupervised + Clustering + Association rules + Other approaches + Weak Classification, Semi-supervised learning 7 [Evaluation](./sections/evaluation.Rmd) - Descriptive statistics - Evaluation measures in machine learning - Graphical evaluation techniques (ROC and other visual evaluation techniques) - [Evaluation in Software Engineering](./sections/evaluationInSoftEng.Rmd) 8 [Advanced Model Building](./sections/advancedModelBuilding.Rmd) (Advanced algorithms) - Metalearners - Hybrid approaches 9 [Advanced Preprocessing Techniques](./sections/advancedPreprocessingTechniques.Rmd) + Noise + Feature Selection and Instance Selection + Imbalance + Missing values (Imputation methods) 10 [Classical Hypothesis Testing](./sections/classicalHypothesisTesting) + p-values + Equivalence Hypothesis Testing 11 [Time Series](./sections/timeSeries.Rmd) 12 [Social Network Analysis](./sections/SNAinSE.Rmd) 13 [Dealing with Large Volumes of Data](./sections/bigData.Rmd) + Apache Spark Introduction Appendix A - [Introduction to R](./sections/rIntro.Rmd) *** # Acknowledgements The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 324356 *** <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.