BU2010 Cleaning and importing

Introduction

Overview

This week you get to try out cleaning and importing data, as well as some simple data visualisation.

Advice

Once you are done be sure to commit your changes again (to save them to the repository) and to push them to GitHub (so that I can see the work you have done).

It's best if you Commit early, commit often - that way you can go back if you have made a mistake and I can see all the work you have done. It doesn't matter if your early commits contain mistakes.

Once you are done be sure to commit your changes (that will save them to the repository) and to push them to GitHub (so that I can see the work you have done).

Task Explanation

Below is today's task. We will be working with a range of skills you have learned in recent weeks.

For this task, you need to provide the 'answers' in the r_code.R file you will find in this repository, not in the Markdown file you are currently reading.

Task Explanation and Overview

Before you start working on the actual R code please clean the spreadsheet called heightdata.xlsx that came with this repository.

Your further tasks can be found in the R script file r_code.R that is part of this repository. You have to work on these tasks by yourself. Do not work with others.

Please work on these tasks in RStudio - not on the GitHub website. If you work in RStudio you can make sure your code works as it should. If you don't work in RStudio, but edit the file on the GitHub web interface you will have to copy and paste the code into R for testing - an unnecessary step that can introduce mistakes.

Pay attention to the autocomplete options RStudio is offering you and use them to explore how R commands work. Also, remember how useful the cursor keys and the Tab key can be. Pressing F1 will bring up the documentation for the selected command in the Help tab.

Please don't forget to commit and to push your commit.

Commit early, commit often - that way you can go back if you have made a mistake and I can see all the work you have done. There's no problem if you commit and there's a mistake in your file or if you haven't done all tasks yet.

Once you are done be sure to commit your changes (that will save them to the repository) and to push them to GitHub (so that I can see the work you have done).

Your Tasks

For task 1 you are asked to clean the spreadsheet called heightdata.xlsx. You can do this in Microsoft Excel or in other software than can read the xlsx file format, e.g. LibreOffice or Apache OpenOffice.

The other tasks can be found in the R script file r_code.R and should ideally be done in RStudio.

Task 1

Clean the spreadsheet called heightdata.xlsx that came with this repository. Leave the original spreadsheet untouched and save the cleaned spreadsheet under the name heightdata_cleaned.xlsx.

Task 2

Set the correct Working Directory.

Hint: I showed you how to do this in previous weeks. It can be done with a command or through the GUI (Graphical User Interface).

Task 3

Load the packages you need.

Task 4

Import the cleaned spreadsheet.

Task 5

Produce a histogram based on the height column.