This projects was accomplished to showcase my abilities in cleaning and preparing data using Excel PowerQuery and analyzing using simple descriptive data analsysis features in Excel. The Dirty Data Sample was downloaded from Kaggle.com in the form of a xlsx file published by Shiva Vashishtha. The excel workbook composed of 2 sheets:
Dirty1: The actual dirty data that I'm supposed to work on.Clean1: the shape of the data which the publisher seeks.
- The data looked a little bit messy so I had to highlight some columns to better understand what I see. Here we see that the data takes the shape of a pivot table that shows:
Order IDcolumn.- The first row which will later express the
Segmentcolumn. - The second column which will later express the
Ship modecolumn. - And finally, the values of the pivot table scattered all around, these are so useful and are considered the
Salescolumn.
- It seemed like each order ID had only one value, this value expresses the ship mode as it's in the same column of its name, and also the segment as the value exists in the same highlighted segment.

By inspecting and understanding the data, I loaded the range into Excel PowerQuery; where I will make the data look better and smoother.
- Removed the top row: this ensures that the second row becomes the first for the next step.
- Make first row as column name: in this step the ship mode names became the columns names, and also changed the data type into decimal.
- Rename the ID column: as making the first row as the column name, the order ID column needs to be modified too.
- Merge ship mode columns: now the key step here, is to merge the similar columns together, so I merged all of the segment columns together, all
first Classtogether and so on with the others. - Merge segment columns: you might notice the existance of some columns like
Consumer Total,Corporate Total,Home office Total, these are the key elements of getting the segment column, but wait, we removed them earlier! No worries! Just rename the columns that contain the values of each segment total.Consumer TotaltoConsumerCorporate TotaltoCorporateHome office TotaltoHome office
- Unpivot columns: this step will return the columns to the natural state as needed.
- Unpivot ship mode columns: this will return 2 columns, a column that has all the segments per each order ID and another column with that has the value of sales relates to the ship mode of the order and blanc value of the ship mode not related; this column is the hero of the day
- Unpivot the segment column: this will also return 2 columns same as the previous one, but this time I'm going to remove the value column and only retain the segment.
- Filter the values: the value column which we called the hero of the day contains
blanccells, we'll filter to remove these blancs, this will result a clean visual of the data. - Remove non-necessary value: noticed the presence of grand total values that exist at the bottom of the table, removed them by remove bottom 12 rows.
- changed sales data type into currency and sorted by ascending ship mode
- loaded the clean data into excel worksheet
The results finally are going to look like this:
With the power of PivotTables, I created meany of them to see what the data sample would like to show us.
- _Consumers are the top customers of the shipment service with total 444 orders shown in the sample.
- On the other hand the average price paid by consumers is the least, wich explains their interst in dealing with our service.
- Consumers is also the greatest segment in terms of sales with about half of the values of sales.
- When it comes to the ship mode it seems like the standard class is the most selling service with more than half of the sales, it's also the most preferred to all segments.





