/data_science_with_r

Tidy Tuesday Projects (R)

Primary LanguageR

Data Science with R: Projects

UK Gender Pay Gap

Code

UKGPG

US Drought

Code

drought_con

drought_map

Women's Rugby

Code

stamina w21bgp

Alternative Fuel

Code

alt_energy

Democracy in Crisis

Code

russia

WEB Du Bois Challenge 2022

Recreation: Left | Original: Right Code

web_du_bois_side_by_side

Tuskegee Airmen

Recreation: Left | Original: Right Code

Screenshot 2022-02-19 at 11 30 47

Top Dogs

Code

All 195 doggos shown in reactive Shiny app here, 2 examples below:

Russell Terriers Staffordshire Bull Terriers

The Mighty Cocoa Bean

Code

chocomap

US Bee Colonies: A Changing Picture

Code

bee_plot

Degrees of Statecraft

Code

degrees_of_statecraft

ArficaR Raster

Code

d10_africaraster

Registered Nurses (BLS)

"The Areas of the blue and red outline are measured from the centreas the common vertex. The blue wedges measured from the centre of the circle represent area for area the median salary for nurses in each year (fig 1) and each state in 2021 (fig 2); & the red outline measures the mean. We can see that across years there has been a close relationship between the median and mean for salaries. When we look at a state level we can see the sometimes large discrepancies in salaries although the close relationship between median and mean still hold. Hawaii is an exception to the general rule where median salary is larger than mean, we could expect that some individuals in Hawaii earn a lot less than other, dragging the central tendancy of the mean downwards."Code

Florence Nightingale (Original)

FBFrHeNX0AIUjmc

Reproduction with registered nurse data (BLS)

nightingale

National Centre for Education Statistics (NCES)

This graphic shows the National Centre for Education Statistics' (NCES) data of Bachlor level graduations of white and black students in the United States between 1940 and 2016. Black graduate rates have always lagged behind white rates quite significantly, never reaching parity. This inequality is the result of intergenerational capital (both financial and social) as well as continued social divides causing an unequal network of opportunities. This continued division, unless intervened within, will continue to prevail within the US.Code

hbcu_text_chart

NBER Papers

New research by NBER affiliates, circulated for discussion and comment. The NBER distributes more than 1,200 working papers each year. These papers have not been peer reviewed. Papers issued more than 18 months ago are open access. More recent papers are available without charge. This graphic clearly shows the trends towards Finance, Marcoeconomics and Microeconomics throughout the decades of publication.Code

panel_alluvial_pewter

Emmy Awards

The two events with the most media coverage are the Primetime Emmy Awards and the Daytime Emmy Awards,\nwhich recognize outstanding work in American primetime and daytime entertainment programming, respectively.

This graphic shows those companies which have had at least 10 nominations in the Primetime Emmy Awards main categories for: Outstanding Programs, Writing, Directing, Lead Actors and Actresses, as well as Supporting Actors and Actresses. The Primetime Emmy awards have always been a consistent category so this allows us to represent a lot of data (since 1949) and how companies fed into these categories through nominations" Code

primetimeemmy

F1 Racing

A few teams often dominate F1, but only for a few years at a time. Code

F1

The Australian Bathing Birds Study

Bird baths are a familiar sight in Australian gardens but surprisingly little is known about the precise role they play in the lives of birds.

In a dry continent such as Australia, bird baths may be vital to supporting an otherwise stressed bird population. Cleary et al (2016) wanted to find out more, so they enlisted the help of thousands of citizen scientists across Australia to gather as much data as they could on how birds use bird baths. And so the Bathing Birds Study was born. Started by researchers at Deakin University and Griffith University in 2014, this study involved collecting data online from 2,500 citizen scientists on bathing birds all over Australia in Winter 2014 and Summer 2015 (20 minutes, three times per week for four weeks).

Notable Winter saw more specific birds, the Noisy Miners, Magpies and Rainbow Lorikeets (see below) but Summer saw more overall bird species, Summer diversity is geographically mapped to the left showing the predominant species surveyed in each bioregion in Summer. Code

Aussie_Birds

The Duke Lemur Centre

The Duke Lemur Centre hosts the most diverse population of lemurs on Earth, outside their native Madagascar. Lemurs are the most threatened group of mammals on the planet, and 95% of lemur species are at risk of extinction. By studying variables that affect their health, reproduction, and social dynamics, the center learns how to most effectively focus their conservation efforts. Here we show the median life expectatcy of the largest 6 species. We can show using normalisation of species age at death and their distributions that between species 95% of Lemurs live between 2 months and 28 years (+/- 2 deviations).We can also see that 15% of the 2,872 Lemurs who have died at Duke's die under 6 months old. Code

Move_it

Kenya Census : Internet Access

The 2019 Kenya Population and Housing Census was the eighth to be conducted since 1948 and was conducted from the 24th to 31st August 2019. Kenya leveraged technology to capture cartographic mapping, enumeration and data transmission, making the 2019 Census the first paperless census to be conducted in Kenya. Here we map internet usage across Kenya to show the low overall usage and vast disparities between counties (left) and between the sexes (right). Code

kenya_internet_access

Star Trek : Do Androids Dream of Stars

Using frequency analysis on Star Trek interactions we can analyse the different words Humanoids, Androids and Computers use in Star Trek. Using sentiment analysis we can see the android Data uses far less emotive words across all types of interactions with Computer than Humanoids\nor even Computer does. Using both Computer plots we can see that Computer uses a lot of negative sentiment and time words begging the question if Computer is a glorified clock. Code

Do Androids Dream of Star Trek?

Bureau of Economic Analysis : Reporting on Infrastructure Spend

Infrastructure provides critical support for economic activity, and assessing its role requires reliable measures. This series of graphics provide an overview of U.S. infrastructure data in the National Economic Accounts. Data from the Bureau of Economic Analysis (BEA) shows Investment in some important types of basic infrastructure has barely or not kept up with depreciation in recent decades (fig. a), though some sub-categories look better (fig. b) showing spend in real terms. Code

See here for all 16 graphics:

Conservation Health Public Safety

Public Park Access : Linear Model of Trust for Public Land data

Since 2011, The Trust for Public Land (TPL) has kept track of green space availability across U.S. metros through the ParkScore index, which measures how well cities are meeting their residents’ green-space need based on five metrics: park access, acreage, investment, equity and amenities.

Even though we can measure against each seperate metric there is still no more powerful of a predictor than spend per resident ($) which when plotted on a logarithmic scale can predict 75% of the variance in\ntotal points (Adj. R Squared [mean of n = 2000 samples]) on our training cases and 79% on our test case. Code

public_parks

Paralympics : When Paralympians started competing in the Olympics

Using International Paralympic Committee data on the Olympics since 1980 we can visualise the enterance of difference in when sexes started competing in Olympic sports and their relative count.Code

Paralympic

Olympics : Females Entering the Ring

Using Kaggle data on the Olympics since 1904 we can visualise the enterance of difference in when sexes started competing in Olympic combat sports and their relative count. Code

Olympic

US Drought : gganimation

Using data produced jointly by the National Drought Mitigation Center (NDMC) at the University of Nebraska-Lincoln, the National Oceanic and Atmospheric Administration (NOAA), and the U.S. Department of Agriculture (USDA) to map and animate drought levels in the US by county. Code

us_droughts_simp

Scooby Doo : Monster Motives

Using ScoobyPedia data can we predict a monsters motives using the episodes IMDB score and year of release? Code

Netflix

Netflix : Text Analysis

Using Netflix catelogue data since 2019 to analyse decriptive text to see where words are found together, we can start to see which words are most commonly found together and use this as a predictive feature of which genre or rating the item is: Code

Netflix

SteamCharts : Risk of Rain

The data this week comes from Steam streaming by way of Kaggle and originally came from SteamCharts. The data was scraped and uploaded to Kaggle and we're using it here to display the average and peak concurrent players of Risk of Rain! Code

ROR

July 2021 : London Animal Rescue

The London fire brigade (LFB) was involved in 755 animal incidents in 2020 – more than two a day. The number of rescues rose by 20% compared with 2019 when there were 602, with the biggest rise coming in the number of non-domestic animals rescued, according to the data. Here we analyse that breakdown: Code

animal_rescue

June 2021 : Ask a Manager

Using a linear model to analyse Ask a Manager data from the technology sector in the UK to determine which is the largest explanatory variable toward differences in salaries within the sector: Code

ask_a_manager_panel

June 2021 : SuvivoR

Using multiple linear models to predict what persoanlity traits and what viewership to expect on future seasons of Survivor which has at present been running for 40 seasons and due to release it's next in Fall 2021: Code

final_survivor_panel

June 2021 : Super Mario Kart R

Originally Mario Kart 64 was called Super Mario Kart R (Rendered), missed a trick I say! Using a Decision Tree fitted with recursive partitioning to predict whether sneaky KartRs used a shortcut: Code

mario64_rpart_final_panel

May 2021 : Broadband Speed Availability

Using data from Microsoft via The Verge to create county visualisation of broadband speed access (25/mbs) while also showing the linear model association of this access with poverty rates from SAIPE: Code

Broadband Availability

May 2021 : Water Point Data Exchange (WPDx)

Using Data from the WPDx to visualise Ugandas water sources recorded in the central database: Code

Functional Water Sources

Uganda Panel FVV

May 2021 : CEO Departures (1996-2019)

Using data from Gentry et al. to analyse trends of reasons for departures of CEOs in the S&P1500 over the period 1996 to 2019: Code

S&P1500 CEO Departures 1996-2019

April 2021 : Estbalishing the United States : Using USPS and SlaveVoyages data to visualise the settlement and establishment of the United States

Using Richard Helbrock's data (2021) on USPS office established and discontinued as well as SlaveVoyages ship porting visualise over time: Code

Lives to Letters

April 2021 : Deforestation : Exploring the relative change of forest area from 1993 to 2020

Modelling relative forest area change within countries, contributing to a 0.15Bn net loss: Code

Deforestation

April 2021 : BLM : Proportion of UK Pop and Arrests within repspective ethnicities

Modelling the ONS 2018/19 dataset to show the delta in pop and arrest proportions: Code

Ethnic Arrests

March 2021 : UN Voting Since 1946, Variance of Consensus

Modelling the variation of disagreement and consensus within the UN across issues: Code

How United are the United Nations

March 2021 : Bechdel Test and the Oscars

Taking a look at female representation at the Oscars through scraping Oscar nominations and wins from 1971-2013 and representing what proportion pass the Bechdel Test: Code

Bechdel_at_the_Oscars

March 2021 : Trans-Atlantic Enslavement Routes

Using data anlytics to model Trans-Atlantic enslavement routes using SlaveVoyage.org datasets. Joining Enslavement Routes and Enslaved Names datasets: Code

Interrupted

February 2021 : Proportion of Arrests Among British Ethnic Groups

Using Office of National Statistics (ONS) data anlytics to model and represent arrests in Britain across ethnicities. Joining ethnic population and arrests data per regions: Code

AER11

February 2021 : The Du Bois Challenge (BLM2021)

Challenge and context: here

"One of the most powerful examples of data visualization was made 118 years ago by an all-black team led by W.E.B. Du Bois only 37 years after the end of slavery in the United States. “The Exhibit of American Negroes” was a sociological display at the 1900 Exposition Universelle in Paris and was a collaboration by noted African-American sociologist W. E. B. Du Bois, educator and social leader Booker T Washington, prominent black lawyer Thomas J. Calloway and students from historically black college Atlanta University. Any African American to be admitted to Harvard University in 1888 had to be exceptionally gifted. But that description doesn’t come close to capturing the talent of WEB Du Bois, a man who managed to write 21 books, as well as over 100 essays while being a professor and a relentless civil rights activist.

The goal of the challenge is to celebrate the data visualization legacy of W.E.B DuBois by recreating the visualizations from the 1900 Paris Exposition using modern tools."

Free-Libre Plate

Code

Original Plate to recreate using R

EsclavesCliff_Final

Comparative Increase of White and Colored Population in Georgia

Code

Original Plate to recreate using R

DuBois_Georgia_Final

January 2021 : ONS Income Distribution

Using Office of National Statistics (ONS) income distribution in the UK to show percentiles and expand ONE represented data. See datasets and original data vis, here

Code

Data Vis of ONS Income Distribution (Replication of ONS)

ONS Income Distribution in R

Data Vis of Full Data Set on Log Scale

Full ONS dataset on log scale