rladies/starter-kit

Convert Current-Chapters.md to a CSV

ledell opened this issue ยท 13 comments

This will make it easier to use this data in other applications (e.g. rladies.org website). The best way to do it would be to write a quick script to parse the markdown file and turn it into a CSV. A template for what the CSV should look like is here: https://github.com/rladies/starter-kit/blob/master/Current-Chapters.csv

I got started on this issue. I transformed the data into a data frame format with Country as the first column and everything else in the second column. Next, need to break this column down into State, City, etc.

library(readr)
library(dplyr)
library(tidytext)
library(purrr)
url <- "https://raw.githubusercontent.com/rladies/starter-kit/master/Current-Chapters.md"
dat <- data_frame(readr::read_lines(url))
colnames(dat) <- "lines"
 
 dat %>% 
   unnest_tokens(word, lines, token="regex", pattern="\\s+#{2}+\\s") %>% 
   filter(!grepl("ctrl\\+f|\\saccount\\s", word)) %>% 
   map_df(stringr::str_replace, pattern = "!\\[\\]\\(.*?\\)\n\n", "") %>%
   tidyr::separate(col=word, into= c("Country", "Else"), sep="\n\n", extra="merge")

@stephaniehicks let me help!

@ledell what do you think about adding a column status? The badge thing of the .md is a good idea, this way we don't loose it.

Thanks @chucheria! I haven't had time to work on it and next week is really busy for me. If you could finish, that would be awesome

Looking good so far! When the transition complete, I'd recommend that we remove the text from the .md file so we don't have two versions which can get out of sync. If we want to leave it there for a while in case people have linked to the file elsewhere, we could just edit the file with an update that says: "Chapter data has been moved to Current-Chatpers.csv". Or we can delete it... either way.

@ledell I think the redirect is better, so people can find the new file (including us! I don't know where I've linked that file ๐Ÿ˜…)

The file is complete, @gdequeiroz and @lauracion can start using it from now on.

Great! Thank you all. I noticed that the md file was updated 2 days ago and the csv was updated 4 days ago so we will probably need to rerun the script.

image

@chucheria Can you manually add the two changes that @gdequeiroz added in the past few days? There are two commits that need to be added to the CSV. Then we should remove all data from Current-Chapters.md and replace it with a redirect so no one edits it anymore. Thanks!

I can also do it... I just didn't want to mess with your process.

I will wait until a new "go for the .csv" to onboard a few requests of new chapters that came in these past days. Thank you!

I commited the changes plus the Sydney Github (which Danielle gave me for the social media page). I think both files have the same info now ๐Ÿ˜Š

Thank u, Bea! We should now start using the .csv, correct?

The PR is not merged yet but yes!

Closed #34