A more complete collection of data journalism tutorials and a "how-to-learn" guide.
You mentioned a couple times that you are interested in trying to learn some data journalism stuff. You may have meant a gentle swath of Excel tutorials or a few light OPRAs. Too bad. You are getting ALL THE DATA, by which I mean all of my data journalism bookmarks and other things regurgitated from the deep corners of my brain. You can open the file by saving it in a folder and double-clicking -- it should open in a browser, allowing you to see all of the links. Not the best formatted, but that's just the way stupid bookmarks exports work.
When it comes to learning data journalism, some people (like me) prefer to peruse through tutorials or read through blogs in their own time, and others (like, say, Steve) really need a project or an inspiration to get started on learning something. So you may find yourself putting this in your saved folder and just using it when the story comes up. That's okay. It's not like my learning has proceeded in a completely linear format, either. But fyi, a couple of these tutorials specifically start with an example dataset you could use or is similar to something you'd use.
These are some places that have multiple tutorials, meta-collections, and/or have big, broad scope:
- My Excel/Datawrapper tutorial -- The aforementioned Excel/Datawrapper tutorial.
- Steve's data j repo -- Steve's repo of tutorials, mostly collected from NICAR sessions.
- NICAR 2018 -- Chrys Wu's collection of NICAR presentations, she does this each year and this is the latest year. Honestly, with both of these I'd just skim through and see what looks interesting and can be knocked off in a couple hours.
- https://github.com/dannguyen/journalism-syllabi -- A sampling of data journalism course syllabi from a very prominent data journalism professor.
- How to set up your computer like a data journalist -- A very helpful introduction to setting up your computer with packages and tools that we use, a lot. Requires admin access.
- Learning interactive journalism, a one-semester syllabus - a guide to follow to figure out what you should learn and in what order if you want a general introduction to the field. It's far from the only free online course out there -- honestly, except for a few instances where you're having a hard time grasping something, you shouldn't need to pay up to learn data journalism. I never have, although I've heard that a structured coding course can help some people if they're stuck on programming.
- Factfinder -- Not a tutorial, but your no. 1 source of data from the Census, which is the biggest, baddest dataset in the whole darn town.
Greatest hits:
These are the tutorials and tools I find myself going back to a lot or found extremely helpful. Some of these are pretty advanced, but I'm including them in case you decide you need them for the future, not necessarily because I expect a newbie to pick them up.
First, the essential tools:
- Datawrapper, Plotly, Flourish, Carto: The out-of-the-box tools we use to create basic charts.
- Visual Vocabulary -- A helper guide to decide what chart to use. This is also helpful: http://chartmaker.visualisingdata.com/
- Tabula: A so-easy-your-grandma-could-do-it way to turn pdfs into Excel spreadsheets.
- Csvkit -- A command-line that lets you process a dataset too big for Excel. Yeah, it happens sometimes. Pretty easy to learn once you've grasped the command line. Another, prettier tutorial from NICAR here: https://github.com/utdata/csvkit-nicar2018/blob/master/README.md
- QGIS: A free mapping application that allows for big processing of data materials. My fav tutorial ishere.
- http://mjwebster.github.io/DataJ/ - Mary Jo Webster, a professor at U Minnesota, has the best Excel tutorials I've found on the interwebs. This page also has its own collection of data journalism links.
- http://lenagroeger.s3.amazonaws.com/cuny-fall15/index.html -- Lena Groeger, another rather famous professor and ProPublica's top designer, has the complete course materials for her design for data journalism course here. Keep in mind that as a student you'd likely get Adobe for free or at a significant discount -- it's worth checking out. If you also have the chance to learn Photoshop/basic editing skills, that's another marketable skill to add to the resume.
- Also by Lena: Making data gifs
- The front-end checklist for designing websites
- SRCCON: Creating a style guide
- The complete, free Python data science handbook available online.
- First Python Notebook: an introduction to Jupyter, Pandas and Python data analysis using data from the California Civic Data Coalition.
- https://tswicegood.github.io/python-data-science-intro/ - Hardly the only introduction to Python for data science, but I found it very accessible for journalists specifically. It walks you through installing packages all the way to analyzing a real dataset of car accidents in New Jersey. Considering this is 1.7 million rows, Excel is useless for this kind of dataset.
- Carla Astudillo also has a great tutorial: https://github.com/CarlaAstudillo/pandas-nicar-2016/tree/6dfc8009e72ed866ba2cd19af66a9f2b515f8f72
- Visualizing data in Python
- Data Wrangling in Agate
- Your first news app with Flask
- Scraping the web with Beautiful Soup https://www.dataquest.io/blog/web-scraping-tutorial-python/?utm_content=bufferc60f6&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
- Fundamentals of data visualization, a complete, free online book
- D3 MIT visualization course -- I would put this in the very advanced category, but when you're ready to learn D3, this is one of the best I've found -- especially for D3 version 4. If you want to scream in frustration over a block of code, learn D3. If you want to create incredible data visualizations that are impossible to do with out-of-the-box tools, learn D3.
- Another popular D3 tutorial: Aligned Left
- Intro to JS tutorial from NICAR: https://github.com/scottpham/JS2WorkshopNICAR2016
- An outline of mapping tools and tech
- Intro to Leaflet
- Math for beginning reporting
- Scraping without programming
- https://docs.google.com/document/d/1D82CY83sP42ik-f8GqCRQ6-E5Th3n-fcMmw0jkICA9Q/edit#- A basic introduction to SQL, another language used to deal with datasets too big or too complicated for Excel. If you want to use it without setting up MySQL or another database software, use data.world.
- Hitchhiker's Guide to data science, machine learning, R, and Python
- Getting started on machine learning for reporting
One of the most important things to learn when learning data journalism is how to lean on others. Like me, sure, but even I don't know everything. Here are some of the best places to go to learn about the field, network, and see examples of good work:
- NICAR: I would highly, highly recommend joining the Investigative Reporters and Editors group if you haven't already. It's only $25 a year for students and comes with access to audio from conferences, video tutorials, clean datasets, and a discount to the NICAR conference when it rolls around.
- The NICAR-L listserv frequently has people chime in with questions and answers, talk about their projects, etc.
- Similarly, the News-Nerdery Slack has a ton of discussion on coding, data viz examples, and has a ladies' only channel. There's also a Muckrock Slack that has FOIA advice, a Document Cloud slack, the Lonely Coder's Club...lots of options.
- Github: Get it, follow some people, look at some projects. There's a reason so many of the tutorials above use Github -- it's a hub that makes it easy to publish your code.
- Data.world kind of aims to be the Github of data. You can publish and analyze your own datasets and find examples from other people.
- Stack Exchange/Stack Overflow -- not data journalism-specific, but has many, many answers to coding questions. Remember, if you've been stuck on a problem, other people probably have too.
- https://twitter.com/EPetenko/lists/data-investigations -- my list of data, viz, investigative and other helpful outlets and journalists. For weekly roundups of the best projects, there's Rachel Schallom's Best in Visual Storytelling, Jeremy Singer-Vine's Data Is Plural, Ren LaForme's Poynter Try This, Sophie Warnes' Fair Warning (which is British but w/e), Peter Yeung's 1801, and OpenNews' Source.