/data-journalism

Data journalism and easy to replicate notebooks using Python, R, and Web visualisations

Primary LanguageHTMLOtherNOASSERTION

Data Journalism

If you are a Data Journalist looking to improve your coding skills, or you work as a developer giving support in a newsroom, you arrived to the right place.

This is a repository of articles and tutorials, as IPython/Jupyter notebooks or web products, about doing data journalism. The articles presented here, apart from analysing data to present some facts about the current, past, and sometimes future world situation, will show programming instructions explaining how to repeat the analysis by yourself. We live in a world where governments and the media, more often than not, serve the interests of a few. Our belief is that to empower people to do their own analysis and arrive to conclusions based on facts (data), is a way to make us all more aware and strong as a society.

The programming instructions will be given as web Notebooks for the programming language or technology used (e.g. Python, R) or sometimes as a web code that yoy can inspect on the repo. This is an ideal way of sharing code combined with textual explanations, charts, images, etc. However, we will tend to favour the Pyhton language. But why Python? Well, we will sometimes use other technologies (mainly R but also JavaScript or Spark) but we think that Python has some characteristics that makes it a good environment for Data Journalism:

  • It is a modern programming language, very clean and expressive, that promotes simplicity and elegance.
  • It can be used to write scripts (as we will use it most of the time) and also to build complex software systems.
  • There are lots of extensions (i.e. libraries) to perform all sorts of tasks, not just data analysis and visualisation ones, but also web scraping, web development, natural language processing, etc.
  • You can share your code as notebooks!

So our hope is that while finding our articles analysis and conclusions interesting, you will also learn how to repeat and extend them yourself and arrive to aditional conclusions.

About me

My name is Jose A. Dianes and I am a data analyst and developer. During years I have been involved in all sorts of software projects including real-time systems, web enterprise systems, and bioinformatics. Eventually I arrived to data analysis and products, where I solve scalability problems and deliver producst that provide actionable knowledge.

You can contact me easily at my personal website.

Articles

Where we analyse the situation of infectious tuberculosis from 1990 to 2007 using WHO datasets.

Where we use Bokeh to represent the same dataset with a simple heatmap and look for visual clues.

Where we show how to use a RESTful like API to get JSON data using Requests, save JSON data into a file, doing Exploratory Data Analysis using Pandas, and generating a data visualisation using Seaborn and mpld3. All this in order to explore Wine.com catalog and get an impression of what the wine market looks like for them.

Contributing

Contributions are welcome! For bug reports or requests please submit an issue.

Contact

Feel free to contact me to discuss any issues, questions, or comments.

License

This repository contains a variety of content; some developed by Jose A. Dianes, and some from third-parties. The third-party content is distributed under the license provided by those parties.

The content developed by Jose A. Dianes is distributed under the following license:

Copyright 2016 Jose A Dianes

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.