/Data-Science-Social-Justice

Materials for D-Lab / UC Berkeley Graduate Division's Data Science for Social Justice summer workshop. These materials provide an introduction to Python, natural language processing, text analysis, word embeddings, and network analysis. They also include discussions on critical approaches to data science to promote social justice.

Primary LanguageJupyter Notebook

Data Science for Social Justice Workshop

DataHub Binder License: CC BY 4.0

Workshop Goals

This 8-week workshop will give you the opportunity to learn the essential tools and methods for data science analysis and be introduced to critical frameworks that will enable you to create a project of your own design and to tell stories that can counter the market-first mentality of data science. This workshop has a heavy emphasis on collaboration and peer-to-peer learning, with a significant group work component.

The course has three basic components. In the computational live sessions, we engage in research in Python to analyze a language dataset using computational tools. In the discussion live sessions, we will engage with the literature on fairness, accountability, and transparency in data science and machine learning, taking a more critical lens on how the tools outputted by these fields impact people and society. Lastly, in the research talks, we will hear from leading experts in the field about how they think on these topics.

You are not expected to have prior programming knowledge, and you are not evaluated on your coding skills. However, this course does make heavy use of Python, particularly through Jupyter Notebooks, so you will have the chance to work on your coding skills and your understanding of some techniques and approaches in natural language processing (NLP).

Installation Instructions

We will use the Python programming language in this workshop. To interact with Python, we will use Jupyter Notebooks, which provide an effective way to conduct data analysis and create visualizations in Python.

There are two ways to get the materials in this workshop working on your computer:

  1. The first is to run the materials directly on your personal machine by installing Python and Jupyter on your computer. This is often referred to as a local installation or running Python locally.
  2. The second is the run the materials on the cloud via a service called DataHub. In this option, you do not need to do any installation. You simply have to log in to DataHub using a specific link (accessible via the DataHub buttons in this README), and you will automatically be redirected to a Jupyter notebook that runs in the cloud.

Local Installation

In a local installation, both Python, Jupyter, and other necessary packages need to be installed on your computer. Anaconda is software that is the standard approach used to carry out all these installations. Installing Anaconda is the easiest way to make sure you have all the necessary software to run the materials for this workshop. Complete the following steps:

  1. Download and install Anaconda. Click "Download" and choose the correct option for your operating system.

  2. Locate the downloaded file and install it. Follow the steps, using the default options.

  3. Next, you need to download the materials in this repository to your computer.

  • Click the green "Code" button in the top right of the repository information.
  • Click "Download Zip".
  • Extract this file to a folder on your computer that makes sense.
  1. A program called "Anaconda Navigator" should be installed on your computer. Start this program. Then, under "JupyterLab", click the "Launch" button.

  2. On the left hand side, you should use the file navigator to locate the workshop materials you downloaded.

Running Python on DataHub

You can also use the UC Berkeley Datahub to run the materials for these lessons. You can access the DataHub by clicking the following button:

DataHub

The DataHub downloads this repository, along with any necessary packages, and allows you to run the materials in an Jupyter instance on UC Berkeley's servers. No installation is necessary from your end - you only need an internet browser and a CalNet ID to log in. By using the DataHub, you can save your work and come back to it at any time. When you want to return to your saved work, just go straight to the D-Lab DataHub, sign in, and you click on the Data-Science-Social-Justice folder.

If you don't have a Berkeley CalNet ID, you can still run these lessons in the cloud, by clicking this button:

Binder

By using this button, however, you cannot save your work.

Run the Code

Now that you have all the required software and materials, you need to run the code:

  1. Open the Anaconda Navigator application. You should see the green snake logo appear on your screen. Note that this can take a few minutes to load up the first time.

  2. Click the "Launch" button under "Jupyter Lab" and navigate through your file system to the Data-Science-Social-Justice folder you downloaded above.

  3. Navigate to lessons -> week-1-2.

  4. Open the 01_Jupyter_and_Python.ipynb notebook to begin.

  5. Press Shift + Enter (or Ctrl + Enter) to run a cell.

About the UC Berkeley D-Lab

D-Lab works with Berkeley faculty, research staff, and students to advance data-intensive social science and humanities research. Our goal at D-Lab is to provide practical training, staff support, resources, and space to enable you to use R for your own research applications. Our services cater to all skill levels and no programming, statistical, or computer science backgrounds are necessary. We offer these services in the form of workshops, one-to-one consulting, and working groups that cover a variety of research topics, digital tools, and programming languages.

Visit the D-Lab homepage to learn more about us. You can view our calendar for upcoming events, learn about how to utilize our consulting and data services, and check out upcoming workshops.