This repo serves as a template for SDS 237 Group Project. It contains template files and rubrics for the group contract, project checkpoints, and final project.
In May 2009, Data.gov - a web portal for accessing US government datasets - was launched by then federal Chief Information Officer Vivek Kundra. Following this, in December 2009, then US President Barack Obama signed the Open Government Data Directive, requiring that all federal agencies post at least 3 high value datasets on data.gov within 45 day. A few years later in May 2013, Pres. Obama signed an Executive Order to: "Mak[e] Open and Machine Readable the New Default for Government Information."
The Order required that the US Office of Management and Budgeting, in collaboration with the CIO and CTO, put out and oversee an Open Data Policy. This policy required the following:
- Data needs to be published in machine-readable formats
- Data needs to be licensed openly
- Data needs to be described with metadata.
For this assignment, I would like you to imagine that the Data.gov Program Management Office in US General Services Administration Technology Transformation Services, the Office of Government and Information Services (OGIS), and the Office of Management and Budget (OMB) are jointly championing a new initiative to improve the accessibility of open government metadata. Administrative metadata describing how datasets are managed or stewarded gets published with most open government datasets. Similarly, descriptive metadata gets published in the form of data dictionaries that provide official definitions of observations and variables present in the dataset. However, user-friendly descriptions of how the dataset was produced, how standard definitions were chosen, how categories were divided, how measurements get taken, what assumptions and judgments are built into the data, why certain information may be missing, and where and how the data gets referenced is often harder to come by and rarely published in a succinct format accessible to diverse users.
To help advance this initiative, imagine the offices will be contracting with a number of teams of consultants to develop a series of dataset user guides. Unlike many existing metadata formats, dataset user guides will be narrative documents. In the process of advancing a similar municipal level initiative, the NYC Open Data Team (with Julia Marden, Tiny Panther Consulting, Sharon Lintz and Jo Polanco) defined dataset user guides as describing "the content of a dataset, how it was created, the agency who maintains it, and how users can begin to use the data."
This semester your project group will develop a user guide for a federal dataset. In class labs each Thursday, you will work through exercises that help apply course concepts from that week to the study of this dataset. While these labs will include both qualitative and computational exercises, the point of this project is not to perform a statistical analysis on this dataset. Instead, you will study the dataset ethnographically and strive to effectively communicate how the dataset came to be, how it has disseminated, how it can be used, and what some of its limitations are. To frame your writing, you can imagine that the audience for this user guide is an advanced statistical and data science student that will be engaging with this dataset for the first time. We will talk more about data documentation later weeks in this course.
- Apply the course concepts in studying the provenance of a dataset
- Develop skill in analyzing social forces and systems
- Produce effective and descriptive data documentation that puts data in context
- Develop skill in authoring Markdown documents and GitHub's version control features
- Plan and execute a substantive writing revision, while evaluating feedback
Question | Response |
---|---|
How effectively does this user guide demonstrate understanding of course concepts? | |
How effectively do the authors of this user guide analyze social systems and forces impacting the dataset? | |
How effectively do the narratives and visuals presented in this user guide contextualize the dataset for a technical audience? | |
In which areas of the user guide could the authors have offered additional details, thicker description, or a more nuanced interpretation? | |
How effective is the explication of concepts and organization of content in this user guide? |
Please see license file for details.
Contact lpoirier@smith.edu