/csv_reports_tool

A tool to make CSV reports more readable and sortable.

Primary LanguagePython

CSV Reports Expansion Tool

Purpose

This is a script that is meant to make the CSV reports generated by ArchivesSpace more readable in some scenarios.

The CSV reports generated by ArchivesSpace in most cases show linked records as a list of JSON objects. This looks something like the following.

[{"name":"Halpern, Elizabeth","function":"creator","role":null},{"name":"Halpern, Elizabeth","function":"subject","role":null}]

This tool creates a separate row for each JSON object in the list and makes a new column for each key in the JSON object. This should allow for increased sorting ability.

Example

The following is an example of how this would look for one row of report data.

Before

accession_number record_title deaccessions names
*2010M-1 Annotation by A. L. (Alfred Leslie) Rowse on clipping [{"description":"It's gone, man.","notification_sent":"Yes","date":"2018-04-06","extent":"0.00 "}] [{"name":"Halpern, Elizabeth","function":"creator","role":null},{"name":"Halpern, Elizabeth","function":"subject","role":null}]

After

accession_number record_title deaccessions_date deaccessions_notification_sent deaccessions_extent deaccessions_description names_name names_function names_role
*2010M-1 Annotation by A. L. (Alfred Leslie) Rowse on clipping 4/6/2018 Yes 0 It's gone, man. Halpern, Elizabeth creator
*2010M-1 Annotation by A. L. (Alfred Leslie) Rowse on clipping 4/6/2018 Yes 0 It's gone, man. Halpern, Elizabeth subject

Choosing Which Columns to Expand

By default, all columns with JSON data will be expanded. However, you can specify columns not to expand through the option -x or --skip-column. The need for this is explained below.

When there are multiple types of linked records on a row, this tool will generate the cross product (i.e. every possible combination of data). This means that you will have a lot of duplicate data if you use this tool for a report that has many linked records of different types and choose to expand all of them. For instance, if you had an accession that is linked to 2 deaccessions, 4 subjects, and 3 names, you would end up with 2x4x3=24 rows in your expanded CSV report. Its easy to see how this could get exessive quickly.

This is why this tool includes an option to specify columns not to expand. If, in the example above, you only really needed to be able to sort by the data for the linked subjects, you could choose not to expand the deaccessions and names columns so that you only ended up with 4 columns for that accession in your report.

How to Run

With python installed, run python expand.py path/to/my_report.csv.

This will generate path/to/my_report_expanded.csv.

A setup file is also provided that can be used to generate an exe with py2exe with python setup.py py2exe.

This exe can be run like expand path/to/my_report.csv.