- With a single file output for use as histogram input:
python parser/parse.py -j data_all/ -o dumps/
- For multi-file output:
python parser/parse.py -j data_all/ -o dumps/ -m
Histograms triggered with -c flag:
- Run all available fields with defaults set:
python analysis/hist.py -c
- Specific field histogram with max y-axis = 750 and bucket size of 20:
python analysis/hist.py -f physicalDescription -m 750 -b 20 -c
- To generate a full scale insetted subplot for the title field
python analysis/hist.py -f title -m 250 -b 10 -i dumps/output.csv -s -c
Baar chart triggered with -x flag:
- Generate a chart showing field population with
python analysis/hist.py -x
- Refactor
- Better handling of extract objects and csv writing loop. Use
extract_objects(args)
again
- Currently deals with json_input only, amend to accept API calls as well as files
- Uses os.walk to process files in directory - amend to allow single file processing
- Want to amend output option to single or multiple files
- currently only working with the flat data - need to prepare the nested data extraction
- Make sure the process overwrites output file on the initial run
- Need to sort header generation for single file parser - hard coded currently
- Refactor
- implement argparse
- specify input file
- allow override of bin size
- allow override of max axis
- allow to process single field type
- Pandas to deal with multiple files. Currently have a section of parser that outputs individual named processed files, this is commented out as I haven't fathomed how to get pandas to process multiple inputs