/yaml-dredger

Have you ever had 1000+ YAML files and needed to know the common data structure?

Primary LanguageJavaScriptGNU General Public License v3.0GPL-3.0

Analyse YAML document structure

This utility can analyse the data structure of a large set of YAML files. It will provide a list of the entities, their attributes and relation to other entities.

Start by running:

npm install

You can run program using:

node yaml-dredger.js <filename or path> [lists...]

For example, process all files in the testdata directory. ['innings', 'deliveries'] are entity lists in the files.

node yaml-dredger.js ./testdata innings deliveries

My test data is cricket match data from https://cricsheet.org/format/

meta:
  data_version: 0.9
  created: 2013-02-22
  revision: 1
info:
  city: Southampton
  dates:
    - 2005-06-13
  gender: male
  match_type: T20
  overs: 20
  player_of_match:
    - KP Pietersen
  teams:
    - England
    - Australia
  umpires:
    - NJ Llong
    - JW Lloyds
  venue: The Rose Bowl
innings:
  - 1st innings:
      team: England
      deliveries:
        - 0.1:
            batsman: ME Trescothick
            bowler: B Lee
            non_striker: GO Jones
            runs:
              batsman: 0
              extras: 0
              total: 0
        - 0.2:
            batsman: ME Trescothick
            bowler: B Lee
            non_striker: GO Jones
            runs:
              batsman: 1
              extras: 0
              total: 1
        ...

Run this program on your directory of data and get a read of the entities, attributes and their relations. An Entity is marked with a star *, you get the data type and number of occurrences.

/usr/bin/node /home/neil/WebstormProjects/cleandata/app.js
 meta * 1
-- data_version (number) 3088
-- created (date) 3088
-- revision (number) 3088
 info * 1
-- city (string) 2591
-- dates * 3088
---- value (variant) 3088
-- gender (string) 3088
-- match_type (string) 3088
-- overs (number) 2656
-- player_of_match * 2849
---- value (string) 2849
-- teams * 3088
---- value (string) 3088
-- umpires * 3083
---- value (string) 3083
-- venue (string) 3088
 innings * 1
-- value (string) 3088
-- team (string) 3088
-- deliveries * 3088
---- value (string) 3088
---- batsman (string) 3088
---- bowler (string) 3088
---- non_striker (string) 3088
---- runs * 3088
------ batsman (number) 3088
------ extras (number) 3088
------ total (number) 3088
...