A parser which indexes unstructured collections of data representing William Branham's complete sermon library and structures them for loading into a data ingester.
This project is part of a three-part system which collectively stores, indexes, and then outputs a collection of sermons as JSON files:
- Original Sources: Sermon metadata from various online data sources
- Indexer: This project, takes the information from the above source and processes it as output into the next source
- Golden Dataset: JSON files which are generated by the indexer and manually uploaded to the repository
To run this project and generate the intended output, you will need the latest version of NodeJS. Once that is installed, run these commands:
git clone https://github.com/branham-player/indexer.git
git clone https://github.com/branham-player/original-sources.git
cd indexer
npm install
npm run all
The result of a successful execution is the presence of three new files in the root folder:
full.json
: A complete dataset which contains all of the information the program could gather from the original sourcescondensed.json
: A shortened version of thefull.json
file which contains the most essential pieces of information for the everyday usermonths.json
: Counts the number of years and months which are present infull.json