This is the summary documentation of my course study of Data Structures at Parsons' Data Visualization MS porgram in Fall 2021. Professor Hill guided us through the entire workflow of data processing in the backend environnment.
- Extracting existing content from web
- Data cleaning
- Fetching data via API
- SQL and NoSQL database schema design
- Setting up SQL(RDS) and NoSQL(DynamoDB) databases in AWS
- Writing data into SQL and NoSQL databases
- Sending query requests using
express
to databases - Surfacing query request output and packaging web content using templates
Week 1 - Use Node.js got
method to extract html body content of 10 Manhattan AA meeting links and save them into text files. Create a loop to loop through the urls and save the text files accordingly.
Week 2 - Use ceerio
to extract and save content. Observe the HTML element structure in one of the AA text files, use appropriate selectors to extract address content. Use various string methods to clean up content and save to a JSON file.
Week 3 - Use API to extract additional data using existing data as query component. Write data into a composite array structure and save to a JSON file.
Week 4 - Part 1: Learn and design a SQL database schema. Part 2: Create PostgreSQL database and write data into the database.
Week 5 - Part 1: Design a NoSQL data model that converts this running list of fashion investments into a more reader-friendly format with the ability to search, filter and aggregate. Part 2: Refine data schema design, create a NoSQL DynamoDB database and write data into the database.
Week 6 - Query data in PostgreSQL and NoSQL databases.
Week 7 - Complete parsing and cleaning of 10 Manhattan AA meeting data and write into SQL database.
Week 8 - Create sketches (wireframes) for the visualizations for AA meeting finder and process blog.
Week 10 - Finalize the design of the visualizations for AA meeting finder and process blog.
Week 11 - Create an application that allows querying and returning data for AA meeting and process blog.
For detailed documentation for each week's assignment, refer to the individual README.md files.