postman-pipeline

Multiple approaches to create spark pipelines for data ingestiona and querying

Points to achieve

Your code should follow concept of OOPS
Non-blocking parallel ingestion
Updating products in the table based on sku as the primary key
Count aggregated names of products.
Multiple notebook runs without truncating the created table

The spark-mongo approach is best suited for this task, as data modelling is the concern. Querying and updating documnts on constrainsts is relatively easier in mongo.

ruthmd/postman-pipeline

postman-pipeline

Multiple approaches to create spark pipelines for data ingestiona and querying

Points to achieve