Homework
Find the latest version of each product in every run, and save it as snapshot.
- Product data stored under the data/homework folder.
- Read data/homework/initial_data.json for the first run.
- Read data/homework/cdc_data.json for the nex runs.
- Save results as json, parquet or etc.
- Note: You can use SQL, dataframe or dataset APIs, but type safe implementation is recommended.
Solution
The merged solution is saved to data/homework/snapshot_data folder.
brand | category | color | id | name | price | timestamp |
---|---|---|---|---|---|---|
lcw | shoes | blue | 1 | product1 | 100.0 | 1611141595000 |
adidas | shoes | bone white | 2 | product2_newName | 250.0 | 1611242395000 |
lcw | Mobile Phone | black | 3 | product3 | 10.0 | 1611141595000 |
defacto | hat | blue | 4 | product4_newName | 35.0 | 1611227995000 |
defacto | hat | green | 5 | product5 | 33.0 | 1611141595000 |
lcw | hat | blue | 6 | product6 | 55.0 | 1611141595000 |
dell | Laptop | white | 7 | product7 | 4500.0 | 1611141595000 |
apple | Laptop | white | 8 | product8_1 | 5700.0 | 1611227995000 |
apple | Laptop | black | 9 | product9 | 5000.0 | 1611141595000 |
apple | Mobile Phone | white | 10 | product10 | 1000.0 | 1611141595000 |
nike | shoes | bone white | 12 | product12 | 200.0 | 1611262396000 |