tetherless-world/mowgli-etl

Source: diffbot

Opened this issue · 2 comments

Mentioned in Stanford KG seminar series. Mike Tang said their knowledge base is available for academic use.

Minor: investigate this. If it looks promising, email Mike and cc: Deborah.

diffbot API returns structure (products, news articles, etc.) for a given page URL, which means we would have to find the page URLs ourselves. We could probably get product URLs from WebDataCommons. The diffbot product API does return dimensions as part of "normalizedSpecs".

Good student project.