Discussion about this alternative crawler
Closed this issue · 0 comments
I have created a proof of concept crawler to explore a better architecture for neume. Currently it only crawls sound-protocol.
Note: This crawler is not perfect and cuts corner but I hope we can incorporate some of these things into neume.
What are the changes?
Use an imperative programming paradigm
This means directly calling functions like callTokenuri
and instead of posting extraction-worker messages we now use async/await
to send messages. We are still using extraction-worker and getting the same benefits such as rate limiting and timeout.
We won't have to deal with worker threads, memory leaks in lifecycle and onboarding developers will be easier. The existing architecture has been proven a little difficult to explain.
We still have concurrency by using p-map.
Use sqlite as the database
Flat-files fail us when we need to find data or update it. A database of some kind is necessary to neume. For this POC I have used sqlite as a key-value database where the key is an ID and the value is JSON data. In case, we need to share our data using IPFS we can because essentially it is just JSON.
Miscellaneous
- We don't have to read from disk every time
- It is easier to adapt this architecture to to parallelise each step since things are in memory or it is easier to load from sqlite. neume-network/strategies#244