Nice dataset

Question

Nice dataset

Opened this issue 4 years ago · 17 comments

Just wanted to say thanks for this dataset. I doodled with it last few days and made this tool:
https://projects.stroep.nl/hicetnunc/

There is no issue so you can close after you've seen the message.

Answer 1 · 2021-04-11T12:36:47.000Z

It would be great if the dataset would be updated frequently, what would be needed for that?

Answer 2 · 2021-04-11T19:09:38.000Z

Yes thank you for setting this up @pallada-92! I'm happy to donate a server if you'd like to get this running on a cron job and/or serve it up live. This is essential infrastructure for the hicetnunc community, IMO.

Answer 3 · 2021-04-11T19:48:01.000Z

@glitch003 thank you, actually the current script is able to run in docker on server (with minimum 4Gb RAM, and 300Gb disk), but sometimes it requires manual modification of the code if something unexpected happens (like using new swap service like https://quipuswap.com/), so ssh connection is needed for such cases at this point.

Current bottleneck is downloading artworks from IPFS. When I run script regularly, it looks like https://cloudflare-ipfs.com/ and https://ipfs.io/ slowing down download speed or just dropping some requests, so it may take up to the whole day to download added works. I also restart download script manually when it hangs for a long time.

Probably it will be a good idea to split dataset into 3 independent parts: for core data, for thumbnails and for heuristics about external swaps, so that the first 2 parts will be able to run on the server reliably in real time.

Answer 4 · 2021-04-12T15:53:51.000Z

During last day and night I've downloaded new data and updated

Latest transaction is 2021-04-12 10:00:00 UTC (6 hours ago).
Will we working towards keeping it up to date with more regular updates.

Answer 5 · 2021-04-14T21:36:32.000Z

Dataset, thumbs and docker image were updated.

Answer 6 · 2021-04-19T00:24:48.000Z

Dataset, thumbs and docker image were updated.

Answer 7 · 2021-04-30T08:19:38.000Z

Would be nice if you could update again. Or maybe automate this somehow.

Answer 8 · 2021-05-05T19:56:36.000Z

Any news on this?

Answer 9 · 2021-05-06T15:22:40.000Z

@pallada-92
Isn't the IPFS issue solvable by storing the failed downloads and check/rerun the script on that list at the end?
if @glitch003 can provide the server I'm sure we can find a way to make this work!

Answer 10 · 2021-05-07T23:32:32.000Z

@markknol I am updating dataset now, current step (thumbnails generation) is estimated to finish in 28 hours, and there may be further issues.

Isn't the IPFS issue solvable by storing the failed downloads and check/rerun the script on that list at the end?

@melMass This is exactly how the current mechanism was implemented. Actually, IPFS download issues have gone for some reason, IPFS is not a bottleneck now.

The most despairing part currently is making small fixes in code to handle new edge cases or partially valid data. There are lots of assertions in the code, every several days new exceptions arises. It is hard to predict, how much time and efforts it will take to investigate, what's happened.

For example, I've spent two days of my vacation trying to figure out what happened between these two transactions:

After first transaction there remain 195 tokens with objkt id 14143 in swap 14279, but second collect call succeeded taking 196 remaining tokens. I still have no idea, what's going on, I've decided to remove assertion and allow negative remaining swap count, but this is unsafe approach.

Keeping this dataset up to date regularly seemed easy initially, but these unpredictable issues with failing assertions are terribly frustrating.

Another paradoxical fact is that I still haven't minted anything and even have no tezos wallet)

So I am giving up on this project, hope to finish last dataset update soon.

Answer 11 · 2021-05-09T16:47:58.000Z

Hi there! I second the opinion that this is an extremely useful tool for the Hicetnunc community. I am very rusty with my coding, and although I'm trying to understand the code to lend a hand and help improve, I don't know how useful will I be. So perhaps what I can do on my side is help along with some money to compensate for your efforts, @pallada-92 . If you open a Tezos wallet I can send you some xtz if that would help you dedicate some time to this instead of dropping the project. Or perhaps with some xtz you can dedicate some time to explain a bit more how the code works, so I can pick it up or someone else...

I have been running the code for over a day but always end up hitting a point where it will fail and doesn't get past the 1st script. When I re-run it, it starts picking up not where it left, but again since the last cache (I started with the one from April 8th which was in the repository, I hadn't noticed there were some more recent updates elsewhere). I'm trying to see if I can make it get back to work where it last failed.

Thanks in any case for this great work!

Answer 12 · 2021-05-10T20:57:19.000Z

@markknol I've just updated the dataset, some files no longer fit in 100MB GitHub file limit, so the most recent data is available only in "Releases" section, file "dataset.zip":
https://github.com/hashquine/hicetnunc-dataset/releases/tag/210510

I've removed lots of internal checks, so there may be incorrect or missing values that I've overlooked. If you notice any of them, let me know.

Answer 13 · 2021-05-10T21:15:55.000Z

@pallada-92 hey, nevermind my previous comment about the resume of the process... for some reason the 1st time it restarted it didn't continue along (or did but I didn't check the timestamps correctly). I have had to restart/resume it again today and it has continued indeed from where it left. However, since you have just updated the dataset, I will stop it not to overload the network. I will do the next update though in some days, please let me know if you are going to (or I'll check here) not to make you work twice, I can do it next time. Thanks a lot for your work!

Answer 14 · 2021-05-11T08:12:11.000Z

Thank you so much for updating!! On the website I now switched to CSV instead of JSON, that saves some loading time, as indeed its getting too big to manage otherwise. I made custom version of the data where some unused columns are stripped.

Answer 15 · 2021-05-26T10:43:01.000Z

So I am giving up on this project, hope to finish last dataset update soon.

What would be a motivation to update it again? It would be great to have a new dataset again.

Answer 16 · 2021-07-01T18:08:52.000Z

@pallada-92 thank you so much for the dataset! Thats awesome
I've been trying to update it here but ran into a few errors in which im not being able to solve...
Any chance you will update it again? Or if you could help me set up the code so i can update it myself
I'm writing a case study about H=N and your data is helping me A LOT. Thanks again!

Answer 17 · 2021-07-01T23:45:18.000Z

Hi there! I have also run into problems when executing the step after downloading all the transactions. I have been considering altering the code to ignore the errors (measure them) and let the code run. If there are few errors then I can use the result, not otherwise. Have you thought about this? I will post a message here with my advances. Thanks! Marcelo

On Thu, 1 Jul 2021 at 20:09, rafael-fj ***@***.***> wrote: @pallada-92 <https://github.com/pallada-92> thank you so much for the dataset! Thats awesome I've been trying to update it here but ran into a few errors in which im not being able to solve... Any chance you will update it again? Or if you could help me set up the code so i can update it myself I'm writing a case study about H=N and your data is helping me A LOT. Thanks again! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKUZH3EL6QGTOEXBK4BSBLTVSVL7ANCNFSM42USHM4A> .

-- --- Marcelo Soria Rodríguez +34 676917782 ***@***.***