Slight differences in #entities for Interactive data sets
szarnyasg opened this issue · 1 comments
szarnyasg commented
We got a report through email:
We were able to download the recent LDBC graphs (from https://repository.surfsara.nl/datasets/cwi/snb). I checked number of entities and it’s a bit different to the numbers reported in the paper here: https://arxiv.org/pdf/2001.02299.pdf (page 135). Are those numbers up-to-date?
E.g., if you check thewc -l social_network-csv_basic-sf1/static/organisation_0_0.csv
it returns 7 956 compared to 7 996 in the paper.
It seems the #entities are a bit off due to the headers being included (incorrectly) in the numbers reported in the spec.
szarnyasg commented
To regenerate the table, run:
set -eu
curl --silent --fail https://repository.surfsara.nl/datasets/cwi/snb/files/social_network-csv_basic-longdateformatter/social_network-csv_basic-longdateformatter-sf0.1.tar.zst | tar -xv --use-compress-program=unzstd
curl --silent --fail https://repository.surfsara.nl/datasets/cwi/snb/files/social_network-csv_basic-longdateformatter/social_network-csv_basic-longdateformatter-sf0.3.tar.zst | tar -xv --use-compress-program=unzstd
curl --silent --fail https://repository.surfsara.nl/datasets/cwi/snb/files/social_network-csv_basic-longdateformatter/social_network-csv_basic-longdateformatter-sf1.tar.zst | tar -xv --use-compress-program=unzstd
curl --silent --fail https://repository.surfsara.nl/datasets/cwi/snb/files/social_network-csv_basic-longdateformatter/social_network-csv_basic-longdateformatter-sf3.tar.zst | tar -xv --use-compress-program=unzstd
curl --silent --fail https://repository.surfsara.nl/datasets/cwi/snb/files/social_network-csv_basic-longdateformatter/social_network-csv_basic-longdateformatter-sf10.tar.zst | tar -xv --use-compress-program=unzstd
curl --silent --fail https://repository.surfsara.nl/datasets/cwi/snb/files/social_network-csv_basic-longdateformatter/social_network-csv_basic-longdateformatter-sf30.tar.zst | tar -xv --use-compress-program=unzstd
curl --silent --fail https://repository.surfsara.nl/datasets/cwi/snb/files/social_network-csv_basic-longdateformatter/social_network-csv_basic-longdateformatter-sf100.tar.zst | tar -xv --use-compress-program=unzstd
curl --silent --fail https://repository.surfsara.nl/datasets/cwi/snb/files/social_network-csv_basic-longdateformatter/social_network-csv_basic-longdateformatter-sf300.tar.zst | tar -xv --use-compress-program=unzstd
curl --silent --fail https://repository.surfsara.nl/datasets/cwi/snb/files/social_network-csv_basic-longdateformatter/social_network-csv_basic-longdateformatter-sf1000.tar.zst | tar -xv --use-compress-program=unzstd
set -eu
for ENTITY in static/organisation static/organisation_isLocatedIn_place static/place static/place_isPartOf_place static/tag static/tag_hasType_tagclass static/tagclass static/tagclass_isSubclassOf_tagclass dynamic/comment dynamic/comment_hasCreator_person dynamic/comment_hasTag_tag dynamic/comment_isLocatedIn_place dynamic/comment_replyOf_comment dynamic/comment_replyOf_post dynamic/forum dynamic/forum_containerOf_post dynamic/forum_hasMember_person dynamic/forum_hasModerator_person dynamic/forum_hasTag_tag dynamic/person dynamic/person_email_emailaddress dynamic/person_hasInterest_tag dynamic/person_isLocatedIn_place dynamic/person_knows_person dynamic/person_likes_comment dynamic/person_likes_post dynamic/person_speaks_language dynamic/person_studyAt_organisation dynamic/person_workAt_organisation dynamic/post dynamic/post_hasCreator_person dynamic/post_hasTag_tag dynamic/post_isLocatedIn_place; do
for SF in 0.1 0.3 1 3 10 30 100 300 1000; do
echo -n "& \numprint{$(tail -qn +2 social_network-csv_basic-longdateformatter-sf${SF}/${ENTITY}_*.csv | wc -l)} "
done
echo "\\\\"
done | tee out