ldbc/ldbc_snb_docs

Slight differences in #entities for Interactive data sets

szarnyasg opened this issue · 1 comments

We got a report through email:

We were able to download the recent LDBC graphs (from https://repository.surfsara.nl/datasets/cwi/snb). I checked number of entities and it’s a bit different to the numbers reported in the paper here: https://arxiv.org/pdf/2001.02299.pdf (page 135). Are those numbers up-to-date?
E.g., if you check the wc -l social_network-csv_basic-sf1/static/organisation_0_0.csv it returns 7 956 compared to 7 996 in the paper.

It seems the #entities are a bit off due to the headers being included (incorrectly) in the numbers reported in the spec.

To regenerate the table, run:

set -eu

curl --silent --fail https://repository.surfsara.nl/datasets/cwi/snb/files/social_network-csv_basic-longdateformatter/social_network-csv_basic-longdateformatter-sf0.1.tar.zst | tar -xv --use-compress-program=unzstd
curl --silent --fail https://repository.surfsara.nl/datasets/cwi/snb/files/social_network-csv_basic-longdateformatter/social_network-csv_basic-longdateformatter-sf0.3.tar.zst | tar -xv --use-compress-program=unzstd
curl --silent --fail https://repository.surfsara.nl/datasets/cwi/snb/files/social_network-csv_basic-longdateformatter/social_network-csv_basic-longdateformatter-sf1.tar.zst | tar -xv --use-compress-program=unzstd
curl --silent --fail https://repository.surfsara.nl/datasets/cwi/snb/files/social_network-csv_basic-longdateformatter/social_network-csv_basic-longdateformatter-sf3.tar.zst | tar -xv --use-compress-program=unzstd
curl --silent --fail https://repository.surfsara.nl/datasets/cwi/snb/files/social_network-csv_basic-longdateformatter/social_network-csv_basic-longdateformatter-sf10.tar.zst | tar -xv --use-compress-program=unzstd
curl --silent --fail https://repository.surfsara.nl/datasets/cwi/snb/files/social_network-csv_basic-longdateformatter/social_network-csv_basic-longdateformatter-sf30.tar.zst | tar -xv --use-compress-program=unzstd
curl --silent --fail https://repository.surfsara.nl/datasets/cwi/snb/files/social_network-csv_basic-longdateformatter/social_network-csv_basic-longdateformatter-sf100.tar.zst | tar -xv --use-compress-program=unzstd
curl --silent --fail https://repository.surfsara.nl/datasets/cwi/snb/files/social_network-csv_basic-longdateformatter/social_network-csv_basic-longdateformatter-sf300.tar.zst | tar -xv --use-compress-program=unzstd
curl --silent --fail https://repository.surfsara.nl/datasets/cwi/snb/files/social_network-csv_basic-longdateformatter/social_network-csv_basic-longdateformatter-sf1000.tar.zst | tar -xv --use-compress-program=unzstd
set -eu

for ENTITY in static/organisation static/organisation_isLocatedIn_place  static/place  static/place_isPartOf_place  static/tag  static/tag_hasType_tagclass  static/tagclass  static/tagclass_isSubclassOf_tagclass  dynamic/comment  dynamic/comment_hasCreator_person  dynamic/comment_hasTag_tag  dynamic/comment_isLocatedIn_place  dynamic/comment_replyOf_comment  dynamic/comment_replyOf_post  dynamic/forum  dynamic/forum_containerOf_post  dynamic/forum_hasMember_person  dynamic/forum_hasModerator_person  dynamic/forum_hasTag_tag  dynamic/person  dynamic/person_email_emailaddress  dynamic/person_hasInterest_tag  dynamic/person_isLocatedIn_place  dynamic/person_knows_person  dynamic/person_likes_comment  dynamic/person_likes_post  dynamic/person_speaks_language  dynamic/person_studyAt_organisation  dynamic/person_workAt_organisation  dynamic/post  dynamic/post_hasCreator_person  dynamic/post_hasTag_tag  dynamic/post_isLocatedIn_place; do
    for SF in 0.1 0.3 1 3 10 30 100 300 1000; do
        echo -n "& \numprint{$(tail -qn +2 social_network-csv_basic-longdateformatter-sf${SF}/${ENTITY}_*.csv | wc -l)} "
    done
    echo "\\\\"
done | tee out