unsplash/datasets

Question about the entities behind the "anonymous_user_id"

vii33 opened this issue · 2 comments

vii33 commented

Hey,
I'm trying to calculate the average amount of images a user downloads.

As I know from my own photo stats, a lot of downloads are generated via API requests from external applications. You state in your API doc that external applications don't need to authenticate on a user level.

My question: Is for an external application like Trello one anonymous user id generated or do you guys have a better approach to distinguish users "behind" the external application?

Example from the test dataset
Could the user from the first row (942 downloads) really be one person or also a whole logical entity like Trello?

anonymous_user_id downloads
5a055748-57d2-45c1-a882-5b9bb9313509 942
beb0923e-c17d-4a90-a8db-47b0f45fb0fc 897
85e5db9c-07c7-49bf-9e08-5cbd1603dd74 546
... ...

Thanks a lot for the answer and great job with the data set. 👍

Hi @vii33 !

These conversions/downloads only concern our main website unsplash.com, after a search happened. Technically, these are unique devices. You're right that this would need some clarification.

We could see to include another table in the dataset with all the downloads from all the sources but I don't think we'd be able to tie individual users of third party apps to the downloads they make. Mainly because of the reason you mentioned: some third party apps act as proxy for the photo download and would "override" the little we know about the device that's actually downloading.

I think it could be an idea for a future version of the dataset though. I'll talk with the team and we'll see what we can do.

vii33 commented

Hey Timmy, all good. Thanks for clearing this up 👍