carissalow/rapids

Exclude certain apps when scraping/updating missing categories.

Closed this issue · 3 comments

yiyir commented

Is your feature request related to a problem? Please describe.
If the app cannot be found by google, then en error will be thrown(causing the program to exit), even if we already excluded the app in EXCLUDED_APPS config; this problem forces us to set both UPDATE_CATALOGUE_FILE and SCRAPE_MISSING_CATEGORIES to be false despite that we want to auto-scrape most of the unidentified apps but just skip certain ones...

Describe the solution you'd like
Exclude apps in EXCLUDED_APPS from scraping, since we are not considering them for feature extraction.

Additional context
In the example config below, the app "com.upmc.rosa" is a potential app that makes the whole program exit.

PHONE_APPLICATIONS_FOREGROUND:
CONTAINER: applications_foreground
APPLICATION_CATEGORIES:
CATALOGUE_SOURCE: FILE
CATALOGUE_FILE: "data/external/stachl_application_genre_catalogue.csv"
UPDATE_CATALOGUE_FILE: TRUE
SCRAPE_MISSING_CATEGORIES: TRUE
PROVIDERS:
RAPIDS:
COMPUTE: TRUE
SINGLE_CATEGORIES: ["all", "email"]
MULTIPLE_CATEGORIES:
social: ["socialnetworks", "socialmediatools"]
entertainment: ["entertainment", "gamingknowledge", "gamingcasual", "gamingadventure", "gamingstrategy", "gamingtoolscommunity", "gamingroleplaying", "gamingaction", "gaminglogic", "gamingsports", "gamingsimulation"]
SINGLE_APPS: ["top1global", "com.facebook.moments", "com.google.android.youtube", "com.twitter.android"]
EXCLUDED_CATEGORIES: []
EXCLUDED_APPS: ["com.upmc.rosa"]
FEATURES: ["count", "timeoffirstuse", "timeoflastuse", "frequencyentropy"]
SRC_SCRIPT: src/features/phone_applications_foreground/rapids/main.py

Thanks for reporting this @yiyir , what commit are you using? git rev-parse --short HEAD

yiyir commented

Thanks for reporting this @yiyir , what commit are you using? git rev-parse --short HEAD

@JulioV 00a3335

This was a bug, the fix is now in 29cc3f0 v1.1.1. We still scrape all apps but correctly handle the case when they don't exist. Thanks again for reporting