Point of interest detection: Home / Work

Installation

pip install -U pip
pip install -r requirements.txt

The csv file technical_test_data.csv should be in the folder data/

Files

The file exploration.ipynb shows some statistics and how the solution was elaborate. The file solution.ipynb uses the functions of functions.py to display the scatter of each users with their home and work places.

Solution description

The idea comes from a simple assumption: people go from home to work during week days and stay at home more often on week end days.

The first step is to extract the points of interest of the user using a clustering algorithm DBSCAN and the haverstine distance metrics which fits best for geospatial data. We try to find groups of dots not too far from each other (around 10 meters if you don't count the horizontal precision) and with a sufficient amount of data (at least 100 event should have occured).

Then we take the top two most visited places and match them to the week days and the week end days to separate home from work.

Users predictions

861100071

861100071

1853210804

1853210804

3330315587

3330315587

Future Work

  • Usage of other features:

    • time series: we did not use the motion of the users at all

      • When they start to move from A to B ?
      • How long did they stay at some place ?
      • The clustering part groups events which did not necessarily happened at the same time. We should consider point spatially grouped but also temporally grouped
      • Instead of counting the occurrence of each point of interest, analyze the order of them: for instance, people are more likely doing the pattern HOME/WORK/HOME during the week whereas during week ends the number of point of interest can increase.
    • crc32: home and work wifi spot should be constant and stable (giving also good horizontal precision)

    • speed: the speed feature was not that used or analyze. Some negative values and some stats did not help that much to use it properly

  • Tweak and explore DBSCAN possibilities

    • Take more time to find a proper haversine version with horizontal precision
    • Finetune the value of eps
  • Use values of latitude and longitude to map point of interest to actual places (using an external API)