openwpm/OpenWPM

Support running OpenWPM crawls on Windows

motin opened this issue · 2 comments

motin commented

Path, I hope, to supporting Windows. There may be some limitations, but first step.

ToDo:

  • Once #648 is merged, there will be three dependencies that don't work on windows:
    • leveldb
      • have looked at the conda-forge recipe. appears it should now be straight forward to add windows support to it. (alternatively windows users have to manually install). Solved: conda-forge/leveldb-feedstock#12
    • plyvel
      • i believe we can switch out plyvel for python-leveldb with almost no fuss
    • python-virtualdriver
      • this is for running xvfb which won't work on windows anyway, so just need to figure a package management solution / environment.yaml that accomodates both (most likely just making installing python-xvfb a manual step, as install xvfb is manual anyway -- maybe moving to pip will workaround)
  • Make some tweaks in deploy_firefox so we're not manually making paths by concatenating strings
  • Also suggest making some tweaks in deploy_firefox so that we let geckodriver set a profile path and we then read off it. this will help in goal of restoring stateful crawls and will make it easier to work here.
  • Find a replacement for the log interceptor that uses mkfifo which is unix only. This stack overflow thread has something that maybe we can drop in as a replacement. Alternatively, I used a different approach in faust-selenium and created something to constantly "tail" geckodriver.log (https://github.com/birdsarah/faust-selenium/blob/master/crawler/geckodriver_log_reader.py). Alternatively again, we just save the geckodriver.log at the end and don't weave it into our logging. @englehardt - what is the motivation for interleaving the geckodriver logs?
    • First step could be to skip geckodriver logs for windows platform - they're not crawl essential as best as I can tell.

Future (open issues):

  • Add CircleCI tests and test on Win, OSX, and Linux (at least once per PR - or once a week).

An alternate version of openwpm was created as a proof of concept and has done windows crawls with openwpm. It uses basically the same openwpm instrumentation extension, but replaces the socket with a websocket, and uses kafka for orchestrating the crawl: https://github.com/birdsarah/faust-selenium

Moved to issue ToDos.