openwpm/OpenWPM

Save Content Decode error when saving image

Closed this issue · 2 comments

I added this line in the config.py

save_content: Union[bool, str] = "image,media,script,imageset"

Ran 1 site. Looked up the hash for an image from the http_response table and tried getting the image using plyvel.

import plyvel

db  = plyvel.DB('../datadir/leveldb')

img = db.get(b'8337212354871836e6763a41e615916c89bac5b3f1f0adf60ba43c7c806e1015')

with open("check.png","wb") as file:
	file.write(img)

db.close()

This saves an error image. So I added a line from an answer I found on stackoverflow

import plyvel

db  = plyvel.DB('../datadir/leveldb')

img = db.get(b'8337212354871836e6763a41e615916c89bac5b3f1f0adf60ba43c7c806e1015')

img_bytes = img.decode('utf8','ignore').encode('latin-1')

with open("check.png","wb") as file:
	file.write(img_bytes)

db.close()

However this keeps giving me unicodedecodeerror.

Hey,
I don't think decoding the image will do what you want.

We have this test to check that content saving actually works:
https://github.com/mozilla/OpenWPM/blob/6742ebbec896a568459fd039ea41e027fac75a55/test/test_http_instrumentation.py#L957-L992

Which basically does the same thing you were doing at first, so I don't have any idea why it wouldn't work.
Are you sure about the type of file this is? Or could you be using the wrong file extension?

It turned out to be a different error. I was trying this on a virtual machine. Which seemed to have been the issue. Otherwise you are right the test method / my first method it working fine.

Thanks. closing this