problem after converting a .har -> .warc and importing in webrecorder
wsdookadr opened this issue · 1 comments
Hi,
Thank you for writing har2warc. I will describe below what I've tried and some minor differences in what I was expecting that I don't know how to explain.
I pulled a docker image of splash created by @scrapinghub like so and ran it:
docker pull scrapinghub/splash
docker run -it -p 8050:8050 --name render_html scrapinghub/splash
Then I rendered a page using splash and exported the resulting .har (as indicated in splash's docs):
curl 'http://localhost:8050/render.har?url=https://www.digitalocean.com/community/tutorials/how-to-secure-haproxy-with-let-s-encrypt-on-centos-7&timeout=10&wait=7&response_body=1' > 1.har
Then I've converted the resulting .har to .warc
har2warc 1.har 1.warc
And after this I've imported the 1.warc file into webrecoreder.
Then I viewed that file as it was stored in webrecorder and any styling seemed to be missing.
I understand and agree that this does not just involve har2warc, and the problem could originate in one of these: har2warc , splash , webrecorder . I'm not sure where to attribute this behaviour.
The general use-case would be automating a large archiving operation where the result would be a faithful reproduction of the original website, if such a website happens to contain a lot of javascript-rendered content, and nowadays that is the case with many websites.
I'd be interested in your thoughts.
Thanks,
Stefan