how to remap url
FerX opened this issue · 4 comments
I have a site that has the url with random parameters,
and downloading is infinite,
how can I remap it?
thanks for your job.
Haven't thought about that yet, but it should work with the htmlAttributeFilter hooks. So if you're looking at normal links you'd need to override the 'a.href'. It should look something like this:
new Telescopy({
htmlAttributeFilters: {
'a.href': {
tag: 'a',
filter: a => a.href && a.href.length,
exec: (attributes,args,resource) => {
let url = attributes.href;
//parse url, remove random parameter and reformat here
//mark as link to be processed
attributes.href = resource.processResourceLink( url, "text/html" );
}
}
}
})
The only other place where every url that will be processed runs through is Resource.prototype.processResourceLink but there's no easy way to hook into it unless you override the prototype. Hope that helps!
thanks for the advice but I still have 2 problems:
1- some links are images, if I leave "text / html" do not download them, unfortunately they are links without extension so I do not know if png or jpg.
2 - in addition to downloading part of the main site is downloading me all google, is not limited to the domain indicated?
thanks.
-
for the images you can do the same as described above, but for the key 'img.src'. I might add a better hook in the next version. if they do not have an extension, my experience has shown that you can still save them all as .jpg and the browser will correct for it when viewing the mirrored website. otherwise you need some logic to determine what they are before the download begins. that is the only way.
-
if you filters are not working I advice you read the documentation and use the test-tool for a specific url:
telescopy-test-rules config.js http://example.com/some-specific-url
it will list all found urls and if they were allowed by your url filter settings
thanks for your help.