Links does not take <base ...> into account
joakimbeng opened this issue · 1 comments
Hi and thanks for a great lib!
Given an html document like this:
<!doctype html>
<html>
<head>
<base href="/pages/">
</head>
<body>
<a href="page2.html">Hi</a>
</body>
</html>
The actual location for page2.html
is /pages/page2.html
because there is a <base>
element which sets the base url for all relative urls.
But when I compile it to markdown with Breakdance it yields:
[Hi](page2.html)
When it should instead be:
[Hi](/pages/page2.html)
I had a hard time tracking down the domain
option to the breakdance-util
package, and because it depends on state and options from the compiler I haven't figured out a good way to solve it.
My quick fix is to use cheerio
myself like this:
const $ = cheerio.load('...the html above...');
breakdance($.html(), {domain: url.resolve(myUrl, $('base').first().attr('href'))})
This works but it would be better if Breakdance did support the <base>
element, which I think it should.
What do you think?
it would be better if Breakdance did support the element
agreed! I missed this one. Since it's an element I think breakdance should support it natively. Do you want to do a pr? Or I can implement this as soon as I have a chance.
Also, did you try doing something like the following?:
var Breakdance = require('breakdance');
var breakdance = new Breakdance();
breakdance.set('base', function(node) {
// get value from node
});
Even though we need to make sure we get the URL from base
before any other URLs are handled, this should work, since AST nodes are created in the same order as the elements. Alternatively, we can update the html preprocessing logic.
(fwiw, the domain
option is strange compared to how the other options are handled. I didn't like how that was done and knew it was going to come back to haunt me lol.)