breakdance/breakdance

Links does not take <base ...> into account

joakimbeng opened this issue · 1 comments

Hi and thanks for a great lib!

Given an html document like this:

<!doctype html>
<html>
  <head>
    <base href="/pages/">
  </head>
  <body>
    <a href="page2.html">Hi</a>
  </body>
</html>

The actual location for page2.html is /pages/page2.html because there is a <base> element which sets the base url for all relative urls.

But when I compile it to markdown with Breakdance it yields:

[Hi](page2.html)

When it should instead be:

[Hi](/pages/page2.html)

I had a hard time tracking down the domain option to the breakdance-util package, and because it depends on state and options from the compiler I haven't figured out a good way to solve it.

My quick fix is to use cheerio myself like this:

const $ = cheerio.load('...the html above...');
breakdance($.html(), {domain: url.resolve(myUrl, $('base').first().attr('href'))})

This works but it would be better if Breakdance did support the <base> element, which I think it should.

What do you think?

it would be better if Breakdance did support the element

agreed! I missed this one. Since it's an element I think breakdance should support it natively. Do you want to do a pr? Or I can implement this as soon as I have a chance.

Also, did you try doing something like the following?:

var Breakdance = require('breakdance');
var breakdance = new Breakdance();
breakdance.set('base', function(node) {
  // get value from node
});

Even though we need to make sure we get the URL from base before any other URLs are handled, this should work, since AST nodes are created in the same order as the elements. Alternatively, we can update the html preprocessing logic.

(fwiw, the domain option is strange compared to how the other options are handled. I didn't like how that was done and knew it was going to come back to haunt me lol.)