Parse for pdf

Question

Parse for pdf

nikophil opened this issue 7 years ago · 5 comments

nikophil commented 7 years ago

give a way to parse for pdf and parse only some parts of the docs

Answer 1 · 2018-12-27T14:58:32.000Z

Almost done

there are some prev / next links hardcoded in the rst of the best practices, i think this might disappear (moreover because i now handle thes links in the json - they are really useless)
If some link exists between different pages of the same "book" we want to print as pdf, the parser renders it with an actual link href="page.html" and not as anchors href="#page". This behavior exists at least in toc, but it could be real for any cross link inside the same book. I don't see another solution than parsing all html, and replacing it.

Answer 2 · 2018-12-27T18:45:28.000Z

there are some prev / next links hardcoded in the rst of the best practices, i think this might disappear (moreover because i now handle thes links in the json - they are really useless)

You mean, for example, the "Next: Creating a project" at the bottom of https://symfony.com/doc/current/best_practices/introduction.html right?

In a perfect world, we would remove these and the auto-generated next/prev would handle this in HTML automatically. Let's just keep this on the "list" for now - we can see how the next/prev links look, and then hopefully remove these manual ones later.

Answer 3 · 2018-12-27T18:54:36.000Z

If some link exists between different pages of the same "book" we want to print as pdf, the parser renders it with an actual link href="page.html" and not as anchors href="#page". This behavior exists at least in toc, but it could be real for any cross link inside the same book. I don't see another solution than parsing all html, and replacing it.

Technically, this is ok! The current PDF-generating code actually already contains a bunch of code (regex, etc) to find and fix the links. However, as this code is very coupled to Sphinx, I think we should re-implement it ourselves - basically have an option that will dump one "section" into a single, final HTML file (or maybe JSON file... so it can be more easily parsed... but containing HTML) with all the links already fixed.

Answer 4 · 2018-12-27T18:57:39.000Z

You mean, for example, the "Next: Creating a project" at the bottom of https://symfony.com/doc/current/best_practices/introduction.html right?
yep, i was talking about that.

ok, let's keep that, but i'm pretty sure we'll soon get rid of it

Technically, this is ok! The current PDF-generating code actually already contains a bunch of code (regex, etc) to find and fix the links. However, as this code is very coupled to Sphinx, I think we should re-implement it ourselves - basically have an option that will dump one "section" into a single, final HTML file (or maybe JSON file... so it can be more easily parsed... but containing HTML) with all the links already fixed.

what do you mean ? i was thinking that we're using princexml to generate pdf ?

Answer 5 · 2019-01-02T13:06:02.000Z

https://github.com/weaverryan/docs-builder/issues/9