sphinx-contrib/plantuml

Support for FTP mode -> Performance boost

danwos opened this issue · 15 comments

PlantUML has a FTP mode, which prevents the JVM to be started for each single PlantUML diagram.
It would be great to have support for this.

Our documentation has ~ 2.000 autogenerated PlantUML description, which raises the build time from normally 8 min to 2.5 hours.
I think avoiding the JVM startup for each image can save a lot of build time.

I found some time to work on this.
So I have made some tests, if the ftp mode could really speed up things.
For this I have created 100 easy planuml model files as input.
Then they get build by calling java -jar plantuml <file> for each or by uploading to a locally running plantuml in ftp mode: java -jar plantuml.jar -ftp.

Results for calling plantuml.jar directly :

Overall time: 69.99031257629395
longest:  0.7630829811096191
shortest:  0.634812593460083
average: 0.68

Results for jar using plantuml.jar -ftp:

Overall time: 21.60977053642273
longest:  0.8469300270080566
shortest:  0.042559146881103516
average: ~0.20

So the FTP mode is a lot faster: 22 secs vs 70 secs in normal mode. That's factor 3.
The reason may be that the FTP mode will need to start the JAVA RTE only once. But the "normal" solution needs to start it for each single file.

I will try to prepare a PR, which allows to start plantuml in ftp mode at the beginning of the build.
And then updating the related directives to upload plantuml code to the local ftp server and download the final png file (if configured to do so.).
Not sure yet, if only png is supported or svg as well (in ftp mode)

As the plantuml ftp mode is a mess/unstable for some commands, here is the used code for upload and download.
Maybe it is helpful for others:

from multiprocessing import Process
import subprocess
from ftplib import FTP
import time

# start server
def start_ftp():
    process = subprocess.run(['java', '-jar', 'plantuml.jar', f'-ftp:4242'])

def use_ftp():
    ftp = FTP()
    print(ftp.connect('127.0.0.1', '4242'))
    print(ftp.login())
    ftp.set_pasv(False)
    print(ftp.retrlines("LIST"))  # Get directory content
    with open('test_file.txt'', 'rb') as fup:
        ftp.storlines("STOR test_file.txt", fup)
        new_file = open("test_file.png"}', 'wb')
        ftp.retrbinary(f'RETR test_file.png', new_file.write)

if __name__ == '__main__':
    p = Process(target=start_ftp)
    p.start()
    time.sleep(1.0)  # Dirty hack to be sure server is already running
    use_ftp()
yuja commented

Hi, thanks for the benchmark data.

Suppose we'll have to use temporary files, I think we can instead batch the execution of plantuml.

Pseudo code:

target_nodes = {}
# build a dict of incdir: [node, ...] while visiting the document tree

# then, generate images per batch
for incdir, nodes in target_nodes.items():
    for n in nodes:
        # dump to temporary file {hash}.puml
    subprocess.Popen([..., "-o", output_dir, abs_{hash0}.puml, abs_{hash1}.puml, ...], cwd=abs_incdir)

Okay, PR is ready: #48 .
Luckily I was able to avoid temporary files.
Please have a look and tell me, if anything is missing or implemented the wrong way.

I have regenerated my documentation with >1.500 diagrams.
Normal run: 66 minutes
Run with ftp mode: 46 minutes

So it saves me ~20 min, or ~30% of build time.
That's nice :)

yuja commented

Can you try my batch-rendering PoC?
I think it'll be faster than FTP since plantuml process can leverage more CPU cores
to render multiple documents.

https://github.com/yuja/plantuml/tree/batch-rendering

Your FTP implementation is cool and the code quality is great, but if we can achieve
the same performance by reducing the number of plantuml calls, that should be better
than dealing with FTP and socket thingy.

Had some thoughts in the same direction, but was not sure if parallel generation is even possible.
Nice to see such a great solution.

Unfortunately I was unable to build my project with your PoC.
Got a message, that a copied plantuml file in _plantuml throws an error.
Reason for this: It uses !include colors.puml and for sure this relative path is now no longer valid, as color.puml got not copied as well.
I'm not sure how to deal with such "includes" or better "relative paths", as they can be used at different places (e.g. image reference).
Maybe a config parameter, which takes additional folders/files to copy to _plantuml may be a solution.

But if this works, it would be awesome and would really speedup things.

However, I think our 2 concepts are not competitors to each other.
Both fight for the same, but with different approaches.
And I have the feeling that both concepts could be used together to maximize the performance boost even more.

Argh sh**, I just deactivated !include colors.puml and recognized now, that my case/documentation is somehow special.
The batch feature gets used, but it only cares about 6 of ~1.500 diagrams.
The reason for this is, that nearly all diagrams get generated on the fly by the extension sphinx-needs.
As I'm the maintainer of this extension, I know some details.
So I use the uml-node of this extension and set it directly.
And it looks like that this is done after your batch-feature is called.
This all is done on the event doctree-resolved, but your implementation is dealing with it already on doctree-read.

I made a small test and changed doctree-read to doctree-resolved.
Build is running, but it takes longer: 77 min instead of 66 (normal mode, not ftp) .
I have the feeling that a plantuml diagram gets generated twice.
First time when sphinx-needs creates the plantuml-node and then again when the batch handling starts.

However, the implementation looks really promising, as it only puts 10 min on extra build time (batch_size = 5).
So this 10 min are the normal plantuml generation time, if I wouldn't use sphinx-needs.

yuja commented

It uses !include colors.puml and for sure this relative path is now no longer valid,

Yup, it's in my TODO list. I should mentioned that, sorry.

I made a small test and changed doctree-read to doctree-resolved.

That wouldn't work since doctree-resolved is emitted when building each doc.

https://www.sphinx-doc.org/en/master/extdev/appapi.html#sphinx-core-events

If your workload contains many plantuml diagrams in one page, maybe we can do
batch per page, which will also solve the problem of include path. But we'll need
a hybrid of FTP + batch to get rid of the java runtime overhead in this case.

Ohh I forgot that doctree-resolved gets fired several times. Thanks for the hint.
(I'm missing an event which gets fired after all docs-trees are resolved).
Maybe you could call the plantuml-batch-generation in the visitors functions, when they get called the first time?

Otherwise your proposal for running batch for each page sounds also valid.
Don't know, what would be the best....

However, let me know when there is something ready for testing.
And thanks a lot for spending time on that.

yuja commented

Can you try updated version?

https://github.com/yuja/plantuml/tree/batch-rendering

It's still PoC, but the !include issue should be addressed. Any docs containing !include are excluded from the batch.

Maybe you could call the plantuml-batch-generation in the visitors functions,

Yup, implemented as such, but be aware that doctree-resolved and visitor calls can be intermixed.
So we can't collect dynamically generated nodes at once and batch them.

Have tested it and everything is working.
Made some tests with different batch_sizes to see how the impact is on build time (CPU has 8 cores, ~1.500 diagrams):

005 -> 48 min
016 -> 44 mins
048 -> 42 mins
192 -> 42 mins

Do you have any suggestion for the perfect batch_size?

yuja commented

It's just a limit to not exceed the maximum length of command line arguments, so 100 or 1000 should be fine.

192 -> 42 mins

So the number looks close to FTP implementation. That sounds good.

yuja commented

Pushed 0.19 to pypi, which includes the batch-rendering branch.

Nice and thanks.
If I find some time I maybe will try to reimplement the FTP mode for your batch solution.
Not sure if it still brings enough benefit now, but as most work is already done a small test is quickly done.

yuja commented

will try to reimplement the FTP mode for your batch solution

I suggest benchmarking batched FTP before implementing it seriously.
If the performance improvement isn't significant, including FTP mode would be just a maintenance burden.

yuja commented

Superseded by plantuml_batch_size option. Closing.