eafer/rdrview

Provide library

Opened this issue · 12 comments

plata commented

It should be possible to put the core functionality into a library which can be reused by other projects.

eafer commented
kLabz commented

Not OP but that's something I needed to integrate rdrview into another tool I'm building. I changed target platform to nodejs in the meantime (not only because of rdrview, some other things were missing or not production ready yet), so now I just use mozilla's reader instead.

plata commented

Is this something that you actually need, or just a general idea?

Not really under my control but I would love to see this being used by @TobiasFella in https://github.com/KDE/alligator.

Background info: Flym feed reader uses Readability4J to show the complete feed content for RSS which only contains teaser text. I couldn't find a C/C++ library which provides the same/similar functionality.

eafer commented
eafer commented
kLabz commented

I am using haxe, which can target different languages or VMs. I wanted to use the new-ish Haxe VM Hashlink for this project, but it's still very rough for sys things (handling processes, etc. ; even creating a web server was pre-alpha).

Having rdrview as a lib would have allowed be to write native bindings for hashlink VM and would have eased the process, but I gave up because of all the other (current) shortcomings of hashlink as a sys application (vs game, which is its primary target). Still using haxe, but now targetting nodejs which is a much better fit for this project.

plata commented

By the way, did you actually need rdrview as a library? Were you calling it often enough to get performance issues from spawning processes, or was there another reason?

I made a quick prototype where I call the executable: plata/alligator@30a5bf0

It takes some time when loading many feeds. However, I cannot tell if this is because it's not a library, if it's Internet access or if it's even related to rdrview at all.

ajusa commented

This is also something I would like to have - ideally a simple function that takes in an input HTML string, and returns or allocates the output HTML string. The main reason for this is performance, as there is some overhead when running a new process each time I want to get the content of a website.

Working with the strings directly also means that I can do the fetching however I want, I don't need to rely on the networking capabilities of this project (for example fetching 100s of websites).

C is also super portable, so adding this functionality would allow less popular languages to use this implementation

plata commented

@eafer Could you provide some hinds on what would be required to build a C library in case somebody would like to give it a try? Looking at the code, it isn't really obvious for me.

In the meantime, I've been looking around for other readability implementations. While there are several for Python, Java, Javascript etc., I've not been able to find anything for C. Also, calling an executable is not an option in my use case (I'm not allowed to do so and it raises issues for packaging/delivery).

eafer commented
plata commented

@eafer feel free to close this if there's no plan to provide a library.

I don't mind leaving it open in case other people show up asking for this.