Provide library

Question

Provide library

plata opened this issue 3 years ago · 12 comments

It should be possible to put the core functionality into a library which can be reused by other projects.

Answer 1 · 2021-05-29T22:32:50.000Z

On Sat, May 29, 2021 at 12:46:07PM -0700, plata wrote: It should be possible to put the core functionality into a library which can be reused by other projects.

Sure, but I would have to decide on an API/ABI, preferably one that doesn't tie me up to libxml, so it does require some work. Is this something that you actually need, or just a general idea?

Answer 2 · 2021-05-30T05:52:50.000Z

Not OP but that's something I needed to integrate rdrview into another tool I'm building. I changed target platform to nodejs in the meantime (not only because of rdrview, some other things were missing or not production ready yet), so now I just use mozilla's reader instead.

Answer 3 · 2021-05-30T10:41:55.000Z

Is this something that you actually need, or just a general idea?

Not really under my control but I would love to see this being used by @TobiasFella in https://github.com/KDE/alligator.

Background info: Flym feed reader uses Readability4J to show the complete feed content for RSS which only contains teaser text. I couldn't find a C/C++ library which provides the same/similar functionality.

Answer 4 · 2021-05-30T17:11:06.000Z

Not really under my control but I would love to see this being used by @TobiasFella in https://github.com/KDE/alligator.

And is there an advantage if alligator can call rdrview as a library, as opposed to just using the cli through system() or similar? I think you would have to parse a lot of pages per second to see a difference in performance; maybe you are thinking of doing prefetching of all the new pages? I'm leaving this issue open either way. I'll try to make time for it if the alligator devs (or anyone else) can confirm that they need it.

Answer 5 · 2021-05-30T17:33:28.000Z

Not OP but that's something I needed to integrate rdrview into another tool I'm building. I changed target platform to nodejs in the meantime (not only because of rdrview, some other things were missing or not production ready yet), so now I just use mozilla's reader instead.

Thanks for the feedback. I think it's typical for such projects to be written in a higher-level language, like you ended up doing. Those languages usually have a (more mature?) readability implementation available. That's part of the reason I never bothered with this. By the way, did you actually need rdrview as a library? Were you calling it often enough to get performance issues from spawning processes, or was there another reason?

Answer 6 · 2021-05-30T18:57:47.000Z

I am using haxe, which can target different languages or VMs. I wanted to use the new-ish Haxe VM Hashlink for this project, but it's still very rough for sys things (handling processes, etc. ; even creating a web server was pre-alpha).

Having rdrview as a lib would have allowed be to write native bindings for hashlink VM and would have eased the process, but I gave up because of all the other (current) shortcomings of hashlink as a sys application (vs game, which is its primary target). Still using haxe, but now targetting nodejs which is a much better fit for this project.

Answer 7 · 2021-06-04T10:14:19.000Z

By the way, did you actually need rdrview as a library? Were you calling it often enough to get performance issues from spawning processes, or was there another reason?

I made a quick prototype where I call the executable: plata/alligator@30a5bf0

It takes some time when loading many feeds. However, I cannot tell if this is because it's not a library, if it's Internet access or if it's even related to rdrview at all.

Answer 8 · 2021-10-28T16:06:47.000Z

This is also something I would like to have - ideally a simple function that takes in an input HTML string, and returns or allocates the output HTML string. The main reason for this is performance, as there is some overhead when running a new process each time I want to get the content of a website.

Working with the strings directly also means that I can do the fetching however I want, I don't need to rely on the networking capabilities of this project (for example fetching 100s of websites).

C is also super portable, so adding this functionality would allow less popular languages to use this implementation

Answer 9 · 2022-02-13T13:41:55.000Z

@eafer Could you provide some hinds on what would be required to build a C library in case somebody would like to give it a try? Looking at the code, it isn't really obvious for me.

In the meantime, I've been looking around for other readability implementations. While there are several for Python, Java, Javascript etc., I've not been able to find anything for C. Also, calling an executable is not an option in my use case (I'm not allowed to do so and it raises issues for packaging/delivery).

Answer 10 · 2022-02-13T14:25:46.000Z

@eafer Could you provide some hinds on what would be required to build a C library in case somebody would like to give it a try? Looking at the code, it isn't really obvious for me.

I haven't looked at the code in a while, but I think the biggest change that would be required is to run cleanups and bubble up errors on failure; right now the program just exits whenever there is a problem. My biggest concern though, is that rdrview was always designed with the sandbox in mind, so it's possible that there are places where I was not sufficiently careful in the parsing. I'm also not very happy with making a library that depends on something as big as libxml2, so I would like to let callers somehow provide their own parser, but I don't know how realistic that is.

Answer 11 · 2023-10-30T11:55:21.000Z

@eafer feel free to close this if there's no plan to provide a library.

Answer 12 · 2023-10-30T17:58:56.000Z

I don't mind leaving it open in case other people show up asking for this.