rpm-software-management/createrepo_c

Simplify cr_get_header_byte_range() + possible efficiency improvment?

dralley opened this issue · 7 comments

I'm curious if headerSizeof() from librpm can be used to replace much of the existing logic for cr_get_header_byte_range().

The way I understand the algorithm of cr_header_byte_range() is

headerStart = 112 (some magic constant representing the size of the Lead?) + signature header size + padding to some kind of alignment
headerEnd = headerStart + index header size

createrepo_c already has to extract and parse the header from the package anyway, so why not just use headerSizeof() from rpmio to calculate the sizes of the headers, and simplify the implementation from ~100 lines to just a couple of lines (essentially just the pseudocode above)

As a side benefit this would avoid having to open the file an extra time and seek around.

Is my understanding of how it works correct? (especially the magic numbers. https://rpm-software-management.github.io/rpm/manual/format.html claims that the Lead is 96 bytes including the magic number, but it's not clear how we get from 96 to 112 or the constant 104 on line 275)

Separately: this (calculating the header byte range) kind of feels like functionality that librpm ought to provide rather than be implemented in createrepo_c. createrepo_c shouldn't be in the business of parsing RPM files and making assumptions about their structure?

I think it sound reasonable to remove part of the code in createrepo_c and replace it by calling function from rpm-libs.

In the first place we have to investigate whether both functions returns compatible outputs.

In the second part we have to ask RPM team whether they would expose headerSizeof() as API. May I ask you @pmatilai for an opinion?

I think it's already exposed? It's listed in the documentation.

@pmatilai see discussion

Maybe the reason is that there isn't a version of rpmReadPackageFile() / rpmReadHeader() that works with the signature header? If you can't calculate the size / padding, you can't finish the calculation. But you could probably still simplify it.

I did eventually figure out the magic constants, I'll make a PR eventually to clean the code up a bit.

The fact that there's no way to get back a pair of signature and normal headers, and that the API merges the headers together, makes this impossible.