This Space for Rent

The joys of open source®©™, part toomany

This last week I decided I’d shove my rpm reader/generator up to a new fractional release by tweaking the makepkg binary to generate rpm v3 format packages. I first just updated the on-disk data structure to v3 format (with a magic number at the start of each dictionary) but didn’t implement signature generation, because it didn’t seem to be mandatory.

Hah, no such luck. But that’s only a minor problem; I would simply implement md5 checksums because they’re simple to implement and they’re fairly resistant to a non-technical attacker (I was running under the logic that once I got this working then I’d be able to fit in ‘PGP’ – actually gnuPG, because that’s the only free implementation these days – and be able to sign the packages if they found a cure for mortality and I could pull Mastodon out of the grave for modern hardware.)

The “documentation” for RPM says that the md5 checksum is for the header (but not the signature header if it’s a v5 signature) and payload. This, apparently, is not actually what it is; I’d modified xrpm so I could extract both the header + payload (xrpm -Dd: -D extracts the header, then -d extracts the payload) into a separate file, so I could tuck those two things in a safe place and hand-verify checksums. So I did this with an actual redhatcentos rpm, only to discover that the md5 checksum (at index 1007 in the signature dictionary) was not the same as the md5 checksum of the header+payload (or header+compressed payload, or header-sans-signature+payload, or header-sans-signature+uncompressed payload.)

I checked the rpm source code. Fucking mistake; it’s like the more opaque parts of discount on speedballs. Hundreds of tiny little functions, all alike, larded up with what looks like some sort of legacy lint configuration commentary, and all, needless to say, pretty close to completely unreadable. No, the only way I’m going to be able to figure out what’s being generated is to make a dinky little rpm that installs nothing, then write an automated check script that walks the thing byte by byte doing every possible variant of an md5 checksum until I can make a checksum that matches what the horrible reference code puts into index 1007. (I would not be surprised one bit if the checksum included parts of the signature dictionary, or if it salted the md5 sum with something stupid to discourage third-party reverse engineering.)

But it’s ~180mb of sccs on github, compared to xrpm’s 1.9mb, so it’s got to be 100 times as confusing. (And OF COURSE it uses the now-traditional three layer shitcake that GNU configure has become, just to ensure that trying to compile a fully debugged version and trace just exactly what’s being written into the md5 checksum generator won’t be happening unless you’ve gone back for a second helping of that lovely Open Source®©™ Flavor-Aid.)

At least xrpm itself can still pick apart current rpms, for whatever good that does in a world filled with GNU configure, Cmake, and other abominations in the eyes of G-d.