implement profile-1.0 secret transformation
Closed this issue · 14 comments
Apparently there are two possible types of transformations applied to the -userKey as specified in the .lcpl. the "basic secret transformation" which does nothing to the user's passphrase (i.e. this is the one that is implemented in lcp-decrypt), and then there is a secret transformation that is derived from a master key using about 10 lines of python code from an old version of DeDRM which was forcefully removed from github.
One can use that very python code and apply it to a passphrase and then give the result to lcp-decrypt as -userKey to successfully decrypt an epub which uses the profile-1.0 key transformation.
In the .lcpl file this type of key transformation is specified in the following fragment:
"encryption": {
"profile": "http://readium.org/lcp/profile-1.0",
"content_key": {
nvm, it just ran without decrypting
actually this works, but encryption.xml also specifies compression, and so I found a bug: lcp-decrypt does not decompress compressed files. so this works for uncompressed content files, but not for compressed ones.
made an issue for the compression, including a "reference implementation" here: #4
oh wow this is an interesting scheme 🤔 what is input_hash
here, the SHA-256 of the clear text user key?
frankly I'm not sure. I tried that very input_hash for the ebook I have with Thorium as a text user key to no avail.
reading through the DeDRM code where above excerpt is from, it seems (on first glance) that it is treated as a SHA-256 of a clear text user key. Or rather, DeDRM checks which algo is specified in the XML but implements only SHA-256.
Rather for this specific ebook I took the input_hash (it is a hex string) I had extract from the reading app, and applied this function to it, and then used the result with lcp-decrypt.
I would go with the tryal and error route in lcp-decrypt: hash the key (according to the algo specified in the xml), and see if it matches the key_check from the xml. If not try the user input AS A HASH. And if that also does not work apply above function and hash (sha-256) the result and see if it matches. Or something like that. So that the user can give a text key or a hash to lcp-decrypt and it just tries and sees at which step in the transformation the input needs to be thrown in to get at the actual content key. I think DeDRM (the old version) does the same thing (it just ignores compression, and this is why it did not work for me)
I got a full copy of dedrm, it seems that they "blindly" try all options until they find one that pass the key check indeed. I don't have any files that'd use the "weird" scheme, so I can't really test any changes I make. If you have any, could you forward them to me? adrien AT bustany DOT org . I pushed fixes for couple of other issues, so maybe just give the latest version a try too, with a bit of luck your issues might be fixed :)
With regards to @devvythelopper's initial comment:
The two profiles are listed here: https://readium.org/lcp-specs/registries/profiles. The basic one is for testing purposes only while 1.0 is for production usage.
The algorithm to determine the user key however is not public as stated in https://www.edrlab.org/readium-lcp/principles
From the passphrase to the content decryption key
A user knows a passphrase (something he has chosen or which has been given to him by a license provider).
The software transforms the passphrase into a user key (h = hash(pp) then uk = userkey(h), with “userkey” a simple string transform). The user key can decrypt the content key provided in the user license. The content key can decrypt the content.
The Readium LCP library software is mostly open-source, only uk = userkey(h) isn’t (in the open-source version it is void). Only trusted licence providers and trusted app developers know what this string transform is. Therefore one cannot take the open-source software and simply add a “save as clear epub” feature applied on ebooks provided by certified servers.
Certified applications must be hardened, so that hackers don’t easily find the secret “userkey” transform.
The python code from the DeDRM Calibre plugin posted above claims that it does exactly this - the secret string transformation as per profile 1.0. Apparently if you have an old copy of the DeDRM plugin you can still decrypt profile 1.0 books.
The DeDRM repo however got a DMCA takedown notice so they had to remove the code from the repo and even re-write the history. But given that they included the master key in plain text (could've been an env variable instead) I'm not surprised. On a side note - I'm not sure if it is wise to keep this code in the comment.
There is however profile 2.0 which doesn't even seem to be documented by Readium and it gets adopted by more publishers. It seems that (logically) it has a different transformation algorithm which is (yet) unknown and I bet Readium would like it to stay this way.
thanks @ienev for pointing this out. Which means it is likely not smart to implement the userkey(h) transformation at all in lcp-decrypt. What makes sense to implement is just an algorithm that tries to throw in whatever the user gives at different places of the decryption process and try if it succeeds.
It's up to users to transform the string then according to the python code of the old version of DeDRM or not at all...
what's not clear to me is, do we have examples of epubs that currently fail to decrypt with lcp-decrypt (ie after I pushed the fixes for incorrect path decoding etc)? If yes, I'm happy to take a look, but else it's all quite theoretical to me :)
I pushed fixes for couple of other issues, so maybe just give the latest version a try too, with a bit of luck your issues might be fixed :)
decrypting my ebook with the manually transformed key works with the current version, great!
I don't have any files that'd use the "weird" scheme, so I can't really test any changes I make. If you have any, could you forward them to me?
as @ienev pointed out, it might be sensible to not implement that transformation after all. Instead you might want to add the ability to load plugins, and then a plugin could implement a hook that transforms the userkey. But I don't know if that makes much of a difference, since one can also simply create a .py file and run the transformation.
in fact given the legal situation of this, I think the most workable way for any sensible reader who wants to back up her ebooks in a format that is not bound to specific devices, is to
- be provided with a download for a little python script that is not hosted on github and that does nothing but apply the mentioned transformation
- and use lcp-decrypt as it is now with the result of the transformation
python because you can download such a python script from an untrustworthy source, and easily see what it does. If someone wrote such a script I would be happy to test.
decrypting my ebook with the manually transformed key works with the current version, great!
🥳
But I don't know if that makes much of a difference, since one can also simply create a .py file and run the transformation.
Agree, I don't want to deal with DMCA takedowns 😅 Though it'd actually be interesting to know if DeDRM got taken down precisely because of that LCP "master" key, or if this was just collateral damage. But if the key doesn't appear in the LCP standard, it's probably (supposed to be) secret 🤷
Feel free to close the ticket if you think it makes sense :)
decrypting my ebook with the manually transformed key works with the current version, great!
🥳
But I don't know if that makes much of a difference, since one can also simply create a .py file and run the transformation.
Agree, I don't want to deal with DMCA takedowns 😅 Though it'd actually be interesting to know if DeDRM got taken down precisely because of that LCP "master" key, or if this was just collateral damage. But if the key doesn't appear in the LCP standard, it's probably (supposed to be) secret 🤷
I would assume this to be exactly the reason. For as far as I understand it, taking down content that is DMCA protected means that this content is not in the public domain. And the only "content string" in DeDRMs code that was not given to the public domain is the master key. But then again, corporations and institutions are even able to patent algorithms (such as mp3), so maybe the whole transformation algorithm is protected despite how simple it is.
Feel free to close the ticket if you think it makes sense :)
👍
do we have examples of epubs that currently fail to decrypt with lcp-decrypt (ie after I pushed the fixes for incorrect path decoding etc)
I have a profile 2.0 one that can't be decrypted, neither by lcp-decrypt nor by DeDRM (the version that worked with profile 1.0)