Duden questions
siarsky opened this issue · 4 comments
The easiness with which you describe the reverse engineering of Duden format in your blog is breathtaking, even everyone who ever tried a similar project knows that it was a hard piece of work - chapeau!
I am curious if you maybe know also other Duden formats (I am using Mac):
- a .dbb file, representing a single "dictionary", which can be added 1 by 1 into the application
- the app then import all dictionaries into one huge .nbof file located in /Users/user/Library/Application Support/DudenBibliothek:
-rwxr-xr-x 1 user group 15360 Nov 29 14:19 dbmedia.bdb
-rwxr-xr-x 1 user group 3729228 Nov 29 14:19 dudenbib.fi1
-rwxr-xr-x 1 user group 24660476 Nov 29 14:19 dudenbib.fi2
-rwxr-xr-x 1 user group 6363524 Nov 29 14:19 dudenbib.fsa
-rwxr-xr-x 1 user group 348931072 Nov 29 14:39 dudenbib.nbof
Is maybe IDX+BOF in your description NBOF and FI1+FI2 maybe FSI ???
Do you believe lsd2dsl can be used for a decompilation? How?
Thanks for your help!
siarsky
PS:
hexdump duden8.dbb
0000000 2d 1f ef 89 24 43 b7 82 3f b0 8b 9d 2d b1 ff 25
0000010 90 fa d0 6f 14 d7 6d eb 2d f7 3f 82 26 c3 04 c8
0000020 d3 6a 29 4b 3b c1 e2 01 67 2e 8a f0 a3 a2 b7 6b
0000030 71 b6 dc 4e b9 ba 18 a1 9a c3 25 ad 01 cf 30 a4
...
hexdump dbmedia.bdb
0000000 2d 1f ef 89 24 43 b7 82 3f b0 8b 9d 2d b1 ff 25
0000010 f8 4f 69 9b 17 aa 4e 4b 15 22 3f f7 63 8b bc 5b
0000020 90 07 e5 e5 53 36 e9 e2 b6 55 62 7f 83 b0 7c 67
0000030 71 b6 dc 4e b9 ba 18 a1 9a c3 25 ad 01 cf 30 a4
...
hexdump dudenbib.fi1
0000000 01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00
0000010 05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00
0000020 09 00 00 00 0b 00 00 00 0c 00 00 00 0d 00 00 00
0000030 0e 00 00 00 0f 00 00 00 10 00 00 00 11 00 00 00
...
hexdump dudenbib.fi2
0000000 01 00 00 81 02 00 00 81 03 00 00 81 04 00 00 81
0000010 05 00 00 81 06 00 00 81 07 00 00 81 08 00 00 81
0000020 09 00 00 81 0a 00 00 80 09 00 00 01 0b 00 00 81
0000030 ef 04 00 01 0d 00 00 81 0e 00 00 81 0f 00 00 81
...
hexdump dudenbib.fsa
0000000 42 46 18 00 00 00 00 00 02 02 00 00 61 01 00 00
0000010 69 01 00 00 02 03 00 00 61 04 00 00 73 01 00 00
0000020 01 01 00 00 73 01 00 00 01 01 00 00 65 01 00 00
0000030 01 01 00 00 72 14 00 00 01 01 00 00 72 18 00 00
...
hexdump dudenbib.nbof
0000000 2d 1f ef 89 24 43 b7 82 3f b0 8b 9d 2d b1 ff 25
0000010 52 0f 59 d3 27 c3 13 34 d1 e4 13 eb cf 2c f8 27
0000020 20 e1 44 6a 7a c3 30 36 fa 7c 13 0a 2b 17 78 35
0000030 71 b6 dc 4e b9 ba 18 a1 9a c3 25 ad 01 cf 30 a4
...
Thank you for the kind words!
Unfortunately, I'm not familiar with the recent versions of the format, so no idea how many changes to the decompiler are required.
I will take a look a bit later. Can't promise anything, but I'm curious too :)
If you need some test files, let me know a private way how I can share them with you. I am starting Ghidra as well :)
Well, it turns out the new dbb format is a an encrypted sqlite database with the following tables:
tabSystem:
random1
random2
random3
express
random4
random5
random6
tabDudenbibUrls:
id
url
tabBookDescription:
bookid
available
desc
version
copyright
baseimage
additionsid
homepage
hasfields
numarticles
tabGUIBitmaps:
filename
image
tabExternFiles:
filename
content
tabMap:
bookid
id
numid
type
tabHtmlText:
numid
lemma
context
type
html
tabMetaFachgebiete:
numid
fachgebietid
tabFieldsTopLevel:
bookid
field
desc
tabFieldValues:
bookid
field
val
desc
tabMarkers:
artid
bookid
created
html
tabTagging:
artid
bookid
created
tags
nbof is similar, but with some additional tables.
I don't know how I feel about this. The decompiler would need the decryption key, which needs to be extracted from the binary. If I provide a way to do that, it might trigger an arms race with Duden (the key is already slightly obfuscated to prevent grepping).
Given the key problem and the amount of work needed, I think I'll leave it be for now.
Thx for your help, I understand your point.