TheParmak/conti-leaks-englished

conti-leaks-englished/deepl/json/translated/combined_en-US.json

0x241B opened this issue · 14 comments

The file 'conti-leaks-englished/deepl/json/translated/combined_en-US.json' is invalid JSON. I have tried to fix it myself but it is thousands of errors. Could we get a reupload?

I am taking a look, thanks for notifying.

Hey, yeah it's invalid because of the splitting. I'm trying to translate the JSONs one by one right now.

isn't removing all \n 's solves it?

isn't removing all \n 's solves it?

Unfortunately not. There are formatting issues like below:

{
  }, "ts": "2022-02-27T19:19:45.870724",
  { "from": "pumba@q3mcco35auwcstmt.onion",
  { "to": "tramp@q3mcco35auwcstmt.onion",
  { "body": "say I'll take her under special control"
}

As well as things like double quotes in the body text not being escaped, and other slashes not being escaped.

oh, I see.

It took longer than expected. Some files throw the "appears to already be in English" error. Just letting you know, don't fix that JSON.

What do you mean don't fix that JSON? did you fixed it, are you fixing it or is it just not fixable?

I'm retranslating the whole thing one by one. it technically is fixable but will require unnecessary amount of time.

Do you have any, more convenient way to speak? like discord, twitter, or even netcat.

Damn, seems like Deepl removed some of the commas. I'll do a PR anyway.

can't you parse the untranslated json first, extract the body, translate it, and then merge with original json?

That's easy. Doing it right now
.

Translated body in #7