ykdojo/editdojo

Find the difference between two sets of text with JavaScript

ykdojo opened this issue ยท 15 comments

Example:

  • Original text: I had aweesome breakfast tooday.
  • Edited text: I had awesome breakfast today.

Given these two sets of text, we should be able to find the difference with JavaScript. And show the difference in UI. This way, it'll be easy to see how the original text has been edited.

In this case, it should look like this: I had awe<e>some breakfast to<o>day.

(Where <e> and <o> have strikethrough over them)

Hi @ykdojo . Can I be assigned to this please ?
And, I think, added characters in the edited text should also be highlighted with maybe something like greenish background color.

Original text: I had aweesome breakfast tooday.
Edited text: I had an awesome breakfast today.

So it may give an output like this:

<p>I had <span class="appended">an</span> awe<span class="deleted">e</span>some breakfast to<span class="deleted">o</span>day.

Where the appended words/chars are wrapped in a span with class .appended, and deleted words/chars are wrapped in a span with class .deleted.

That sounds perfect! I tried assigning this to you on GitHub, but it didn't work for some reason. Feel free to just start working on it and send a pull request when it's ready :)

Found this library, but not sure if it would be a good fit for this: https://github.com/jhchen/fast-diff

Maybe it would be better/simpler to just implement this ourselves? That way, it'll be easier to tweak it as we go, too. I just don't like how most of these libraries are really complex to read and edit... Anyway, I'll think about this some more.

Started working on this today.

Pretty raw and it might be hard to read, but I have some code here already: https://github.com/ykdojo/text_difference_finder

Hey @ykdojo did you look at my pull request (#25)?

@justafrank hey sorry I thought I'd submitted my comments already. I just sent them again.

I was thinking of implementing something on my own and comparing it to yours.

Here are some examples for testing, in English:

  • Hello Iโ€™m looking of the group for learn english.
    -> Hello, Iโ€™m looking for a group to learn English with.

  • what book are you reading ? i want to start this oneโ€ฆ what do you think about ?
    -> What book are you reading? I want to start this oneโ€ฆ What do you think about it?

  • who helps me for improving my English?
    -> Can anyone help me improve my English?

  • I am waiting for snow from this morning until now.
    -> Iโ€™ve been waiting for snow since this morning.

  • even I have French nationality. I still make mistake when I write my own sentences .
    -> Even though Iโ€™m French, I still make mistakes when I write in French.

And some test cases in Japanese:

  • ๆ—ฅๆœฌ่ชžใงใ‚‚่‹ฑ่ชžใงใ‚‚ใƒ‰ใƒฉใƒžใ‚ˆใ่ฆ‹ใฆใ„ใ‚‹ใ€‚
    -> ๆ—ฅๆœฌ่ชžใงใ‚‚่‹ฑ่ชžใงใ‚‚ใƒ‰ใƒฉใƒžใ‚’ใ‚ˆใ่ฆ‹ใฆใ„ใพใ™ใ€‚
  • ใใฎใ†ใฎใฐใ‚“ใ”ใฏใ‚“ใฏใจใฆใ‚‚ใŠใ„ใ—ใ‹ใŸใงใ™ใ€‚
    -> ใใฎใ†ใฎใฐใ‚“ใ”ใฏใ‚“ใฏใจใฆใ‚‚ใŠใ„ใ—ใ‹ใฃใŸใงใ™ใ€‚
  • ใ‚ใŸใ—ใฎใ—ใ”ใจใฎใพใ€ใ‚ใŸใ„ใฏใใ†ใ ใ„ใชใ‘ใ—ใใ‚’ใฟใพใ—ใŸ
    -> ใ—ใ”ใจใฎใ‚ใ„ใพใซใใ‚Œใ„ใชใ‘ใ—ใใ‚’ใฟใพใ—ใŸใ€‚
  • ็Ÿฅใ‚‰ใชใ„ไบบใจใŠๅ–‹ใ‚ŠใฎใŒๆ€–ใ„ใงใ™ใ€‚
    -> ็Ÿฅใ‚‰ใชใ„ไบบใจใŠๅ–‹ใ‚Šใ™ใ‚‹ใฎใŒๆ€–ใ„ใงใ™ใ€‚
  • Duolingoใฏๆœฌใจใซๅ‡„ใ„ใ‚ขใƒ—ใƒชใงใ™ใ€‚
    -> Duolingoใฏๆœฌๅฝ“ใซๅ‡„ใ„ใ‚ขใƒ—ใƒชใงใ™ใ€‚
  • ไปŠๆœใฏใจใฆใ‚‚ๅฏ’ใ„๏ผๅ†ฌใฏใฏใ‚„ใๆฅใ‚‹ใ€‚
    -> ไปŠๆœใฏใจใฆใ‚‚ๅฏ’ใ„ใงใ™ใญ๏ผๅ†ฌใŒใ‚‚ใ†ใ™ใๆฅใใ†ใงใ™ใ€‚

I'm starting to think, ideally, there should be a custom solution for splitting a sentence into words for each language.

Splitting by spaces work well for English, but it won't work well for Japanese, for example.

I think splitting a Japanese sentence into individual characters works well enough for now though.

I changed a few lines of code, which split the strings by character. Having the option to split by words or character should be enough to cover most languages, I think.
textdif
Should I submit another pull request?

Sure, that sounds perfect.

I made my own version here, so I might pick yours or mine, or try merge them later on.

So one of the reasons I wanted to make my own version is because I wanted to learn how it works myself, and another reason is just because I wanted to make a video about it.

Anyway, I think it's good to have multiple solutions available to choose from :)

Solved this in my last video. I'll close this issue for now. https://youtu.be/4SP_AY7GGxw

Also thanks for your code, @justafrank. I got a lot of inspiration from your code for that video.