Zirak/SO-ChatBot

Handling unformatted code

Closed this issue · 44 comments

Opening this up for discussion because I don't think it would be unreasonable for Cap to detect unformatted code in the chats and:
a) warn the user
b) migrate it to the bin and warn the user
c) migrate to the bin and mock the user
d) migrate to the bin and post a formatted version on behalf of the user

There should also be a throttle on it, because often even regulars will post, edit quickly, ctrl + k, send again. maybe 10 seconds?

These are all just random ideas. What does everyone think?

If you fin a reliable way to detect unformatted code, I'm in favor of trashing it and warning the user. We can do the mocking ourselves.

I agree with @honnza. I don't have many ideas for how you'd reliably detect unformatted code, though

towc commented

maybe a command that roomowners could use to format other people's code automatically? Now that we have the clipboard accessible, we could add an extension that allows everyone to copy the ID of every message by clicking a button on that message. So the ROs can call the command, and copy paste the ID without too much hastle, and the bot'll do the rest. No need for automatic recognition which could fail, this way

{
howto: Letter frequency, especially special chars vs. alnum

action: warn (actually notify and teach about Ctrl+K) later if message is still unformatted bin it
}

Come up with specific rules for what defines "unformatted code" and it's possible to convert the rules into a regex. After a short chat conversation, it's hinted that "Both { and } occuring in multiline messages means the message needs work." Please discourse on more criterias.

ralt commented

I know that php codersniffer detects code in comments, so it's definitely possible.

Right now I don't think we need a formal solution, this is meant to be a discussion whether or not this feature would be useful. We can as a group discuss the criteria for message migration, _after_ we decide if it is useful or wanted.

can codersniffer be reverse engineered?

ralt commented

I'd just eval and check for syntax error...

"definitely" for "useful"

@ralt how would you eval css? :P

towc commented

@awalgarg this is the JS room, nobody who doesn't know what he's doing should post css

I support this idea because it's not like there could be any harm arising from helpful guidances leading to formatting code. Feel free to request ownership of my code dump room if you'd like to help moderate it.

sure we want leave javascript code with syntax errors unformatted? Also, code in other languages? -1 on eval

mhmm, what about code by newbies having syntax errors which they need help in fixing? @towc

towc commented

@rlemon I am totally for it. Just one addition: have the bot remind the user how to format code

better yet, redirect them to jsfiddle

towc commented

@awalgarg that's why I suggest having ROs do the validations. Not to automate the process, just to make it a lot easier for them

In fact, all we need is a bot command to trash chat posts

This is really a very broad topic. Can we break it down into smaller issues?

  • detect javascript only code which should be binned and posted on jsfiddle instead for a start?

My suggestion is to start with a bin command. Auto-trigger can come next.

@awalgarg this entire discussion is about whether it is a good idea. not the implementation of said idea. Everyone just took it a step further.

It is a good idea. I doubt anyone disagrees that

Yeah, as honnza said, we all agree it is a good idea since we have a known problem which needs a solution. Being programmers, we naturally started looking for an implementation ;)

I'm mostly waiting for the sleepy heads to wake up and chime in. Zirak, otherBotRunners, etc.

Detection sounds simple for 90% of cases.

Detect $(", function() <div or .controller(" and trash those to a "please post formatted code" room. I can write a simple ML that'd detect code more reliably or we can use an existing library but honestly it's super overkill.

Zirak commented

Sounds good. The thought passed my head a few times, but the genie always told me detecting the 100% was too difficult.

90% is good enough. I'll dig through the transcript and try to come up with something.

Zirak commented

After some fooling around, here's what I came up with:

function isUnformattedCode (text) {
    var lines = text.split('\n');
    if (lines.length < 4) {
        return false;
    }

    var codeyLine = /^\}$|\}$|^<\//;
    return lines.some(/ /.test.bind(codeyLine));
}

Searched the transcript for !!format, ran that against all messages in the time block, saw that it agreed with them and caught some more. Most importantly, miraculously it's yet to provide me a false positive, tested against today's and yesterday's chat history.

Methinks the algo should look something like this:

  • Ignore if user is an owner/mod
  • Bin and teach <2k users
  • Teach >=2k users

By "teach" I mean a message like "Please don't post unformatted code - use Ctrl+K before sending (hit up to edit messages). See the FAQ [faq link]".

If the user sent a long message (>10 lines), it'll also have a "or use a paste service like [links]".

Thoughts?

Neat. The rules sound good to me. When will the maid be implementing this?
On May 27, 2015 12:37 AM, "Zirak" notifications@github.com wrote:

After some fooling around, here's what I came up with:

function isUnformattedCode (text) {
var lines = text.split('\n');
if (lines.length < 4) {
return false;
}

var codeyLine = /^\}$|\}$|^<\//;
return lines.some(/ /.test.bind(codeyLine));

}

Searched the transcript for !!format, ran that against all messages in
the time block, saw that it agreed with them and caught some more. Most
importantly, miraculously it's yet to provide me a false positive, tested
against today's and yesterday's chat history.

Methinks the algo should look something like this:

  • Ignore if user is an owner/mod
  • Bin and teach <2k users
  • Teach >=2k users

By "teach" I mean a message like "Please don't post unformatted code - use
Ctrl+K before sending (hit up to edit messages). See the FAQ [faq link]".

If the user sent a long message (>10 lines), it'll also have a "or use a
paste service like [links]".

Thoughts?


Reply to this email directly or view it on GitHub
#238 (comment).

I know I'm late but I'd just like to chime in with my opinion:

Don't do the bin/teach cutoff at 2K, that's ridiculous. It needs to be much much lower, I have ~1.3K rep and I'm a very knowledgeable person.

I had typed out why we shouldn't do this at all (seriously guys, binning unformatted code automatically why do we even need room owners these days just make Caprica automatically kick people too) but I'm going to let it go and suggest a sensible "smart user" rep level.

Zirak commented

@AmaanC I'll be home this weekend, will try and take a stab at it.

@Jhawins

Don't do the bin/teach cutoff at 2K

Not set in stone, we can take it back to 1k (which is also /welcome's lower threshold), but most regulars do have more than 2k.

why do we even need room owners these [if we have features like these]

"That's a room owner's job" isn't a reason to not implement this. Lacking this task, you won't find our room owners bored; we're room owners, not people who hunt down unformatted messages and lecture users on the basic etiquette of chat.

Binning unformatted messages and correcting people is one of the menial things you have to do to maintain a normal conversation. Why not automate it? It's a mechanical process, there's nearly no thought behind it, it's repetitive, and it's annoying. I don't do it as much as I used to because of these reasons.

just make Caprica automatically kick people too

I'd love to. Boy oh boy would I love to. Imagine not having to deal with help vampires. Imagine not having to deal with spammers or bigots. Wouldn't it be great? Wouldn't it be awesome if some automatic process took care of the mindless things, and left the more serious stuff to us?

image

SO Magic!

I appreciate you replying to everything but you know I have no defense lol.

Rep level though I needs adjusting still. We are a "rep != knowledge"
community so 2K rep is not a decent "teachable user" level.
On May 26, 2015 4:48 PM, "Zirak" notifications@github.com wrote:

@AmaanC https://github.com/AmaanC I'll be home this weekend, will try
and take a stab at it.

@Jhawins https://github.com/Jhawins

Don't do the bin/teach cutoff at 2K
Not set in stone, we can take it back to 1k (which is also /welcome's
lower threshold), but most regulars do have more than 2k.

why do we even need room owners these [if we have features like these]
"That's a room owner's job" isn't a reason to not implement this. Lacking
this task, you won't find our room owners do without things to do in the
long room; we're room owners, not people who hunt down unformatted messages
and lecture users on the basic etiquette of chat.

Binning unformatted messages and correcting people is one of the menial
things you have to do to maintain a normal conversation. Why not
automate it? It's a mechanic process, there's nearly no thought behind it,
it's repetitive, and it's annoying. I don't do it as much as I used to
because of these reasons.

just make Caprica automatically kick people too
I'd love to. Boy oh boy would I love to. Imagine not having to deal with
help vampires. Imagine not having to deal with spammers or bigots. Wouldn't
it be great? Wouldn't it be awesome if some automatic process took care of
the mindless things, and left the more serious stuff to us?


Reply to this email directly or view it on GitHub
#238 (comment).

Zirak commented

@awalgarg Sadly (AFAICT) that's serverside SO magic.

@Jhawins

Rep level though I needs adjusting still.

Sure, what do you think will be better? 1k as in /welcome?

I say remove the rep limit fully and implement a throttle. I have X seconds to edit the message and format it before the bot bitches at me.

Zirak commented

That'll be in there anyway.

Maybe too heavy or unsupported, but might be relevant: https://github.com/tj/node-language-classifier
It uses the deprecated classifier internally.
I didn't check how it handles unsupported languages.

+1 for Zirak

Zirak commented

@gtomitsuka That seems to assume the input is a programming language, when we want to determine whether it is one. Obviously the dumb regexp above won't match a slew of languages, but it seems to get that 90%.

Zirak commented

Timeout is 10 seconds, messages will be binned to Trash Can, rep threshold is 2k (due to lack of better suggestion)

@Zirak, the limit should be three lines and not four. Sorry I don't feel like this needs a new issue (considering how fresh the feature is). re-open if you agree, otherwise I'll start a new issue.

(otherwise, good work, you made us proud, etc)

Zirak commented

Let's give this a couple more days and revisit if it gives too many false negatives?

@Zirak this is no longer working. Should we reopen this or start a new issue?

http://chat.stackoverflow.com/transcript/message/24729564#24729564

Bot sees every line as a new message.

http://i.stack.imgur.com/l9BIE.png

not sure if that is expected or not.