General thoughts/suggestions on NLTK content

Question

General thoughts/suggestions on NLTK content

Opened this issue 9 years ago · 1 comments

A few comments based on experience/reflection from day 1 of ResBaz - feel free to discuss, discard, or modify as appropriate...

when introducing significant whitespace, should we use the concept of a 'code block' eg see http://en.wikipedia.org/wiki/Block_%28programming%29
when talking about lists the exercise/demo uses a list sent4 or sent7 that was defined elsewhere and invisible to the learner. Would it be better to have the user create two new lists from scratch, and then see how you can join and manipulate them?
some examples use the python print and others just type the variable name and have the interpreter display the contents. We should be consistent and (i suggest) always use print variable - this way learner get used to the idea of using lots of print statements to look inside their variables, a useful debugging skill.
defining variables challenge 1 - some people tried to solve the more generic problem, and write a function that will recognise a ; in (any) array and store/print anything before it - but were stumped because of lack of skills/practice yet - perhaps reword this example so it's clear we 'know' what is in the array and you want us to count and slice given a known content.
similar to the last point - the fidst challenge that follows, some folks tried to write a general function that will take four corpora as input and compare (which proved to be difficult with current practice/skills/knowledge) - but I think the aim is just to write one function that will take one input text and return the top 15 most common words.
occasionally an example in the explanation/notes is one that you wouldn't want the user to actually type because the output is too large - eg sorted(whole_corpora) or array[8:] type things - check that all examples are "runnable" if the user tries them on the loaded texts.
the python construct [len(w) for w in text] etc - I think it might better to write these out longform - especially as learners haven't been introduced to for loops yet etc.
the challenge to write code that will find all the words in a text that are more than seven letters long and occur more than seven times - this requires the use of and conditional which wasn't introduced earlier...
variable name - sometimes we use w for "word" and sometimes we use word - be consistent. Perhaps the longer form is easier for learners to follow than w

Answer 1 · 2015-02-17T02:42:12.000Z

On 16 February 2015 at 20:27, Cameron McLean notifications@github.com
wrote:

A few comments based on experience/reflection from day 1 of ResBaz - feel
free to discuss, discard, or modify as appropriate...

when introducing significant whitespace, should we use the concept
of a 'code block' eg see
http://en.wikipedia.org/wiki/Block_%28programming%29

when talking about lists the exercise/demo uses a list sent4 or sent7
that was defined elsewhere and invisible to the learner. Would it be better
to have the user create two new lists from scratch, and then see how you
can join and manipulate them?

I think repeat the lines declaring the variables, but commented out, as a
reminder that they had been declared previously.

some examples use the python print and others just type the variable
name and have the interpreter display the concepts. We should be consistent
and (i suggest) always use print variable - this way learner get used
to the idea of using lost of print statements to look inside their
variables, a useful debugging skill.

You are right, I was removing the print statements in the day-1 because I
am a lazy programmer. But it's probably better to put them in as you say.

defining variables challenge 1 - some people tried to solve the more
generic problem, and write a function that will recognise a ; in the
array and store/print anything before it - but were stumped because of lack
of skills/practice yet - perhaps reword this example so it's clear we
'know' what is in the array and you want us to count and slice given a
known content.

similar to the last point - the fidst challenge that follows, some
folks tried to write a general function that will take four corpora as
input and compare (which proved to be difficult with current
practice/skills/knowledge) - but I think the aim is just to write one
function that will take one input text and return the top 15 most common
words.

occasionally an example in the explanation/notes is one that you
wouldn't want the user to actually type because the output is too large -
eg sorted(whole_corpora) or array[8:] type things - check that all
examples are "runnable" if the user tries them on the loaded texts.

the python construct [len(w) for w in text] etc - I think it might
better to write these out longform - especially as learners haven't been
introduced to for loops yet etc.

That's fair. I suggested to Daniel that the most idiomatic python be used,
but that isn't always the easiest to understand.

the challenge to write code that will find all the words in a text
that are more than seven letters long and occur more than seven times -
this requires the use of and conditional which wasn't introduced
earlier...

Frame it as the intro to the conditional?

variable name - sometimes we use w for "word" and sometimes we use
word - be consistent. Perhaps the longer form is easier for learners
to follow than w

Agreed.

cheers
L.

"This is a profound psychological violence here. How can one even begin to
speak of dignity in labor when one secretly feels one's job should not
exist?"

On the Phenomenon of Bullshit Jobs, David Graeber
http://strikemag.org/bullshit-jobs/