louis1001/c---

Basic features for a basic language

louis1001 opened this issue · 25 comments

What I think the language needs

  • Tag for start of program
  • Tag for end of program
  • Literal values (just numbers, strings and bool?)
  • Words for bools
  • Variable declaration (something like "let's say a is an int with value 10", but in aussie).
  • Operations +-*/ (could be symbols, maybe some words like "is this bigger?", but aussie)
  • Branching statements. If, else.
  • looping (while, for)
  • Early exit ("nevermind" or something).
  • Print statement

thinking control blocks should be using GT and LT < > little fucking boomerangs

Start of program tag - CARN
End of program tag - HOOROO
Try/catch - AAAH <> SHELLBERIGHT<>
IF else branching - YEHNAH<>NAHYEH<>
(Close alternative TIM<>TAM<> )

While loop - WALKABOUT( NOTBAD ) < Code >

variable assignment - SHAZZA x ISA “string”

  • BAZZA x ISA TRUEBLUE / FURPHY
  • DAZZA x ISA 10.0

early exit - FUCKINPIKER

General thoughts

(Note: this is pretty close to just a stream of consiouness; I have certainly played fast and lose with proper syntax defining conventions; sorry about that.)

I think the language should be:

  • case-insensitive and should ignore many kinds of punctuation, like dashes in places where they don't mean minus.
    this will allow australian-english-like expressiveness and open up the possibility of programs seeming like things people might actually say.
  • dynamically typed, except where types are made explicit by the developer.
    see below
  • no semicolons. New line = new line.

I don't think there should be a Print() function as such. Instead, I think that any logical segment of code that evaluates to a literal value should cause the output of that value. Therefore the Hello World program, in full consists of

"G'day Mate"

(If this isn't viable, the print statement might be "Give us")

Any output line that does not end with the string "Mate" (prior to punctuation) has a 1 in 20 chance of having the suffix ", Mate" added (before the punctuation). Thus 1+1 will usually print 2 on the screen, but will sometimes print 2, Mate.

Question: should we use indenting like Python or braces to scope?
I've used spacing in my conditionals below.

Literal values

Here's a crazy idea: what if any parts of the code that are not
(a) inside quotes (always a string literal)
(b) pulled out as defined keywords by the parser, or
(c) pulled out as variable names defined by the developer
are considered to be literals?
Then, if the value parses as an int or float or bool it is that, otherwise it is a string. (Hopefully avoiding javascript-ish oddness)

Note: The hello world program above could therefore have been written without quotes , BUT the keyword MATE would have been ignored:

G'day Mate
>> G'day Mate

(though it might at random append ", Mate" anyway...)
Update: "mate" will be reflected in output anyway, like punctuation.

Tag for start of program

BLIMEY

Tag for end of program

BAIL

(but is this necessary as such? How about End Of File = end of program, but BAIL means exit/return?)

assignment

(STRUTH) I RECON

e.g. I RECON x = 10

Variable declaration

(something like "let's say x is an int with value 10", but in aussie).
[assignment operator] YA WANT

I RECON YA WANT x that can count stuff
and it's 10

(see below on "and" and "it's")

Types

usually dynamicly typed but developer can specify with:

[assignment] YA WANT [variable name] [type specifier]

Where type specifier is one of these:

keywords type
Thingy object
that can count stuff int
that can count heaps of stuff bigint
that can meassure stuff float
that can say stuff string
that can check out stuff bool
that is an Esky List

i.e.

I RECON YA WANT x that can count stuff

means assign the type int to variable x, a.k.a. create varibale x of type int.

Esky's are special.

esky literals can be implied by [].

I recon MyBeers=["XXXX", "Fosters", "VB", "Tooheys", "Coopers"]

populates MyBeers with a list of the specified strings (possibly creating the variable MyBeers if needed)

I Recon ya want foo that is an esky that can count stuff

creates foo to be a list holding type int

I Recon ya want bar that is an esky

creates bar to be a list of untyped variables. I think each one could be potentially of different types...

Note that an element of an esky could easily be another esky, making multidimentional array-like structures.

Booleans

Any of these keywords should evaluate to logical TRUE: YEAH, for sure, fair dinkum, deffo, RIDGY DIDGE
Any of these keywords should evaluate to logical FALSE: NAH, rooted, wrong

Any series of boolean constants in a row with no operations means IGNORE ALL BUT THE LAST ONE:
Yeah, Nah = FALSE
for sure rooted = FALSE
Nah, Fair Dinkum = TRUE

NOT - operator that means NOT. Important to allow NOT WRONG = TRUE

Note: AND at the (effective) start of a line has a different meaning to the infix operator AND; see below. Otherwise AND, OR should work as expected.

Special keywords and synonyms

Crikey! is the comment starter. If it is inline, then the rest of the line is ignored. If it appears on a line by itself, then the following lines are ignored until the end comment keyword 'KN'OATH! is encountered.

  • Hey - always ignore this keyword
  • Mate - always ignore this keyword
  • Oh (pronounced as a cross between "aww" "arr" as an American might say it) - ignore this keyword

eg: "Oh, mate - hey I recon x is 5"
parses to the same as "I recon x = 5" (set variable x to value 5)

  • IT'S evaluates as IT IS (and then the next two rules are appplied)

  • IS evaluates the same as "="

  • IT evaluates the same as the previously referenced variable

  • AND at start of a line = first command from previous line
    thus

I RECON YA WANT x that can count stuff
AND IT'S 5

is the same as

I RECON YA WANT x that can count stuff
I RECON x = 5

Other Special keywords

constants

  • Slab: 24
  • Dozen: 12
  • Fortnight: 14 (days)

special prefix mathematical operations

  • 'Half a [value]' = divide value by 2 (e.g. "Half a dozen" = 6)
  • 'couple of [value]' = double the value (e.g. "Couple of slabs" = 48)
HEY MATE - YA RECON A COUPLE OF DOZEN IS A SLAB?
>> yeah, mate

Functions

Chuck some Dice (random)

The pseudo-random number generator keyword is chuck some dice.
(seeding of the generator is handled internally by the language)

chuck some dice between [number1] and [number2]
chuck some dice (1,2)

are equivalent, and return a value between the two numbers, inclusive. If both numbers are integers, an integer will be returned. If one or more of the numbers is a float, any float in the range may be returned.

chuck some dice to pick from someEsky
chuck some dice (SomeEsky)

are equivalent, and return one element from the given esky (i.e. list) at random.

(item) is in [Esky]

Does the list contain this item?

Operations +-*/ (could be symbols, maybe some words like "is this bigger?", but aussie)

TODO

Branching statements. If, else.

YA RECON [statement][is]? (a question - if or case/switch; same thing)
Conditionals are all done with the equavalent of a CASE/SWITCH statement, so a boolean IF has Yeah and Nah cases:

STRUTH I RECON x = 1
AND I RECON y = chuck some dice (1,2)
YA RECON x=y?
  Yeah: 
    "got one"
  Nah: 
    "missed one"

A non-boolean case/switch with multiple matches for one code branch, and an 'otherwise'/'default' (Blow Me Down) clause.

I recon Beers=["XXXX", "Fosters", "VB", "Tooheys", "Coopers"]
AND thisBeer = chuck some dice(Beers)    Crikey! chuck a dice on a list chooses a random value from that list

YA RECON thisBeer is?
  "Fosters", "VB":
    "Flamin' hell!"
  "Coopers":
    "You Beauty!"
  Blow me down:
    "Yeah, dunno that one."

looping (while, for)

TODO

Early exit ("nevermind" or something).

(could be Fuck Off)

Exceptions

  • Spit the dummy - throw exception
  • Spit the Dummy about [error message]
  • Throwing an exception results in an output of "Bloody hell! [error message]" and the program exits.

I rate that ^^^

sleep operator should be .winks(40) / Dreamtime(40)

I rate that ^^^

sleep operator should be .winks(40)

If regular languages use syntax closer to:

time.sleep(60)

Then this could also be:

hit the sack for a min

I love the boomerangs for scoping!

Also by @howsitcohendevelopment : PIKER for early exit, use of WALKABOUT and TIM/TAM are bonza, mate! 👍

Update: I have added a random function called chuck some dice, and some comment delimiters - Crikey! and 'KN'OATH! to my speil above, and fixed a little bit of formatting. I might fill in some of the other TODO sections a bit later today if I get the chance.

Specs part 2

Note: I have partially adopted the boomerangs for scope in this, though inconsistently. Nonetheless, here's what I've got:

strings

concatenation

String concatenation is achieved simply by juxtaposition. Whatever normally ignored stuff (whitespace, punctuation) is between the symbols is put between the concatenated strings.

I recon x is Aussie
I recon y is Oi!
x, x, x! y y y
>> Aussie, Aussie, Aussie! Oi! Oi! Oi!

(Note: Strings entered without quotes are trimmed of whitespace at each end)

If you want to concatenate strings with no spaces or punctuation between them, you need to use the keyword Rabbit-On, optionally with punctuation around it like added hyphens for readability.

I recon colour = "Red"
I recon whereabouts = "back"
I recon spider = bloody colour-rabbit-on-whereabouts

Watch out for the spider!
>> Watch out for the bloody Redback!

(A moment to inspect in detail what the hypothetical parser in my head has done here in the line I recon spider = bloody colour-rabbit-on-whereabouts:
it has:

  1. found the keyword(s) "I recon" and "rabbit-on" and the operator "=" (maybe with regex???)
  2. if there were any matches for numbers (including any negative signs and decimals), it would have tokenized them. In this case there weren't any.
  3. split apart the remaining items on the line at any whitespace or punctuation
    token list so far:
    [keyword: I Recon, punctuation:" ", "spider", punctuation:" ", operator:=, punctuation:" ", "bloody", punctuation:" ", "colour", punctuation:"-", keyword: rabbit-on, punctuation:"-", "whereabouts" ]
  4. searched through the remaining strings for variables defined so far in the code, finding "colour" and "whereabouts"
    [keyword: I Recon, punctuation:" ". "spider", punctuation:" ", operator:=, punctuation:" ", "bloody", punctuation:" ", variable:"colour", punctuation:"-", keyword: rabbit-on, punctuation:"-", variable: "whereabouts" ]
  5. Evaluation consists of
    1. parse keyword "I Recon", which expects the following subsequent tokens: (zero or more punctuation tokens), (new variable name), (zero or more punctuation tokens), (an "=" operator), (zero or more punctuation tokens), the right hand side (RHS).
    2. Evaluate the RHS:
    1. substituting the variable values on the RHS: ["bloody", punctuation:" ", "Red", punctuation:"-", keyword: rabbit-on, punctuation:"-", "back" ]
    2. evaluating keyword "rabbit-on", which concatenates the (string representation of) the items either side, ignoring the adjacent punctuation tokens: ["bloody", punctuation:" ", "Redback" ]
    3. concatenating the remainder: ["bloody Redback" ]
      3. create the variable and assign it
      )

escaping quotes

A Literal quote character is written as !"!.

I said !"!Bloody Hell!!"!
>> I said "Bloody Hell!"
I recon wayTheySayIt is "Americans say !"!Brisbane!"! as !"!Bryz-bane!"! for some reason"
Pronunciation issue: wayTheySayIt.
>> Pronunciation issue: Americans say "Brisbane" as "Bryz-bane" for some reason.

and if you actually wanted to write !"! it would look like !!"!!.

String constants

Bugger all

The keyword/constant Bugger All is defined as the empty string for convenience.
(An empty pair of quotes "" will also technically work, but that usage is frowned upon.)

(note: if used in a comparison with a number, the empty string evaluates as 0, or 0.0)

Snag

This is the current system's newline string.

"Blimey" snag "What a bludger!"
Output:
Blimey
What a bludger

looping

for loop: Walkabout

integers

([assignment]) [variable] is a walkabout from [startNumber] (down} to [endNumber]

variable begins at startNumber (usually an int but could theoretically be a float.. or anything you an increment/decrement and compare, i.e. int or float so far)
variable is incremented (for "to", or decremented, for "down to") by 1 at the end of each loop until it is greater than (for "to" or less than for "down to") endNumber.

I recon x is a walkabout from 7 to 12
<
  x
>

Output:
7
8
9
10
11
12
I recon y is a walkabout from 100 down to 0
<
  do ya recon y is in [0,5,10,25,50,75]?
    Yeah: Hit a watering hole at y - have a drink!
>
Beauty. Variable "y" is now y.

Output:
Hit a watering hole at 75 - have a drink!
Hit a watering hole at 50 - have a drink!
Hit a watering hole at 25 - have a drink!
Hit a watering hole at 10 - have a drink!
Hit a watering hole at 5 - have a drink, Mate!
Hit a watering hole at 0 - have a drink!
Beauty. Variable y is now -1.

(note: I made one of those outputs have the random ",Mate" suffix, just for fun)

{foreach} - traverse an esky (list)

([assignment]) [variable] is a walkabout (backwards) across [Esky]

the variable is set to each item in the list in turn. If "backwards" is used, it starts at the end index and works its way back to the start.
If the esky is empty, the loop executes zero times.

do/while loop: whack this til the cows come home

Whack this < statements... > till the cows come home or [condition]

I recon x = 5
I recon finished = Nah
Whack this
<
  I recon x = x+0.1
  I recon finished is x {is greater than} 6   Crikey! I haven't defined is greater than keywords yet!
>
Til the cows come home or finished

after the 6th iteration, the statement x {is greater than} 6 will become Yeah (true), and that value will be assigned to variable finished, and the loop will exit.

If the condition was already true at the start, no iterations would happen.

Try/Catch: Av a go (at)/bugger

(Implied exception message variable: the fuck up

I recon x is a walkabout from -5 to 5
<
  Av a go at
  <
    1 divided by x is 1/x
  >
  Bugger
  <
    ya recon the fuck up is "Division by zero error"?
      Yeah:
        Ya got Buckley's of dividing by zero, ya nong!
      Nah:
        Spit the dummy about the fuck up
  >
>

Output:
1 divided by -5 is -0.2
1 divided by -4 is -0.25
1 divided by -3 is -0.333333333
1 divided by -2 is -0.5
1 divided by -1 is -1
Ya got Buckley's of dividing by zero, ya nong!
1 divided by 1 is 1
1 divided by 2 is 0.5
1 divided by 3 is 0.33333333
1 divided by 4 is 0.25
1 divided by 5 is 0.2

Hahaha ya nong !!

Couple of other constants

  • Lobster = 20
  • Pineapple = 50
  • Buckleys = 0.001
  • Couple of= mostly returns 2 but sometimes 3+ (obv not a constant but that’s Aus for ya )

default case of a switch/case could be known as the BRADBURY case

Auto Increment: Up ya
Decrement : Knock it off

tbc

All good suggestions again @howsitcohendevelopment (though I need a little more convincing on the Bradbury I think)

Just realised something about the list of keywords to ignore: best to add them to the list of punctuation, I think, which is just mostly ignored, but still appears in string concatenation

We then could and should also add general English filler words like "the", "a", "also"... and swear words like "fuck(ing)" and, yes, because reddit expects it, "cunt".

A friend has pointed out that any reference to the word "Fucking" should properly be spelled "Fucken" in Australian. Righty-o, let's make it so.

I will also correct the spelling where I wrote "Till" above to "Til" at his suggestion (thanks Daniel)

Glad to see this project shaping up!

Should we begin talking about requirements/specifications like the base language we are going to build this in?

ive always wrote "fuckin",

Yeh its looking like its getting there major thanks to @MarkWhybird,

I'd like to suggest we code it in Rust / Go, I've been wanting to get my hands dirty with either.. zero experience mind you.. ive done some C about 6 years ago

I mostly use C# these days, but I don't mind trying something else... right now I'm just enjoying making up the specs :)

Yeah, this is looking good!

I still have to setup the readme.
Next step, I think, is use some of this specs to write some example programs.

The classics. Hello World, Fibonacci, a truth-machine... whatever you can think of. I'll make that a separate issue if any of you wants to work on that. And if you want to be collaborators, let me know.

the syntax looks fun. It's a little crazy. I'm specially wondering about making keywords consisting of many words, like I recon or til the cow comes home.

Maybe we'll have to polish some stuff based on what's possible with our parsing.

I'm down with using Go.
I do love rust, but for a fun project I feel like the troubles that come with it (compiler warnings everywhere) are a little too much.

I wrote up a whole way of doing both functions and inputs... but I really didn't like it, so I'm starting over. But I have more of an idea what I want to do now, so this second time should be quicker.

This bit is OK though:

Statements and scoping

Any place a statement can be written, it can be replaced with a series of statements in boomerang scope start/end indicators < >. If a returned type is expected, the effective output of the statements should be the desired result.

note: if it is simpler, we might just make all varibales in a given program global from the moment of creation? That would make the boomerangs just statement groupers rather than scopers, though. Just a thought.

so, the I RECON variable assignemnt clause is somewhat more correctly defined as

I RECON {variable name} = {statement}

so I recon x = 1 is identical to

I recon x =
<
  1
>

or identical in result to

I recon x =
<
  I recon y = half a pineapple    Crikey! Ya remember a pineapple is 50, right?
  y - a slab                      Crikey! ...and ya remember a slab is 24, so result is 1.
>

addition

I'm thinking x+y where one or more is a non-numeric type should be treating the + as punctuation rather than addition - so UNLIKE javascript, "1"+2 outputs string "1+2".

I think these work:

Inputs

GIVE US {variable name} [{type}] [by asking {prompt}]

(Note: I have editited the initial thoughts entry above on types. They use to read, for example "to count things"; I've changed them to e.g. "that can count things"; the wording works better across more situations, including this one.)

I've thought about this a lot, and I've made this compatible with the way paramaters are defined for functions, too... see Functions below.

basic

Give us x

gives an input cursor on screen for the user to type into. No particular prompt is displayed; it is up to the coder to have put something meaningful on the previous line.
When they press return, whatever the user has typed is populated into variable x. x is created if necessary. In this example, x is an untyped/dynamically typed variable, since no type was specified. As such, the input will be checked in exactly the same way as code is for type: if it regexes as an integer (possibly with a negative in front), then it is that. Next a float, then a bool and finally a string.

with prompt

Give us x by asking "How many do ya want? "

As above, but the prompt will look like this:

How many do ya want? ░

typed

Give us x that can count things

Only the relevant type of input (integer in this example) is allowed. For simplicity, I suggest we allow anything to be typed, and attempt to parse it. If parsing as the specified type fails, we can print Yeah, Nah - I can't make "{typedString}" into a thingy {type}. Give it another go. and prompt again.

typed with prompt

Of course, the two things demoed above can be combined:

Give us x that can count things by asking "How many do ya want? "

Casting types

{varName} as a thingy {type} attempts to cast {varname} to the given type.

Casting a float to an int rounds it to the nearest integer.

(not sure how viable this bit will be, but anyway)
Strings being cast to constants of specific type can use those constant names. e.g. "A Dozen" can be sucessfully cast to 12.
The reverse is also true: while 11 as a thingy that can say stuff evaluates to "11", 12 as a thingy that can say stuff evaluates as "a dozen"

Functions and function calls

Instructions on how to do named units of work are defined as the THE HARD YAKKA.

The hard yakka for {function name} [to get a thingy {type}] is
  [inputs]
  {statements}

examples:

The hard yakka for HitTheSackForASec is
  Sleep(1000)     Crikey! I've used a traditional sleep function as an example; not saying we have to implement sleep()

That would be called by just typing the name of the function - HitTheSackForASec - elsewhere in code.

Here's a function with input s of type 'that can measure stuff' (i.e. float) and a multi-line body:

The hard yakka for HitTheSackForAFewSecs is
  Give us howManySecs that can measure stuff
<
  I recon milliseconds is howManySecs * 1000 as a thingy that can count stuff   Crikey! multipy by 1000 and cast (round) to an integer
  Sleep(milliseconds)
>

Calling syntax with paramaters:
{functionName} (with|where) {paramName = {value} [paramName2 = {value}...]
(note though that punctuation such as commas is ignored as usual, so you can use them if you want, and IS becomes =, so that is also allowed)

e.g.

HitTheSackForAFewSecs where howManySecs = 10

Idea: if the function is called with not all paramaters, prompt the user in a similar way as above with a 'by asking' clause of the paramater name if no prompt supplied??? Thus, if the code has HitTheSackForAFewSecs but the howManySecs param is not supplied, the user gets the entry prompt howManySecs? ░.

Here's a function with two integer inputs and a float return value:

The hard yakka for average to get a thingy that can measure stuff is
  Give us firstInteger that can measure stuff
  and secondInteger that can measure stuff
<
  I recon theSum is firstInteger + secondInteger
  theSum/2.0
>

Call it like this:

the average where firstInteger is 3, secondInteger is 4
>> 3.5

Comparison

Greater than: Is a bigger bugger than
Less than Is a littler bugger than

AND

If AND is the first token in a line, it means 'the same token as the first one in the previous line'
if AND is inline:

  • if it is between two Booleans, it is a logical AND operation
  • else it is punctuation

Does that make sense?

Something to consider, if BLIMEY designates the start of a program, is everything that preceeds it a comment? Regardless of what you place?

Also, if you put a word after BLIMEY (like BLIMEY MATE) does that designate the program's name as MATE? Or does the language just ignore whatever follows on the line?

Also also, commas or whitespace?