toml-lang/toml

`nil` or `null` values

benolee opened this issue ยท 26 comments

(moved discussion from #11)

It seems like nil or null values must be allowed. For example,

# is this equivalent to {"foo":null,"bar":{"baz":null}} or {} ?
[foo]
[bar.baz]

in this case, it seems like it would make sense to be able to set them with the normal key = value syntax. Here are some alternatives for thought:

key = nil
key = null
key = # empty value ala bash

you don't need nil or null, just leave out that assignment.

Yeah, I'm not convinced of the usefulness of nil. TOML is intended for configuration, at which point @aaronblohowiak is right: just leave it out. I'm open to use cases and further convincing, though.

no further arguments here. I think it might be important to implementers to know if an empty "key group" should result in a key with no value (ie. nil or null depending on the language) or no key at all

@benolee Ah, good question. I'm going to say it should be an empty hash table.

Yeah, I'm not convinced of the usefulness of nil. TOML is intended for configuration, at which point @aaronblohowiak is right: just leave it out. I'm open to use cases and further convincing, though.

It's possible I'm using TOML for the wrong thing here, but I was going to use TOML as a bridge for query languages in my CLInvoice crate since I already have toml as a dependency for parsing user configurations, and it is a dead-simple markup language which I believe users would be able to learn without much trouble.

The reason for this is that CLInvoice is designed to be able to handle any permanent storage facility, whether or not it actually has a Structured Query Language of its own. Because of this, I needed to create a unified query 'language' based on the model and what operations made sense for it. Writing an adapter for CLInvoice explicitly provides support for this query 'language'.

Querying in CLInvoice is built on the backbone of the Match type, which can accept a list of types for HasNone, HasAny, or HasAll operations. Some types give Match values which may be None. For example, querying an InvoiceDate requires specifying an Option<chrono::DateTIme<chrono::Local>> for its paid field. If I were using toml, that means that TOML would have to be able to accept a list containing None/Nil and/or a concrete date, like so:

# rest of `InvoiceDate` query left out for simplicity's sake

[paid]
condition = 'HasAny'
value = [
  2020-04-01T03:00:00Z,
  None
]

The above would be quivalent to the English statement "match InvoiceDates that are either unpaid or were paid on 2020/04/01 at 3:00 UTC."

Obviously, for Match operations such as EqualTo that only accept one value, just leaving the value out is good enough to imply a None. But in list types there isn't a good way to specify a None in a given position.

Right now I'm thinking of switching to YAML I've switched to YAML, but YAML has some of its own issues (such as its fear of tabs and embedded types using way too much indentation). Accepting None in TOML would be very handy for the odd case such as this!


I had considered nom but writing a DSL for this project seems like it would lead to less ROI than serializing / deserializing a model + helpful errors in this case.

@Iron-E Although there won't be a None or Nil added to TOML (as far as I can see), you do have an option to use within the TOML syntax that would fit the bill. If you would never need to represent a hashmap value (and few relational database columns in this world store whole hashmaps), you could use an empty inline table to express a NULL value in your value list. For example, the sample you provided could be changed to look like this:

[paid]
condition = 'HasAny'
value = [
  2020-04-01T03:00:00Z,
  {}  # This represents NULL in the value list.
]

It's not a literal null, but it would do the trick. You could also use any value that isn't a datetime, like false, if you'd rather have something more lightweight than a table here.

In any case, you would need to handle non-datetime values gracefully, but you would need to do that with any hypothetical NULL anyway.

Yeah, I'm not convinced of the usefulness of nil. TOML is intended for configuration, at which point @aaronblohowiak is right: just leave it out. I'm open to use cases and further convincing, though.

Here's a use case: You want to read config from a TOML file, interpreting values as defaults, but you want environment variables with the same name to be able to override the defaults. An empty field thus indicates to your code that it must look in the environment, and it indicates to the user that an environment variable must be set.

[config]
my_db_host = '127.0.0.1'
my_db_user = 'user'
my_db_pass

@danhje Based on what you wrote, this use case has no real "defaults," a.k.a. values that are used in the absence of all other settings. Everything is set by environment variables first and foremost, followed by the settings in the TOML configuration file. Any missing setting must certainly lead to an error.

What this use case needs is just documentation. No value, not even an explicit null, would indicate that my_db_pass must be assigned by an environment variable. Worse, users may consider an explicit null to be a legitimate value for a password. An explicit null is equivalent to a missing setting, so why use an explicit null? In any case, you must explain your intention for password assignment, which is what comments are for. Or external documentation, if you don't want configuration comments.

Here's a pattern for this use case. This configuration would come with the installation for the users to fill out. All equivalent environment variable settings appear next to the configuration setting.

[config]
# Environment variable settings override the values here.

# Database host (Env: MY_DB_HOST)
my_db_host = '127.0.0.1'

# Database user account (Env: MY_DB_USER)
my_db_user = 'user'

# Database password cannot be set here.
# Required Env: MY_DB_PASS

Nobody reads documentation, and config comments are ugly. But more importantly, in my use case itโ€™s not just about signaling to the user what variables are expected, I also want access to โ€œemptyโ€ variables in code.

Consider how docker-compose interprets empty variables to mean that the variable should be mirrored from the hostโ€™s environment. In that case, leaving out the variable or using a comment isnโ€™t an option. Docker composes uses yaml, and my understand is that leaving the value out really just results in an empty string, not a null, which I suppose is fine for my use case.

Hereโ€™s my use case, in a little bit more detail:

I want to create a variable / secret managing library for Python. The library is meant to be used for app development in large teams, where itโ€™s difficult for each developer to keep track of all the environment variables that have to be set in order for the code to work. I want the users of my library to be able to centrally manage all these variables in a config file that could either be included or excluded from version control. I want the library to be able to give a friendly warning to a developer if a variable doesnโ€™t have a default and isnโ€™t found in the environment. So if you as a developer pulls down some commit where a fellow developer, unbeknownst to you, have introduced new variables that need to be set, youโ€™ll find out about it right away rather than when the app fails unexpectedly, possible with a not so helpful error.

When working in interactive mode, I also want tab completion to present you with all variables from the config file, both set and unset.

I could force the users of my library to list expected variables in code rather than a config file, including default values, but this breaks the separation of config and code. In a project with hundreds of code files it also makes it harder to track down those expected variables, and itโ€™s hard to enforce a central location for them.

If thereโ€™s a clever solution I havenโ€™t thought of, Iโ€™d love to hear about it. But I think Iโ€™ll just use yaml instead. Which is a shame, since I was hoping to allow using pyproject.toml.

What about arrays such as

key = [1, 2, 3, null, 5]

I also find it a little weird that

key = [1, 2, 3, , 5]

is parsed just fine, ignoring the extra ",". This also parses fine

key = [1, 2, 3, "", 5]

@albertotb if that second example parses OK in some implementation you're using you should file a bug report because it should not

@albertotb if that second example parses OK in some implementation you're using you should file a bug report because it should not

It seems it was fixed in the latest Python implementation (0.10.2)

Nobody reads documentation, and config comments are ugly. But more importantly, in my use case itโ€™s not just about signaling to the user what variables are expected, I also want access to โ€œemptyโ€ variables in code.

Can't you just use config.get("value") which will automatically fall back to None? Or does your use case require differentiating between missing and null values?

Question regarding this: Instead of allowing null/nil/none, can it be specified what parsers "should" do by default if they see a null/nil/none value anyway? Ie. I think the standard should recommend to simply omit them from the serialized toml document, or throw a type error, or to use a magic stringified value (id hope not), or something else (maybe putting an empty commented line with the key but no value?).

This is relevant since libraries are making different decisions on this. E.g. Fatal1ty/mashumaro#85 or samuelcolvin/rtoml#23. The latter is interesting - it claims to be fully compliant and pass all the toml tests - but apparently if stringifies null values, somehow (wrongly I assume) indicating that this is the right thing to do.

@jonaslb TOML parsers will never see a null value, since those don't exist in TOML files. What you mean is a TOML serializer/writer.

About giving advice for them on how to represent types that don't map cleanly to a TOML type: I'm a bit skeptical about this since it might vary a lot on the use case. In the general case, "throw an error" is probably indeed the best course of action. But there may be applications, where, say, calling a to_dict() method on objects that have it and then serializing the result as a TOML table is entirely appropriate.

So I think the general rule is: when writing a serializer, document how it handles unexpected types.

And for TOML users: the best course of action is certainly not to pass any unexpected objects to your TOML writer in the first place. But if you want/need to do so anyway, make sure that it handles them in a way you consider appropriate.

TOML is intended for configuration, at which point @aaronblohowiak is right: just leave it out. I'm open to use cases and further convincing, though.

Here's a use case: Layered configuration with global default config (read from, say, /etc/config/my-app.toml) and user settings that override/complement the defaults (read from, say, /home/user/.config/my-app/config.toml). In this scenario, it's currently impossible for the user to unset a default value. null would allow this.

marzer commented

@salim-b

The snippet you've quoted:

I'm open to use cases and further convincing, though.

Was written over ten years ago. There has been considerable deliberation on this point in the intervening years (including people giving examples exactly like yours), and sentiment has coalesced pretty firmly around "nulls are bad, actually" (see discussions in #146, #802, #803, #921, #975).

levkk commented

There is one good use case for nulls in TOML configuration files: sane defaults. Bear with me here.

Imagine that you have a setting like connect_timeout in your software that configures how long your application should wait before giving up on connecting to a server. Super important setting because servers go down all the time, doesn't mean your app should too. If you're distributing this app, you'd want to help your users by setting it to a value that's reasonable to use in production, e.g. 30 seconds. So you get the following definition:

#[derive(Serialize, Deserialize)]
pub struct Config {
    #[serde(default = "Config::default_config_timeout")]
    connect_timeout: u64,
}

impl Config {
    fn default_config_timeout() -> u64 {
        1000 * 30 // 30 seconds in milliseconds
    }
}

Everything is great and right in the world. If your users want to set it higher or lower, they can just:

connect_timeout = 1000

and everyone is happy.

But what if your users don't want a connect timeout? Their network is slow, they know it and they are in no rush, and why would they want to throw errors to their users when they know things will take a while? Their option is to either set it to a super large value like 1 year in milliseconds, which...well, works in practice, until you actually want to wait 1 year for something and Christmas day comes and your on-call gets a nasty page about an error they have never seen before, or for your software to support weird values like -1 which then require additional documentation and changing the obviously unsigned integer to a signed one just to store a negative number for one use case.

But what if we could set it to null instead?

connect_timeout = null

means there is no connect timeout and the app should wait forever, as desired by the user. Nulls are valid values in databases, software code and life in general: they mean there is nothing here, and that's how we like it.

marzer commented

or for your software to support weird values like -1 which then require additional documentation

Would it, though? Your software would need exactly the same amount of documentation regardless of what you chose for a sentinel, be it -1, null, nil, 0, or whatever else you can imagine. In all cases it's a single value that has special meaning, and would require exactly the same kind of verbiage. null isn't somehow special in this regard.

levkk commented

I think my main concern is using incorrect types, e.g. i64 can store an order of magnitude less values in it just so I can store a -1. Also someone could set it to -500 and the compiler wouldn't complain. We would have to validate it with logic. Meanwhile, a Duration::from_millis(config.connect_timeout) is validated by the compiler.

marzer commented

We would have to validate it with logic. Meanwhile, a Duration::from_millis(Meanwhile, a Duration::from_millis(config.connect_timeout) is validated by the compiler.

You will always need runtime logic, with or without nulls. TOML data is heterogeneous so you can't somehow get compile-time validation without doing type-based logic on lookups first. You need to explicitly specify the type of config.connect_timeout yourself somewhere, which means you need to check that it's a match etc.

levkk commented

You need to explicitly specify the type of config.connect_timeout yourself somewhere, which means you need to check that it's a match etc.

Serde will take care of that. By forcing me to change the data type I need to make sure that the value is valid, but before, any value was valid... if deaerialization is successful that is. So by forcing me to change the data type, I need to write more error-prone code.

Whether nulls belong in the TOML spec or not I think is a question of taste to be honest. It's hard for me to know what's the right decision here since your points are valid as well and having explicit nulls in a config file looks weird. That being said, null is a valid value for a data type so excluding it from the spec is not driven by correctness but probably by ergonomics and taste which are fine choices to make but nonetheless force the user to do something the TOML way instead of the optimal way.

marzer commented

Yeh, indeed it is a matter of taste. I'd like to clarify something though:

That being said, null is a valid value for a data type

No, it isn't. It's not in the spec, so it's not a valid value. It existing conceptually, or being in other languages, doesn't confer validity in TOML. The canonical way to express (something like) 'null' in TOML is to omit a value, so you still have that option.

instead of the optimal way.

What the 'optimal' way is happens to be a matter of taste too, FYI. IMO the most 'optimal' thing is what requires the least expression in the TOML config file itself - hard to beat "omit this KVP entirely" there.

I do recognize that the lack of null makes interop with other languages a decent bit harder in many cases, but I think it's also important to acknowledge that TOML is a config language first-and-foremost - any serialization concerns are for implementers to worry about, not users. Implementers can always jump through an extra hoop via a helper function (or similar), which is great if it keeps the language simpler for users.

The canonical way to express (something like) 'null' in TOML is to omit a value, so you still have that option.

That is a flawed concept by itself which doesn't work for the described use case (user overrides global default; default is not absent).

What the 'optimal' way is happens to be a matter of taste too, FYI. IMO the most 'optimal' thing is what requires the least expression in the TOML config file itself - hard to beat "omit this KVP entirely" there.

"omit this KVP entirely" is not a universally applicable way to express "undefinition" in TOML, so cannot be a good general best practice recommendation. If there were a null value in TOML, you would still be free to "omit this KVP entirely" instead of explcitily setting it to null for simple use cases. So I don't really see the damage null would bring to the TOML language. My opinion.

I'm sorry but I'm going to say that we're not revisiting this design choice at this point and an extended discussion about the consequences of that choice is something that I'd prefer folks have on a new discussion over on https://github.com/toml-lang/toml/discussions instead of in an issue that was closed a decade ago.

People in the 21st century are really still arguing against the concept of zero. "It's not useful to say you have None of something." Lmao.