mongodb/specifications

Connection string compatibility issues

vitaly-t opened this issue · 7 comments

I have been asked a number of times about compatibility between general-purpose connection-string, and MongoDB implementation, which I have compiled into a small page - Compatibility, highlighting 2 known issues.

I believe those 2 issues were an oversight when the original spec was written here. I don't know if that will ever be addressed, but at least no harm pointing out, in case another major update for the connection is due.

Hi @vitaly-t

I am aware of some differences between the connection string specification and URI RFCs but not the ones mentioned in the page you linked.

Invalid use of + within schema/protocol

https://tools.ietf.org/html/rfc3986#section-3.1 states that + is allowed in scheme, can you clarify what you think the incompatibility is?

When abbreviated to just a simple string, like some-name, MongoDB assumes it to be the database name.

https://tools.ietf.org/html/rfc3986#section-3 in my reading specifies that both scheme and hier-part are required, hence it seems to me that "some-name" is not a valid URI.

@p-mongo Thank you for responding so quickly!

The first question from you, about the use of + has just sent me on the old goose chase, which I did long time ago, across multiple versions of the spec, browsers and servers implementations. You see, the use of + is perhaps the most questionable thing, as it changed over the years, across different servers, browsers and even specs. There was an older version of the spec, which I'm trying to dig out again, that only stated that + should be treated the same space inside URI, and nothing else. That changed in the latest spec. That legacy inconsistency still exists in some servers, and even browsers.

I guess, you could say that avoiding + is simply a safe bet. But I'm gonna have to get back to you on this separately.

The second part - database name, support of an abbreviated connection string is an optional extension on any library, it is not really spec-ed anywhere, it is simply implemented in the way that avoids any syntax ambiguity. You can find many examples here.

And almost everywhere in old implementations one would always support something like localhost or localhost:1234 to be just the host detail, never a database name; pg-connection-string is just one such parser example.

As shown in the example, if you want just the database name, the shortest form should be /dbname. Also, in my library, since it supports fully optional syntax, it will also allow the following forms ///dbname, :///dbname, schema:///dbname.

Before we get too far into the details, can you please clarify the goal of this issue?

The connection string spec imposes some requirements that are incompatible with RFC 3986 (the current URI spec). Some of these requirements are necessary for MongoDB, such as support of multiple host addresses. Some requirements can potentially be removed, for example if I am remembering correctly the database parsing requirements result in an implementation that reports errors in a potentially unexpected manner in some scenarios, and there is the requirement for slash being present which I believe doesn't exist in RFC 3986.

I personally am of the opinion that we could change the connection string spec to be aligned with RFC 3986 in these aspects that don't affect the principal requirements of the connection string (specifying multiple host addresses and known option handling).

With that said, it is my understanding that you have created a library that parses MongoDB connection strings but also has its own functionality that is not covered by the connection string specification. If your library has behavior that diverges from connection string specification, this does not by itself create an issue with the connection string specification. If you would like to document the behavior differences, the appropriate location for that seems to me to be your library's documentation.

Before we get too far into the details, can you please clarify the goal of this issue?

To get to the truth, by raising awareness, isn't it always the purpose? 😄

Some of these requirements are necessary for MongoDB, such as support of multiple host addresses

I disagree, those things are extensions on the standard, as they do not create any conflict when parsing based on the standard.

it is my understanding that you have created a library that parses MongoDB connection strings but also has its own functionality that is not covered by the connection string specification. If your library has behavior that diverges from connection string specification

My library is general-purpose, it was not created for MongoDB, but for everything, from the classic in-browser URI usage, to general server authentication, to all database drivers, etc.

And again, let's not call it diverges when talking about extensions on the standard that do not conflict with the standard. It's different 😉


Anyhow, I think i'm gonna stay corrected on the account of +, and make the corresponding changes in my own library instead. Thanks for that (Спасибо)! 😄

But I would stand by the view on the use of the short-hand syntax for the database name, to be /dbname. If you think about it, the host details always come first, and that's also why it precedes within the URI syntax, so it is all very logical. Also consider the short syntax for just host + database name is host/dbname , then it becomes more clear that a simple string has to be the host.

I believe all MongoDB drivers I have used provide a way to create a client by specifying the configuration in a manner idiomatic for the programming language being used. For example in Ruby you could do:

client = Mongo::Client.new(['localhost'])

A Ruby developer looking at this code knows what "localhost" means and what the line would do.

But, the behavior of passing a single seed like that is not standardized across drivers. In Python for example the behavior would be different from Ruby.

We have standardized the connection string and URI options in order for the following line to produce the same behavior across all drivers (4.4-compatible ones in this case):

client = Mongo::Client.new("mongodb://localhost/?directConnection=false")

Part of the reason for why connection string requires the scheme is that it is unambiguous, clear and obvious when a connection string is given, and thus the user of the driver can assume spec-compliant behavior. For this reason I do not expect a proposal to interpret "localhost" or "/database" as valid connection strings to get much support.

If you believe there is a valid use case for such a proposal, I suggest submitting it to https://feedback.mongodb.com/.

But, the behavior of passing a single seed like that is not standardized across drivers

It's worse, a bit of a forest, or used to be, which was the very reason I wanted to create something that would make general sense, using the best practices, and that's what my connection-string basically is.

There are lots of projects on GitHub today that use my library to access MongoDB, so I want it be compatible, without breaking the standard, as to the question again of why I opened this issue. Usually it is about accessing multiple database types in a generic way, like in this example.

client = Mongo::Client.new("mongodb://localhost/?directConnection=false")

Well, in my library the result would be the exact same. Here we do not have any database detail.

Part of the reason for why connection string requires the scheme is that it is unambiguous, clear and obvious when a connection string is given

Only if you want to be an idealist. But when trying to unify a wild forest of various implementations under one roof, compromises have to be made. And the short-handed syntax is actually a very useful one.

So, I hope you will reconsider, and make the change, to show respect to many other connection string parsers out there that treat a simple string as host, the same way as connection-string does 😉

@p-mongo I ended up amending my library, to be precisely as per the latest URI spec - see v4.0.0 release notes.

As for the second part, about open string being the host and not the database, it is up to you, of course, if you want to make it consistent with how other parsers work.

I also deleted the original article, as it is no longer valid.

I think it is ok to close the issue, if you want.