npgsql/EntityFramework6.Npgsql

Switch from position() to strpos() for citext support

charliefoxtwo opened this issue ยท 6 comments

Tested with Postgres 9.6.17 and EntityFramework6.Npgsql 6.3.0

The query users.Where(u => u.FullName.Contains("aaron")); will generate the SQL "Extent1"."fullName" LIKE E'%aaron%', which functions as expected when fullName is citext.

However, let's say you're looking for any users matching one of many possible names. You might write your query like this

var searchTerms = new[] { "aaron", "andrew" };
users.Where(u => searchTerms.Any(t => u.FullName.Contains(t)));

Now, the generated query looks like this

EXISTS(SELECT 1 AS "C1"
     FROM ((SELECT CAST(E'aaron' AS varchar) AS "C1" FROM (SELECT 1 AS "C") AS "SingleRowTable1")
           UNION ALL
           (SELECT CAST(E'andrew' AS varchar) AS "C1"
            FROM (SELECT 1 AS "C") AS "SingleRowTable2")) AS "UnionAll1"
     WHERE position("UnionAll1"."C1" in "Extent1"."fullName") > 0);

Note the switch to using the position function. If you have a user whose fullName is "Aaron", this query will not return that user, even though fullName is citext.

While the postgres documentation would indicate that position(a in b) is just syntactic sugar for strpos(b, a), it appears that the query parser converts position(a in b) to position(b, a), and as far as I can tell citext doesn't expose a citext overload for position(b, a). This can be done by manually creating a function, however one wouldn't expect to have to do this in order to get citext working with Npgsql.

My proposed solution would be that Npgsql switch from using position(a in b) to strpos(b, a) since it has the proper overloads created when installing citext.

Emill commented

Hi. Maybe send a bug report to postgresql that position(a in b) is not equal to strpos(b, a) in the case of citext? Anyway if it is as you say it should be easy to change to using strpos in Npgsql. PR?

I opened a bug with postgres, and the conclusion seems to be that it's not a bug but intended behavior. They're correct in that citext doesn't claim to overload the position function.

It appears this repo doesn't currently have any unit tests built around citext at all, so a PR for this may take a bit of time. I looked into it the other day and am in the process of sorting out how to create a citext column in the code-first model that the unit tests in this repo use, since the database needs to be created in order to install the citext extension but you can't create the database if it has a citext column without having the citext extension installed.

roji commented

I opened a bug with postgres, and the conclusion seems to be that it's not a bug but intended behavior. They're correct in that citext doesn't claim to overload the position function.

Thanks for opening that bug. FWIW I tend to agree with what they're saying there... It's not really feasible to duplicate all text-processing functions for citext, and the new non-deterministic collations will likely obsolete citext and will not have this problem (although they're still limited in some ways).

It appears this repo doesn't currently have any unit tests built around citext at all, so a PR for this may take a bit of time. I looked into it the other day and am in the process of sorting out how to create a citext column in the code-first model that the unit tests in this repo use, since the database needs to be created in order to install the citext extension but you can't create the database if it has a citext column without having the citext extension installed.

I think you may have missed CitextqueryTest - it should be pretty straightforward to add a test there (you already have a model there with a citext column).

roji commented

Apologies, I got confused and thought we were in EF Core and not in EF6. It's true that we're lacking a comprehensive test suite for EF6, including with regards to citext.

I'd be OK with merging this without a test though, as the change really is quite minimal and safe - let me know how you want to proceed.

I'll go ahead and take care of it sans unit tests. It might be worth bringing some parity between the EF Core and EF6 tests - maybe I'll take a look at that separately as well.

roji commented

Thanks for your contribution.

Re tests, EF Core has a very comprehensive test suite (well over 10k tests) specifically meant for different database providers to use; this makes it very easy to test that a given provider works correctly. EF6 never had this - as a provider you're on your own, which is one reason why the test coverage here is so lacking.

Note also that EF6 in general is "archived" to a large extent - no new development is going to take place on it. The same is true of the PostgreSQL provider - all innovation and effort is going into EF Core. We definitely still accept bugfixes and minor features, but I wouldn't invest time in new test infrastructure etc...