martinjw/dbschemareader

Reading the same schema from PostgreSQL takes a lot longer than from MSSQL

Opened this issue · 4 comments

I was running my test suite and noticed that the PostgreSQL tests take A LOT longer than the MSSQL tests.
To verify this I wrote a short benchmark with BenchmarkDotNet and got these results:

MSSQL

Method Mean Error StdDev Median Gen0 Gen1 Allocated
ReadAll 765.5 ms 18.91 ms 54.85 ms 750.3 ms 2000.0000 1000.0000 14.47 MB
AllTables 456.9 ms 12.77 ms 37.45 ms 457.6 ms 2000.0000 1000.0000 13.98 MB

PostgreSQL

Method Mean Error StdDev Gen0 Gen1 Allocated
ReadAll 2.835 s 0.0536 s 0.0638 s 1000.0000 - 7.03 MB
AllTables 2.653 s 0.0485 s 0.0454 s - - 4.48 MB

As you can see both ReadAll and AllTables take a lot longer for PostgreSQL (3.7x and 5.8x respectively).
Knowing the performance of PostgreSQL this result is really surprising to me and seems rather strange.

Based on my profiling the culprits seem to be the PrimaryKeys (1.2s) and ForeignKeys (1s) methods of the PostgreSQLAdapter.

I took a quick look at the SQL used to load the constraints and think a faster solution to the problem would be to load all the constraints simultaneously instead of executing the same (very expensive) SQL multiple times. Every run of the constraints SQL takes 70-100ms, which results in the above numbers if applied to every single loaded table and constraint type. Loading all constraints at once on the other hand only takes ~200ms and results in a few hundred results that can then easily be filtered in C#.

ReadAll internally calls AllTables, which internally calls TableBuilder.

TableBuilder does seeral calls on constraints- for the different types (primary key, foreign key, unique, check, null). For all, it does all tables - so it shouldn't be doing anything table by table.

For PostgreSql, 3 are the constraints use the same query (primary, foreign, unique), so yes, that is repeated, but maximum 3 times irregardless of number of tables. It may be possible to optimize this in the postgreSqlAdaptor class down to one, which is cached and reused between the (consecutive) calls- although you wouldn't want that cache to remain in memory.

Do you have an idea, why PostgreSQL takes more than thrice the time of MSSQL? Ordinarily, one'd expect the opposite ...

I guess the metadata tables are not optimized - for example, no indexes on schema or table name, so every query is a full table scan. Also sometimes the metadata tables contain vast amounts of system data- Oracle is a bad offender here, it is by far the slowest to read.