willbryant/kitchen_sync

--set-to-variables "search_path=xyz" doesn't work in the recent version

Closed this issue · 9 comments

Syncing MySQL to PostgreSQL.

Build from https://github.com/willbryant/kitchen_sync/tree/7cc0fc391ca4e27c26e8babc3909cbcbc417fe26 creates tables in public schema regardless the the option specified:

Kitchen Sync
CREATE TABLE "public"."announcements_announcement" (
  "id" integer NOT NULL GENERATED BY DEFAULT AS IDENTITY,
  "content" character varying(1000) NOT NULL,
  "active" boolean NOT NULL,
  PRIMARY KEY("id"))
...

Old build from https://github.com/willbryant/kitchen_sync/tree/acd346f14f19ec933f378d0d8f2ea9d6ca13f83a creates tables in xyz schema properly:

Kitchen Sync
CREATE TABLE "announcements_announcement" (
  "id" integer NOT NULL GENERATED BY DEFAULT AS IDENTITY,
  "content" character varying(1000) NOT NULL,
  "active" boolean NOT NULL,
  PRIMARY KEY("id"))
...

I'd be happy to use the old version but need enhancement from 7cc0fc3 :)

Thanks!

Hmm yeah feature #64 has made this not work, as now KS knows about schemas and mysql only has one schema per database.

Could you tell me more about your use case?

Sure. There's a PostgreSQL warehouse that aggregates multiple data sources in one database. Each data source has multiple tables and exists in a separate schema. Having all sources in the same database makes possible queries that include data from multiple sources at the same time.

To sync data from various data sources (e.g. MySQL), I run several jobs:

from: mysql://mysql-server/database1
to: postgresql://postgresql-server/warehouse
from: mysql://mysql-server/database2
to: postgresql://postgresql-server/warehouse

Data from each source should land at separate schemas (e.g. database1 and database2) to avoid mixing tables from various sources (can be many) in the same schema.

Right, I see.

The root of the issue is that in #64 I treated the default schema (public in PostgreSQL's case) the same as having no schema.

I did this so that tables in the default schema in PostgreSQL would be represented the same way as tables in MySQL, and so that if/when I add support for other database servers that support multiple schema per database the default schema of each will be used. For example, Microsoft SQL Server uses dbo instead of public but you'd probably expect a SQL Server -> PostgreSQL sync to put tables from dbo into public.

There's a bunch of ways to resolve this:

  1. Revert explicitly writing out the public. prefix for postgresql tables that are in the default schema (which will alway be the case for tables coming from mysql). This would obviously fix your use case, but it does mean that if the default search_path does not start with public, people who aren't trying to do the same thing as you may be surprised that KS puts the tables into an unexpected database. This would particularly be annoying/surprising for people syncing PostgreSQL-to-PostgreSQL. (They could at least work around by explicitly setting search_path back to something starting with public once they figure out what has happened though.)
  2. Differentiate between the default schema for database servers that support multiple schemas and the default schema for database servers that don't, in order to keep the behavior for the PostgreSQL-to-PostgreSQL case. This seems really ugly.
  3. Add an option or URL parameter for the default schema to put tables into. This would make your use case possible without introducing any surprises for people not deliberately placing default tables into a non-default schema. I think I'd be keener on this option if there was a nice clean popular syntax for adding schema names to DbUrls, perhaps just /database/schema - the JDBC adapters and so on do horrible stuff like ?currentSchema=....
  4. Add an option for N:N schema-name mappings. This is significantly more complicated than the other options, but it would allow not just for your use case but also use cases like PostgreSQL-to-PostgreSQL syncing if the user is trying to move data into a different schema, and for people to disagree with my assumption about the dbo to public example above.

Yeah, (4) sounds like the most flexible option indeed. (3) should also work perfectly, however it doesn't seem that libpq has such an option.

Is there any quick workaround I can use today with the current codebase?

(3) needs code changes anyway, because right now the code writes out the public. prefix anyway which would override any default schema. But, I think this is probably the best option for me to pursue.

I can't think of a way to work around this with the current codebase, but if you are in a desperate hurry you can branch off v2.3 and cherry-pick 7cc0fc3 - it applies cleanly except for one of the test files.

I had to add schema_name to the Table struct in https://github.com/willbryant/kitchen_sync/blob/master/src/schema.h, but otherwise worked perfectly. Thanks!

@easeev could you try origin/schema and

ks --from mysql://mysql-server/database1 --to postgresql://postgresql-server/warehouse/database1
ks --from mysql://mysql-server/database2 --to postgresql://postgresql-server/warehouse/database2

@willbryant works perfectly, thanks!

Merged to master.