embulk/embulk-input-jdbc

date type between mssql and mysql not working.

Opened this issue · 13 comments

env

embulk v0.9.23
java openjdk version "1.8.0_252"

situation

(MSSQL)
nullable date type
->
(MYSQL)
nullable date type

issue

destination date columns are filled with NULL value.
even all the source columns have values.

in:
  type: sqlserver
  host: ***
  user: ***
  password: ***
  database: ***
  query: |-
    SELECT
      [sdate],
   ***
    FROM *** WITH (NOLOCK)
out:
  type: mysql
  host: ***
  user: ***
  password:***
  database: ***
  table: ***
  mode: merge_direct
  options: {useUnicode: true, characterEncoding: UTF-8}
  column_options:
    sdate: {type: date null}

Hello, @makkaba

  • Could you double-check whether sdata contains NULL value?
  • Does your table has primary or unique constraints?
  • Could you tell us more detail about the problem? (What kind of data did you insert and update?)

I tested merge_direct feature in the following environment.
It seems worked well.

  • OS: macOS 10.15
  • MySQL: 8.0.19
  • Embulk: embulk 0.9.23
  • embulk-output-mysql: 0.8.7
in:
  type: inline
  schema:
    - { name: id, type: long }
    - { name: string_column, type: string }
    - { name: double_column, type: double }
    - { name: boolean_column, type: boolean }
    - { name: sdate, type: timestamp }
  data:

    # First data insertion.
    - { id: 1, string_column: test, double_column: 0.1, boolean_column: true, sdate: '2016-06-14 15:19:05' }
    - { id: 2, string_column: test, double_column: 0.1, boolean_column: true, sdate: '2016-06-14 15:19:05' }
    - { id: 3 }

    # 2nd data insertion
    #- { id: 1, string_column: test, double_column: 0.1, boolean_column: true, sdate: '2016-06-14 15:19:05' }
    #- { id: 2, string_column: test, double_column: 0.1, boolean_column: true }
    #- { id: 3, string_column: test, double_column: 0.1, boolean_column: true, sdate: '2016-06-14 15:19:05' }
#out: {type: stdout}
out:
  type: mysql
  host: localhost
  user: user
  password: password
  database: embulk_test
  table: merge_direct_test
  mode: merge_direct
  options: {useUnicode: true, characterEncoding: UTF-8}
  create_table_constraint: 'primary key(id)'
  column_options:
    purchase: {type: date null}

first insert results.

show fields from merge_direct_test;
+----------------+------------+------+-----+---------+-------+
| Field          | Type       | Null | Key | Default | Extra |
+----------------+------------+------+-----+---------+-------+
| id             | bigint     | NO   | PRI | NULL    |       |
| string_column  | text       | YES  |     | NULL    |       |
| double_column  | double     | YES  |     | NULL    |       |
| boolean_column | tinyint(1) | YES  |     | NULL    |       |
| sdate          | timestamp  | YES  |     | NULL    |       |
+----------------+------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
select * from merge_direct_test;
+----+---------------+---------------+----------------+---------------------+
| id | string_column | double_column | boolean_column | sdate               |
+----+---------------+---------------+----------------+---------------------+
|  1 | test          |           0.1 |              1 | 2016-06-14 15:19:05 |
|  2 | test          |           0.1 |              1 | 2016-06-14 15:19:05 |
|  3 | NULL          |          NULL |           NULL | NULL                |
+----+---------------+---------------+----------------+---------------------+
3 rows in set (0.00 sec)

2nd data insert.

  • I expect update the following
    • id 3 sdata: NULL -> '2016-06-14 15:19:05'
    • id 2 sdata: 2016-06-14 15:19:05 -> NULL
mysql> select * from merge_direct_test;
+----+---------------+---------------+----------------+---------------------+
| id | string_column | double_column | boolean_column | sdate               |
+----+---------------+---------------+----------------+---------------------+
|  1 | test          |           0.1 |              1 | 2016-06-14 15:19:05 |
|  2 | test          |           0.1 |              1 | NULL                |
|  3 | test          |           0.1 |              1 | 2016-06-14 15:19:05 |
+----+---------------+---------------+----------------+---------------------+
3 rows in set (0.00 sec)

@hiroyuki-sato
Thank you for testing!
but something is different.
in my case, the type is not timestamp.
yyyy-MM-dd style"date" type.

Hello, @makkaba

It worked too. The following test used date column.

test script

First insert

After insert

mysql> show fields from merge_direct_test;
+----------------+------------+------+-----+---------+-------+
| Field          | Type       | Null | Key | Default | Extra |
+----------------+------------+------+-----+---------+-------+
| id             | bigint     | NO   | PRI | NULL    |       |
| string_column  | text       | YES  |     | NULL    |       |
| double_column  | double     | YES  |     | NULL    |       |
| boolean_column | tinyint(1) | YES  |     | NULL    |       |
| sdate          | date       | YES  |     | NULL    |       |
+----------------+------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
mysql> select * from merge_direct_test;
+----+---------------+---------------+----------------+------------+
| id | string_column | double_column | boolean_column | sdate      |
+----+---------------+---------------+----------------+------------+
|  1 | test          |           0.1 |              1 | 2016-06-14 |
|  2 | test          |           0.1 |              1 | 2016-06-14 |
|  3 | NULL          |          NULL |           NULL | NULL       |
+----+---------------+---------------+----------------+------------+
3 rows in set (0.00 sec)

2nd insert

After update.

I expect to update the following

  • id 2 sdata: 2016-06-14 -> NULL
  • id 3 sdata: NULL -> 2016-06-14
mysql> select * from merge_direct_test;
+----+---------------+---------------+----------------+------------+
| id | string_column | double_column | boolean_column | sdate      |
+----+---------------+---------------+----------------+------------+
|  1 | test          |           0.1 |              1 | 2016-06-14 |
|  2 | test          |           0.1 |              1 | NULL       |
|  3 | test          |           0.1 |              1 | 2016-06-14 |
+----+---------------+---------------+----------------+------------+
3 rows in set (0.00 sec)

@hiroyuki-sato
thank you for research.

I have realized something new.

In amazon linux ubuntu, it works properly. (it means date value is inserted)

But in WSL(Window Subsystem Linux) Ubuntu in my local window machine,
all the date values are filled with NULL.

It can be WSL problem itself. or some environment problem
ex) timezone ..

Is there any factor which affects to convert date type?

Hello, @makkaba

Did you check consistency in both environments?
Does WSL environment have primary or unique consistency?

Have you ever executed my test script in your WSL environment?
If so, could you tell me the result?

@hiroyuki-sato
Consistency in both environments are not confirmed.
i assumed that it causes problem.

and this is testing in WSL below.

test1

i have tested your test script in WSL environment.
(timestamp -> date)

result

sdate field is filled with NULL value.

test2

chaging the type of sdate with 'date'

in:
  type: inline
  schema:
    - { name: id, type: long }
    - { name: string_column, type: string }
    - { name: double_column, type: double }
    - { name: boolean_column, type: boolean }
    - { name: sdate, type: date }
  data:

    # First data insertion.
    - { id: 1, string_column: test, double_column: 0.1, boolean_column: true, sdate: '2016-06-14' }
    - { id: 2, string_column: test, double_column: 0.1, boolean_column: true, sdate: '2016-06-14' }
    - { id: 3 }

result

org.embulk.exec.PartialExecutionException: org.jruby.exceptions.RaiseException: (RuntimeError) Unknown type :date
        at org.embulk.exec.BulkLoader$LoaderState.buildPartialExecuteException(BulkLoader.java:340)
        at org.embulk.exec.BulkLoader.doRun(BulkLoader.java:566)
        at org.embulk.exec.BulkLoader.access$000(BulkLoader.java:35)
        at org.embulk.exec.BulkLoader$1.run(BulkLoader.java:353)
        at org.embulk.exec.BulkLoader$1.run(BulkLoader.java:350)
        at org.embulk.spi.Exec.doWith(Exec.java:22)
        at org.embulk.exec.BulkLoader.run(BulkLoader.java:350)
        at org.embulk.EmbulkEmbed.run(EmbulkEmbed.java:242)
        at org.embulk.EmbulkRunner.runInternal(EmbulkRunner.java:291)
        at org.embulk.EmbulkRunner.run(EmbulkRunner.java:155)
        at org.embulk.cli.EmbulkRun.runSubcommand(EmbulkRun.java:431)
        at org.embulk.cli.EmbulkRun.run(EmbulkRun.java:90)
        at org.embulk.cli.Main.main(Main.java:64)
        Suppressed: java.lang.NullPointerException
                at org.embulk.exec.BulkLoader.doCleanup(BulkLoader.java:463)
                at org.embulk.exec.BulkLoader$3.run(BulkLoader.java:397)
                at org.embulk.exec.BulkLoader$3.run(BulkLoader.java:394)
                at org.embulk.spi.Exec.doWith(Exec.java:22)
                at org.embulk.exec.BulkLoader.cleanup(BulkLoader.java:394)
                at org.embulk.EmbulkEmbed.run(EmbulkEmbed.java:245)
                ... 5 more
Caused by: org.jruby.exceptions.RaiseException: (RuntimeError) Unknown type :date
        at RUBY.block in initialize(uri:classloader:/gems/embulk-0.9.23-java/lib/embulk/schema.rb:46)
        at org.jruby.RubyArray.each(org/jruby/RubyArray.java:1735)
        at RUBY.initialize(uri:classloader:/gems/embulk-0.9.23-java/lib/embulk/schema.rb:20)
        at RUBY.block in transaction(uri:classloader:/gems/embulk-0.9.23-java/lib/embulk/input_plugin.rb:60)
        at RUBY.resume(/home/ubuntu/.embulk/lib/gems/gems/embulk-input-inline-0.1.0/lib/embulk/input/inline.rb:21)
        at RUBY.transaction(/home/ubuntu/.embulk/lib/gems/gems/embulk-input-inline-0.1.0/lib/embulk/input/inline.rb:17)
        at RUBY.transaction(uri:classloader:/gems/embulk-0.9.23-java/lib/embulk/input_plugin.rb:58)

Error: org.jruby.exceptions.RaiseException: (RuntimeError) Unknown type :date

Hello, @makkaba

embulk-input-inline doesn't support date type.
Embulk doesn't support date type. You need to change that part to timestamp.

You need to use date type in embulk-output-mysql side only.

You can check the input side with embulk preview.

Hello, @makkaba
It may be a case-sensitive problem of column names.
If the actual column name is SDATE, embulk-output-mysql will distinguish it from the column sdate, and values in the sdate column will not inserted into the SDATE column.

@hiroyuki-sato
Thank you for the investigation!

hi @hito4t
unfortunately, this is not a case-sensitive problem.
tested with same yaml file, the results were different.
and i have tried only with lowercase "sdate"

in sqlserver : "sdate"
in mysql: "sdate"
in yaml file: "sdate"

@makkaba
Thank you for your reply.

I'd like to know the actual column names on MySQL.
Would you show the result of desc target_table on MySQL client?

ex)

mysql> desc merge_test;
+--------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+------------+------+-----+---------+-------+
| ID | int(11) | NO | PRI | 0 | |
| VALUE1 | varchar(8) | YES | | NULL | |
| VALUE2 | varchar(8) | YES | | NULL | |
+--------+------------+------+-----+---------+-------+
3 rows in set (0.03 sec)

mysql>

Hello, @makkaba

The above report is the embulk-input-inline exception. If embulk preview works well, execute embulk run again and paste an exception.

And also would you show the result of show variables like '%lower%'; on MySQL client?

mysql> show variables like '%lower%';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| lower_case_file_system | OFF   |
| lower_case_table_names | 0     |
+------------------------+-------+
2 rows in set (0.01 sec)

i appreciate for your investigation.
@hiroyuki-sato
@hito4t

actual column name is
"sdate" in mysql.
(and also "sdate" in sqlserver. )


DESC target_tableXXXX;

sdate | date | YES | MUL |   |  
-- | -- | -- | -- | -- | --
timezone | varchar(128) | NO |   |   |  
order | int(128) | NO |   |   |  

show variables like '%lower%';

lower_case_file_system | OFF
-- | --
lower_case_table_names | 0

i assumed that it is problem of WSL environment variables or something.

Hello, @makkaba Thank you for your reply.

  • Does target_tableXXXX table have the primary key?
  • Could you also paste embulk run execution?