Netflix/aegisthus

Timestamp clustering keys lose precision

aslotnick opened this issue · 3 comments

First off, thanks for this project as I'm making great use of it to bring in C* data to our warehouse.

I'm running into an issue with timestamp clustering keys and was hoping someone could point me to the right part of the code base to investigate it further.

Say I have the following table:

CREATE TABLE t (
col1 int,
col2 timestamp,
col3 varchar,
PRIMARY KEY ((col1), col2);

I'm passing this DDL into Aegisthus using the "-D aegisthus.cql_schema" option. In the JSON output, the formatting of col2 is being truncated to the minute. So if the source value was "2016-09-21 12:48:05" the output would look like "2016-09-21 12\:48Z". This becomes an issue when you have two rows with identical primary keys except for the timestamps occurring at different seconds -- now the output looks like it has duplicate primary keys due to the loss of timestamp precision.

It's possible to work around this issue by changing the DDL to use "bigint" instead of "timestamp", which leads me to believe this is simply an output formatting issue.

Any idea where this might be happening or how to fix? My first guess was https://github.com/Netflix/aegisthus/blob/master/aegisthus-hadoop/src/main/java/com/netflix/aegisthus/output/SSTableOutputFormat.java#L104 but not sure about that.

Thank you.

@aslotnick Interesting. I would think this is a problem with https://github.com/Netflix/aegisthus/blob/master/aegisthus-hadoop/src/main/java/com/netflix/aegisthus/output/JsonOutputFormat.java when you are looking at the JSON output. The SSTableOutput shouldn't change the format at all. As soon as I can get a free moment I will test this out if you don't find the issue before then.

@aslotnick I would start looking here: https://github.com/Netflix/aegisthus/blob/master/aegisthus-hadoop/src/main/java/com/netflix/aegisthus/output/JsonOutputFormat.java#L151

In the JSON format that is where the output is being transformed from the Cassandra atom into a string.

Closing all issues since the project is archived.