Data Structure in influxdb and timescaledb are different
Opened this issue · 7 comments
In my influx measurement, the timestamp tag data type is timestamp and that in timescale it is timestamp with timezone.
Also , there us one more field which is of type string seperated by commas and I want to convert it into an array in timescale.
Please help me in which file should I make the required chanages and how to run the code for the same.
Hello @komal-lunkad. Yes Outflux exports the time column from InfluxDB as timestamptz
type. Currently Outflux doesn't allow you to influence the schema much during migration or define your custom transformations. But you can do the following.
- Run the migrate command and transfer all your data. (Let's say it creates a hyperatable
a
with atime
column that's timestamptz, and a TEXT columnb
with values likea,b,c
- Modify the time column in TimescaleDB
ALTER TABLE a ALTER COLUMN time TYPE timestamp;
- Create a new column of type array, and populate it with the
string_to_array
function built in PostgreSQL:ALTER TABLE a ADD COLUMN b_arr text[]; UPDATE a SET b_arr = string_to_array(b, ',');
Thank you for the answer @blagojts.
Currently, we already have the table defined on production with the fields holding the above specified data types. So could you please help with how can we make through the code like if we can change any file or so in the code?
You should take a look at the Transformer
interface and the jsoncombiner
implementation https://github.com/timescale/outflux/tree/develop/internal/transformation
You can implement your own transformer that will run between extraction from InfluxDB and ingestion in TimescaleDB. The Prepare
method returns the desired data set definition after the data goes through the transformer. The Start method is what transforms the actual rows.
Then you just need to invoke it when the migration pipes are created
depending on some config flagsYou might add support for array data types, and that's a bit more tricky (not complex, just have to modify multiple files). Start with https://github.com/timescale/outflux/blob/develop/internal/idrf/data_type.go
IDRF stands for intermediate data representation format and its what Outflux uses to keep track of the data types and which columns are present in a data set. After you add the array data type in the enum you might need to update all functions referencing it
@blagojts We have tables in our influxdb with the default timestamp and want to migrate to timescaldb to a table with timestamptz. You said that outflux exports the time column as timestamptz, but that's not what we got when we migrated, we got timestamp.
We have other tables in timescaledb already in timestamptz so we are trying to find out how to best integrate the two to do operations mixing them.
In the code of outflux you do distinguish between timestamp and timestamptz, how can we tell the migration to use timestamptz?
Following up @polmonso comment, we got this error message:
existing table in target db is not compatible with required. Required column time of type IDRFTimestamp is not compatible with existing type Timestamptz
bump?
I ran into the same issue. I was able to resolve it by letting outflux create the schema, and then alter the time column type afterwards using:
ALTER TABLE temperatures ALTER COLUMN time TYPE TIMESTAMP WITH TIME ZONE USING time AT TIME ZONE 'UTC';