IntervalShardingAlgorithm performance is too bad
Ahoo-Wang opened this issue · 10 comments
Feature Request
When I was about to use IntervalShardingAlgorithm
to integrate CosId, I checked the source code and found the following problems:
- Ease of use: The
IntervalShardingAlgorithm
implementation is to first convert to a string and then convert toLocalDateTime
, the conversion success rate is affected by the time formatting characters - Performance: The method of nested traversal checking determines whether the conditions are met, and it is accompanied by the conversion process of
LocalDateTime
. The performance problem is fatal. The performance ofPreciseShardingValue
is even lower than7000 ops/s
(it is lower than the storage layerMySql
, ShardingSphere-JDBC becomes the bottleneck, which is obviously unbearable)
Code implementation of benchmark report
I even doubt whether there is a problem with the way the benchmark report is implemented,But I have tried my best to eliminate the test noise
gradle cosid-shardingsphere:jmh
# JMH version: 1.29
# VM version: JDK 11.0.13, OpenJDK 64-Bit Server VM, 11.0.13+8-LTS
# VM options: -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/work/CosId/cosid-shardingsphere/build/tmp/jmh -Duser.country=CN -Duser.language=zh -Duser.variant
# Blackhole mode: full + dont-inline hint
# Warmup: 1 iterations, 10 s each
# Measurement: 1 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
Benchmark Mode Cnt Score Error Units
IntervalShardingAlgorithmBenchmark.cosid_precise_local_date_time thrpt 66276995.822 ops/s
IntervalShardingAlgorithmBenchmark.cosid_precise_timestamp thrpt 24841952.001 ops/s
IntervalShardingAlgorithmBenchmark.cosid_range_local_date_time thrpt 3344013.803 ops/s
IntervalShardingAlgorithmBenchmark.cosid_range_timestamp thrpt 2846453.298 ops/s
IntervalShardingAlgorithmBenchmark.office_precise_timestamp thrpt 6286.861 ops/s
IntervalShardingAlgorithmBenchmark.office_range_timestamp thrpt 2302.986 ops/s
So I re-implemented the time range sharding algorithm based on time interval to improve the ease of use and performance.
https://github.com/Ahoo-Wang/CosId/releases/tag/v1.4.5
- DateIntervalShardingAlgorithm
- type: COSID_INTERVAL_DATE
- LocalDateTimeIntervalShardingAlgorithm
- type: COSID_INTERVAL_LDT
- TimestampIntervalShardingAlgorithm
- type: COSID_INTERVAL_TS
- TimestampOfSecondIntervalShardingAlgorithm
- type: COSID_INTERVAL_TS_SECOND
- SnowflakeIntervalShardingAlgorithm
- type: COSID_INTERVAL_SNOWFLAKE
Is your feature request related to a problem?
NO
Describe the feature you would like.
If you think this implementation is good, I can submit a PR
Thank you for the feature request and COSID project, this is a good open source project.
I just need more information before we do it.
- What is the dependencies of COSID, how to handle if guava's version conflict with ShardingSphere?
- Is the COSID must be number type in database column, how about date or varchar type?
- Is it passible to merge
COSID_INTERVAL_DATE
,COSID_INTERVAL_LDT
,COSID_INTERVAL_TS
andCOSID_INTERVAL_TS_SECOND
together? How about use properties key to distinguish them? - What is the usage of COSID_INTERVAL_SNOWFLAKE, why re-implement SNOWFLAKE again?
Thank you very much for your approval and reply.
- CosId-Core has no dependencies(I can remove Guava dependency Or keep the guava version consistent with ShardingSphere),But in order to solve the SnowflakeId
machineId
allocation problem, CosId-Redis needs to depend on Redis(io.lettuce:lettuce-core
). For the id segment mode, the ID segment distribution problem, you need to depend on Jdbc(java.sql.*
) or Redis(io.lettuce:lettuce-core
). - Yes,The distributed ID provided by CosId only supports returning
long
(Because we will not choose Date or varchar as the primary key). https://github.com/Ahoo-Wang/CosId/blob/main/cosid-core/src/main/java/me/ahoo/cosid/IdGenerator.java - Yes,I can merge COSID_INTERVAL_DATE, COSID_INTERVAL_LDT , COSID_INTERVAL_TS and COSID_INTERVAL_TS_SECOND together.We don’t need to distinguish between them, I will parse it by checking the shard value type.https://github.com/Ahoo-Wang/CosId/releases/tag/v1.4.6
- CosId implements two types of distributed Id: SnowflakeId and SegmentId.
But we know that only the algorithm provided by SnowflakeId is completely insufficient. For example, ShardingSphere did not consider the issue of machineId allocation when implementing SnowflakeId (ShardingSphere provides manual allocation, but this is in flexible deployment The process is shown to be inefficient), and CosId provides MachineIdDistributor to solve this problem. Of course there are other features.
Here is a more detailed introduction, such as the optimization of the number segment mode- We know the partitioning method of SnowflakeId, SnowflakeId can parse out the timestamp, that is, SnowflakeId can be used as time, so SnowflakeId can be used as an INTERVAL allocation algorithm. (When there is no CreateTime available shards [this is a very extreme situation], or when there is a very extreme requirement for performance, the distributed ID primary key as the query range may be a better choice for the performance of the persistence layer.)
Thank you very much for your approval and reply.
- CosId-Core has no dependencies(I can remove Guava dependency Or keep the guava version consistent with ShardingSphere),But in order to solve the SnowflakeId
machineId
allocation problem, CosId-Redis needs to depend on Redis(io.lettuce:lettuce-core
). For the id segment mode, the ID segment distribution problem, you need to depend on Jdbc(java.sql.*
) or Redis(io.lettuce:lettuce-core
).
Redis is not reg-center component of ShardingSphere, could you consider about use ZooKeeper or Etcd as cluster mode
of ShardingSphere? Maybe we need to integrate with ShardingSphere deeply.
- Yes,The distributed ID provided by CosId only supports returning
long
(Because we will not choose Date or varchar as the primary key). https://github.com/Ahoo-Wang/CosId/blob/main/cosid-core/src/main/java/me/ahoo/cosid/IdGenerator.java
OK
- Yes,I can merge COSID_INTERVAL_DATE, COSID_INTERVAL_LDT , COSID_INTERVAL_TS and COSID_INTERVAL_TS_SECOND together.We don’t need to distinguish between them, I will parse it by checking the shard value type.https://github.com/Ahoo-Wang/CosId/releases/tag/v1.4.6
How to process use real timestamp as business column value, just use original interval timestamp?
- CosId implements two types of distributed Id: SnowflakeId and SegmentId.
But we know that only the algorithm provided by SnowflakeId is completely insufficient. For example, ShardingSphere did not consider the issue of machineId allocation when implementing SnowflakeId (ShardingSphere provides manual allocation, but this is in flexible deployment The process is shown to be inefficient), and CosId provides MachineIdDistributor to solve this problem. Of course there are other features.
Here is a more detailed introduction, such as the optimization of the number segment mode
If COSID's snowflake algorithm is good enough, ShardingSphere can use it to instead of the original one, but it is better to keep type of algorithm as SNOWFLAKE
, it is fine if add new types of COSID_SEGMENT
and COSID_SEGMENT_CHAIN`.
- We know the partitioning method of SnowflakeId, SnowflakeId can parse out the timestamp, that is, SnowflakeId can be used as time, so SnowflakeId can be used as an INTERVAL allocation algorithm. (When there is no CreateTime available shards [this is a very extreme situation], or when there is a very extreme requirement for performance, the distributed ID primary key as the query range may be a better choice for the performance of the persistence layer.)
Yes. totally agree.
Because of original snowflake algorithm had implemented by ShardingSphere already, so it is better to keep name consist, for new key generators and sharding algorithms, we can introduce brand of COSID
.
The summaries are:
- Add new 2 key generators:
COSID_SEGMENT
and COSID_SEGMENT_CHAIN, and update original
SNOWFLAKE` - Add sharding algorithms with
SNOWFLAKE
, andCOSID_TIME_INTERVAL
. - Integrate
COSID_SEGMENT
andCOSID_SEGMENT_CHAIN
with ShardingSphere deeply, just reuse reg-center ofcluster mode
.
- Add new 2 key generators: COSID_SEGMENT and COSID_SEGMENT_CHAIN, and update original SNOWFLAKE`
CosId provides a unified interface IdGeneratorProvider
without the need to specify the specific implementation algorithm is SnowflakeId
, SegmentId
or IdSegmentChain
. That is, the user does not need to specify TYPE as any specific algorithm. It may be better to define it as COSID
. Pass in the parameter id-name
(Properties
) to get the specific algorithm from IdGeneratorProvider
.
If COSID's snowflake algorithm is good enough, ShardingSphere can use it to instead of the original one, but it is better to keep type of algorithm as
SNOWFLAKE
.(and update original SNOWFLAKE`)
OK.
- Add sharding algorithms with SNOWFLAKE, and COSID_TIME_INTERVAL.
OK.
- Integrate COSID_SEGMENT and COSID_SEGMENT_CHAIN with ShardingSphere deeply, just reuse reg-center of cluster mode.
Redis is not reg-center component of ShardingSphere, could you consider about use ZooKeeper or Etcd as cluster mode of ShardingSphere? Maybe we need to integrate with ShardingSphere deeply.
OK, I will consider using Zookeeper to implement MachineIdDistributor
of SnowflakeId
and IdSegmentDistributor
of SegmentId
.
How to process use real timestamp as business column value, just use original interval timestamp?
I’m not quite sure what you mean by real timestamp as business column value and original interval timestamp, what is the difference, could you elaborate more?
CosId provides a unified interface IdGeneratorProvider without the need to specify the specific implementation algorithm is SnowflakeId, SegmentId or IdSegmentChain. That is, the user does not need to specify TYPE as any specific algorithm. It may be better to define it as COSID. Pass in the parameter id-name(Properties) to get the specific algorithm from IdGeneratorProvider.
SNOWFLAKE is the special one, lots of users know the algorithm. It is not good idea for change the old configuration with new type.
I’m not quite sure what you mean by real timestamp as business column value and original interval timestamp, what is the difference, could you elaborate more?
For example, user just want use format yyyy-MM-dd
to persist the data.
SNOWFLAKE is the special one, lots of users know the algorithm. It is not good idea for change the old configuration with new type.
agree.
I’m not quite sure what you mean by real timestamp as business column value and original interval timestamp, what is the difference, could you elaborate more?
For example, user just want use format yyyy-MM-dd to persist the data.
You mean Java.sql.Date
type?
If so, the DateIntervalShardingAlgorithm
is also supported.
If you mean string type, it is not supported yet, but I can handle it.
Great, everything has came to an agreement each other.
We can discuss coding design by pull request soon.
Nice!!! Thank you very much for your patience and suggestions.
I will finish partial coding work and submit PR within this week.
@Ahoo-Wang we need to integrate ShardingSphere's cluster mode to get instanceId instead of work-id
, and I have opened this issue #14254 to create instanceId for 3 modes of ShardingSphere, therefore, I will reopen the issue until we finish all the work.
@menghaoranss ok, that's right.