zzt93/syncer

TODO list

zzt93 opened this issue · 3 comments

zzt93 commented

TODO

  • Cold-start (ETL) opt

    • hold buffer full handle
    • hold change after now
  • Test Framework more

    • join
    • performance test
  • Add /stat, /input endpoint for syncer

  • Timezone config

  • Convert MySQL integer as byte array

    • add auto conversion with meta info fetched: unsigned int: long, unsigned long: byte array
  • Dependency module: not package & load if not use mongo sync etc.

    • SLF4j
    • Nginx
  • IncludeBefore & IncludeUpdated config?

  • rpm and dpkg

  • Sync check: query input & output for comparing

    • Implement by a special SyncData?
    • Should has a http endpoint to invoke it.
  • Warning if multiple schema.table has different rows

  • A better serializer than json, which lost the type info: PB?Avro?

  • Test redis output & nested sql

  • Config thread for each consumer

  • Support set parent of ES

  • Row image format support?

    • Add must appeared field restriction -- now only primary key
    • Opt: keep only changed field in update event & primary key in delete event -- include must appear field
  • Batch module & failure module is coupled with channel module

    • Filter chain?
    • Failure module as last channel?
  • MDC.put eventId is necessary??

zzt93 commented

Done:

  • Output need customization of Spring EL -- Remove spring EL
  • Mysql input field check
  • Add new cold start: batch select (order by id) & batch insert
  • Shorten id:
    • change serverId to port/clientId?
      • serverId: not for unique purpose, but for debug -- removed, to save memory
    • variable integer encoding for position: xxx/123456/gap/xxx
    • shorten offset
  • Support set start binlog file name & position in config file (make it easier to rebuild)
  • Refactor clone & dup semantic -- change to create
  • Reduce memory footprint of StandardEvaluationContext (20% memory reduction)
  • Add file as data source: to read binlog file
  • Update failure log format: not escape json string
  • Order problem: make same id to same thread; strict mode: retry error item and all left; retry only error item
  • Output channel reconnection logic: MySQL & ES
  • Adjust logging level dynamically
  • Add health check endpoint
  • upsert for es output channel if 404
  • Add shutdown hook to do clean up: stop sending data to output target, avoid dup key exception
  • Update to Spring Boot 2.0 for better yaml prompt when config
  • Skip synced item if already synced when startup
  • Add kafka output channel
    • kafka msg consumer has to handle event idempotently;
    • send event using primary key as key
    • deploy SyncData SyncUtil as separate jar to maven central
  • Refactor config naming:
input:
  masters:
    - connection:
        address: ${HOST_ADDRESS}
        port: 27018
      type: Mongo
      repos:
        - name: "chat"
          entities:
          - name: messages
            fields: [time, content]
  • Package refactor:
    • For syncer-data deploy
    • Refactor config package
  • Add kafka version compatiblity in readme.
  • Reduce useless dependency: remove spring boot
  • Refactor filter module design flaw & add nested if and/or enhance switcher
  • Use javassist/cglib/byte buddy JavaCompiler to generate code dynamically rather than spring el
  • Support config key like lower-hyphen
  • Binlog checksum type auto detection
  • kafka MESSAGE TOO LARGE
  • Share same table definition for multiple remote
  • Test framework
  • Refactor SyncData: update event should have before & fields data:
    • add updated() & udpated(String name) method for use
    • add before to get before data
  • Test framework: add update/delete test
  • Update README config example: remove and link to test config dir.
  • Test framework: mongo
  • Check MongoDB whether registered db/collection is exists
  • Batch buffer bug
  • Opt logging: Ack log, MasterConnector
  • Connect to latest binlog flag (cold start usage)
    • de-register cold-start consumer?
    • or use same consumer, different filter?
  • Add consumerId in log
    • or report thread-consumer relation in http port
    • or change thread name to syncer-consumerId-filter-1
  • ConsumerId syntax check: not support -
  • FileBasedMap record last removed position if map is empty
  • Change from tailing oplog to use change stream api: check mongo version when startup
  • ES output channel support nested obj
  • Alter table auto re-sync mysql column index so no need to restart
  • ES client upgrade (5.x, 7.x, not all features, 6.x all features) -- rest client & basic auth to replace xpack & low level rest client
  • Test framework:
    • Mongo update/delete
  • Change filter module to single thread, add partition key support in syncData which will be used in output module (multiple thread)
  • Order problem when id is changed: add scheduler key
    • Joining like this will inevitably cause data inconsistency because the at-least-once-semantic, not do.
    • ES can make it by nested obj
    • Kafka need this
  • Filter module not shutdown but use failure log- Pressure test continue
    • Degradation & Bound queue size: change to fixed sized queue
  • Column filter: _all
  • Cold start
zzt93 commented

Testing & Implementing

  • Update position even not interested in

  • Share storage in k8sMode

    • Sync meta info to ZK like
    • k8sMode need a instanceId to differentiate
      • storage path /instanceId/syncer/xx
    • config file?
  • Kafka output: timestamp to long;

  • Mysql output: auto add id;

  • #8 [Test Pending] MySQL upsert support: for join table order problem -- ref

  • [Impl Pending] Update sync meta position when consumer not interested in this event?

    • Implement by a simple position flusher typed event?
    • emit when trying to shutdown?
    • emit when num not interested event happened
zzt93 commented

Not Do

  • Schema mis-match problem -- fix by new cold-start method -- ETL
    • Write schema of all tables to local file, then parse all DDL to update it.
    • Start to load schema from files
    • Cold start
      • connect to latest binlog (can't resolve mis-match in this situation)
  • Netty as http client (idempotence is hard to achieve)
  • Support rpc output channel (idempotence is hard to achieve)
  • Support websocket for long lived connection (idempotence is hard to achieve)
  • Join by query extra data source in output?
  • Make output module non-blocking with callback, so reduce filter-output thread?
    • May cause disorder of event -- make it as config option: non-block-mode