/osarchiver

OpenStack databases archiver

Primary LanguagePythonOtherNOASSERTION

OSArchiver: OpenStack databases archiver

OSArchiver is a python package that aims to archive and remove soft deleted data from OpenStack databases. The package is shiped with a main script called osarchiver that reads a configuration file and run the archivers.

Philosophy

  • OSArchiver doesn't have any knowledge of Openstack business objects
  • OSArchiver purely relies on the common way of how OpenStack marks data as deleted by setting the column 'deleted_at' to a datetime. It means that a row is archivable/removable if the 'deleted_at' column is not NULL

Limitations

  • Support Mysql/MariaDB as db backend.
  • python >= 3.5

Design

OSArchiver reads an INI configuration file in which you can define:

  • archivers: a section that hold one source and a non mandatory list of destinations
  • sources: a section that define a source of where the data should be read (basically the OS DB)
  • destinations: a section that define where the data should be archived

How does it works:

                                   .----------.
        .--------------------------| Archiver |-----------------------------.
        |                          '----------'                             |
        |                                                                   |
        |                                                                   |
        |                                                                   |
        v                        _______________                            v
   .--------.                    \              \                    .-------------.
   | Source |-------------------->) ARCHIVE DATA )------------------>| Desinations |
   '--------'                    /______________/                    '-------------'
        |                                |                                  |
        |                                |                                  |
        |                                |                                  |
        |                                |                                  |
        |                                |                                  |
        |                                v                                  |
        |                  .--------------------------.                     |
        v                 ( No error and delete_data=1 )                    |
                           '--------------------------'                     |
    _.-----._                            |                    _.-----._     |
  .-         -.                          |                  .-         -.   |   ___   
  |-_       _-|                          |                  |-_       _-|   |  |   |\ 
  |  ~-----~  |                          |                  |  ~-----~  |<--'->|   ' ___   
  |           |                          |                  |           |      | SQL|   |\ 
  `._       _.'                          |                  `._       _.'      |____|   '-|---.
     "-----"                             |                     "-----"              | CSV |   |
  OpenStack DB                           v                  Archiving DB            |_____|   |
        ^                        _______________                                              v
        |                        \              \                                 .-----------------------.
        '-------------------------) DELETE DATA  )                               ( remote_store configured )
                                 /______________/                                 '-----------------------'
                                                                                              |
                                                                                              v
                                                                                         __________ 
                                                                                        [_|||||||_°]
                                                                                        [_|||||||_°]
                                                                                        [_|||||||_°]

                                                                                 Remote Storage (Swift, ...)

Installation

git clone https://github.com/ovh/osarchiver.git
cd osarchiver
pip install -r requirements.txt
pip setup.py install

osarchiver script

# osarchiver --help
usage: osarchiver [-h] --config CONFIG [--log-file LOG_FILE]
                  [--log-level {info,warn,error,debug}] [--debug] [--dry-run]

optional arguments:
  -h, --help            show this help message and exit
  --config CONFIG       Configuration file to read
  --log-file LOG_FILE   Append log to the specified file
  --log-level {info,warn,error,debug}
                        Set log level
  --debug               Enable debug mode
  --dry-run             Display what would be done without really deleting or
                        writing data

Configuration

The configuation is an INI file containing several sections. You configure your differents archivers in this configuration file. An example is available at the root of the repository.

DEFAULT section:

  • Drescription: default section that define default/fallback value for options
  • Format [DEFAULT]
  • configuration parameters: all the parameters of archiver, source, destination and backend section can be added in this section, those will be the fallback value if the value is not set in a section.

Archiver section:

  • Description: defines where to read data and where to archive them and/or delete.
  • Format [archiver:name]
  • configuration parameters:
    • src: name of the src section
    • dst: comma separated list of destination section names
    • enable: 1 or 0, if set to 0 the archiver is ignored and not run

Example:

[archiver:My_Archiver]
src: os_prod
dst: file, db

[src:os_prod]
...

[dst:file]
...

[dst:db]
....

Source section:

  • Description: defines where the OpenStack database are. It supports for now one backend (db) but it may be easily extended
  • Format [src:name]
  • configuration parameters:
    • backend: the name of backend to use, only db is supported
    • retention: 12 MONTH
    • archive_data: 0 or 1 if set to 1 expect a dest to archive the data else won't run the archiving step just the delete step.
    • delete_data: 0 or 1 if set to 1 will run the delete step. If the archive step fails the delete step is not run to prevent loose of data.
    • backend specific options

Destination section:

  • Description: defines where the data should be written. It supports for now two backends (db for datatabase and file [csv, sql]) and may be extended
  • Format [dst:name]
  • configuration parameters:
    • backend: the name of backend to use, db or file
    • backend specific options

Backends options:

db

  • Description: is the database (mysql/mariadb) backend
  • options:
    • host: DB host to connect to
    • port: port of MariaDB server is running on
    • user: login of MariaDB server to connect with
    • password: password of user
    • delete_limit: apply a LIMIT to DELETE statement
    • select_limit: apply a LIMIT to SELECT statement
    • bulk_insert: data are inserted in DB every builk_insert rows
    • deleted_column: name of column that holds the date of soft delete, is also used to filter table to archive, it means that the table must have the deleted_column to be archived
    • where: the literal SQL where applied to the select statement Ex: where=${deleted_column} <= SUBDATE(NOW(), INTERVAL ${retention})
    • foreign_key_check: true or false if set to false disable foreign key check (default true)
    • retention: how long time of data to keep in database (SQL format: 12 MONTH, 1 DAY, etc..)
    • excluded_databases: comma, cariage return or semicolon separated regexp of DB to exclude when specfiying '*' as database. The following DB are akways ignored: 'mysql', 'performance_schema', 'information_schema'
    • excluded_tables: comma, cariage return or semicolon separated regexp of DB to exclude when specifying '' as table. Ex: shadow_.,.*_archived
    • db_suffix: a non mendatory suffix to apply to the archiving DB. The default suffix '_archive' is applied if you archive on same host than source without setting a db_suffix or table_suffix (avoid reading and writing on the same db.table)
    • table_suffix: apply a suffix to the archiving table if specified

file

  • Description: is the file archiving destination type, it writes SQL data in a file using one or several formats (supported: SQL, CSV)
    • directory: the directory path where to archive data. You may use the {date} keyword to append automaticaly the date to the directory path. (/backup/archive_{date})
    • formats: a comma, semicolon or cariage return separated list that define the format in witch archive the data (csv, sql)

You've developed a new cool feature ? Fixed an annoying bug ? We'd be happy

to hear from you !

Have a look in CONTRIBUTING.md

Related links

License

See https://github.com/ovh/osarchiver/blob/master/LICENSE