/rest-ftp-daemon

A pretty simple but configurable and efficient FTP-client daemon, driven through a RESTful API, used by France Télévisions in production

Primary LanguageJavaScriptMIT LicenseMIT

rest-ftp-daemon

Gem Version Code Climate Test Coverage Codeship

A pretty simple but configurable and efficient FTP-client daemon, driven through a RESTful API. Create transfer jobs by POSTing a simple JSON structure, be notified of their completion, watch their status on a dedicated dashboard.

Dashboard

Features

  • System and process features

    • environment-aware configuration in a YAML file
    • daemon process is tagged with its name and environment in process lists
    • global dashboard directly served within the daemon HTTP interface
    • support pooling of worker to dedicate workers to groups of jobs
  • File management ans transferts

    • allow authentication in FTP target in a standard URI-format
    • static path pointers in configuration to abstract local mounts or remote FTPs (endpoint tokens)
    • local source path and local/remote target path can use patterns to match multiple files (/dir/file*.jpg)
    • several file transfer protocols supported: FTPs, FTPes, sFTP
    • display bitrate to any pool or any FTP destination currently transferring (API and dashboard)
  • Job management

    • highly parrallel job processing using dedicated worker threads with their own context
    • jobs are taken into account as soon as they are submitted
    • each job carry its own attributes: build subdirectories (mkdir), overwrite target file, priority weight
    • dynamic evaluation of priorities, honoring any change on context until the job is picked
    • automatically clean-up jobs after a configurable amount of time (failed, finished)
  • Realtime status reporting

    • realtime transfer status reporting, with progress and errors
    • periodic update notifications sent along with transfer status and progress to an arbitrary URL (JSON resource POSTed)
    • metrics about pools, throughtput, and queues output to NewRelic

Project status and quick installation

Stability

Though it may need more robust tests, this gem has been used successfully in production for a while without any glitches at France Télévisions.

API Documentation

API documentation is self-hosted on /swagger.html

Expected features in a short-time range

  • Provide swagger-style API documentation
  • Authenticate API clients
  • Allow more transfer protocols (HTTP POST etc)
  • Expose JSON status of workers on GET /jobs/ for automated monitoring

Installation

With Ruby (version 2.3 or higher) and rubygems properly installed, you only need :

gem install rest-ftp-daemon

If that is not the case yet, see section Debian install preparation.

Subsystems

Conchita: jobs queues cleanup

Job queue can be set to automatically cleanup after a certain delay. Entries are removed from the queue when they have been idle (updated_at) for more than X seconds, and in any of the following statuses:

  • failed (conchita.clean_failed)
  • finished (conchita.clean_finished)
  • queued, (conchita.clean_queued)

Cleanup is done on a regular basis, every (conchita.timer) seconds.

Reporter: metrics collection

[TODO]

Usage and examples

Launching rest-ftp-daemon

You must provide a configuration file for the daemon to start, either explicitly using option --config or implicitly at /etc/rest-ftp-daemon.yml. A sample file is provided, issue --help to get more info.

You then simply start the daemon on its standard port, or on a specific port using -p

$ rest-ftp-daemon -p 3000 start

Check that the daemon is running and exposes a JSON status structure at http://localhost:3000/status.

The dashboard will provide an overview at http://localhost:3000/

If the daemon appears to exit quickly when launched, it may be caused by logfiles that can't be written (check files permissions or owner).

Launcher options :

Param Short Default Description
-p --port (automatic) Port to listen for API requests
-e production Environment name
--dev Equivalent to -e development
-d --daemonize false Wether to send the daemon to background
-f --foreground false Wether to keep the daemon running in the shell
-P --pid (automatic) Path of the file containing the PID
-u --user (none) User to run the daemon as
-g --group (none) Group of the user to run the daemon as
-h --help Show info about the current version and available options
-v --version Show the current version

Start a job to transfer a file named "file.iso" to a local FTP server

curl -H "Content-Type: application/json" -X POST -D /dev/stdout -d \
'{"source":"~/file.iso","target":"ftp://anonymous@localhost/incoming/dest2.iso"}' "http://localhost:3000/jobs"

Start a job using endpoint tokens

First define nas ans ftp1 in the configuration file :

defaults: &defaults

development:
  <<: *defaults

  endpoints:
    nas: "~/"
    ftp1: "ftp://anonymous@localhost/incoming/"

Those tokens will be expanded when the job is run:

curl -H "Content-Type: application/json" -X POST -D /dev/stdout -d \
'{"source":"~/file.dmg","priority":"3","target":"ftp://anonymous@localhost/incoming/dest4.dmg","notify":"http://requestb.in/1321axg1"}' "http://localhost:3000/jobs"

Start a job with a specific pool name

The daemon spawns groups of workers (worker pools) to work on groups of jobs (job pools). Any pool attribute not declared in configuration will land into the "default" pool.

curl -H "Content-Type: application/json" -X POST -D /dev/stdout -d \
'{"pool": "maxxxxx",source":"~/file.iso",target":"ftp://anonymous@localhost/incoming/dest2.iso"}' "http://localhost:3000/jobs"

This job will be handled by the "maxxxxx" workers only, or by the "default" worker is this pool is not declared.

Get info about a job with ID="q89j.1"

Both parameters q89j.1 and 1 will be accepted as ID in the API. Requests below are equivalent:

GET http://localhost:3000/jobs/q89j.1
GET http://localhost:3000/jobs/1

Configuration

Most of the configuration options live in a YAML configuration file, containing two main sections:

  • defaults section should be left as-is and will be used is no other environment-specific value is provided.
  • production section can receive personalized settings according to your environment-specific setup and paths.

Configuration priority is defined as follows (from most important to last resort):

  • command-line parameters
  • config file defaults section
  • config file environment section
  • application internal defaults

As a starting point, rest-ftp-daemon.yml.sample is an example config file that can be copied into the expected location /etc/rest-ftp-daemon.yml.

Default administrator credentials are admin/admin. Please change the password in this configuration file before starting any kind of production.

Here is the contents of the default configuration (oeverride by passing -c local.yml at startup)

daemonize: true
port: 3000
user: rftpd
# group: rftpd
# host: "myhost"
allow_reload: false

pools:                      # number of workers decidated to each pool value
  default: 2
  urgent: 1

reporter:                   # the subsytem in charge of reporting metrics, mainly to NewRelic
  debug: false
  timer: 10                 # report every X seconds

conchita:
  debug: false
  timer: 60                 # do the cleaning up every X seconds
  garbage_collector: true   # force a garbage collector cleanup when cleaning things up
  clean_failed: 3600        # after X seconds, clean jobs with status="failed"
  clean_finished: 3600      # //              //              //       finished
  clean_queued: 86400       # //              //              //       queued

transfer:
  debug: false
  mkdir: true               # build directory tree if missing
  tempfile: true            # transfer to temporary file, rename after sucessful transfer
  overwrite: false          # overwrite any target file with the same name
  timeout: 1800             # jobs running for longer than X seconds will be killed
  notify_after: 5           # wait at least X seconds between HTTP notifications

  debug_ftp: false
  debug_ftps: false
  debug_sftp: false

  retry_on:                 # job error values that will allow a retry
    - ftp_perm_error
    - net_temp_error
    - conn_reset_by_peer
    - conn_timed_out
    - conn_refused
    - sftp_auth_failed
    - conn_host_is_down
    - conn_unreachable
    - conn_failed
    - conn_openssl_error
  retry_max: 5              # maximum number of retries before giving up on that job
  retry_for: 1800           # maximum time window to retry failed jobs
  retry_after: 10           # delay to wait before tries

newrelic:
  debug: false
  # license: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
  # app_name: "rftpd-bigbusiness-dev"     # app_name used for naming app (used as-is if provided)
  prefix: "rftpd"                         # app prefix to build app_name
  # platform: "bigbusiness"               # app platform to build app_name

logs:
  path:     "/var/log/"
  thin:     "rftpd-environment-thin.log"
  newrelic: "rftpd-environment-newrelic.log"
  queue:    "rftpd-environment-core.log"
  api:      "rftpd-environment-core.log"
  workers:  "rftpd-environment-core.log"
  transfer: "rftpd-environment-workers.log"
  conchita: "rftpd-environment-workers.log"
  reporter: "rftpd-environment-workers.log"
  notify:   "rftpd-environment-workers.log"

TODO for this document

  • Document /status
  • Document /routes
  • Document mkdir and overwrite options
  • Document stats

Debian install preparation

This project is available as a rubygem, requires Ruby 2.3.0 and RubyGems installed.

Using rbenv and ruby-build

You may use rbenv and ruby-build to get the right Ruby version. If this is your case, ensure that ruby-build definitions are up-to-date and include the right Ruby version. You may have to install some extra packages for the compilations to complete.

# apt-get install libffi-dev zlib1g-dev bison libreadline-dev
# git clone https://github.com/rbenv/rbenv.git ~/.rbenv
# git clone https://github.com/rbenv/ruby-build.git ~/.rbenv/plugins/ruby-build
# echo 'export PATH="$HOME/.rbenv/bin:$PATH"' >> ~/.bashrc
# echo 'eval "$(rbenv init -)"' >> ~/.bashrc
# rbenv install --list | grep '2.3'
# curl -fsSL https://gist.github.com/mislav/055441129184a1512bb5.txt | rbenv install --patch 2.2.3

Otherwise, you way have to update ruby-build to include Ruby 2.3 definitions. On Debian, 2.3 is not included in Wheezy and appears in Jessie's version of the package.

Dedicated user

Use a dedicated user for the daemon, switch to this user and enable rbenv

# adduser --disabled-password --gecos "" rftpd
# su rftpd -l

Ruby version

Install the right ruby version and activate it

# rbenv install 2.3.0
# rbenv local 2.3.0
# rbenv rehash

Daemon installation

Update RubyGems and install the gem from rubygems.org

# gem update --system
# gem install rest-ftp-daemon --no-ri --no-rdoc
# rbenv rehash
# rest-ftp-daemon start

Known bugs

  • As this project is based on the Psyck YAML parser, configuration merge from "defaults" section and environment-specific section are broken. A sub-tree defined for a specific environment, will overwrite the corresponding subtree from "defaults". Please repeat whole sections from "defaults".

  • As this project is based on Chamber, and it considers hyphens in filename as namespaces, the global /etc/rest-ftp-daemon.yml config file is not parsed (and thus, ignored). Until this is worked around, please specify a config filename on the commandline.

  • If you get fatal error: 'openssl/ssl.h' file not found when installing eventmachine on OSX El Capitan, you can try with:

gem install eventmachine -v '1.0.8' -- --with-cppflags=-I/usr/local/opt/openssl/include
bundle install
  • If you get uncommon.mk:189: recipe for target 'build-ext' failed on Debian, you can try with:
curl -fsSL https://gist.github.com/mislav/055441129184a1512bb5.txt | rbenv install --patch 2.2.3

Contributing

Contributions are more than welcome, be it for documentation, features, tests, refactoring, you name it. If you are unsure of where to start, the Code Climate report will provide you with improvement directions. And of course, if in doubt, do not hesitate to open an issue. (Please note that this project has adopted a code of conduct.)

If you want your contribution to adopted in the smoothest and fastest way, don't forget to:

  • provide sufficient documentation in you commit and pull request
  • add proper testing (we know full grown solid test coverage is still lacking and need to up the game)
  • use the RuboCop guidelines provided (there are all sorts of editor integration plugins available)

So,

  1. Fork the project
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Code
  • add proper tests if adding a feature
  • run the tests using rake
  • check for RuboCop style guide violations
  1. Commit your changes

  2. Push to the branch (git push origin my-new-feature)

  3. Create new Pull Request

About

Thanks to https://github.com/berkshelf/berkshelf-api for parts and ideas used in this project

This project has been initiated and originally written by Bruno MEDICI Consultant (http://bmconseil.com/)