A RESTBase queuing module for Apache Kafka
The purpose of the change propagation service is executing actions based on events. The service listens to kafka topics, and executes handlers for events according to configurable rules. Currently, a rule could issue HTTP requests, produce new messages, or make an HTCP purge request. The list of supported actions is easily expandable by creating new modules with internal HTTP endpoints and calling them from the rules.
- Config-based rules for message processing. For more information about rules configuration see [Configuration](##Rule configuration) section.
- Automatic limited retries
- Global rule execution concurrency limiting
- Metrics and logging support
A Rule
is a semantically meaningful piece of service functionality. For example,
'Rerender RESTBase if the page was changed', or 'Update summary if RESTBase render was changed'
are both rules. To specify the rules, you need to add a property to the kafka
module config
template property.
Each rule is executed by a single worker, but internal load-balancing mechanism tries to distribute
rules to workers equally.
The rule can contain the following properties:
- topic A name of the topic to subscribe to.
- match An optional predicate for a message. The rule is executed only if all of the
match
properties were satisfied by the message. Properties could be nested objects, constants or a regex. Regex could contain capture groups and captured values will later be accessible in theexec
part of the rule. Capture groups could be named, using the(?<name>group)
syntax, then the captured value would be accessible undermatch.property_name.capture_name
within theexec
part. Named and unnamed captures can not be mixed together. - match_not An optional predicate which must not match for a rule to be executed. It doesn't capture values
and doesn't make them accessible to the
exec
part of the rule. Thematch_not
may be an array with the semantics of logical OR - if any of the array items match, thematch_not
matches. - exec An array of HTTP request templates, that will be executed sequentially if the rule matched.
The template follows request templating syntax.
The template is evaluated with a
context
that hasmessage
global property with an original message, andmatch
property with values extracted by the match.
Here's an example of the rule, which would match all resource_change
messages, emitted by RESTBase
,
and purge varnish caches for the resources by issuing an HTTP request to a special internal module, that would
convert it to HTCP purge and make an HTCP request:
purge_varnish:
topic: resource_change
match:
meta:
uri: '/^https?:\/\/[^\/]+\/api\/rest_v1\/(?<rest>.+)$/'
tags:
- restbase
exec:
method: post
uri: '/sys/purge/'
body:
- meta:
uri: '//{{message.meta.domain}}/api/rest_v1/{{match.meta.uri.rest}}'
For testing locally you need to setup and start Apache Kafka and set the
KAFKA_HOME
environment variable to point to the Kafka home directory.
Here's a sample script you need to run:
export KAFKA_HOME=<your desired kafka install path>
wget http://mirror.pekalatechnology.com/apache/kafka/0.9.0.1/kafka_2.10-0.9.0.1.tgz -O kafka.tgz
mkdir -p $KAFKA_HOME && tar xzf kafka.tgz -C $KAFKA_HOME --strip-components 1
echo "KAFKA_HOME=$KAFKA_HOME" >> ~/.bash_profile
echo "PATH=\$PATH:\$KAFKA_HOME/bin" >> ~/.bash_profile
Also, you need to enable topic deletion so that the test scripts could clean up kafka state before each test run:
echo 'delete.topic.enable=true' >> KAFKA_HOME/config/server.properties
Before starting the development version of change propagation or running
test you need to start Zookeeper and Kafka with start-kafka
npm script.
To stop Kafka and Zookeeper tun stop-kafka
npm script.
To run the service locally, you need to have to have kafka and zookeeper installed
and run. Example of installation and configuration can be found in the Testing
section of this readme. After kafka is installed, configured, and run with npm run start-kafka
command, copy the example config and run the service:
cp config.example.yaml config.yaml
npm start
Also, before using the service you need to ensure that all topics used in your config
exist in kafka. Topics should be prefixed with a datacenter name (default is default
). Also,
each topic must have a retry topic. So, if you are using a topic named test_topic
, the follwing
topics must exist in kafka:
- 'default.test_topic'
- 'default.change-prop.retry.test_topic'
The service is maintained by the Wikimedia Services Team. For bug reporting use EventBus project on Phabricator or #wikimedia-services IRC channel on freenode.