/apm-integration-testing

APM Integration Testing

Primary LanguagePythonApache License 2.0Apache-2.0

APM Integration Testing

This repo contains tools for end-to-end (eg agent -> apm server -> elasticsearch <- kibana) development and testing of Elastic APM.

Build Status

Prerequisites

The basic requirements for starting a local environment are:

  • Docker
  • Docker Compose
  • Python (version 3 preferred)

This repo is tested with Python 3 but starting/stopping environments work with Python 2.7 is supported on a best-efforrt basis. To change the default PYTHON version you have to set PYTHON environment variable to something like PYTHON=python2.

Docker

Installation instructions

Docker compose

Installation instructions

Python 3

Update Elastic Stack on new releases

Every time there is a new Elastic Stack release we have to update the configuration files to test the new release. These are the changes we have to make:

  • Update ELASTIC_STACK_VERSION param on Jenkinsfiles (.ci/*.groovy and .ci/Jenkinsfile files).
  • Update .ci/scripts/7.0-upgrade.sh Elastic stack used for the update (on branch 7.x)
  • Update SUPPORTED_VERSIONS
  • Update APM server versions

Running Local Enviroments

Starting an Environment

./scripts/compose.py provides a handy CLI for starting a testing environment using docker-compose. make venv creates a virtual environment with all of the python-based dependencies needed to run ./scripts/compose.py - it requires virtualenv in your PATH. Activate the virtualenv with source venv/bin/activate and use ./scripts/compose.py --help for information on subcommands and arguments. Finally, you can execute the following command to list all available parameter to start the environment ./scripts/compose.py start --help.

APM LocalEnv Quickstart

Logging in

By default, Security is enabled, which means you need to log into Kibana and/or authenticate with Elasticsearch. An assortment of users is provided to test different scenarios:

  • admin
  • apm_server_user
  • apm_user_ro
  • kibana_system_user
  • *_beat_user

The password for all default users is changeme.

Stopping an Environment

All services:

./scripts/compose.py stop

# OR

docker-compose down

All services and clean up volumes and networks:

make stop-env

Individual services:

docker-compose stop <service name>

Example environments

We have a list with the most common flags combination that we internally use when developing our APM solution. You can find the list here:

Persona Flags Motivation / Use Case Team Comments
CI Server python scripts/compose.py start 8.0.0 --java-agent-version ${COMMIT_SHA} --build-parallel --with-agent-java-spring --no-apm-server-self-instrument --force-build Prepare environment for running tests for APM Java agent APM Agents
CI Server python scripts/compose.py start 8.0.0 --apm-server-build https://github.com/elastic/apm-server@${COMMIT_HASH} --build-parallel --no-apm-server-self-instrument --with-agent-rumjs --with-agent-dotnet --with-agent-go-net-http --with-agent-nodejs-express --with-agent-ruby-rails --with-agent-java-spring --with-agent-python-django --with-agent-python-flask --force-build Prepare environment for running tests for APM Server APM Server
Demos & Screenshots ./scripts/compose.py start --release --with-opbeans-dotnet --with-opbeans-go --with-opbeans-java --opbeans-java-agent-branch=pr/588/head --force-build --with-opbeans-node --with-opbeans-python --with-opbeans-ruby --with-opbeans-rum --with-filebeat --with-metricbeat 7.3.0 demos, screenshots, ad hoc QA. It's also used to send heartbeat data to the cluster for Uptime PMM Used for snapshots when close to a release, without the --release flag
Development ./scripts/compose.py start 7.3 --bc --with-opbeans-python --opbeans-python-agent-local-repo=~/elastic/apm-agent-python Use current state of local agent repo with opbeans APM Agents
./scripts/compose.py start 7.3 --bc --start-opbeans-deps This flag would start all opbeans dependencies (postgres, redis, apm-server, ...), but not any opbeans instances APM Agents This would help when developing with a locally running opbeans. Currently, we start the environment with a --with-opbeans-python flag, then stop the opbeans-python container manually
Developer ./scripts/compose.py start main --no-apm-server Only use Kibana + ES in desired version for testing APM Server
Developer ./scripts/compose.py start --release 7.3 --no-apm-server Use released Kibana + ES, with custom agent and server running on host, for developing new features that span agent and server. APM Agents If --opbeans-go-agent-local-repo worked, we might be inclined to use that instead of running custom apps on the host. Would have been handy while developing support for breakdown graphs. Even then, it's probably still faster to iterate on the agent without involving Docker.
Developer ./scripts/compose.py start main --no-kibana Use newest ES/main, with custom kibana on host, for developing new features in kibana APM
Developer ./scripts/compose.py start 6.3 --with-kafka --with-zookeeper --apm-server-output=kafka --with-logstash --with-agent-python-flask Testing with kafka and logstash ingestion methods APM
Developer ./scripts/compose.py start main --no-kibana --with-opbeans-node --with-opbeans-rum --with-opbeans-x Developing UI features locally APM UI
Developer ./scripts/compose.py start main --docker-compose-path - --skip-download --no-kibana --with-opbeans-ruby --opbeans-ruby-agent-branch=main > docker-compose.yml Developing UI features againt specific configuration APM UI We sometimes explicity write a docker-compose.yml file and tinker with it until we get the desired configuration becore running docker-compose up
Developer scripts/compose.py start ${version} Manual testing of agent features APM Agents
Developer ./scripts/compose.py start main --with-opbeans-java --opbeans-java-agent-branch=pr/588/head --apm-server-build https://github.com/elastic/apm-server.git@main Test with in-progress agent/server features APM UI
Developer ./scripts/compose.py start 7.0 --release --apm-server-version=6.8.0 Upgrade/mixed version testing APM Then, without losing es data, upgrade/downgrade various components
Developer ./scripts/compose.py start --with-opbeans-python --with-opbeans-python01 --dyno main Spin up a scenario for testing load-generation. APM The management interface will be available at http://localhost:9000

Change default ports

Expose Kibana on http://localhost:1234:

./scripts/compose.py start main --kibana-port 1234

Opbeans

Opbeans are demo web applications that are instrumented with Elastic APM. Start opbeans-* services and their dependencies along with apm-server, elasticsearch, and kibana:

./scripts/compose.py start --all main

This will also start the opbeans-load-generator service which, by default, will generate random requests to all started backend Opbeans services. To disable load generation for a specific service, use the --no-opbeans-XYZ-loadgen flag.

Opbeans RUM does not need a load generation service, as it is itself generating load using a headless Chrome instance.

Start Opbeans with a specific agent branch

You can start Opbeans with an agent which is built from source from a specific branch or PR. This is currently only supported with the Go and the Java agent.

Example which builds the elastic/apm-agent-java#588 branch from source and uses an APM server built from main:

./scripts/compose.py start main --with-opbeans-java --opbeans-java-agent-branch=pr/588/head --apm-server-build https://github.com/elastic/apm-server.git@main

Note that it may take a while to build the agent from source.

Another example, which installs the APM Python Agent from the main branch for testing against opbeans-python (for example, for end to end log correlation testing):

./scripts/compose.py start main --with-opbeans-python --with-filebeat --opbeans-python-agent-branch=main --force-build

Note that we use --opbeans-python-agent-branch to define the Python agent branch for opbeans-python, rather than --python-agent-package, which only applies to the --with-python-agent-* flags for the small integration test apps.

Uploading Sourcemaps

The frontend app packaged with opbeans-node runs in a production build, which means the source code is minified. The APM server needs the corresponding sourcemap to unminify the code.

You can upload the sourcemap with this command:

./scripts/compose.py upload-sourcemap

In the standard setup, it will find the config options itself, but they can be overwritten. See

./scripts/compose.py upload-sourcemap --help

Kafka output

./scripts/compose.py start --with-kafka --with-zookeeper --apm-server-output=kafka --with-logstash main

Logstash will be configured to ingest events from Kafka.

Topics are named according to service name. To view events for 1234_service-12a3:

docker exec -it localtesting_6.3.0-SNAPSHOT_kafka kafka-console-consumer --bootstrap-server kafka:9092 --topic apm-1234_service-12a3 --from-beginning --max-messages 100

Onboarding events will go to the apm topic.

Note that index templates are not loaded automatically when using outputs other than Elasticsearch. Create them manually with:

./scripts/compose.py load-dashboards

If data was inserted before this point (eg an opbeans service was started) you'll probably have to delete the auto-created apm-* indexes and let them be recreated.

🦖 Load testing in Dyno mode

The APM Integration Test Suite includes the ability to create an environment that is useful for modeling various scenarios where services are failed or experiencing a variety of constraints, such as network, memory, or network pressure.

Starting the APM Integration Test suite with the ability to generate load and manipulate the performance characteristics of the individual chararteristics is called Dyno Mode.

To enable Dyno Mode, apped the --dyno flag to the arguments given to ./script/compose.py. When this flag is passed, the test suite will start as it normally would, but various additional components will be enabled which all the user to generate load for various services and to manipulate the performance of various components.

After starting in Dyno Mode, navigate to http://localhost:9000 in your browser. There, you should be presented with a page which shows the Opbeans which are running, along with any dependent services, such as Postgres or Redis.

A pane on the left-hand side of the window allows load-generate to be started and stopped by clicking the checkbox for the Opbean(s) you wish to apply load to. Unchecking the box for an Opbean in this pane will cause load-generation to cease.

After load generation is started, the number of requests can be adjusted by moving the W slider for the relevant Opbean up or down. To control the likelihood that a request will result in an error in the application which can be seen in APM, use the E slider to adjust the error rate. Moving the slider up will result in a higher percentage of requests being errors.

Supported Dyno Opbeans

Not all Opbeans are supported for use with Dyno

Opbean Supported
Python ✅
Go 🔲
.NET 🔲
Java 🔲
Node 🔲
Ruby 🔲

Using Dyno with a remote APM Server

It is possible to connect the infrastructure generated by compose.py to a remote APM Server. This makes possible to send APM data from the scenarios modeled with Dyno. The following command will launch Opbeans-python and Dyno locally, the Opbeans-python is configured as the environment local of the service dyno-service, and using the version name test-demo

APM_SERVER_URL=https://apm.example.com \
APM_TOKEN=MySuPerApMToKen \
python3 ./scripts/compose.py start 8.0.0 \
  --no-kibana \
  --no-elasticsearch \
  --dyno  \
  --apm-server-url "${APM_SERVER_URL}" \
  --apm-server-secret-token="${APM_TOKEN}" \
  --with-opbeans-python \
  --opbeans-python-service-environment local \
  --opbeans-node-service-version test-demo \
  --opbeans-python-service-name dyno-service

when the Docker container started you can connect to the Dyno UI at http://localhost:9000 and modify the scenario to cause errors, in this case we have disabled PostgreSQL to cause a database service error.

Then we can check the result of our changes in Dyno in the APM UI, in this case we can see that the error rate for postgreSQL is increasing.

Introducing failure into the network

For each service, different classes of network failure can be introduced and adjusted with their respective sliders. They are as follows:

Slider key Slider name Description
L Latency Adds latency to all data. The overall delay is equal to latency +/- jitter.
J Jitter Adds jitter to all data. The overall delay is equal to latency +/- jitter.
B Bandwidth The overall amount of bandwidth available to the connection
T Timeout Stops all data from getting through, and closes the connection after timeout. If timeout is 0, the connection won't close, and data will be delayed until the timeout is increased.
Sas Packet slice average size Slice TCP packets into this average size
Ssd Packet slice average delay Introduce delay between the transmission of each packet

Modifying system properties

The container for each sevice may be instantly resized to the amount of available CPU power or total memory up or down through use of the CPU and Memory slider respectively.

Enabling/disabling services

Unchecking the button immediately above the panel of sliders for each service will immediately cut off access to that service. The service itself will remain up but no traffic will be routed to it. Re-checking the box will immediately restore the network connectivity for the service.

Debugging problems

Load generation is on but no requests are arriving

Occasionally, all network routing will fail due to an unresolved bug with thread-safety in the proxy server. If you expect load-generation to be running but you do not see any traffic arriving at the Opbeans, it is possibly that the network proxy has crashed. To quickly restore it, run docker-compose restart toxi. Traffic should be immediatley restored.

Advanced topics

Dumping docker-compose.yml

./scripts/compose.py start main --docker-compose-path - --skip-download will dump the generated docker-compose.yml to standard out (-) without starting any containers or downloading images.

Omit --skip-download to just download images.

Testing compose

compose.py includes unittests, make test-compose to run.

Jaeger

APM Server can work as a drop-in replacement for a Jaeger collector and ingest traces directly from a Jaeger agent via gRPC.

To test Jaeger/gRPC, start apm-integration-testing, run the Jaeger Agent, and start the Jaeger hotrod demo:

./scripts/compose.py start 7.13 \
--apm-server-secret-token="abc123"
docker run --rm -it --name jaeger-agent --network apm-integration-testing -p6831:6831/udp \
-e REPORTER_GRPC_HOST_PORT=apm-server:8200 \
-e AGENT_TAGS="elastic-apm-auth=Bearer abc123" \
jaegertracing/jaeger-agent:latest
docker run --rm -it --network apm-integration-testing \
-e JAEGER_AGENT_HOST=jaeger-agent \
-e JAEGER_AGENT_PORT=6831 \
-p8080-8083:8080-8083 jaegertracing/example-hotrod:latest all

Finally, navigate to http://localhost:8080/ and click around to generate data.

Running Tests

Additional dependencies are required for running the integration tests:

  • python3
  • virtualenv

On a Mac with Homebrew:

brew install pyenv-virtualenv

All integration tests are written in python and live under tests/.

Several make targets exist to make their execution simpler:

  • test-server
  • test-kibana
  • test-agent-{go,node,python,ruby,...}

These targets will create a python virtual environment in venv with all of the dependencies need to run the suite.

Each target requires a running test environment, providing an apm-server, elasticsearch, and others depending on the particular suite.

Tests should always eventually be run within a Docker container to ensure a consistent, repeatable environment for reporting.

Prefix any of the test- targets with docker- to run them in a container eg: make docker-test-server.

Network issues diagnose

It is possible to diagnose network issues related with lost documents between APM Agent, APM server, or Elasticsearch.

In order to do so, you have to add the --with-packetbeat argument to your command line.

When you add this argument an additional Docker container running Packetbeat is attached to the APM Server Docker container, this container will grab information about the communication between APM Agent, APM server, and Elasticsearch that you can analyze in case of failure.

When a test fails, data related to Packetbeat and APM is dumped with elasticdump into a couple of files /app/tests/results/data-NAME_OF_THE_TEST.json and /app/tests/results/packetbeat-NAME_OF_THE_TEST.json

Continuous Integration

Jenkins runs the scripts from .ci/scripts and is viewable at https://apm-ci.elastic.co/.

Those scripts shut down any existing testing containers and start a fresh new environment before running tests unless the REUSE_CONTAINERS environment variable is set.

These are the scripts available to execute:

  • agent.sh: runs the tests for the given agent. You can choose the versions to run see the environment variables configuration.

  • all.sh: runs all tests on apm-server and every agent type.

  • common.sh: common scripts variables and functions. It does not execute anything.

  • dotnet.sh: runs .NET tests, you can choose the versions to run see the environment variables configuration.

  • go.sh: runs Go tests. You can choose the versions to run see the environment variables configuration.

  • java.sh: runs Java tests. Uou can choose the versions to run see the environment variables configuration.

  • kibana.sh: runs Kibana agent tests. You can choose the versions to run see the environment variables configuration.

  • nodejs.sh: runs Nodejs agent tests. You can choose the versions to run see the environment variables configuration.

  • opbeans.sh: runs the unit tests for the apm-integration-testing app and validates the linting. You can choose the versions to run see the environment variables configuration.

  • opbeans-app.sh: runs the apm-integration-testing app and validates the stack can be started. You can choose the versions to run see the environment variables configuration.

  • php.sh: runs PHP agent tests, you can choose the versions to run see the environment variables configuration.

  • python.sh: runs Python agent tests. You can choose the versions to run see the environment variables configuration.

  • ruby.sh: runs Ruby agent tests, you can choose the versions to run see the environment variables configuration.

  • server.sh: runs APM Server tests, you can choose the versions to run see the environment variables configuration.

  • unit-tests.sh: runs the unit tests for the apm-integration-testing app and validate the linting, you can choose the versions to run see the environment variables configuration.

Environment Variables

It is possible to configure some options and versions to run by defining environment variables before to launch the scripts.

  • COMPOSE_ARGS: replaces completely the default arguments compose.py used by scripts, see the compose.py help to know which ones you can use.

  • DISABLE_BUILD_PARALLEL: by default Docker images are built in parallel. If you set DISABLE_BUILD_PARALLEL=true then the Docker images will build in series, which helps to make the logs more readable.

  • BUILD_OPTS: aggregates arguments to default arguments passing to compose.py. See compose.py help to know which ones you can use.

  • ELASTIC_STACK_VERSION: selects the Elastic Stack version to use on tests. By default is is used the main branch. You can choose any branch or tag from the Github repo.

  • APM_SERVER_BRANCH: selects the APM Server version to use on tests. By default it uses the main branch. You can choose any branch or tag from the Github repo.

  • APM_AGENT_DOTNET_VERSION: selects the agent .NET version to use. By default it uses the main branch. See specify an agent version

  • APM_AGENT_GO_VERSION: selects the agent Go version to use. By default it uses the main branch. See specify an agent version

  • APM_AGENT_JAVA_VERSION: selects the agent Java version to use. By default it uses the main branch. See specify an agent version

  • APM_AGENT_NODEJS_VERSION: selects the agent Nodejs version to use. By default it uses the main branch. See specify an agent version

  • APM_AGENT_PHP_VERSION: selects the agent PHP version to use. By default it uses the main branch. See specify an agent version

  • APM_AGENT_PYTHON_VERSION: selects the agent Python version to use. By default it uses the main branch. See specify an agent version

  • APM_AGENT_RUBY_VERSION: selects the agent Ruby version to use. By default it uses the main branch. See specify an agent version

Specify an Agent Version

You can choose any release, branch, or tag from the Github repo. To do so, you must set the PKG environment variable to MODE;VERSION, where MODE can be:

  • github: to get VERSION from branches and tags.

  • release: to get VERSION from releases.

  • commit: to get VERSION from commits (only Java and Go agents).

e.g.

  • APM_AGENT_NODEJS_VERSION=github;v1.0.0 It will try to get v1.0.0 branch or tag from Github.

  • APM_AGENT_NODEJS_VERSION=github;main It will try to get main branch or tag from Github.

  • APM_AGENT_NODEJS_VERSION=release;v1.0.0 It will try to get v1.0.0 from releases repo.

  • APM_AGENT_RUBY_VERSION=release;latest It will try to get latest from releases repo.

  • APM_AGENT_JAVA_VERSION=commit;539f1725483804d32beb4f780eac72c238329cb1 It will try to get 539f1725483804d32beb4f780eac72c238329cb1 from repo commits.

Version Tests

Various combinations of versions of agents and the Elastic Stack are tested together to ensure compatibility.

The matrix is defined using apm_server.yml for one axis and then a per-agent specification for the other axis.

Certain exclusions are defined on a per agent basis.

For example, the nodejs matrix is defined in nodejs.yml.

When those tests run, scripts/ci/versions_nodejs.sh is invoked with the product of those files, eg scripts/ci/versions_nodejs.sh 'github;main' '6.3'.

The Elastic Stack version argument accepts an optional list of semi-colon separated arguments that will be passed to scripts/compose.py when building the test stack.

Agent Development

To run integration tests against unreleased agent code, start an environment where that agent code is used by the test application.

For example, to test an update to the Python agent from user elasticcontributor on their newfeature1 branch:

# start test deps: apm-server, elasticsearch, and the two python test applications
# the test applications will use elasticcontributor's newfeature1 apm agent
./scripts/compose.py start main --with-agent-python-django --with-agent-python-flask --python-agent-package=git+https://github.com/elasticcontributor/apm-agent-python.git@newfeature1 --force-build

# wait for healthiness
docker-compose ps

# run tests
make test-agent-python

Testing unrelease code for other agents follows a simliar pattern.

See version* in https://github.com/elastic/apm-integration-testing/tree/main/.ci/scripts for details on how CI tests specific agent/elastic stack version combinations.

Testing docker images

Tests are written using bats under the docker/tests dir

make -C docker test-<app>
make -C docker test-opbeans-<agent>
make -C docker test-<agent>

Test all the docker images for the Opbeans

make -C docker all-opbeans-tests

Test all the docker images for the agents

make -C docker all-agents-tests

Use Cases

Some of the uses cases that are not fully automated and required some manual actions.

To run Agents using an ApiKey

Run the script to start only Elasticsearch:

python scripts/compose.py start 8.0.0 --no-apm-server --no-kibana

Then, run the script to generate the API key:

apiKey=$(scripts/create-api-key.sh)
echo $apiKey

Finally, run the apm-its with the required services and the flag --elastic-apm-api-key ${apiKey}, for instance:

python scripts/compose.py start 8.0.0 --with-agent-dotnet --elastic-apm-api-key ${apiKey}