Make sure you have run this before the demo, because some steps take time and require a decent internet connection.
- Make sure you have your AWS account set up, access key created, and added as environment variables in
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
. Protip: Use https://github.com/sorah/envchain to keep your environment variables safe. - Create the Elastic Cloud instance with the same version as specified in variables.yml's
elastic_version
, enable Kibana as well as the GeoIP & user agent plugins, and set the environment variables with the values forELASTICSEARCH_HOST
,ELASTICSEARCH_USER
,ELASTICSEARCH_PASSWORD
,KIBANA_HOST
,KIBANA_ID
,MYSQL_USER
, andMYSQL_PASSWORD
. - Change into the lightsail/ directory.
- Change the settings to a domain you have registered under Route53 in inventory, variables.tf, and variables.yml. Set the Hosted Zone for that domain and export the Zone ID under the environment variable
TF_VAR_zone_id
. If you haven't created the Hosted Zone yet, you should set it up in the AWS Console first and then set the environment variable. - If you haven't installed the AWS plugin for Terraform, get it with
terraform init
first. Then create the keypair, DNS settings, and instances withterraform apply
. - Open HTTPS on the network configuration on the frontend and monitor instances, MySQL on the backend instance, and TCP 8200 on the monitoring instance (waiting for this Terraform issue).
- Apply the base configuration to all instances with
ansible-playbook configure_all.yml
. - Apply the instance specific configurations with
ansible-playbook configure_backend.yml
andansible-playbook configure_monitor.yml
. - Deploy the JAR with
ansible-playbook deploy_frontend.yml
(Ansible is also building it).
When you are done, remove the instances, DNS settings, and key with terraform destroy
.
Prerequisite: Make sure MySQL is stopped and restart the Java app so it won't successfully come up.
- Heartbeat dashboard: Site is indeed down
- Metricbeat system dashboard: Check the system, which servers do you even have, what is running, the load,...
- Kibana monitoring: You can also get an overview of the system here — specifically in the Beats section.
- Logs: Discuss why you don't want to parse the LOG and we are using JSON instead. Filter down to
application:frontend
andjson.severity:ERROR
. You can see that something is happening with MySQL, so check if Heartbeat is collecting the data? It is, it's just not part of the dashboard. - Custom visualization: Build a custom visualization to show that MySQL is down. For example a horizontal bar, filter on
tcp.port:3306
and split theDate histogram
onmonitor.status
. So something is up with MySQL — time to look at the logs again. - Filebeat system discover: In Filebeat filter to
fileset.module:system
and search formysql
. You will find an entryStopping MySQL Community Server...
— wondering what happened there. - Auditbeat discover: Search for
mysql
in the Auditbeat data. Find the actual command that shut down MySQL, which should besudo service mysql stop
. So now you know what happened and who did it. - Restart MySQL via SSH and then the Java application with
ansible-playbook restart_frontend.yml
, so we have the event for annotations later on. - Insert some data and let the audience go wild on the application.
- Packetbeat dashboards: Show Overview, flows, HTTP, and MySQL.
- Filebeat nginx dashboard: Show similar data as Packetbeat HTTP.
- Metricbeat visualization: Show the collection of application metrics with HTTP and JMX. Visualize it in the visual builder by dividing the
average
ofjolokia.metrics.memory.heap_usage.used
with themax
ofjolokia.metrics.memory.heap_usage.max
. Also annotate from theevents
index with the fieldsuser, application, host
and use the row template{{user}} ran {{application}} on {{host}}
. - APM: Show the init calls — it's kind of expected that they are slow. But why is search doing so many SQL queries? David will need to fix that.
- SSH dashboard: Ask if anybody tried to SSH into the machine and show the status.
- Alerting: If asked about alerting, show the check for Heartbeat data (checks every minute if at least 2 pings failed in the last 5 minutes).
- Machine learning: Optionally add the default nginx jobs. If the service has been running for a while you can see the anomaly when it's down. Though it doesn't use the full potential since we don't have a recurring usage pattern on the site.