[Meta] Elastic Agent Inputs

Question

[Meta] Elastic Agent Inputs

kvch opened this issue 2 years ago · 5 comments

What is an Elastic Input?

Elastic Inputs are binaries that are distributed with and controlled by Elastic Agent. As the name implies they collect data from and on endpoints where they are deployed.

Goal

The goal of this project is to design a developer friendly inputs platform. The system should help people to write new inputs and test them easily. This ticket lists the requirements and defines the different development phases.

You can read about the motivation in the design document below.

Requirements

Input

Each input must speak the control protocol of Elastic Agent
Each input must provide a manifest file describing its features
Each input must run in a separate process
Each input must log to stdout and stderr in JSON format
Each input must be able to test a configuration and report errors without starting data collection
Each input must be able to send events to the shipper
Each input must provide a reference configuration to aid integration developers
Each input must accept a logger on init and enrich log events with metadata so the issues can be tracked down elastic/beats#9177

Developer experience

Testing framework
- to check if the input understands the control protocol
- to check if the input can report its own state, metrics and logs
Input code generator to let users work on production logic and not waste time on boilerplate
Write a guide for Elastic developers for migrating existing inputs

Development plan

Phase I.

The goal of the first phase is to lay the foundation for the new developer tooling by developing the first new agent input. The first input developed will be the agent load generator described in elastic/elastic-agent-shipper#57. Developing a completely new input will allow this phase to focus on the basic input structure, developer tooling, and release process.

TODO

TBD.

Input development to support elastic/elastic-agent-shipper#57
elastic/beats#2
Use the new V2 protocol status reporting features to support: elastic/elastic-agent#100
elastic/beats#3
elastic/beats#5

Dependencies

Packaging

Packaging is done with mage. Every binary is crossbuilt in a Docker container. Then the resulting binaries and generated files are moved to a common folder.

A package contains inputs in separate binaries with their manifest files.

$ tar tvf elastic-agent-inputs-linux-arm64.tar.gz
inputs/
  filestream/
    filestream
    manifest.yml
  journald/
    journald
    manifest.yml
...

If someone wants to create their own inputs package, it must follow the pattern above. Then this package has to be extracted into the folder where Agent can find it.

inputs/
  {input_name}/
    {input_name}
    manifest.yml

The packages could be installed manually by moving the inputs to the appropriate folder. However, we should also provide DEB, RPM and other packages that can be installed along with Elastic Agent.

More details here: elastic/elastic-agent#222

Phase II.

In this phase we focus on making input migration and development accessible to both internal and external developers. In this phase the filestream and system metrics inputs will be migrated to the new framework.

TODO

Dependencies

elastic/elastic-agent#222

Phase III.

Phase III is going to be the first phase for moving real inputs to the new system. The inputs selected for this phase are easy to move because they already use the inputs v2 architecture. We could delegate/ask other teams to help with inputs under x-pack/filebeat.

Phase IV.

This is the last phase of the development. I expect collaboration from all teams as our team does not have the bandwidth to migrate all existing inputs. In this phase the Data plane team should focus on supporting other developers by reviewing their work, adjusting documentation if needed, etc.

TODO

Move remaining inputs...
Document how to add independent inputs written in arbitrary languages

Documentation

Document how to collect monitoring information from inputs
Document how to add new Golang inputs to our input manager
Document how to add independent inputs written in arbitrary programming languages

@cmacknz @belimawr Looking at this issue content I do think we should, for the first steps, focus on providing a framework to let team develop their own inputs. Providing building blocks should be enough to move forward without migrating filestream or any other input we own today.
If you agree with that, should we change the order of the steps listed here: #22 (comment)

cc @pierrehilbert @joshdover

Answer 3 · 2022-11-08T16:00:43.000Z

We definitely need to re-evaluate the plan for this project. I agree we don't need to migrate an existing input, but I think we do need to develop a reference input. It does not need to be a pre-existing input, and could continue to be the load generator we already started.

We need to make sure we discover all of the issues developers will hit operating outside of the Beat framework, there will be questions like:

Should inputs implement any kind of buffering or queueing?
Can we make an easy to use framework for reporting errors and status back to the agent?
How should new inputs define their configuration and validate it?

I think it will be hard to validate our answer to those questions without actually writing an input ourselves and testing it thoroughly.

Answer 4 · 2022-11-08T18:44:49.000Z

Also one more note, there is a requirement above that I am not sure will be true anymore

Each input must run in a separate process

I think the agent ended up grouping each type of input into a separate process, for example if there are multiple filestream inputs they will be grouped into the same process.

We should revisit some of the requirements based on what was actually implemented in the v2 agent, elastic/elastic-agent#1643 should provide a summary once it is complete.

Answer 5 · 2022-11-10T14:02:40.000Z

I agree with @cmacknz

I think it will be hard to validate our answer to those questions without actually writing an input ourselves and testing it thoroughly.

The "little" I worked on the load generator input regarding the publishing pipeline already raised multiple questions that only seem to grow the more I learn about Beats internals. Libbeat does take a lot of responsibilities from the developers (if you compare creating a new Beat to creating a new input). We need to go through it ourselves to even fully understand what is an input and what is the "publishing pipeline" that can be turned into a framework.

What is an Elastic Input?

Goal

Requirements

Input

Developer experience

Development plan

Phase I.

TODO

Dependencies

Packaging

Phase II.

TODO

Dependencies

Phase III.

Phase IV.

TODO

Documentation

Related