A Splunk data model for cloud infrastructure data (AWS / GCP / Azure)
- Original Author: Rico Valdez, Rod Soto
- Current maintainers:
- Sourcetype: aws:cloudtrail, aws:cloudwatchlogs:vpcflow, google:gcp:pubsub:message, mscs:azure:audit
- Has index-time ops: false
- Provides a data model suitable for normalizing some of the data coming from AWS, GCP, and Azure.
- Blocks for compute (VM Instances), storage, network traffic, and authentication/authorization
- Includes additional eventtypes, field aliases, tags, and calculations to help populate the model.
- Provides data-model mapping for:
- Basic VM activity (start, stop, create, terminate)
- Basic bucket/object activity for storage use cases
- Network traffic, as provided by vpcflow logs, and gec_instance events for GCP
A Splunk data model is a type of knowledge object that applies an information structure to raw data at search time—regardless of the data's origin or format—and encodes the domain knowledge necessary to build a variety of specialized searches. Data models enhance the efficiency of your searches. You can increase their velocity with data-model acceleration, which creates summaries for the fields you report on, accelerating the dataset represented by those fields.
The Cloud Infrastructure Data Model normalizes and combines your machine data from AWS, Azure, and/or GCP, so you can write analytics that work across all three.
You should have the appropriate add-on(s) for your cloud provider(s) to bring data into Splunk and perform the basic field extractions. These are the following:
- AWS: Splunk Add-on for Amazon Web Services - https://splunkbase.splunk.com/app/1876/
- GCP: Splunk Add-on for Google Cloud Platform - https://splunkbase.splunk.com/app/3088/
- Azure: Splunk Add-on for Microsoft Cloud Services - https://splunkbase.splunk.com/app/3110/
Method 1:
- Download the .tgz package and install as a normal app.
- It may be necessary to move props.conf, tags.conf, and eventtypes.conf from the app default directory to the local directory. If it doesn't already exist, create one at the same level as the default directory.
Method 2:
- Clone or download the repo.
- Find the cloud_infrastructure.json file located in default/data/models.
- Navigate to Settings->Datamodels in your Splunk instance.
- Click the "Upload" button in the top right of the page and select the cloud_infrastrucutre.json file. You can install in whatever app context you like.
- Go back one level to the list of data models and click "Edit," then select "Edit Permissions."
- Select "All Apps" and assign permissions, so that everyone can read and admins can write. Click "Save."
- [optional] Click "Edit," then choose "Edit Data Model Acceleration," if you wish to accelerate the model.
- Drop the included props.conf, tags.conf, and eventtypes.conf in the
local
directory of the app context in which you installed or create/update the associated files in the local directory for the appropriate app. For example, create or modifylocal/props.conf
in the app directory for the Amazon add-on, to include the information under the AWS stanzas in the included props.conf file.
The following search will show you how the data model is being populated with compute data:
| datamodel Cloud_Infrastructure Compute search | table Compute*
You can re-run the search, replacing both instances of "Compute" with "Storage" or "Traffic." You can also look at a specific provider by inserting a search, such as:
| datamodel Cloud_Infrastructure Compute search | search sourcetype=aws:cloudtrail | table Compute*
You can also leverage the |tstats command to search this data as well:
| tstats values(Compute.dest) as machines from datamodel=Cloud_Infrastructure.Compute by Compute.region Compute.vendor
If the data is not showing up, confirm that the provided .conf files are properly deployed. They may need to be moved into an app local
directory to take precedence over existing directives.
Make sure the indexes containing your cloud_infrastructure data are searchable by default, or add the indexes to the definition for the eventtypes, as necessary.
If data is not properly mapped, consider making edits to the lines provided in props.conf.
- Because we-redefine some event types, you may get errors when starting splunk. To rectify, you can place the included props.conf, eventtypes.conf, and tags.conf into an app /local folder, which will give them precedence and resolve conflicts.
- Could always use more Testing/refinement of extractions and mappings
- Tags and eventtypes are used to populate the model. You can re-define these as desired
- Most events have authentication assocaited with them. AWS is scoped down to AssumeRole and ConsoleLogin events only. For google and ms, .....
Field Name | Description | |
---|---|---|
Compute | ||
account | Cloud provider account id | |
action | Describes an action taken on a resource | |
dest | Target of the action (instance or image) | |
event_name | Title for the event | |
http_user_agent | User agent presented with request | |
image_id | Image used to create/run instance | |
instance_id | This is the identifier for the instance associated with the event | |
instance_type | The type of the instance specified | |
msg | Error/code returned to caller | |
region | The region where the resource resides | |
src | Source IP address of the request | |
src_user | Identifier for the user making the request | |
status | Status of the instance | |
user_type | Type of user that made the request | |
vendor | The name of the cloud provider | |
vendor_product | The specific service/product generating the event | |
Storage | ||
account | Cloud provider account ID | |
acl_entity | The user or group the permission is granted to in the acl event | |
acl_permission | The permission specified in the acl event | |
action | Describes an action taken on a resource | |
bucket_name | The name of the storage object (bucket) | |
event_name | Title for the event | |
http_user_agent | User agent presented with request | |
msg | Error/code returned to caller | |
object_path | Full path to storage object | |
region | The region where the resource resides | |
src | Source IP address of the request | |
src_user | Identifier for the user making the request | |
user_type | Type of user that made the request | |
vendor | The name of the cloud provider | |
vendor_product | The specific service/product generating the event | |
Traffic | ||
action | Describes an action taken on a resource | |
bytes | Byte count recorded in event | |
dest | Destination for traffic. Aliased from dest_ip | |
dest_ip | Destination IP address for the traffic | |
dest_port | Destination port for the traffic | |
dest_translated_ip | The NATed IPv4 or IPv6 address to which a packet has been sent | |
dest_translated_port | The NATed port to which a packet has been sent. Note: Do not translate the values of this field to strings (tcp/80 is 80, not http). | |
dest_zone | The network zone of the destination. | |
direction | The direction the packet is traveling. | |
duration | The amount of time for the completion of the network event in seconds. | |
dvc | The device that reported the traffic event. | |
packets | The total count of packets reported in the event | |
protocol_version | Version of the OSI layer 3 protocol. | |
rule | The rule which defines the action that was taken in the network event. Note: This is a string value. Use rule_id for rule fields that are integer data types. The rule_id field is optional, so it is not included in the data model | |
src | Source of traffic. Aliased from src_ip | |
src_ip | Source IP address of the traffic | |
src_port | Source port of the traffic | |
src_translated_ip | The NATed IPv4 or IPv6 address from which a packet has been sent. | |
src_translated_port | The NATed port from which a packet has been sent. Note: Do not translate the values of this field to strings (tcp/80 is 80, not http). | |
src_zone | The network zone of the source | |
tcp_flag | The TCP flag or multiple flags specified in the event | |
transport | The OSI layer 4 (transport) protocol of the traffic observed, in lower case. | |
vendor | The name of the cloud provider | |
vendor_product | The specific service/product generating the event | |
vlan | The virtual local area network (VLAN) specified in the record | |
vpc | The virtual private cloud (VPC) specified in the record | |
Authentication | ||
action | Describes an action taken on a resource - for auth, this is typically success or failure | |
assumed_role_user | For events that assign temp credentials - the credential that is returned for use | |
dest | Destination/target of the authentication | |
dest_ip | Destination IP address for the traffic | |
event_name | Title for the event. | |
group_name | Specifies the group associated with the identity if provided | |
http_user_agent | User agent presented with request | |
mfa_auth | Typically a boolen, indicating if mfa is used for authenticaiton | |
msg | Error/code returned to caller | |
region | The region where the auth takes place | |
src | Source of traffic. Aliased from src_ip | |
src_user | Identifier for the user making the request | |
user_type | Type of user that made the request | |
vendor | The name of the cloud provider | |
vendor_product | The specific service/product generating the event |