ElasticIntel

Serverless, low cost, threat intel aggregation for enterprise or personal use, backed by ElasticSearch.

About

An alternative to expensive threat intel aggregation platforms which ingest the same data feeds you could get for free.

ElasticIntel is designed to provide a central, scalable and easily queryable repository for threat intelligence of all types.

Utilizes amazon services to allow for minimal support needs while maintaining scalability and resilience and performance. (aws lambda, elasticsearch, s3, sns)

Disclaimer.

Currently documentation for this project is lacking due to time constraints. This is actively being fixed and should be much more verbose in a few days. Please check back soon if you're not ready to jump in blind :)

Features

Serverless - No maintenance required
Scalable (all services scale via AWS)
High performance API - API can be used to run extremely high volume queries
Flexible - Feeds can be added via simple json feed configuration
Extensible - written in python and extended by new modules
Cost-effective - Pay only for the backend services - don't worry about API limits
Automated Deployment - platform can be deployed from a single command
Works "out of the box" - comes pre-configured with 30+ opensource intel feeds

Why ElasticIntel

ElasticIntel is the answer to a frustration which arose when evaluating various paid threat intel products and feeds. After reviewing the data from several of these services, I found that 90% of the data they were returning was data from publicly (and freely) available sources, simply aggregated into one place.

Even more frustrating, was the fact that nearly all of them wanted to charge insane amounts for API access to this ame data, which was limited by volume and made it nearly impossible to query the data in any significant volume without paying even more.

Enrichment

Whois enrichment
< more to come >

Architecture

Feed Scheduler lambda - The feed scheduler lambda runs once an hour, just like a cron job. It downloads the configurations for all feeds, checks their scheduled download times and puts a download job into an sns queue a feed needs to be downloaded
Ingest Feed Lambda - The ingest lambda is triggered by messages arriving to an sns topic. When a message arrives, the ingest lamda reads the message, parses out the information about the intel feed and downloads the feed itself. Once downloaded, the ingest lambda stores a copy of the feed in s3 and then parses out the data in the feed. Once the data is parsed, the ingest lambda puts the data into the intel index in elasticsearch for easy querying.

intel objects define in set of values (json)
intel feed objects define the feed itself (url, type(xml, csv, json), schedule)
intel feeds may be easily added simply by defining a new feed configuration in the feeds directory.
for API-based intel feeds, modules may be easily added in the form of python scripts and imported into the main feed manager

feed ingestion is done via a series of lambdas

Feed scheduler:
- the scheduler lambda runs once an hours, reads the various config files and determines if any feeds need to be pulled in
- If a feed is determined to need refreshing, the scheduler lambda launches a new lambda to pull down that feed

Feed ingestion

feeds are ingested through the ingestfeed lambda function. this function is passed a event containing a feed dictionary, as well as the ES index where the indicators from the feed will be stored.

This function then reads the feed dictionary, downloads the appropriate data from the feed url, saves that data to an s3 bucket as a timestamped file, parses that data into intel objects and finally indexes the feed data in teh specified ES index

Elasticsearch

It is important to note that intel is not unique. Each feed is queried daily and some intel may appear in a feed across multiple days. This is by designed, to allow a history view of indicators.

However, this may not be your default expected behavior when querying against the data, so it is important to realize that the number of times an indicator shows up may not be indicative of a high threat score.

setup

Requirements note

if pip3 fails on crypto install, make sure libssl-dev is installed (sudo apt install libssl-dev)

Issues

Elasticsearch, while extremely powerful in its query language, has a very high barrier to entry. For actively slicing and dicing data, piping or copying data to splunk may yield more maleable data.
Queries are best written in the developer tools section of kibana

himanshudas/elasticintel