airbnb/streamalert

[RFC] App invocation fails when last timestamp not present and using CRON as schedule.

gavinelder opened this issue · 0 comments

Background

When creating a new StreamAlert app which has never been invoked you may run into an issue using CRON as a scheduling mechanism vs rate.

This is due to the _last_timestamp not being set and the application attempting to determine the correct _last_timestamp to use based upon converting the rate to seconds and subtracting this from the current time stamp.

sadly the application only knows how to handle RATE and fails with a valid CRON expression.

Note running into this edge-case should be rare, however it's rarity may make it hard to troubleshoot, this ISSUE is over-verbose with relevant stack traces to help aid discovery via search engines.

The workarounds are as follows

  • Deploy App with Rate , switch to cron after first run
  • Update AWS SSM Parameter store manually however this requires detailed knowledge and is not recommended.
  • Handle this case within your App before _gather_logs() is called.

Potential Solutions

It is possible to use a library such as CronIter to get the last theoretical invocation time of an application based on this schedule, this however would require importing an additional python package which may introduce issues in the future.

Another potential approach is to set self._last_timestamp on application deploy via the manage.py CLI this would mean that you would not have to import an additional library or handling the rate -> previous time conversion to determine this timestamp.

Another option is to leave this issue open as a known issue document the workarounds and address at a later stage as for many this will be a non-blocking issue, CRON is only mentioned in the Docs around scheduled queiries and most users deploying SA for the first time will use rate by following the guides.

Stack Error

From the Stack Error it can be seen that their is an error surfacing from _evaluate_interval

[ERROR] AppConfigError: Invalid 'rate' interval value: cron(0/10 * * * ? *)
Traceback (most recent call last):
  File "/var/task/streamalert/apps/main.py", line 30, in handler
    StreamAlertApp.get_app(event['app_type'])(event, context).gather()
  File "/var/task/streamalert/apps/app_base.py", line 389, in gather
    if not self._initialize():
  File "/var/task/streamalert/apps/app_base.py", line 220, in _initialize
    self._config.set_starting_timestamp(self.date_formatter())
  File "/var/task/streamalert/apps/config.py", line 74, in set_starting_timestamp
    self.start_last_timestamp = self._determine_last_time(date_format)
  File "/var/task/streamalert/apps/config.py", line 262, in _determine_last_time
    interval_time = self._evaluate_interval()
  File "/var/task/streamalert/apps/config.py", line 338, in _evaluate_interval
    '
{}
'.format(self._schedule))

streamalert/apps/config.py

Looking at where it is called self.last_timestamp is None as such it looks to determine the interval and do a timediff based on the rate vs current time , for example if the rate is 10 minutes set timestamp to 10 minutes in the past to use that as a starting point.

    def _determine_last_time(self, date_format):
        """Determine the last time this function was executed and fallback on
        evaluating the rate value if there is no last timestamp available

        Returns:
            int: The unix timestamp for the starting point to fetch logs back to
        """
        if not self.last_timestamp:
            interval_time = self._evaluate_interval()
             ....

Within the evaluate interval

streamalert/apps/config.py#330

    def _evaluate_interval(self):
        """Get the interval at which this function is executing. This translates
        an AWS Rate Schedule Expression ('rate(2 hours)') into a second interval
        """
        rate_match = AWS_RATE_RE.match(self._schedule)

        if not rate_match:
            raise AppConfigError('Invalid \'rate\' interval value: '
                                 '{}'.format(self._schedule))

Where the regex is as follows streamalert/apps/config.py#32

AWS_RATE_RE = re.compile(r'^rate\(((1) (minute|hour|day)|'
                         r'([2-9]+|[1-9]\d+) (minutes|hours|days))\)$')

Description

Steps to Reproduce

  1. Deploy new SA App.
  2. Use CRON as schedule.