updown.io – Website monitoring, simple and inexpensive

Documentation Known issues when monitoring an AWS WAF protected website

A tightly controlled AWS Web Application Firewall can block access to updown.io, especially when used with the Bot Control Feature. This is because the firewall is designed to stop bots, and updown.io needs to be explicitly allowed access.

Here is a list of some known problems and potential solutions to check. If you have suggestions for changes to this page please let us know.

My website is up but updown reports 403 Forbidden?

This likely means AWS is blocking our requests. You'll need to investigate why exactly this is happening if you want to be able to allow updown requests in the most precise way, without allowing too many other bots.

Here are some steps to do that, provided by one of our clients who [investigated such issue (read more)]((https://github.com/freelawproject/courtlistener/issues/3336)). They can also apply to any bots other than updown that you would like to allow.

1. Identify which rule category is blocking the request.

This is done by enabling the ACL's logs so that they're sent to S3 and then querying them with Athena.
Enable the WAF logs and create the non-partitioned table for them as documented here.

With the table in place, it can be queried with:

WITH waf_data AS
    (SELECT from_unixtime(waf.timestamp / 1000) as time,
    waf.terminatingRuleId,
    waf.action,
    waf.httprequest.clientip as clientip,
    waf.httprequest.requestid as requestid,
    waf.httprequest.country as country,
    waf.httprequest.uri as uri,
    rulegroup.terminatingrule.ruleid as matchedRule,
    labels as Labels,
         map_agg(LOWER(f.name),
         f.value) AS kv
    FROM waf_logs waf,
    UNNEST(waf.httprequest.headers)
AS t(f), UNNEST(waf.rulegrouplist) AS t(rulegroup)
    --WHERE rulegroup.terminatingrule.ruleid IS NOT NULL
    GROUP BY 1, 2, 3, 4, 5, 6, 7,8,9)
SELECT waf_data.time,
       waf_data.action,
       waf_data.terminatingRuleId,
       waf_data.matchedRule,
       waf_data.kv['user-agent'] as UserAgent,
       waf_data.clientip,
       waf_data.country,
       waf_data.Labels,
       waf_data.uri
FROM waf_data
WHERE
  waf_data.uri like '/up%'
  --waf_data.kv['user_agent'] like 'updown%'
ORDER BY time DESC
LIMIT 10

Note that this request is filtered to a particular URL that is used for monitoring and to updown User-Agent, so you'll need to adjust that. That query should gives a result similar to:

3 2023-11-01 01:24:21.000 BLOCK allow-monitoring-uri updown.io daemon 2.9 104.238.136.194 US [{name=awswaf:managed:token:absent}, {name=awswaf:managed:aws:bot-control:signal:non_browser_user_agent}] /up

Towards the end of that you see non_browser_user_agent, which is one of the categories of content described here. So in this example, that's the rule and category that will need to be adjusted in your WAF rules. The category is officially known as SignalNonBrowserUserAgent.

Caution: on a different setup at a different time, or even at the same time but from different updown locations, it's possible that our requests will end up in a different category. For example this client also found requests in the known_bot_data_center category.

2. Quick fix

In the bot rule configuration, change SignalNonBrowserUserAgent and KnownBotDataCenter to Count. This allow traffic through, but will also allow traffic from many other bots. So let's see how to be more granular in the following section.

3. Better fix

Add a new rule to the ACL after the bot control rule that blocks all matches of the SignalNonBrowserUserAgent and KnownBotDataCenter categories, except for the path used by updown.io bot.

The rule can look like this:

{
  "Name": "allow-monitoring-uri",
  "Priority": 3,
  "Action": {
    "Block": {}
  },
  "VisibilityConfig": {
    "SampledRequestsEnabled": true,
    "CloudWatchMetricsEnabled": true,
    "MetricName": "match_rule"
  },
  "Statement": {
    "AndStatement": {
      "Statements": [
        "OrStatement": {
          "Statements": [
            {
              "LabelMatchStatement": {
                "Scope": "LABEL",
                "Key": "awswaf:managed:aws:bot-control:signal:non_browser_user_agent"
              }
            },
            {
              "LabelMatchStatement": {
                "Scope": "LABEL",
                "Key": "awswaf:managed:aws:bot-control:signal:known_bot_data_center"
              }
            },
          ],
        },
        {
          "NotStatement": {
            "Statement": {
              "ByteMatchStatement": {
                "FieldToMatch": {
                  "UriPath": {}
                },
                "PositionalConstraint": "STARTS_WITH",
                "SearchString": "/up",
                "TextTransformations": [
                  {
                    "Type": "NONE",
                    "Priority": 0
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

In this example, the path updown monitors starts with /up and so the filtering was configured on this path. But you could also allow the bot through by using the user agent reported by Athena, above.

Adrien Rey-Jarthon

Created on November 02, 2023