Custom Parser

When checking out the detailed metrics you may find that log entries aren’t being parsed. Maybe the log format has changed or you’re logging additional data the author didn’t anticipate. The best thing is to add your own parser.

Types of Parsers

There are several type of parsers and they are used in stages. Some are designed to work with the raw log entries while others are designed to take pre-parsed data and add or enrich it. This way you can do branching and not every parser needs to now how to read a syslog message.

Their Local Path will tell you what stage they kick in at. Use sudo cscli parsers list to display the details. s00-raw works with the ‘raw’ files while s01 and s02 work further down the pipeline. Currently, you can only create s00 and s01 level parsers.

Integrating with Scenarios

Useful parsers supply data that Scenarios are interested in. You can create a parser that watches the system logs for ‘FOOBAR’ entries, extracts the ‘FOOBAR-LEVEL`, and passes it on. But if nothing is looking for ‘FOOBARs’ then nothing will happen.

Let’s say you’ve added the Caddy collection. It’s pulled in a bunch of Scenarios you can view with sudo cscli scenarios list. If you look at one of the assicated files you’ll see a filter section where they look for ’evt.Meta.http_path’ and ’evt.Parsed.verb’. They are all different though, so how do you know what data to supply?

Your best bet is to take an existing parser and modify it.


Note - CrowdSec is pretty awesome and after talking in the discord they’ve already accomodated both these scenarios within a relase cycle or two. So these two examples are solved. I’m sure you’ll find new ones, though ;-)

A Web Example

Let’s say that you’ve installed the Caddy collection, but you’ve noticed basic auth login failures don’t trigger the parser. So let’s add a new file and edit it.

sudo cp /etc/crowdsec/parsers/s01-parse/caddy-logs.yaml /etc/crowdsec/parsers/s01-parse/caddy-logs-custom.yaml

You’ll notice two top level sections where the parsing happens; nodes and statics and some grok pattern matching going on.

Nodes allow you try multiple patterns and if any match, the whole section is considered successful. I.e. if the log could have either the standard HTTPDATE or a CUSTOMDATE, as long as it has one it’s good and the matching can move on. Statics just goes down the list extracting data. If any fail the whole event is considered a fail and dropped as unparseable.

All the pasrsed data gets attached to event as ’evt.Parsed.something’ and some of the statics are moving it to evt values the Senarios will be looking for Caddy logs are JSON formatted and so basically already parsed and this example makes use of the JsonExtract method quite a bit.

# We added the caddy logs in the acquis.yaml file with the label 'caddy' and so we use that as our filter here
filter: "evt.Parsed.program startsWith 'caddy'"
onsuccess: next_stage
# debug: true
name: caddy-logs-custom
description: "Parse custom caddy logs"
 CUSTOMDATE: '%{DAY:day}, %{MONTHDAY:monthday} %{MONTH:month} %{YEAR:year} %{TIME:time} %{WORD:tz}'
  - nodes:
    - grok:
        pattern: '%{NOTSPACE} %{NOTSPACE} %{NOTSPACE} \[%{HTTPDATE:timestamp}\]%{DATA}'
        expression: JsonExtract(evt.Line.Raw, "common_log")
          - target: evt.StrTime
            expression: evt.Parsed.timestamp
    - grok:
        pattern: "%{CUSTOMDATE:timestamp}"
        expression: JsonExtract(evt.Line.Raw, "resp_headers.Date[0]")
          - target: evt.StrTime
            expression: + " " + evt.Parsed.month + " " + evt.Parsed.monthday + " " + evt.Parsed.time + ".000000" + " " + evt.Parsed.year
    - grok:
        pattern: '%{IPORHOST:remote_addr}:%{NUMBER}'
        expression: JsonExtract(evt.Line.Raw, "request.remote_addr")
    - grok:
        pattern: '%{IPORHOST:remote_ip}'
        expression: JsonExtract(evt.Line.Raw, "request.remote_ip")
    - grok:
        pattern: '\["%{NOTDQUOTE:http_user_agent}\"]'
        expression: JsonExtract(evt.Line.Raw, "request.headers.User-Agent")
  - meta: log_type
    value: http_access-log
  - meta: service
    value: http
  - meta: source_ip
    expression: evt.Parsed.remote_addr
  - meta: source_ip
    expression: evt.Parsed.remote_ip
  - meta: http_status
    expression: JsonExtract(evt.Line.Raw, "status")
  - meta: http_path
    expression: JsonExtract(evt.Line.Raw, "request.uri")
  - target: evt.Parsed.request #Add for http-logs enricher
    expression: JsonExtract(evt.Line.Raw, "request.uri")
  - parsed: verb
    expression: JsonExtract(evt.Line.Raw, "request.method")
  - meta: http_verb
    expression: JsonExtract(evt.Line.Raw, "request.method")
  - meta: http_user_agent
    expression: evt.Parsed.http_user_agent
  - meta: target_fqdn
    expression: JsonExtract(evt.Line.Raw, "")
  - meta: sub_type
    expression: "JsonExtract(evt.Line.Raw, 'status') == '401' && JsonExtract(evt.Line.Raw, 'request.headers.Authorization[0]') startsWith 'Basic ' ? 'auth_fail' : ''"

The very last line is where a status 401 is checked. It looks for a 401 and a request for Basic auth. However, this misses events where someone asks for a resource that is protected and the serer responds telling you Basic is needed. I.e. when a bot is poking at URLs on your server ignoring the prompts to login. You can look at the log entries more easily with this command to follow the log and decode it while you recreate failed attempts.

sudo tail -f /var/log/caddy/access.log | jq

To change this, update the expression to also check the response header with an additional ? (or) condition.

    expression: "JsonExtract(evt.Line.Raw, 'status') == '401' && JsonExtract(evt.Line.Raw, 'request.headers.Authorization[0]') startsWith 'Basic ' ? 'auth_fail' : ''"xtract(evt.Line.Raw, 'status') == '401' && JsonExtract(evt.Line.Raw, 'resp_headers.Www-Authenticate[0]') startsWith 'Basic ' ? 'auth_fail' : ''"

Syslog Example

Let’s say you’re using dropbear and failed logins are not being picked up by the ssh parser

To see what’s going on, you use the crowdsec command line interface. The shell command is cscli and you can ask it about it’s metrics to see how many lines it’s parsed and if any of them are suspicious. Since we just restarted, you may not have any syslog lines yet, so let’s add some and check.

ssh [email protected]
logger "This is an innocuous message"

cscli metrics
INFO[28-06-2022 02:41:33 PM] Acquisition Metrics:
| file:/var/log/messages | 1          | -            | 1              | -                      |

Notice that the line we just read is unparsed and that’s OK. That just means it wasn’t an entry the parser cared about. Let’s see if it responds to an actual failed login.


# Enter some bad passwords and then exit with a Ctrl-C. Remember, localhost attempts are whitelisted so you must be remote.
[email protected]'s password:
[email protected]'s password:

cscli metrics
INFO[28-06-2022 02:49:51 PM] Acquisition Metrics:
| file:/var/log/messages | 7          | -            | 7              | -                      |

Well, no luck. We will need to adjust the parser

sudo cp /etc/crowdsec/parsers/s01-parse/sshd-logs.yaml /etc/crowdsec/parsers/s01-parse/sshd-logs-custom.yaml

Take a look at the logfile and copy an example line over to Use a pattern like

Bad PAM password attempt for '%{DATA:user}' from %{IP:source_ip}:%{INT:port}

Assuming you get the pattern worked out, you can then add a section to the bottom of the custom log file you created.

  - grok:
      name: "SSHD_AUTH_FAIL"
      pattern: "Login attempt for nonexistent user from %{IP:source_ip}:%{INT:port}"
      apply_on: message

Last modified June 8, 2023: Added Crowdsec Pages (72da49f)