Grok

Note - in progess


The two best resources are the grok debugger

http://grokdebug.herokuapp.com/
And the list of patterns available by default

Alternative Patterns

If you see output with a parse failure like below, you need a custom (or different) pattern.

{
       "message" => "test message",
      "@version" => "1",
    "@timestamp" => "2014-10-31T17:39:28.925Z",
          "host" => "some.app.private",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}

The best way to start is to change your match to a simple pattern and work out from there. Try the 'GREEDYDATA' pattern and assign it to a field named 'Test'. This takes the form of:

%{GREEDYDATA:Test}

And it looks like: 

filter {
  grok {
    match => [ "message" => "%{GREEDYDATA:Test}" ]
  }
}

{
       "message" => "test message",
      "@version" => "1",
    "@timestamp" => "2014-10-31T17:39:28.925Z",
          "host" => "some.app.private",
          "Test" => "The rest of your message
}

That should give you some output. You can then start cutting it up with the patterns (also called macros) listed above.

Optional Fields
Some log formats simply skip columns when they don't have data. This will cause your parse to fail unless you make some fields optional with a '?',  like this:

match => [ "message", "%{HOSTNAME:VHost}? %{COMBINEDAPACHELOG} %{IP:XForwardedFor}?"]

Making your own patterns from existing patterns

You may need to pull together several existing patterns into a new one. You do that with (?<someName>%someMacro %someOtherMacro). Here's an example that pulls in a custom timestamp

Example:
Sun Oct 26 22:20:55 2014 File does not exist: /var/www/html/favicon.ico

Pattern:
match => { "message" => "(?<timestamp>%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR})"}

Notice the '?' at the beginning of the parenthetical enclosure. That simply tells the pattern matching engine not to capture that enclose for later use, similar to how \(  \) and \1 works in sed.

Direct patterns







grok 
uses the Oniguruma Regular Expressions at http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt


Date
http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html



Notes:

Multiple Conf files
Logstash uses all the files on the conf.d directory - even if they don't end in .conf. 

Default Index
logstash-%{+YYYY.MM.dd}




You can test that anything is working at all at the command line
    sudo /opt/logstash/bin/logstash -e 'input { stdin { } } output { stdout { codec => rubydebug } }'


And test your config
    /opt/logstash/bin/logstash -f /etc/logstash/conf.d/logstash-shib.conf --configtest


http://logstash.net/docs/1.4.2/configuration#fieldreferences
The logstash folks have some great material on how to get started. Really top notch.


Deleting

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-delete-mapping.html

curl -XDELETE http://localhost:9200/_all/shib
curl -XDELETE http://localhost:9200/_all/shib?q=path:"/var/log/httpd/ssl_request_log"




You can also directly reference fields once you have grok'd the message

filter {
  grok {
    match => { "message" => "%{HOSTNAME:VHost}? %{COMBINEDAPACHELOG} %{IP:XForwardedFor}?"}
  }  
  if [request] == "/status"
    drop {}
}













 and that you have "start_position" defined, so that it reads in the existing data. If you got output only once, make sure you set 'sincedb' to /dev/null so it will stop remembering where it left off.







There are other macros available as well and you can see the details here:

http://logstash.net/docs/1.4.2/filters/grok



When you're working with a custom log source you'll need to build a custom parse string. If that's the case, and it often will be, jump to that section below. 


You generally want to test your config at the command line (hence the output to stdout) and you can launch it interactively like this

[gattis@someHost ~]$ /opt/logstash/bin/logstash -f /etc/logstash/conf.d/logstash.conf

Logstash reads new lines by default. If you don't have anyone actually hitting your webserver, but you do have some log entries in the file itself,  you can tell logstash to process the exiting entries and not save it's place in the file.

  file {
    path => "/var/log/httpd/request.log"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }

That will spit out every line in the file to your terminal, nicely parsed. Take a look at the attributes. If you see one that says 'tag: _grokparsefailure' you'll need to look at parsing in detail, to figure out what's wrong.

Dropping Messages you don't want

Oftentimes, you'll have entires that you don't care about and you'll want to drop those.  Best practice is to do coarse actions first, so you'll want to compare and drop with a general conditional like:

filter {
  if [message] =~ /File does not exist/ {
    drop { }
  }
  grok {
    ...
    ...

http://logstash.net/docs/1.4.2/configuration#conditionals

Dating Messages
By default, logstash date stamps the message when it sees them. However, there can be a delay between when an action happens and when it gets logged to a file. To remedy this - and allow you to suck in old files without the date on every event being the same - you apply a date filter.

Note - you actually have to grok out the date into it's own variable, you can't just attempt to match on the whole message.  The combined apache macro below does this for us.


filter {
  grok {
    match => { "message" => "%{HOSTNAME:VHost}? %{COMBINEDAPACHELOG} %{IP:XForwardedFor}?"}
  }
  date {
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
}

In the above case, 'timestamp' is a parsed field and you're using the date language to tell it what the component parts are
http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html

Sending the Data

If everything looks good, then you can change the output section to send that message to the destination of your choice. This is often an elasticsearch server, but you may want to send it to a syslog server, or both at once.

input {
  file {
    path => "/var/pdweb/www-default/log/request.log"
    sincedb_path => "/opt/logstash/sincedb"
  }
}

filter {
  grok {
    match => [ "message", "%{HOSTNAME:VHost}? %{COMBINEDAPACHELOG} %{IP:XForwardedFor}?"]
  }
  date {
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
  }

}

output {
  elasticsearch {
    host => "some.server.private"
    protocol => "http"
  }
  syslog {
    host => "some.syslog.server"
    port => "514"
    severity => "notice"
    facility => "daemon"
  }
}




Comments