This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Logstash

Logstash is a parser and shipper. It reads from (usually) a file, parses the data into JSON, then connects to something else and send the data. That something else can be Elasticsearch, a systlog server, and others.

Logstash v/s Beats

But for most things these days, Beats is a better choice. Give that a look fist.

1 - Installation

Note: Before you install logstash, take a look at Elasticsearch’s Beats. It’s lighter-weight for most tasks.

Quick Install

This is a summary of the current install page. Visit and adjust versions as needed.

# Install java
apt install default-jre-headless
apt-get install apt-transport-https
apt install gnupg2
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | apt-key add -

# Check for the current version - 7 is no longer the current version by now
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | tee -a /etc/apt/sources.list.d/elastic-7.x.list
apt update
apt-get install logstash

Logstash has a NetFlow module, but it has been deprecated2. One should instead use the Filebeat Netflow Module.3

The rest of this page is circa 2014 - use with a grain of salt.

Installation - Linux Clients

Install Java

If you don’t already have it, install it. You’ll need at least 1.7 and Oracle is recommended. However, with older systems do yourself a favor and use the OpenJDK as older versions of Sun and IBM do things with cryptography leading to strange bugs in recent releases of logstash.

# On RedHat flavors, install the OpenJDK and select it for use (in case there are others) with the system alternatives utility
sudo yum install java-1.7.0-openjdk

sudo /usr/sbin/alternatives --config java

Install Logstash

This is essentially:

( Look at https://www.elastic.co/downloads/logstash to get the lastest version or add the repo)
wget (some link from the above page)
sudo yum --nogpgcheck localinstall logstash*

# You may want to grab a plugin, like the syslog output, though elasticsearch installs by default
cd /opt/logstash/
sudo bin/plugin install logstash-output-syslog

# If you're ready to configure the service
sudo vim /etc/logstash/conf.d/logstash.conf

sudo service logstash start

https://www.elastic.co/guide/en/logstash/current/index.html

Operating

Input

The most common use of logstash is to tail and parse log files. You do this by specifying a file and filter like so

[gattis@someHost ~]$ vim /etc/logstash/conf.d/logstash.conf


input {
  file {
    path => "/var/log/httpd/request.log"
  }
}
filter {
  grok {
    match => [ "message", "%{COMBINEDAPACHELOG}"]
  }
}
output {
  stdout {
    codec => rubydebug
  }
}

Filter

There are many different types of filters, but the main one you’ll be using is grok. It’s all about parsing the message into fields. Without this, you just have a bunch of un-indexed text in your database. It ships with some handy macros such as %{COMBINEDAPACHELOG} that takes this:

10.138.120.138 - schmoej [01/Apr/2016:09:39:04 -0400] "GET /some/url.do?action=start HTTP/1.1" 200 10680 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36" 

And turns it into

agent        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36"
auth         schmoej
bytes                   10680
clientip   10.138.120.138
httpversion 1.1
path           /var/pdweb/www-default/log/request.log
referrer   "-"
request   /some/url.do?action=start
response   200
timestamp   01/Apr/2016:09:39:04 -0400
verb        GET 

See the grok’ing for more details

Output

We’re outputting to the console so we can see what’s going on with our config. If you get some output, but it’s not parsed fully because of an error in the parsing, you’ll see something like the below with a “_grokparsefailure” tag. That means you have to dig into a custom pattern as in described in grok’ing.

Note: by default, logstash is ’tailing’ your logs, so you’ll only see new entries. If you’ve got no traffic you’ll have to generate some

{
       "message" => "test message",
      "@version" => "1",
    "@timestamp" => "2014-10-31T17:39:28.925Z",
          "host" => "some.app.private",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}

If it looks good, you’ll want to send it on to your database. Change your output to look like so which will put your data in a default index that kibana (the visualizer) can show by default.

output {

  elasticsearch {
    hosts => ["10.17.153.1:9200"]
  }
}

Troubleshooting

If you don’t get any output at all, check that the logstash user can actually read the file in question. Check your log files and try running logstash as yourself with the output going to the console.

cat /var/log/logstash/*

/opt/logstash/bin/logstash -f /etc/logstash/conf.d/logstash.conf

2 - Operation

Basic Operation

Generally, you create a config with 3 sections;

  • input
  • filter
  • output

This example uses the grok filter to parse the message.

sudo vi /etc/logstash/conf.d/logstash.conf
input {
  file {
        path => "/var/pdweb/www-default/log/request.log"        
      }
}
filter {
  grok {
    match => [ "message", "%{COMBINEDAPACHELOG}"]
  }
}
output {
  stdout { }
}

Then you test it at the command line

# Test the config file itself
/opt/logstash/bin/logstash -f /etc/logstash/conf.d/logstash.conf --configtest

# Test the parsing of data
/opt/logstash/bin/logstash -e -f /etc/logstash/conf.d/logstash.conf

You should get some nicely parsed lines. If that’s the case, you can edit your config to add a sincedb and an actual destination.

input {
  file {
        path => "/var/pdweb/www-default/log/request.log"
        sincedb_path => "/opt/logstash/sincedb"
  }
}
filter {
  grok {
    match => [ "message", "%{COMBINEDAPACHELOG}"]
  }
}
output {
  elasticsearch {
    host => "some.server.private"
    protocol => "http"
  }
}

If instead you see output with a _grokparsefailure like below, you need to change the filter. Take a look at the common gotchas, then the parse failure section below it.

{
       "message" => "test message",
      "@version" => "1",
    "@timestamp" => "2014-10-31T17:39:28.925Z",
          "host" => "some.app.private",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}

Common Gotchas

No New Data

Logstash reads new lines by default. If you don’t have anyone actually hitting your webserver, but you do have some log entries in the file itself, you can tell logstash to process the exiting entries and not save it’s place in the file.

file {
  path => "/var/log/httpd/request.log"
    start_position => "beginning"
    sincedb_path => "/dev/null"
}

Multiple Conf files

Logstash uses all the files in the conf.d directory - even if they don’t end in .conf. Make sure to remove any you don’t want as they can conflict.

Default Index

Logstash creates Elasticsearch indexes that look like:

logstash-%{+YYYY.MM.dd}

The logstash folks have some great material on how to get started. Really top notch.

http://logstash.net/docs/1.4.2/configuration#fieldreferences

Parse Failures

The Greedy Method

The best way to start is to change your match to a simple pattern and work out from there. Try the ‘GREEDYDATA’ pattern and assign it to a field named ‘Test’. This takes the form of:

%{GREEDYDATA:Test}

And it looks like:

filter {
  grok {
    match => [ "message" => "%{GREEDYDATA:Test}" ]
  }
}


       "message" => "test message",
      "@version" => "1",
    "@timestamp" => "2014-10-31T17:39:28.925Z",
          "host" => "some.app.private",
          "Test" => "The rest of your message

That should give you some output. You can then start cutting it up with the patterns (also called macros) found here;

You can also use the online grok debugger and the list of default patterns.

Combining Patterns

There may not be a standard pattern for what you want, but it’s easy to pull together several existing ones. Here’s an example that pulls in a custom timestamp.

Example:
Sun Oct 26 22:20:55 2014 File does not exist: /var/www/html/favicon.ico

Pattern:
match => { "message" => "(?<timestamp>%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR})"}

Notice the ‘?’ at the beginning of the parenthetical enclosure. That tells the pattern matching engine not to bother capturing that for later use. Like opting out of a ( ) and \1 in sed.

Optional Fields

Some log formats simply skip columns when they don’t have data. This will cause your parse to fail unless you make some fields optional with a ‘?’, like this:

match => [ "message", "%{HOSTNAME:VHost}? %{COMBINEDAPACHELOG} %{IP:XForwardedFor}?"]

Date Formats

http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html

Dropping Events

Oftentimes, you’ll have messages that you don’t care about and you’ll want to drop those. Best practice is to do coarse actions first, so you’ll want to compare and drop with a general conditional like:

filter {
  if [message] =~ /File does not exist/ {
    drop { }
  }
  grok {
    ...
    ...

You can also directly reference fields once you have grok’d the message

filter {
  grok {
    match => { "message" => "%{HOSTNAME:VHost}? %{COMBINEDAPACHELOG} %{IP:XForwardedFor}?"}
  }  
  if [request] == "/status" {
        drop { }
  }
}

http://logstash.net/docs/1.4.2/configuration#conditionals

Dating Messages

By default, logstash date stamps the message when it sees them. However, there can be a delay between when an action happens and when it gets logged to a file. To remedy this - and allow you to suck in old files without the date on every event being the same - you add a date filter.

Note - you actually have to grok out the date into it’s own variable, you can’t just attempt to match on the whole message. The combined apache macro below does this for us.

filter { grok { match => { “message” => “%{HOSTNAME:VHost}? %{COMBINEDAPACHELOG} %{IP:XForwardedFor}?”} } date { match => [ “timestamp” , “dd/MMM/yyyy:HH:mm:ss Z” ] } }

In the above case, ’timestamp’ is a parsed field and you’re using the date language to tell it what the component parts are

http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html

Sending to Multiple Servers

In addition to an elasticsearch server, you may want to send it to a syslog server at the same time.

    input {
      file {
        path => "/var/pdweb/www-default/log/request.log"
        sincedb_path => "/opt/logstash/sincedb"
      }
    }

    filter {
      grok {
        match => [ "message", "%{HOSTNAME:VHost}? %{COMBINEDAPACHELOG} %{IP:XForwardedFor}?"]
      }
      date {
        match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
      }

    }

    output {
      elasticsearch {
        host => "some.server.private"
        protocol => "http"
      }
      syslog {
        host => "some.syslog.server"
        port => "514"
        severity => "notice"
        facility => "daemon"
      }
    }

Deleting Sent Events

Sometimes you’ll accidentally send a bunch of event to the server and need to delete and resend corrected versions.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-delete-mapping.html

curl -XDELETE <http://localhost:9200/_all/SOMEINDEX>
curl -XDELETE <http://localhost:9200/_all/SOMEINDEX?q=path:"/var/log/httpd/ssl_request_log>"

3 - Index Routing

When using logstash as a broker, you will want to route events to different indexes according to their type. You have two basic ways to do this;

  • Using Mutates with a single output
  • Using multiple Outputs

The latter is significantly better for performance. The less you touch the event, the better it seems. When testing these two different configs in the lab, the multiple output method was about 40% faster when under CPU constraint. (i.e. you can always add more CPU if you want to mutate the events.)

Multiple Outputs

    input {
      ...
      ...
    }
    filter {
      ...
      ...
    }
    output {

      if [type] == "RADIUS" {
        elasticsearch {
          hosts => ["localhost:9200"]
          index => "logstash-radius-%{+YYYY.MM.dd}"
        }
      }

      else if [type] == "RADIUSAccounting" {
        elasticsearch {
          hosts => ["localhost:9200"]
          index => "logstash-radius-accounting-%{+YYYY.MM.dd}"
        }
      }

      else {
        elasticsearch {
          hosts => ["localhost:9200"]
          index => "logstash-test-%{+YYYY.MM.dd}"
        }
      }

    }

Mutates

If your source system includes a field to tell you want index to place it in, you might be able to skip mutating altogether, but often you must look at the contents to make that determination. Doing so does reduce performance.

input {
  ...
  ...
}
filter {
  ...
  ... 

  # Add a metadata field with the destination index based on the type of event this was
  if [type] == "RADIUS" {
    mutate { add_field => { "[@metadata][index-name]" => "logstash-radius" } } 
  }
  else  if [type] == "RADIUSAccounting" {
    mutate { add_field => { "[@metadata][index-name]" => "logstash-radius-accounting" } } 
  }
  else {
    mutate { add_field => { "[@metadata][index-name]" => "logstash-test" } } 
  }
}
output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "%{[@metadata][index-name]}-%{+YYYY.MM.dd}"
  }
}

https://www.elastic.co/guide/en/logstash/current/event-dependent-configuration.html#metadata

4 - Database Connections

You can connect Logstash to a database to poll events almost as easily as tailing a log file.

Installation

The JDBC plug-in ships with logstash so no installation of that is needed. However, you do need the JDBC driver for the DB in question.

Here’s an example for DB2, for which you can get the jar from either the server itself or the DB2 fix-pack associated with the DB Version you’re running. The elasticsearch docs say to just put it in your path. I’ve put it in the logstash folder (based on some old examples) and we’ll see if it survives upgrades.

sudo mkdir /opt/logstash/vendor/jars
sudo cp /home/gattis/db2jcc4.jar /opt/logstash/vendor/jars
sudo chown -R logstash:logstash /opt/logstash/vendor/jars

Configuration

Configuring the input

Edit the config file like so

sudo vim /etc/logstash/conf.d/logstash.conf

    input {
      jdbc {
        jdbc_driver_library => "/opt/logstash/vendor/jars/db2jcc4.jar"
        jdbc_driver_class => "com.ibm.db2.jcc.DB2Driver"
        jdbc_connection_string => "jdbc:db2://db1.tim.private:50000/itimdb"
        jdbc_user => "itimuser"
        jdbc_password => "somePassword"
        statement => "select * from someTable"
      }
    }

Filtering

You don’t need to do any pattern matching, as the input emits the event pre-parsed based on the DB columns. You may however, want to match a timestamp in the database.

    # A sample value in the 'completed' column is 2016-04-07 00:41:03:291 GMT

    filter {
      date {
        match => [ "completed" , "yyyy-MM-dd HH:mm:ss:SSS zzz" ]
      }
    }

Output

One recommended trick is to link the primary keys between the database and kibana. That way, if you run the query again you update the existing elasticsearch records rather than create duplicates ones. Simply tell the output plugin to use the existing primary key from the database for the document_id when it sends it to elasticsearch.

    # Database key is the column 'id'

    output {
      elasticsearch {
        hosts => ["10.17.153.1:9200"]
        index => "logstash-db-%{+YYYY}"

        document_id => "${id}"

        type => "isim-process"

      }
    }

Other Notes

If any of your columns are non-string type, logstash and elasticsearch will happily store them as such. But be warned that kibana will round them to 16 digits due to a limitation of javascript.

https://github.com/elastic/kibana/issues/4356

Sources

https://www.elastic.co/blog/logstash-jdbc-input-plugin https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html

5 - Multiline Matching

Here’s an example that uses the multiline codec (preferred over the multiline filter, as it’s more appropriate when you might have more than one input)

input {
  file {
    path => "/opt/IBM/tivoli/common/CTGIM/logs/access.log"
    type => "itim-access"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    codec => multiline {
      pattern => "^<Message Id"
      negate => true
      what => previous
    }
  }
}

Getting a match can be difficult, as grok by default does not match against multiple lines. You can mutate to remove all the new lines, or use a seemingly secret preface, the ‘(?m)’ directive as shown below

filter {
  grok {
    match => { "message" => "(?m)(?<timestamp>%{YEAR}.%{MONTHNUM}.%{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND}%{ISO8601_TIMEZONE})%{DATA}com.ibm.itim.security.%{WORD:catagory}%{DATA}CDATA\[%{DATA:auth}\]%{DATA}CDATA\[%{DATA:clientip}\]"}
  }

https://logstash.jira.com/browse/LOGSTASH-509