This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Elastic Stack

This is also referred to ELK, and is an acronym that stands for Elasticsearch, Logstash and Kibana

This is a trio of tools that <www.elasticsearch.org> has packaged up into a simple and flexible way to handle, store and visualize data. Logstash collects the logs, parses them and stores them in Elasticsearch. Kibana is a web application that knows how to to talk to Elasticsearch and visualizes the data.

Quite simple and powerful

To make use of this tio, start by deploying in this order:

  • Elasticseach (first, you have have some place to put things)
  • Kibana (so you can see what’s going on in elasticsearch easily)
  • Logstash (to start collecting data)

More recently, you can use the Elasticsearch Beats client in place of Logstash. These are natively compiled clients that have less capability, but are easier on the infrastructure than Logstash, a Java application.

1 - Elasticsearch

1.1 - Installation (Linux)

This is circa 2014 - use with a grain of salt.

This is generally the first step, as you need a place to collect your logs. Elasticsearch itself is a NoSQL database and well suited for pure-web style integrations.

Java is required, and you may wish to deploy Oracle’s java per the Elasticsearch team’s recommendation. You may also want to dedicate a data partition. By default, data is stored in /var/lib/elasticsearch and that can fill up. We will also install the ‘kopf’ plugin that makes it easier to manage your data.

Install Java and Elasticsearch

# (add a java repo)
sudo yum install java

# (add the elasticsearch repo)
sudo yum install elasticsearch

# Change the storage location
sudo mkdir /opt/elasticsearch
sudo chown elasticsearch:elasticsearch /opt/elasticsearch

sudo vim /etc/elasticsearch/elasticsearch.yml

    ...
    path.data: /opt/elasticsearch/data
    ...

# Allow connections on ports 9200, 9300-9400 and set the cluster IP

# By design, Elasticsearch is open so control access with care
sudo iptables --insert INPUT --protocol tcp --source 10.18.0.0/16 --dport 9200 --jump ACCEPT

sudo iptables --insert INPUT --protocol tcp --source 10.18.0.0/16 --dport 9300:9300 --jump ACCEPT

sudo vim /etc/elasticsearch/elasticsearch.yml
    ...
    # Failing to set the 'publish_host can result in the cluster auto-detecting an interface clients or other
    # nodes can't reach. If you only have one interface you can leave commented out. 
    network.publish_host: 10.18.3.1
    ...


# Increase the heap size
sudo vim  /etc/sysconfig/elasticsearch

    # Heap size defaults to 256m min, 1g max
    # Set ES_HEAP_SIZE to 50% of available RAM, but no more than 31g
ES_HEAP_SIZE=2g

# Install the kopf plugin and access it via your browser

sudo /usr/share/elasticsearch/bin/plugin -install lmenezes/elasticsearch-kopf
sudo service elasticsearch restart

In your browser, navigate to

http://10.18.3.1:9200/_plugin/kopf/

If everything is working correctly you should see a web page with KOPF at the top.

1.2 - Installation (Windows)

You may need to install on windows to ensure the ‘maximum amount of service ability with existing support staff’. I’ve used it on both Windows and Linux and it’s fine either way. Windows just requires a few more steps.

Requirements and Versions

The current version of Elasticsearch at time of writing these notes is 7.6. It requires an OS and Java. The latest of those supported are:

  • Windows Server 2016
  • OpenJDK 13

Installation

The installation instructions are at https://www.elastic.co/guide/en/elastic-stack-get-started/current/get-started-elastic-stack.html

Note: Elasicsearch has both an zip and a MSI. The former comes with a java distro but the MSI includes a service installer.

Java

The OpenJDK 13 GA Releases at https://jdk.java.net/13/ no longer include installers or the JRE. But you can install via a MSI from https://github.com/ojdkbuild/ojdkbuild

Download the latest java-13-openjdk-jre-13.X and execute. Use the advanced settings to include the configuration of the JAVA_HOME and other useful variables.

To test the install, open a command prompt and check the version

C:\Users\allen>java --version
openjdk 13.0.2 2020-01-14
OpenJDK Runtime Environment 19.9 (build 13.0.2+8)
OpenJDK 64-Bit Server VM 19.9 (build 13.0.2+8, mixed mode, sharing)

Elasticsearch

Download the MSI installer from https://www.elastic.co/downloads/elasticsearch. It may be tagged as beta, but it installs the GA product well. Importantly, it also installs a windows service for Elasticsearch.

Verify the installation by checking your services for ‘Elasticsearch’, which should be running.

Troubleshooting

Elasticsearch only listing on localhhost

By default, this is the case. You must edit the config file.

# In an elevated command prompt
notepad C:\ProgramDaata\Elastic\Elasticsearach\config\elasticsearch.yml

# add
discovery.type: single-node
network.host: 0.0.0.0

https://stackoverflow.com/questions/59350069/elasticsearch-start-up-error-the-default-discovery-settings-are-unsuitable-for

failure while checking if template exists: 405 Method Not Allowed

You can’t run newer versions of the filebeat with older versions of elasticsearch. Download the old deb and sudo apt install ./some.deb

https://discuss.elastic.co/t/filebeat-receives-http-405-from-elasticsearch-after-7-x-8-1-upgrade/303821 https://discuss.elastic.co/t/cant-start-filebeat/181050

1.3 - Common Tasks

This is circa 2014 - use with a grain of salt.

Configuration of elasticsearch itself is seldom needed. You will have to maintain the data in your indexes however. This is done by either using the kopf tool, or at the command line.

After you have some data in elasticsearch, you’ll see that your ‘documents’ are organized into ‘indexes’. This is a simply a container for your data that was specified when logstash originally sent it, and the naming is arbitrarily defined by the client.

Deleting Data

The first thing you’re likely to need is to delete some badly-parsed data from your testing.

Delete all indexes with the name test*

curl -XDELETE http://localhost:9200/test*

Delete from all indexes documents of type ‘WindowsEvent’

curl -XDELETE http://localhost:9200/_all/WindowsEvent

Delete from all indexes documents have the attribute ‘path’ equal to ‘/var/log/httpd/ssl_request.log’

curl -XDELETE 'http://localhost:9200/_all/_query?q=path:/var/log/https/ssl_request.log'

Delete from the index ’logstash-2014.10.29’ documents of type ‘shib-access’

curl -XDELETE http://localhost:9200/logstash-2014.10.29/shib-access

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

Curator

All the maintenance by hand has to stop at some point and Curator is a good tool to automate some of it. This is a script that will do some curls for you, so to speak.

Install

wget https://bootstrap.pypa.io/get-pip.py
sudo python get-pip.py
sudo pip install elasticsearch-curator
sudo pip install argparse

Use

curator --help
curator delete --help

And in your crontab

# Note: you must escape % characters with a \ in crontabs
20 0 * * * curator delete indices --time-unit days --older-than 14 --timestring '\%Y.\%m.\%d' --regex '^logstash-bb-.*'
20 0 * * * curator delete indices --time-unit days --older-than 14 --timestring '\%Y.\%m.\%d' --regex '^logstash-adfsv2-.*'
20 0 * * * curator delete indices --time-unit days --older-than 14 --timestring '\%Y.\%m.\%d' --regex '^logstash-20.*'

Sometimes you’ll need to do an inverse match.

0 20 * * * curator delete indices --regex '^((?!logstash).)*$'

A good way to test your regex is by using the show indices method

curator show indices --regex '^((?!logstash).)*$'

Here’s some OLD posts and links, but be aware the syntax had changed and it’s been several versions since these

http://www.ragingcomputer.com/2014/02/removing-old-records-for-logstash-elasticsearch-kibana http://www.elasticsearch.org/blog/curator-tending-your-time-series-indices/ http://stackoverflow.com/questions/406230/regular-expression-to-match-line-that-doesnt-contain-a-word

Replication and Yellow Cluster Status

By default, elasticsearch assumes you want to have two nodes and replicate your data and the default for new indexes is to have 1 replica. You may not want to do that to start with however, so you change the default and change the replica settings on your existing data in-bulk with:

http://stackoverflow.com/questions/24553718/updating-the-default-index-number-of-replicas-setting-for-new-indices

Set all existing replica requirements to just one copy

curl -XPUT 'localhost:9200/_settings' -d '
{ 
  "index" : { "number_of_replicas" : 0 } 
}'

Change the default settings for new indexes to have just one copy

curl -XPUT 'localhost:9200/_template/logstash_template' -d ' 
{ 
  "template" : "*", 
  "settings" : {"number_of_replicas" : 0 }
} '

http://stackoverflow.com/questions/24553718/updating-the-default-index-number-of-replicas-setting-for-new-indices

Unassigned Shards

You will occasionally have a hiccup where you run out of disk space or something similar and be left with indexes that have no data in them or have shards unassigned. Generally, you will have to delete them but you can also manually reassign them.

http://stackoverflow.com/questions/19967472/elasticsearch-unassigned-shards-how-to-fix

Listing Index Info

You can get a decent human readable list of your indexes using the cat api

curl localhost:9200/_cat/indices

If you wanted to list by size, they use the example

curl localhost:9200/_cat/indices?bytes=b | sort -rnk8 

2 - Kibana

2.1 - Installation (Windows)

Kibana is a Node.js app using the Express Web framework - meaning to us it looks like a web server running on port 5601. If you’re running elasticsearch on the same box, it will connect with the defaults.

https://www.elastic.co/guide/en/kibana/current/windows.html

Download and Extract

No MSI or installer is available for windows so you must download the .zip from https://www.elastic.co/downloads/kibana. Uncompress (this will take a while), rename it to ‘Kibana’ and move it to Program Files.

So that you may access it later, edit the config file at {location}/config/kibana.yaml with wordpad and set the server.host entry to:

server.host: "0.0.0.0"

Create a Service

Download the service manager NSSM from https://nssm.cc/download and extract. Start an admin powershell, navigate to the extracted location and run the installation command like so:

C:\Users\alleng\Downloads\nssm-2.24\nssm-2.24\win64> .\nssm.exe install Kibana

In the Pop-Up, set the application path to the below. The start up path will auto populate.

C:\Program Files\Kibana\kibana-7.6.2-windows-x86_64\bin\kibana.bat

Click ‘Install service’ and it should indicate success. Go to the service manager to find and start it. After a minute (Check process manager for the CPU to drop) You should be able to access it at:

http://localhost:5601/app/kibana#/home

2.2 - Troubleshooting

Rounding Errors

Kibana rounds to 16 significant digits

Turns out, if you have a value of type integer, that’s just the limit. While elasticsearch shows you this:

    curl http://localhost:9200/logstash-db-2016/isim-process/8163783564660983218?pretty
    {
      "_index" : "logstash-db-2016",
      "_type" : "isim-process",
      "_id" : "8163783564660983218",
      "_version" : 1,
      "found" : true,
      "_source":{"requester_name":"8163783564660983218","request_num":8163783618037078861,"started":"2016-04-07 15:16:16:139 GMT","completed":"2016-04-07 15:16:16:282 GMT","subject_service":"Service","request_type":"EP","result_summary":"AA","requestee_name":"Mr. Requester","subject":"mrRequest","@version":"1","@timestamp":"2016-04-07T15:16:16.282Z"}
    }

Kibana shows you this

View: Table / JSON / Raw
Field Action Value
request_num    8163783618037079000

Looking at the JSON will give you the clue - it’s being treated as an integer and not a string.

 "_source": {
    "requester_name": "8163783564660983218",
    "request_num": 8163783618037079000,
    "started": "2016-04-07 15:16:16:139 GMT",
    "completed": "2016-04-07 15:16:16:282 GMT",

Mutate it to string in logstash to get your precision back.

https://github.com/elastic/kibana/issues/4356

3 - Logstash

Logstash is a parser and shipper. It reads from (usually) a file, parses the data into JSON, then connects to something else and send the data. That something else can be Elasticsearch, a systlog server, and others.

Logstash v/s Beats

But for most things these days, Beats is a better choice. Give that a look fist.

3.1 - Installation

Note: Before you install logstash, take a look at Elasticsearch’s Beats. It’s lighter-weight for most tasks.

Quick Install

This is a summary of the current install page. Visit and adjust versions as needed.

# Install java
apt install default-jre-headless
apt-get install apt-transport-https
apt install gnupg2
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | apt-key add -

# Check for the current version - 7 is no longer the current version by now
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | tee -a /etc/apt/sources.list.d/elastic-7.x.list
apt update
apt-get install logstash

Logstash has a NetFlow module, but it has been deprecated2. One should instead use the Filebeat Netflow Module.3

The rest of this page is circa 2014 - use with a grain of salt.

Installation - Linux Clients

Install Java

If you don’t already have it, install it. You’ll need at least 1.7 and Oracle is recommended. However, with older systems do yourself a favor and use the OpenJDK as older versions of Sun and IBM do things with cryptography leading to strange bugs in recent releases of logstash.

# On RedHat flavors, install the OpenJDK and select it for use (in case there are others) with the system alternatives utility
sudo yum install java-1.7.0-openjdk

sudo /usr/sbin/alternatives --config java

Install Logstash

This is essentially:

( Look at https://www.elastic.co/downloads/logstash to get the lastest version or add the repo)
wget (some link from the above page)
sudo yum --nogpgcheck localinstall logstash*

# You may want to grab a plugin, like the syslog output, though elasticsearch installs by default
cd /opt/logstash/
sudo bin/plugin install logstash-output-syslog

# If you're ready to configure the service
sudo vim /etc/logstash/conf.d/logstash.conf

sudo service logstash start

https://www.elastic.co/guide/en/logstash/current/index.html

Operating

Input

The most common use of logstash is to tail and parse log files. You do this by specifying a file and filter like so

[gattis@someHost ~]$ vim /etc/logstash/conf.d/logstash.conf


input {
  file {
    path => "/var/log/httpd/request.log"
  }
}
filter {
  grok {
    match => [ "message", "%{COMBINEDAPACHELOG}"]
  }
}
output {
  stdout {
    codec => rubydebug
  }
}

Filter

There are many different types of filters, but the main one you’ll be using is grok. It’s all about parsing the message into fields. Without this, you just have a bunch of un-indexed text in your database. It ships with some handy macros such as %{COMBINEDAPACHELOG} that takes this:

10.138.120.138 - schmoej [01/Apr/2016:09:39:04 -0400] "GET /some/url.do?action=start HTTP/1.1" 200 10680 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36" 

And turns it into

agent        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36"
auth         schmoej
bytes                   10680
clientip   10.138.120.138
httpversion 1.1
path           /var/pdweb/www-default/log/request.log
referrer   "-"
request   /some/url.do?action=start
response   200
timestamp   01/Apr/2016:09:39:04 -0400
verb        GET 

See the grok’ing for more details

Output

We’re outputting to the console so we can see what’s going on with our config. If you get some output, but it’s not parsed fully because of an error in the parsing, you’ll see something like the below with a “_grokparsefailure” tag. That means you have to dig into a custom pattern as in described in grok’ing.

Note: by default, logstash is ’tailing’ your logs, so you’ll only see new entries. If you’ve got no traffic you’ll have to generate some

{
       "message" => "test message",
      "@version" => "1",
    "@timestamp" => "2014-10-31T17:39:28.925Z",
          "host" => "some.app.private",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}

If it looks good, you’ll want to send it on to your database. Change your output to look like so which will put your data in a default index that kibana (the visualizer) can show by default.

output {

  elasticsearch {
    hosts => ["10.17.153.1:9200"]
  }
}

Troubleshooting

If you don’t get any output at all, check that the logstash user can actually read the file in question. Check your log files and try running logstash as yourself with the output going to the console.

cat /var/log/logstash/*

/opt/logstash/bin/logstash -f /etc/logstash/conf.d/logstash.conf

3.2 - Operation

Basic Operation

Generally, you create a config with 3 sections;

  • input
  • filter
  • output

This example uses the grok filter to parse the message.

sudo vi /etc/logstash/conf.d/logstash.conf
input {
  file {
        path => "/var/pdweb/www-default/log/request.log"        
      }
}
filter {
  grok {
    match => [ "message", "%{COMBINEDAPACHELOG}"]
  }
}
output {
  stdout { }
}

Then you test it at the command line

# Test the config file itself
/opt/logstash/bin/logstash -f /etc/logstash/conf.d/logstash.conf --configtest

# Test the parsing of data
/opt/logstash/bin/logstash -e -f /etc/logstash/conf.d/logstash.conf

You should get some nicely parsed lines. If that’s the case, you can edit your config to add a sincedb and an actual destination.

input {
  file {
        path => "/var/pdweb/www-default/log/request.log"
        sincedb_path => "/opt/logstash/sincedb"
  }
}
filter {
  grok {
    match => [ "message", "%{COMBINEDAPACHELOG}"]
  }
}
output {
  elasticsearch {
    host => "some.server.private"
    protocol => "http"
  }
}

If instead you see output with a _grokparsefailure like below, you need to change the filter. Take a look at the common gotchas, then the parse failure section below it.

{
       "message" => "test message",
      "@version" => "1",
    "@timestamp" => "2014-10-31T17:39:28.925Z",
          "host" => "some.app.private",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}

Common Gotchas

No New Data

Logstash reads new lines by default. If you don’t have anyone actually hitting your webserver, but you do have some log entries in the file itself, you can tell logstash to process the exiting entries and not save it’s place in the file.

file {
  path => "/var/log/httpd/request.log"
    start_position => "beginning"
    sincedb_path => "/dev/null"
}

Multiple Conf files

Logstash uses all the files in the conf.d directory - even if they don’t end in .conf. Make sure to remove any you don’t want as they can conflict.

Default Index

Logstash creates Elasticsearch indexes that look like:

logstash-%{+YYYY.MM.dd}

The logstash folks have some great material on how to get started. Really top notch.

http://logstash.net/docs/1.4.2/configuration#fieldreferences

Parse Failures

The Greedy Method

The best way to start is to change your match to a simple pattern and work out from there. Try the ‘GREEDYDATA’ pattern and assign it to a field named ‘Test’. This takes the form of:

%{GREEDYDATA:Test}

And it looks like:

filter {
  grok {
    match => [ "message" => "%{GREEDYDATA:Test}" ]
  }
}


       "message" => "test message",
      "@version" => "1",
    "@timestamp" => "2014-10-31T17:39:28.925Z",
          "host" => "some.app.private",
          "Test" => "The rest of your message

That should give you some output. You can then start cutting it up with the patterns (also called macros) found here;

You can also use the online grok debugger and the list of default patterns.

Combining Patterns

There may not be a standard pattern for what you want, but it’s easy to pull together several existing ones. Here’s an example that pulls in a custom timestamp.

Example:
Sun Oct 26 22:20:55 2014 File does not exist: /var/www/html/favicon.ico

Pattern:
match => { "message" => "(?<timestamp>%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR})"}

Notice the ‘?’ at the beginning of the parenthetical enclosure. That tells the pattern matching engine not to bother capturing that for later use. Like opting out of a ( ) and \1 in sed.

Optional Fields

Some log formats simply skip columns when they don’t have data. This will cause your parse to fail unless you make some fields optional with a ‘?’, like this:

match => [ "message", "%{HOSTNAME:VHost}? %{COMBINEDAPACHELOG} %{IP:XForwardedFor}?"]

Date Formats

http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html

Dropping Events

Oftentimes, you’ll have messages that you don’t care about and you’ll want to drop those. Best practice is to do coarse actions first, so you’ll want to compare and drop with a general conditional like:

filter {
  if [message] =~ /File does not exist/ {
    drop { }
  }
  grok {
    ...
    ...

You can also directly reference fields once you have grok’d the message

filter {
  grok {
    match => { "message" => "%{HOSTNAME:VHost}? %{COMBINEDAPACHELOG} %{IP:XForwardedFor}?"}
  }  
  if [request] == "/status" {
        drop { }
  }
}

http://logstash.net/docs/1.4.2/configuration#conditionals

Dating Messages

By default, logstash date stamps the message when it sees them. However, there can be a delay between when an action happens and when it gets logged to a file. To remedy this - and allow you to suck in old files without the date on every event being the same - you add a date filter.

Note - you actually have to grok out the date into it’s own variable, you can’t just attempt to match on the whole message. The combined apache macro below does this for us.

filter { grok { match => { “message” => “%{HOSTNAME:VHost}? %{COMBINEDAPACHELOG} %{IP:XForwardedFor}?”} } date { match => [ “timestamp” , “dd/MMM/yyyy:HH:mm:ss Z” ] } }

In the above case, ’timestamp’ is a parsed field and you’re using the date language to tell it what the component parts are

http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html

Sending to Multiple Servers

In addition to an elasticsearch server, you may want to send it to a syslog server at the same time.

    input {
      file {
        path => "/var/pdweb/www-default/log/request.log"
        sincedb_path => "/opt/logstash/sincedb"
      }
    }

    filter {
      grok {
        match => [ "message", "%{HOSTNAME:VHost}? %{COMBINEDAPACHELOG} %{IP:XForwardedFor}?"]
      }
      date {
        match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
      }

    }

    output {
      elasticsearch {
        host => "some.server.private"
        protocol => "http"
      }
      syslog {
        host => "some.syslog.server"
        port => "514"
        severity => "notice"
        facility => "daemon"
      }
    }

Deleting Sent Events

Sometimes you’ll accidentally send a bunch of event to the server and need to delete and resend corrected versions.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-delete-mapping.html

curl -XDELETE <http://localhost:9200/_all/SOMEINDEX>
curl -XDELETE <http://localhost:9200/_all/SOMEINDEX?q=path:"/var/log/httpd/ssl_request_log>"

3.3 - Index Routing

When using logstash as a broker, you will want to route events to different indexes according to their type. You have two basic ways to do this;

  • Using Mutates with a single output
  • Using multiple Outputs

The latter is significantly better for performance. The less you touch the event, the better it seems. When testing these two different configs in the lab, the multiple output method was about 40% faster when under CPU constraint. (i.e. you can always add more CPU if you want to mutate the events.)

Multiple Outputs

    input {
      ...
      ...
    }
    filter {
      ...
      ...
    }
    output {

      if [type] == "RADIUS" {
        elasticsearch {
          hosts => ["localhost:9200"]
          index => "logstash-radius-%{+YYYY.MM.dd}"
        }
      }

      else if [type] == "RADIUSAccounting" {
        elasticsearch {
          hosts => ["localhost:9200"]
          index => "logstash-radius-accounting-%{+YYYY.MM.dd}"
        }
      }

      else {
        elasticsearch {
          hosts => ["localhost:9200"]
          index => "logstash-test-%{+YYYY.MM.dd}"
        }
      }

    }

Mutates

If your source system includes a field to tell you want index to place it in, you might be able to skip mutating altogether, but often you must look at the contents to make that determination. Doing so does reduce performance.

input {
  ...
  ...
}
filter {
  ...
  ... 

  # Add a metadata field with the destination index based on the type of event this was
  if [type] == "RADIUS" {
    mutate { add_field => { "[@metadata][index-name]" => "logstash-radius" } } 
  }
  else  if [type] == "RADIUSAccounting" {
    mutate { add_field => { "[@metadata][index-name]" => "logstash-radius-accounting" } } 
  }
  else {
    mutate { add_field => { "[@metadata][index-name]" => "logstash-test" } } 
  }
}
output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "%{[@metadata][index-name]}-%{+YYYY.MM.dd}"
  }
}

https://www.elastic.co/guide/en/logstash/current/event-dependent-configuration.html#metadata

3.4 - Database Connections

You can connect Logstash to a database to poll events almost as easily as tailing a log file.

Installation

The JDBC plug-in ships with logstash so no installation of that is needed. However, you do need the JDBC driver for the DB in question.

Here’s an example for DB2, for which you can get the jar from either the server itself or the DB2 fix-pack associated with the DB Version you’re running. The elasticsearch docs say to just put it in your path. I’ve put it in the logstash folder (based on some old examples) and we’ll see if it survives upgrades.

sudo mkdir /opt/logstash/vendor/jars
sudo cp /home/gattis/db2jcc4.jar /opt/logstash/vendor/jars
sudo chown -R logstash:logstash /opt/logstash/vendor/jars

Configuration

Configuring the input

Edit the config file like so

sudo vim /etc/logstash/conf.d/logstash.conf

    input {
      jdbc {
        jdbc_driver_library => "/opt/logstash/vendor/jars/db2jcc4.jar"
        jdbc_driver_class => "com.ibm.db2.jcc.DB2Driver"
        jdbc_connection_string => "jdbc:db2://db1.tim.private:50000/itimdb"
        jdbc_user => "itimuser"
        jdbc_password => "somePassword"
        statement => "select * from someTable"
      }
    }

Filtering

You don’t need to do any pattern matching, as the input emits the event pre-parsed based on the DB columns. You may however, want to match a timestamp in the database.

    # A sample value in the 'completed' column is 2016-04-07 00:41:03:291 GMT

    filter {
      date {
        match => [ "completed" , "yyyy-MM-dd HH:mm:ss:SSS zzz" ]
      }
    }

Output

One recommended trick is to link the primary keys between the database and kibana. That way, if you run the query again you update the existing elasticsearch records rather than create duplicates ones. Simply tell the output plugin to use the existing primary key from the database for the document_id when it sends it to elasticsearch.

    # Database key is the column 'id'

    output {
      elasticsearch {
        hosts => ["10.17.153.1:9200"]
        index => "logstash-db-%{+YYYY}"

        document_id => "${id}"

        type => "isim-process"

      }
    }

Other Notes

If any of your columns are non-string type, logstash and elasticsearch will happily store them as such. But be warned that kibana will round them to 16 digits due to a limitation of javascript.

https://github.com/elastic/kibana/issues/4356

Sources

https://www.elastic.co/blog/logstash-jdbc-input-plugin https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html

3.5 - Multiline Matching

Here’s an example that uses the multiline codec (preferred over the multiline filter, as it’s more appropriate when you might have more than one input)

input {
  file {
    path => "/opt/IBM/tivoli/common/CTGIM/logs/access.log"
    type => "itim-access"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    codec => multiline {
      pattern => "^<Message Id"
      negate => true
      what => previous
    }
  }
}

Getting a match can be difficult, as grok by default does not match against multiple lines. You can mutate to remove all the new lines, or use a seemingly secret preface, the ‘(?m)’ directive as shown below

filter {
  grok {
    match => { "message" => "(?m)(?<timestamp>%{YEAR}.%{MONTHNUM}.%{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND}%{ISO8601_TIMEZONE})%{DATA}com.ibm.itim.security.%{WORD:catagory}%{DATA}CDATA\[%{DATA:auth}\]%{DATA}CDATA\[%{DATA:clientip}\]"}
  }

https://logstash.jira.com/browse/LOGSTASH-509

4 - Beats

Beats are a family of lightweight shippers that you should consider as a first-solution for sending data to Elasticsearch. The two most common ones to use are:

  • Filebeat
  • Winlogbeat

Filebeat is used both for files, and for other general types, like syslog and NetFlow data.

Winlogbeat is used to load Windows events into Elasticsearch and works well with Windows Event Forwarding.

4.1 - Linux Installation

On Linux

A summary from the general docs. View and adjust versions as needed.

If you haven’t already added the repo:

apt-get install apt-transport-https
apt install gnupg2
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | apt-key add -
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | tee -a /etc/apt/sources.list.d/elastic-7.x.list
apt update

apt install filebeat
systemctl enable filebeat

Filebeat uses a default config file at /etc/filebeat/filebeat.yml. If you don’t want to edit that, you can use the ‘modules’ to configure it for you. That command will also load dashboard elements into Kibana, so you must have that already installed Kibana to make use of it.

Here’s a simple test

mv /etc/filebeat/filebeat.yml /etc/filebeat/filebeat.yml.orig
vi /etc/filebeat/filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/*.log
output.file:
  path: "/tmp/filebeat"
  filename: filebeat
  #rotate_every_kb: 10000
  #number_of_files: 7
  #permissions: 0600

4.2 - Windows Installation

Installation

Download the .zip version (the msi doesn’t include the server install script) from the URL below. Extract, rename to Filebeat and move it the to the c:\Program Files directory.

https://www.elastic.co/downloads/beats/filebeat

Start an admin powershell, change to that directory and run the service install command. (Keep the shell up for later when done)

PowerShell.exe -ExecutionPolicy UnRestricted -File .\install-service-filebeat.ps1

Basic Configuration

Edit the filebeat config file.

write.exe filebeat.yml

You need to configure the input and output sections. The output is already set to elasticsearch localhost so you only have to change the input from the unix to the windows style.

  paths:
    #- /var/log/*.log
    - c:\programdata\elasticsearch\logs\*

Test as per normal

  ./filebeat test config -e

Filebeat specific dashboards must be added to Kibana. Do that with the setup argument:

  .\filebeat.exe setup --dashboards

To start Filebeat in the forrgound (to see any interesting messages)

  .\filebeat.exe -e

If you’re happy with the results, you can stop the application then start the service

  Ctrl-C
  Start-Service filebeat

Adapted from the guide at

https://www.elastic.co/guide/en/beats/filebeat/7.6/filebeat-getting-started.html

4.3 - NetFlow Forwarding

The NetFlow protocol is now implemented in Filebeat1. Assuming you’ve installed Filebeat and configured Elasticsearch and Kibana, you can use this input module to auto configure the inputs, indexes and dashboards.

./filebeat modules enable netflow
filebeat setup -e

If you are just testing and don’t want to add the full stack, you can set up the netflow input2 which the module is a wrapper for.

filebeat.inputs:
- type: netflow
  max_message_size: 10KiB
  host: "0.0.0.0:2055"
  protocols: [ v5, v9, ipfix ]
  expiration_timeout: 30m
  queue_size: 8192
output.file:
  path: "/tmp/filebeat"
  filename: filebeat
filebeat test config -e

Consider dropping all the fields you don’t care about as there are a lot of them. Use the include_fields processor to limit what you take in

  - include_fields:
      fields: ["destination.port", "destination.ip", "source.port", "source.mac", "source.ip"]

4.4 - Palo Example

# This filebeat config accepts TRAFFIC and SYSTEM syslog messages from a Palo Alto, 
# tags and parses them 

# This is an arbitrary port. The normal port for syslog is UDP 512
filebeat.inputs:
  - type: syslog
    protocol.udp:
      host: ":9000"

processors:
    # The message field will have "TRAFFIC" for  netflow logs and we can 
    # extract the details with a CSV decoder and array extractor
  - if:
      contains:
        message: ",TRAFFIC,"
    then:
      - add_tags:
          tags: "netflow"
      - decode_csv_fields:
          fields:
            message: csv
      - extract_array:
          field: csv
          overwrite_keys: true
          omit_empty: true
          fail_on_error: false
          mappings:
            source.ip: 7
            destination.ip: 8
            source.nat.ip: 9
            network.application: 14
            source.port: 24
            destination.port: 25
            source.nat.port: 26
      - drop_fields:
          fields: ["csv", "message"] 
    else:
        # The message field will have "SYSTEM,dhcp" for dhcp logs and we can 
        # do a similar process to above
      - if:
          contains:
            message: ",SYSTEM,dhcp"
        then:
        - add_tags:
            tags: "dhcp"
        - decode_csv_fields:
            fields:
              message: csv
        - extract_array:
            field: csv
            overwrite_keys: true
            omit_empty: true
            fail_on_error: false
            mappings:
              message: 14
        # The DHCP info can be further pulled apart using space as a delimiter
        - decode_csv_fields:
            fields:
              message: csv2
            separator: " "
        - extract_array:
            field: csv2
            overwrite_keys: true
            omit_empty: true
            fail_on_error: false
            mappings:
              source.ip: 4
              source.mac: 7
              hostname: 10
        - drop_fields:
            fields: ["csv","csv2"] # Can drop message too like above when we have watched a few        
  - drop_fields:
      fields: ["agent.ephemeral_id", "agent.hostname", "agent.id", "agent.type", "agent.version", "ecs.version","host.name","event.severity","input.type","hostname","log.source.address","syslog.facility", "syslog.facility_label", "syslog.priority", "syslog.priority_label","syslog.severity_label"]
      ignore_missing: true
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
setup.template.settings:
  index.number_of_shards: 1
output.elasticsearch:
  hosts: ["localhost:9200"]

4.5 - RADIUS Forwarding

Here’s an example of sending FreeRADIUS logs to Elasticsearch.

cat /etc/filebeat/filebeat.yml
filebeat.inputs:
  - type: log
    paths:
      - /var/log/freeradius/radius.log
    include_lines: ['\) Login OK','incorrect']
    tags: ["radius"]
processors:
  - drop_event:
      when:
        contains:
          message: "previously"
  - if:
      contains:
        message: "Login OK"
    then: 
      - dissect:
          tokenizer: "%{key1} [%{source.user.id}/%{key3}cli %{source.mac})"
          target_prefix: ""
      - drop_fields:
          fields: ["key1","key3"]
      - script:
          lang: javascript
          source: >
            function process(event) {
                var mac = event.Get("source.mac");
                if(mac != null) {
                        mac = mac.toLowerCase();
                         mac = mac.replace(/-/g,":");
                         event.Put("source.mac", mac);
                }
              }            
    else:
      - dissect:
          tokenizer: "%{key1} [%{source.user.id}/<via %{key3}"
          target_prefix: ""
      - drop_fields: 
          fields: ["key1","key3"]
output.elasticsearch:
  hosts: ["http://logcollector.yourorg.local:9200"]        
  allow_older_versions: true
  setup.ilm.enabled: false

4.6 - Syslog Forwarding

You may have an older system or appliance that can transmit syslog data. You can use filebeat to accept that data and store it in Elasticsearch.

Add Syslog Input

Install filebeat and test the reception the /tmp.

vi  /etc/filebeat/filebeat.yml

filebeat.inputs:
- type: syslog
  protocol.udp:
    host: ":9000"
output.file:
  path: "/tmp"
  filename: filebeat


sudo systemctl filebeat restart

pfSense Example

The instructions are NetGate’s remote logging example.

Status -> System Logs -> Settings

Enable and configure. Internet rumor has it that it’s UDP only so the config above reflects that. Interpreting the output requires parsing the message section detailed in the filter log format docs.

'5,,,1000000103,bge1.1099,match,block,in,4,0x0,,64,0,0,DF,17,udp,338,10.99.147.15,255.255.255.255,2048,30003,318'

'5,,,1000000103,bge2,match,block,in,4,0x0,,84,1,0,DF,17,udp,77,157.240.18.15,205.133.125.165,443,61343,57'

'222,,,1000029965,bge2,match,pass,out,4,0x0,,128,27169,0,DF,6,tcp,52,205.133.125.142,205.133.125.106,5225,445,0,S,1248570004,,8192,,mss;nop;wscale;nop;nop;sackOK'

'222,,,1000029965,bge2,match,pass,out,4,0x0,,128,11613,0,DF,6,tcp,52,205.133.125.142,211.24.111.75,15305,445,0,S,2205942835,,8192,,mss;nop;wscale;nop;nop;sackOK'