1 - Installation
Note: Before you install logstash, take a look at Elasticsearch’s Beats. It’s lighter-weight for most tasks.
Quick Install
This is a summary of the current install page. Visit and adjust versions as needed.
# Install java
apt install default-jre-headless
apt-get install apt-transport-https
apt install gnupg2
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | apt-key add -
# Check for the current version - 7 is no longer the current version by now
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | tee -a /etc/apt/sources.list.d/elastic-7.x.list
apt update
apt-get install logstash
Logstash has a NetFlow module, but it has been deprecated2. One should instead use the Filebeat Netflow Module.3
The rest of this page is circa 2014 - use with a grain of salt.
Installation - Linux Clients
Install Java
If you don’t already have it, install it. You’ll need at least 1.7 and Oracle is recommended. However, with older systems do yourself a favor and use the OpenJDK as older versions of Sun and IBM do things with cryptography leading to strange bugs in recent releases of logstash.
# On RedHat flavors, install the OpenJDK and select it for use (in case there are others) with the system alternatives utility
sudo yum install java-1.7.0-openjdk
sudo /usr/sbin/alternatives --config java
Install Logstash
This is essentially:
( Look at https://www.elastic.co/downloads/logstash to get the lastest version or add the repo)
wget (some link from the above page)
sudo yum --nogpgcheck localinstall logstash*
# You may want to grab a plugin, like the syslog output, though elasticsearch installs by default
cd /opt/logstash/
sudo bin/plugin install logstash-output-syslog
# If you're ready to configure the service
sudo vim /etc/logstash/conf.d/logstash.conf
sudo service logstash start
https://www.elastic.co/guide/en/logstash/current/index.html
Operating
The most common use of logstash is to tail and parse log files. You do this by specifying a file and filter like so
[gattis@someHost ~]$ vim /etc/logstash/conf.d/logstash.conf
input {
file {
path => "/var/log/httpd/request.log"
}
}
filter {
grok {
match => [ "message", "%{COMBINEDAPACHELOG}"]
}
}
output {
stdout {
codec => rubydebug
}
}
Filter
There are many different types of filters, but the main one you’ll be using is grok. It’s all about parsing the message into fields. Without this, you just have a bunch of un-indexed text in your database. It ships with some handy macros such as %{COMBINEDAPACHELOG} that takes this:
10.138.120.138 - schmoej [01/Apr/2016:09:39:04 -0400] "GET /some/url.do?action=start HTTP/1.1" 200 10680 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36"
And turns it into
agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36"
auth schmoej
bytes 10680
clientip 10.138.120.138
httpversion 1.1
path /var/pdweb/www-default/log/request.log
referrer "-"
request /some/url.do?action=start
response 200
timestamp 01/Apr/2016:09:39:04 -0400
verb GET
See the grok’ing for more details
Output
We’re outputting to the console so we can see what’s going on with our config. If you get some output, but it’s not parsed fully because of an error in the parsing, you’ll see something like the below with a “_grokparsefailure” tag. That means you have to dig into a custom pattern as in described in grok’ing.
Note: by default, logstash is ’tailing’ your logs, so you’ll only see new entries. If you’ve got no traffic you’ll have to generate some
{
"message" => "test message",
"@version" => "1",
"@timestamp" => "2014-10-31T17:39:28.925Z",
"host" => "some.app.private",
"tags" => [
[0] "_grokparsefailure"
]
}
If it looks good, you’ll want to send it on to your database. Change your output to look like so which will put your data in a default index that kibana (the visualizer) can show by default.
output {
elasticsearch {
hosts => ["10.17.153.1:9200"]
}
}
Troubleshooting
If you don’t get any output at all, check that the logstash user can actually read the file in question. Check your log files and try running logstash as yourself with the output going to the console.
cat /var/log/logstash/*
/opt/logstash/bin/logstash -f /etc/logstash/conf.d/logstash.conf
2 - Operation
Basic Operation
Generally, you create a config with 3 sections;
This example uses the grok
filter to parse the message.
sudo vi /etc/logstash/conf.d/logstash.conf
input {
file {
path => "/var/pdweb/www-default/log/request.log"
}
}
filter {
grok {
match => [ "message", "%{COMBINEDAPACHELOG}"]
}
}
output {
stdout { }
}
Then you test it at the command line
# Test the config file itself
/opt/logstash/bin/logstash -f /etc/logstash/conf.d/logstash.conf --configtest
# Test the parsing of data
/opt/logstash/bin/logstash -e -f /etc/logstash/conf.d/logstash.conf
You should get some nicely parsed lines. If that’s the case, you can edit your config to add a sincedb
and an actual destination.
input {
file {
path => "/var/pdweb/www-default/log/request.log"
sincedb_path => "/opt/logstash/sincedb"
}
}
filter {
grok {
match => [ "message", "%{COMBINEDAPACHELOG}"]
}
}
output {
elasticsearch {
host => "some.server.private"
protocol => "http"
}
}
If instead you see output with a _grokparsefailure
like below, you need to change the filter. Take a look at the common gotchas, then the parse failure section below it.
{
"message" => "test message",
"@version" => "1",
"@timestamp" => "2014-10-31T17:39:28.925Z",
"host" => "some.app.private",
"tags" => [
[0] "_grokparsefailure"
]
}
Common Gotchas
No New Data
Logstash reads new lines by default. If you don’t have anyone actually hitting your webserver, but you do have some log entries in the file itself, you can tell logstash to process the exiting entries and not save it’s place in the file.
file {
path => "/var/log/httpd/request.log"
start_position => "beginning"
sincedb_path => "/dev/null"
}
Multiple Conf files
Logstash uses all the files in the conf.d directory - even if they don’t end in .conf. Make sure to remove any you don’t want as they can conflict.
Default Index
Logstash creates Elasticsearch indexes that look like:
logstash-%{+YYYY.MM.dd}
The logstash folks have some great material on how to get started. Really top notch.
http://logstash.net/docs/1.4.2/configuration#fieldreferences
Parse Failures
The Greedy Method
The best way to start is to change your match to a simple pattern and work out from there. Try the ‘GREEDYDATA’ pattern and assign it to a field named ‘Test’. This takes the form of:
And it looks like:
filter {
grok {
match => [ "message" => "%{GREEDYDATA:Test}" ]
}
}
"message" => "test message",
"@version" => "1",
"@timestamp" => "2014-10-31T17:39:28.925Z",
"host" => "some.app.private",
"Test" => "The rest of your message
That should give you some output. You can then start cutting it up with the patterns (also called macros) found here;
You can also use the online grok debugger and the list of default patterns.
Combining Patterns
There may not be a standard pattern for what you want, but it’s easy to pull together several existing ones. Here’s an example that pulls in a custom timestamp.
Example:
Sun Oct 26 22:20:55 2014 File does not exist: /var/www/html/favicon.ico
Pattern:
match => { "message" => "(?<timestamp>%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR})"}
Notice the ‘?’ at the beginning of the parenthetical enclosure. That tells the pattern matching engine not to bother capturing that for later use. Like opting out of a ( ) and \1 in sed.
Optional Fields
Some log formats simply skip columns when they don’t have data. This will cause your parse to fail unless you make some fields optional with a ‘?’, like this:
match => [ "message", "%{HOSTNAME:VHost}? %{COMBINEDAPACHELOG} %{IP:XForwardedFor}?"]
http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html
Dropping Events
Oftentimes, you’ll have messages that you don’t care about and you’ll want to drop those. Best practice is to do coarse actions first, so you’ll want to compare and drop with a general conditional like:
filter {
if [message] =~ /File does not exist/ {
drop { }
}
grok {
...
...
You can also directly reference fields once you have grok’d the message
filter {
grok {
match => { "message" => "%{HOSTNAME:VHost}? %{COMBINEDAPACHELOG} %{IP:XForwardedFor}?"}
}
if [request] == "/status" {
drop { }
}
}
http://logstash.net/docs/1.4.2/configuration#conditionals
Dating Messages
By default, logstash date stamps the message when it sees them. However, there can be a delay between when an action happens and when it gets logged to a file. To remedy this - and allow you to suck in old files without the date on every event being the same - you add a date filter.
Note - you actually have to grok out the date into it’s own variable, you can’t just attempt to match on the whole message. The combined apache macro below does this for us.
filter {
grok {
match => { “message” => “%{HOSTNAME:VHost}? %{COMBINEDAPACHELOG} %{IP:XForwardedFor}?”}
}
date {
match => [ “timestamp” , “dd/MMM/yyyy:HH:mm:ss Z” ]
}
}
In the above case, ’timestamp’ is a parsed field and you’re using the date language to tell it what the component parts are
http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html
Sending to Multiple Servers
In addition to an elasticsearch server, you may want to send it to a syslog server at the same time.
input {
file {
path => "/var/pdweb/www-default/log/request.log"
sincedb_path => "/opt/logstash/sincedb"
}
}
filter {
grok {
match => [ "message", "%{HOSTNAME:VHost}? %{COMBINEDAPACHELOG} %{IP:XForwardedFor}?"]
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
host => "some.server.private"
protocol => "http"
}
syslog {
host => "some.syslog.server"
port => "514"
severity => "notice"
facility => "daemon"
}
}
Deleting Sent Events
Sometimes you’ll accidentally send a bunch of event to the server and need to delete and resend corrected versions.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-delete-mapping.html
curl -XDELETE <http://localhost:9200/_all/SOMEINDEX>
curl -XDELETE <http://localhost:9200/_all/SOMEINDEX?q=path:"/var/log/httpd/ssl_request_log>"
3 - Index Routing
When using logstash as a broker, you will want to route events to different indexes according to their type. You have two basic ways to do this;
- Using Mutates with a single output
- Using multiple Outputs
The latter is significantly better for performance. The less you touch the event, the better it seems. When testing these two different configs in the lab, the multiple output method was about 40% faster when under CPU constraint. (i.e. you can always add more CPU if you want to mutate the events.)
Multiple Outputs
input {
...
...
}
filter {
...
...
}
output {
if [type] == "RADIUS" {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-radius-%{+YYYY.MM.dd}"
}
}
else if [type] == "RADIUSAccounting" {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-radius-accounting-%{+YYYY.MM.dd}"
}
}
else {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-test-%{+YYYY.MM.dd}"
}
}
}
Mutates
If your source system includes a field to tell you want index to place it in, you might be able to skip mutating altogether, but often you must look at the contents to make that determination. Doing so does reduce performance.
input {
...
...
}
filter {
...
...
# Add a metadata field with the destination index based on the type of event this was
if [type] == "RADIUS" {
mutate { add_field => { "[@metadata][index-name]" => "logstash-radius" } }
}
else if [type] == "RADIUSAccounting" {
mutate { add_field => { "[@metadata][index-name]" => "logstash-radius-accounting" } }
}
else {
mutate { add_field => { "[@metadata][index-name]" => "logstash-test" } }
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "%{[@metadata][index-name]}-%{+YYYY.MM.dd}"
}
}
https://www.elastic.co/guide/en/logstash/current/event-dependent-configuration.html#metadata
4 - Database Connections
You can connect Logstash to a database to poll events almost as easily as tailing a log file.
Installation
The JDBC plug-in ships with logstash so no installation of that is needed. However, you do need the JDBC driver for the DB in question.
Here’s an example for DB2, for which you can get the jar from either the server itself or the DB2 fix-pack associated with the DB Version you’re running. The elasticsearch docs say to just put it in your path. I’ve put it in the logstash folder (based on some old examples) and we’ll see if it survives upgrades.
sudo mkdir /opt/logstash/vendor/jars
sudo cp /home/gattis/db2jcc4.jar /opt/logstash/vendor/jars
sudo chown -R logstash:logstash /opt/logstash/vendor/jars
Configuration
Configuring the input
Edit the config file like so
sudo vim /etc/logstash/conf.d/logstash.conf
input {
jdbc {
jdbc_driver_library => "/opt/logstash/vendor/jars/db2jcc4.jar"
jdbc_driver_class => "com.ibm.db2.jcc.DB2Driver"
jdbc_connection_string => "jdbc:db2://db1.tim.private:50000/itimdb"
jdbc_user => "itimuser"
jdbc_password => "somePassword"
statement => "select * from someTable"
}
}
Filtering
You don’t need to do any pattern matching, as the input emits the event pre-parsed based on the DB columns. You may however, want to match a timestamp in the database.
# A sample value in the 'completed' column is 2016-04-07 00:41:03:291 GMT
filter {
date {
match => [ "completed" , "yyyy-MM-dd HH:mm:ss:SSS zzz" ]
}
}
Output
One recommended trick is to link the primary keys between the database and kibana. That way, if you run the query again you update the existing elasticsearch records rather than create duplicates ones. Simply tell the output plugin to use the existing primary key from the database for the document_id when it sends it to elasticsearch.
# Database key is the column 'id'
output {
elasticsearch {
hosts => ["10.17.153.1:9200"]
index => "logstash-db-%{+YYYY}"
document_id => "${id}"
type => "isim-process"
}
}
Other Notes
If any of your columns are non-string type, logstash and elasticsearch will happily store them as such. But be warned that kibana will round them to 16 digits due to a limitation of javascript.
https://github.com/elastic/kibana/issues/4356
Sources
https://www.elastic.co/blog/logstash-jdbc-input-plugin
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html
5 - Multiline Matching
Here’s an example that uses the multiline codec (preferred over the multiline filter, as it’s more appropriate when you might have more than one input)
input {
file {
path => "/opt/IBM/tivoli/common/CTGIM/logs/access.log"
type => "itim-access"
start_position => "beginning"
sincedb_path => "/dev/null"
codec => multiline {
pattern => "^<Message Id"
negate => true
what => previous
}
}
}
Getting a match can be difficult, as grok by default does not match against multiple lines. You can mutate to remove all the new lines, or use a seemingly secret preface, the ‘(?m)’ directive as shown below
filter {
grok {
match => { "message" => "(?m)(?<timestamp>%{YEAR}.%{MONTHNUM}.%{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND}%{ISO8601_TIMEZONE})%{DATA}com.ibm.itim.security.%{WORD:catagory}%{DATA}CDATA\[%{DATA:auth}\]%{DATA}CDATA\[%{DATA:clientip}\]"}
}
https://logstash.jira.com/browse/LOGSTASH-509