Elasticseach

Configuration of elasticsearch itself is seldom needed. You will have to maintain the data in your indexes however. This is done by either using the kopf tool, or at the command line.

After you have some data in elasticsearch, you'll see that your 'documents' are organized into 'indexes'. This is a simply a container for your data that was specified when logstash originally sent it, and the naming is arbitrarily defined by the client. 

Deleting Data

The first thing you're likely to need is to delete some badly-parsed data from your testing.

Delete all indexes with the name test*

curl -XDELETE http://localhost:9200/test*

Delete from all indexes documents of type 'WindowsEvent'

curl -XDELETE http://localhost:9200/_all/WindowsEvent

Delete from all indexes documents  have the attribute 'path' equal to '/var/log/httpd/ssl_request.log'

curl -XDELETE 'http://localhost:9200/_all/_query?q=path:/var/log/https/ssl_request.log'

Delete from the index 'logstash-2014.10.29' documents of type 'shib-access'

curl -XDELETE http://localhost:9200/logstash-2014.10.29/shib-access


http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

Curator

All the maintenance by hand has to stop at some point and Curator is a good tool to automate some of it. This is a script that will do some curls for you, so to speak. 

Install
  • wget https://bootstrap.pypa.io/get-pip.py
  • sudo python get-pip.py
  • sudo pip install elasticsearch-curator
  • sudo pip install argparse

Use
curator --help
curator delete --help

And in your crontab

# Note: you must escape % characters with a \ in crontabs
20 0 * * * curator delete indices --time-unit days --older-than 14 --timestring '\%Y.\%m.\%d' --regex '^logstash-bb-.*'
20 0 * * * curator delete indices --time-unit days --older-than 14 --timestring '\%Y.\%m.\%d' --regex '^logstash-adfsv2-.*'
20 0 * * * curator delete indices --time-unit days --older-than 14 --timestring '\%Y.\%m.\%d' --regex '^logstash-20.*'

Sometimes you'll need to do an inverse match.

0 20 * * * curator delete indices --regex '^((?!logstash).)*$'

A good way to test your regex is by using the show indices method

curator show indices --regex '^((?!logstash).)*$'



Here's some OLD posts and links, but be aware the syntax had changed and it's been several versions since these
http://www.elasticsearch.org/blog/curator-tending-your-time-series-indices/
http://stackoverflow.com/questions/406230/regular-expression-to-match-line-that-doesnt-contain-a-word


Replication and Yellow Cluster Status

By default, elasticsearch assumes you want to have two nodes and replicate your data and the default for new indexes is to have 1 replica. You may not want to do that to start with however, so you change the default and change the replica settings on your existing data in-bulk with:

http://stackoverflow.com/questions/24553718/updating-the-default-index-number-of-replicas-setting-for-new-indices

Set all existing replica requirements to just one copy
curl -XPUT 'localhost:9200/_settings' -d '
  "index" : { "number_of_replicas" : 0 } 
}'

Change the default settings for new indexes to have just one copy
curl -XPUT 'localhost:9200/_template/logstash_template' -d ' 
  "template" : "*", 
  "settings" : {"number_of_replicas" : 0 }
} '

http://stackoverflow.com/questions/24553718/updating-the-default-index-number-of-replicas-setting-for-new-indices

Unassigned Shards

You will occasionally have a hiccup where you run out of disk space or something similar and be left with indexes that have no data in them or have shards unassigned. Generally, you will have to delete them but you can also manually reassign them.

http://stackoverflow.com/questions/19967472/elasticsearch-unassigned-shards-how-to-fix

Listing Index Info

You can get a decent human readable list of your indexes using the cat api

curl localhost:9200/_cat/indices

If you wanted to list by size, they use the example

curl localhost:9200/_cat/indices?bytes=b | sort -rnk8 

Comments