Setting up Logstash, timestamp consideration

When Logstash creates a JSON document to store it into ElasticSearch, it uses the current date/time at processing to create a valid timestamp, using one of the default format supported by ElasticSearch automatic field type recognition.
You should therefor understand that by default, the timestamp is not extracted from your message, even if it contains a valid one.

When processing is done live, like for tweets, syslog messages or whatever log you may sent to Logstash, you will discover that, most of the time, when there is no delay, the timestamp added by Logstash is not so different from the timestamp of the message. If you have no network congestion, it is from the same second.
But if before to be ingested into ElasticSearch, you store the message in a message queue, keeping it for replay and whatsoever. Or you are reprocessing all the log lines of a log file, the delta between the actual timestamp of the message and the one corresponding to the processing time can be quite different.

Thus, here comes the date filter.
This filter will allow you to extract the date and time from a field of your message and force Logstash to use as timestamp. This is really usefull when you are reprocessing a log file from its begining, so the real log timestamp is used and not the one of the processing time.
filter {
  date {
    match => [ "date", "yyyy/MM/dd HH:mm:ss", "dd/MMM/yyyy:HH:mm:ss Z", "EEE MMM dd HH:mm:ss.SSSSSS yyyy" ]
The above example will search the field called date for a valid date and time matching one of the 3 formats listed. The following timestamps will be matched:
  1. 2017/05/25 17:08:19
  2. 25/May/2017:17:08:19 +0200        (typical timestamp used in Apache access logs)
  3. Fri May 25 17:08:19.76544 2017
Automatically, the value of the field matched will be used to populate the content of the filed @timestamp created by Logstash when storing JSON documents into ElasticSearch.

If you want to process old Apache log files or just to be sure that the document that is goingto be indexed by ElasticSearch contains the Apache timestamp and not the processing timestamp, we can add the following into our filter { } section of the Apache logstash pipeline configuration:

    date {
      match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]