Setting up Logstash, for Syslog

Configuration to get Syslog messages

This is the configuraton of this Logstash instance. We will use the syslog input model to listen for syslog messages from all our hosts.
We will start the Logstash on server "logstash-runner", then we will configure Rsyslog.

input {
  syslog {
    port => "10514"
    add_field => {
      "source" => "syslog"
    }
  }
}
filter {
  if [source] == "syslog" {
    grok {
      match => { message => "%{SYSLOGTIMESTAMP} dpkg: %{WORD:action} %{DATA:package}:%{DATA:arch} %{DATA:version_old} %{GREEDYDATA:version_new}" }
      overwrite => [ "message" ]
      add_field => { "program" => "dpkg" }
    }
    dns {
      hostsfile => [ "/etc/hosts" ]
      reverse => [ "host" ]
      action => "replace"
    }
    mutate {
      gsub => [ "message", "\n", " " ]
    }
    if [program] == "dpkg" {
      mutate {
        replace => {
          "facility_label" => "user-level"
          "facility" => 6
          "severity_label" => "informational"
          "severity" => 6
        }
      }
    } else {
      grok {
        match => { message => "%{GREEDYDATA} %{IPV4:connectip} %{GREEDYDATA}" }
      }
      geoip {
        source => "connectip"
      }
    }
  }
}
output {
  if [source] == "syslog" {
    elasticsearch {
      hosts => [ "es1:9200", "es2:9200" ]
      index => "syslog-%{+YYYY.MM.dd}"
    }
    mongodb {
      collection => "logs"
      database => "syslog"
      uri => "mongodb://mongodb:27017"
    }
    if [program] == "dpkg" {
      file {
        codec => line { format => "%{@timestamp};%{host};%{action};%{package};%{arch};%{version_old};%{version_new};%{message}" }
        path => "/opt/dpkg/%{+YYYY-MM-dd}.dpkg"
      }
    }
  }
}


Here we use the Syslog listener as input. As it will listen on the port 10514, we won't need to run Logstash as root, so the default user created when installing it is enough.

The Grok section is used to detect patterns into the message. The first one is used to detect a message send via Syslog about packages. See the configuration below of RSyslog where we define the format of this message.
The pattern used here are pre-defined GROK patterns, coming with Logstash. See the full list of pre-defined patterns here: https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns
After the pattern (in capital letters), we add a colon (:) and the name of the field where the info captured by this pattern will be hold.
If no field name is given after the pattern, than the text matched by it is not saved. The pattern is just there to be sure that we identify correctly the message.

Thus the first Grok section will match if the following is met:
  • Line start with a valid syslog timestamp}
  • the word "dpkg:" is present
  • a single word (alphanumeric suite of characters) that will be captured and saved in a field called "action"
  • We search for an expression like name:x86_64, that we split into the "package" and "arch" field. This is the package name that can contain more than alphanumeric characters.
  • The next 2 blocks of characters will be matched and saved as "version_old" "version_new". If the information is about the installation of a new package, "version_old' contain only a dash ("-").
The second block of the filter is to do a reverse DNS lookup on the hostname present in the field "host" and to replace the value by the result of the reverse lookup. The local /etc/hosts file is used as source for the DNS lookup.

The third block is used to remove the trailing carriage-return of the message string. Because we want to save all messages about package management to a file and the output module to write a file automatically add a carriage-return to each line is outputting.

Then, we test if the message is about "dpkg" (set or not just before). If this is the case, we created overwrite the facility and severity of the syslog message to the informational and user-level. If the message is not about "dpkg", we search an IP address in the message. If one is found, then we do a Geographical IP lookup using the GeoIP database delivered with Logstash.

Finally, the results are written into an ElasticSearch index and into a MongoDB database. If the message is about the management of packages, a local file is also created for all hosts.

To sent syslog message and package management activities, we use the following Rsyslog configuration on each system we want to monitor:
/etc/rsyslog.d/01-remote.conf:
*.* @@logstash-runner:10514
The above Rsyslog configuration line will send all messages (each severity of each facility) to a remote syslog server at the address logstash-runner using the TCP port 10514. By default Logstash starts its syslog listener with the TCP protocol. By using the double @, we tell Rsyslog to use TCP. If only one @ is used, it will connect to the remote syslog server on UDP then.

/etc/rsyslog.d/90-dpkg.conf:
module(load="imfile")

template(name="DpkgMsg" type="string" string="%timereported:::date-rfc3164% %syslogtag% %msg:21:$%\n")

ruleset(name="dpkg"){
  if $msg contains ' install ' or $msg contains ' remove ' or $msg contains ' upgrade ' then {
    action(type="omfwd" target="logstash-runner" port="10514" protocol="tcp" template="DpkgMsg")
  }
  stop
}

input(type="imfile" file="/var/log/dpkg.log" tag="dpkg:" severity="info" facility="local6" ruleset="dpkg")


This configuration will load the appropriate RSyslog module and create a template to format the message we want to send.
The test in the ruleset "dpkg" is present to reduce the "noise". We are just interested by the line with the apt-get action (update, install or remove) and we don't send any information about the package configure or de-configure, download, extract, ...