elasticsearch

NiFi for Apache - the flow

Presentation

In the previous guide, you have installed, configured and enabled the MiNiFi agent on each of your web server. Now, it is time to build a flow on your central NiFi server to do something with the information that will be sent to it.

 

Building up a flow on the NiFi server

We are now back to the workspace of our NiFi server.
If you have followed this guide line by line, you should only have one input port called “RemoteMiNiFi” on it.

NiFi and ElasticSearch

Custom mapping for the index you will update with NiFi flows

Unlike the Logstash output “ElasticSearch”, you cannot associate a customized mapping with the processor. Therefore, if the dynamic mapping of ElasticSearch doesn’t attribute the type you really want to one of your fields, you will have to use a default mapping template (see this chapter in the ElasticSearch section of the site).
If doing that, remember that:

Setting up Logstash, for Apache access logs

Configuration to get Apache Access logs

In this case, we will run LogStash on each server where an Apache web server is running. In our Apache setup, we've enabled the Apache Combined Access Log for each of our Apache Virtual servers.
In this case, we have one virtual machine running Apache 2.4.18 with a different access log per Virtual Host. So an extract of the configuration looks like:

<VirtualHost site1>
    DocumentRoot /var/www/site1
    <...>
    CustomLog /var/log/apache2/site1.log combined
</VirtualHost>

Setting up Logstash, for Syslog

Configuration to get Syslog messages

This is the configuraton of this Logstash instance. We will use the syslog input model to listen for syslog messages from all our hosts.
We will start the Logstash on server "logstash-runner", then we will configure Rsyslog.

input {
  syslog {
    port => "10514"
    add_field => {
      "source" => "syslog"
    }
  }
}
filter {
  if [source] == "syslog" {
    grok {

Setting up Logstash, for Tweets

Configuration to get Tweets

input {
  twitter {
    consumer_key => "<your twitter API consumer key>"
    consumer_secret => "<your twitter API consumer secret>"
    oauth_token => "<your twitter API application token>"
    oauth_token_secret => "<your twitter API application token secret>"
    keywords => [ "thevoicebe" ]
    add_field => { "source" => "tweets" }
    full_tweet => false
  }
}
filter {
  if [type] == "tweets" {
    mutate {

Setting up Logstash - the basics

Setting up Logstash, the basics


Installing Logstash on the server (done with Ubuntu 16.04 LTS).
    server1(admin) ~$ wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
    server1(admin) ~$ sudo apt-get install apt-transport-https
    server1(admin) ~$ echo "deb https://artifacts.elastic.co/packages/5.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-5.x.list
    server1(admin) ~$    sudo apt-get update && sudo apt-get install logstash


ElastiSearch fields mapping customisation

Changing the field mapping ElasticSearch is using

With ElasticSearch, you don't need to explicitly define everything (field names, field types, indices, ...). He will try to do it automatically.
When uploading data using the REST api to an index which is not yet created, a new one with the name provided will be created. Default mapping (the types to use for each fields) and settings will be applied.

Detailed presentaton of ElasticSearch

Features

Open Source real-time search and analytics engine with a dedicated ecosystem of tools to feed it, manage it and use it.

Fully-featured search

  • Relevance-ranked text search
  • Scalable search
  • High-performance geo, temporal, range and key lookup
  • Highlighting
  • Support for complex / nested document types
  • Spelling suggestions
  • Powerful query DSL
  • “Standing” queries
  • Real-time results
  • Extensible via plugins

Powerful faceting/analysis