NiFI for Apache - the flow using records and registry


In a previous guide, we’ve setup MiNiFi on Web servers to export Apache access log event to a central NiFi server. Then we saw an example of flow build in this NiFi server to handle this flow. This flow was using standard NiFi processors, manipulating each event as a string. Now, we will start a new flow, achieving the same purpose but using a record oriented approach.
We will then discover the ease of use of the record oriented flow files and how it can speed up the deployment of a flow.

Pieces needed from before

NiFi for Apache - the flow


In the previous guide, you have installed, configured and enabled the MiNiFi agent on each of your web server. Now, it is time to build a flow on your central NiFi server to do something with the information that will be sent to it.


Building up a flow on the NiFi server

We are now back to the workspace of our NiFi server.
If you have followed this guide line by line, you should only have one input port called “RemoteMiNiFi” on it.

NiFi for Apache - using MiNiFi


In this guide, we will use the lightweight version of NiFi, Minifi, that will run on an Apache web server, looking for new event written in the Apache access logs.
MiNiFi is a lightweight version of NiFi, without the web interface and with only a limited set of processors. It doesn’t take a lot of resources on the host it is running.
It can be used as a “Forward-only” to any central NiFi server you have previously setup.

Configuring MiNiFi

Setting up Logstash, for Apache access logs

Configuration to get Apache Access logs

In this case, we will run LogStash on each server where an Apache web server is running. In our Apache setup, we've enabled the Apache Combined Access Log for each of our Apache Virtual servers.
In this case, we have one virtual machine running Apache 2.4.18 with a different access log per Virtual Host. So an extract of the configuration looks like:

<VirtualHost site1>
    DocumentRoot /var/www/site1
    CustomLog /var/log/apache2/site1.log combined