Creating a HDP cluster

Setting up a HDP cluster with Ambari

To have a fully functional cluster running HortonWorks Data Platform

 

Presentation

The Apache Ambari project implements a Web GUI that can be used to help in provisioning, managing and monitoring an Apache Hadoop cluster. With the time, it has introduced support for many Open Source projects, part of the Hadoop ecosystem.
The Ambari server will enable you to:
  1. Create a new cluster
  2. Provision services on selected nodes of the cluster
  3. Manage multiple versions of the services configuration
  4. Start and stop services from on single location
  5. Have a dashboard with the health of the cluster in general and each services in particular
  6. Provide you with metrics and statistics about your services (throughput, uptime, speed, volume, …)
  7. Deliver alarms for system alerts (e.g., a node goes down, remaining disk space is low, etc)
  8. Upgrade smoothly your whole cluster software
Ambari can be used to deploy and manage services and features like:
  • Zookeeper
  • HDFS
  • YARN
  • MapReduce
  • TEZ
  • PIG
  • Sqoop
  • Oozie
  • Kafka
  • Storm
  • Flume
  • NiFi
  • Hive
  • HBase
  • Kerberos authorizations
  • Ranger
  • SolR
These are the applications part of the HortonWorks Data Platorm, HortonWorks Data Flow and HortonWorks Data Platform Search.
The Ambari server is installed on one node of your cluster and will retrieve the status of each nodes by dialoguing with an Ambari agent. This Ambari agent will be installed on each node of the cluster during the creation of the cluster.

Preparing the scene

  1. We want to do the installation on Ubuntu, and we are going to install Ambari 2.5.1 that is supporting Ubuntu 14 LTS and Ubuntu 16 LTS (since June 2017). Other distributions supported are Suse, Debian, RedHat and CentOS. Refer to the http://ambari.apache.org websites for all versions. We will use Ubuntu 14 LTS for the rest of this tutorial, because we want to install HDF on top of HDP later on and currently HDP stack doesn’t support yet Ubuntu 16 LTS, only Ubuntu 14 LTS.
  2. From the node where the Ambari server will run, SSH without password must be possible. You should create a dedicated user for doing that. From a security perspective, this will allow you to better track and audit the activities originating the Ambari server. Of course, technically speaking, nothing block you to enable this for root or your personal account, see SSH, keys and agent for more information. Also, if the account used to SSH into the nodes is not root, be sure that this user is allowed to use "sudo" command on each nodes without password, see Switching user (sudo) for information about sudo configuration.
  3. To provide consistency in the logging shown on different nodes, it is recommended that all nodes are kept in time synchronization, via a tool like NTP
admin@hdf1:~ sudo apt-get install ntp
admin@hdf1:~$ /etc/init.d/ntp status
* NTP server is running
If the system tells you that ntp is not running, launch it (sudo /etc/init.d/ntp start)
  1. All hosts must be configured for both forward and reverse DNS. Or with an /etc/hosts file on every hosts listing all nodes with IP and name (FQDN).
Example of /etc/hosts for our tutorial:
127.0.0.1 localhost
10.1.1.125 hdf1.mylab.org hdf1
10.1.1.126 hdf2.mylab.org hdf2
  1. SELinux should be disabled. In /etc/selinux/config, be sure that you have the line:
SELINUX=disabled
After changing this file, you need to reboot your system.
If this file is not found in your system, then the SElinux binaries and libraries are not installed and the system is therefore not active.
 

Installing Ambari server

On the machine that will became your Ambari server, we do the following:
admin@hdf1:~ sudo wget -O /etc/apt/sources.list.d/ambari.list http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/2.5.1.0/ambari.list
admin@hdf1:~ sudo apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD
Executing: /tmp/tmp.fXvu8GGGS1/gpg.1.sh --recv-keys
--keyserver
keyserver.ubuntu.com
B9733A7A07513CAD
gpg: requesting key 07513CAD from hkp server keyserver.ubuntu.com
gpg: key 07513CAD: public key "Jenkins (HDP Builds) <jenkin@hortonworks.com>" imported
gpg: Total number processed: 1
gpg:               imported: 1  (RSA: 1)
admin@hdf1:~ sudo apt-get update
(…)
Get:17 http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/2.5.1.0 Ambari/main amd64 Packages [1,388 B]
(…)
admin@hdf1:~ sudo apt-get install ambari-server
(…)
admin@hdf1:~ sudo ambari-server setup
Using python  /usr/bin/python
Setup ambari-server
Checking SELinux...
WARNING: Could not run /usr/sbin/sestatus: OK
Customize user account for ambari-server daemon [y/n] (n)? n
Adjusting ambari-server permissions and ownership...
Checking firewall status...
Checking JDK...
[1] Oracle JDK 1.8 + Java Cryptography Extension (JCE) Policy Files 8
[2] Oracle JDK 1.7 + Java Cryptography Extension (JCE) Policy Files 7
[3] Custom JDK
==============================================================================
Enter choice (1): 1
To download the Oracle JDK and the Java Cryptography Extension (JCE) Policy Files you must accept the license terms found at http://www.oracle.com/technetwork/java/javase/terms/license/index.html and not accepting will cancel the Ambari Server setup and you must install the JDK and JCE files manually.
Do you accept the Oracle Binary Code License Agreement [y/n] (y)? y
Downloading JDK from http://public-repo-1.hortonworks.com/ARTIFACTS/jdk-8u112-linux-x64.tar.gz to /var/lib/ambari-server/resources/jdk-8u112-linux-x64.tar.gz
jdk-8u112-linux-x64.tar.gz... 100% (174.7 MB of 174.7 MB)
Successfully downloaded JDK distribution to /var/lib/ambari-server/resources/jdk-8u112-linux-x64.tar.gz
Installing JDK to /usr/jdk64/
Successfully installed JDK to /usr/jdk64/
Downloading JCE Policy archive from http://public-repo-1.hortonworks.com/ARTIFACTS/jce_policy-8.zip to /var/lib/ambari-server/resources/jce_policy-8.zip
Successfully downloaded JCE Policy archive to /var/lib/ambari-server/resources/jce_policy-8.zip
Installing JCE policy...
Completing setup...
Configuring database...
Enter advanced database configuration [y/n] (n)? n
Configuring database...
Default properties detected. Using built-in database.
Configuring ambari database...
Checking PostgreSQL...
Configuring local database...
Configuring PostgreSQL...
Restarting PostgreSQL
Creating schema and user...
done.
Creating tables...
done.
Extracting system views...
.......ambari-admin-2.5.1.0.159.jar
....
Adjusting ambari-server permissions and ownership...
Ambari Server 'setup' completed successfully.
benoit@hdf1:~$ sudo ambari-server start
(…)

 

Creating a cluster

Now the remaining part of the configuration will be done in a web browser. Open a web browser and point it to the URL http://ambari-server:8080.
The initial login information is:
  • Username: admin
  • Password: admin
If this is the first time you are logging into the Ambari Web, you will have a view similar to this:
1.Landing_page_initial.PNG
We will click “Launch Install Wizard” to perform our cluster creation and initial configuration.
The process, which is driven by nice (almost) self-explanatory web pages, will start by asking you a name for the cluster:
2.Naming_cluster.PNG
After you click “Next”, you are prompted to select the version you want to install.
This page is used to help you to select the appropriate version you want to deploy.
Also, below the tabbed version list, you have the repositories entries that tells your server where to pick up the packages to install to build up this version of HDP. This means that your server must have a valid Internet access to get this packages. If it is not the case, you can use a local repository somewhere on your own network. To build this repository, I refer you to the Hortonworks official documentation (http://docs.hortonworks.com) as we are not going to cover this here.
3.Select_version.PNG
The most recent version of the HDP stack is already selected, so you’ll just have to click “Next”.
The next page is the one where you will assign the initial hosts of your cluster. The page is divided into 2 main parts:
  • Above where you enter the names of the hosts:
    • Use the fully-qualified domain names to designate each host
    • One host per line
    • The name you are using must be identical to the hostname defined inside each host
    • You can use pattern expressions like node[0-4].domain.org if you have 5 hosts named node0, node1, …
  • Below where you define the SSH parameters to connect to these hosts from the Ambari server:
    • The public counter part of the private key you will load in this page must be allowed in each .ssh/authorized_keys file on each hosts you want to join to the cluster. This key doesn’t need to be the SSH private key stored on your Ambari server, it can be any key, as long as its public key counterpart is authorized on each host you want to register.
    • If you don’t use root as “SSH User Account”, you must be sure that the user you will use can perform sudo without being prompted for a password
    • Adapt the port if you have done so in the SSH daemon configuration
Refer to our tutorial about SSH password less access and password less sudo to help you.

The manual registration consist of manually installing the ambari-agent agent package on each node and pointing it to the Ambari server manually. We won’t cover this here, please click on the link “manual registration” on the page to have more information.
4.Register_hosts.PNG
Click “Register and Confirm” and wait for the process to complete.
Once the registration is finished and successful, the “Confirm hosts” page is displayed:
5.Registering_done.PNG
You’ll notice that the yellow part below the hosts is indicating that you have warning that needs to be check-up but they are not blocking as the “Next” button is already active.

For the sake of the rest of the process, you will check the warning by clicking on the link “Click here to see the warnings”. You will receive the list of all checks that were done by the installer with their status. Look for the one with the yellow triangle and click on them to see the details about the problem. On top of the list, you will find some information about how to perform some automatic cleaning, by running a Python script:
sudo python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py --silent --skip=users

If you need to solve the Transparent Huge Pages issues, have a look at the Ambari usage and tweaking.
6.Issues.PNG
Once you’ve done the operation required to remove the issue, you can click on the “Rerun checks” to validate that the issues are not seen anymore. If everything looks fine, then close this page and click “Next” on the “Confirm Hosts” page.

Once the hosts are registered, you come to the page where you choose the various services you want to install. This is now up to you to determine what you need on your cluster. Some of these services are dependent on others, but if some mandatory services are not selected, you will be informed that service Y will be also installed to satisfy the dependency.
Click “Next” when satisfied with the list of service.

The next page is to assign the services to your hosts. For each service, choose on which host you want to run it. The wizard will propose you with a default distribution of the selected services on your hosts. You can change the host by selecting a different one in the drop-down list next to the service name.
There are services that can be run on more than one host, for these ones you will find a green + icon at the left of its service – host pair line. So you can add it to other hosts too.
Service for which there is no green + are services that must be unique in the cluster. Mainly these are management services, meta-data or catalogue servers.
Click “Next” when the list looks OK to you.

The next page is used to assign some slave roles to some of your hosts like Name Node, NFS Gateway, … You will also have the possibility to select hosts where to install some client tools.
Click “Next” once the list is OK for you.

The next page is to customize service configuration, like changing listening ports, directories, logging, memory usage and so on. If some parameters need to be addressed in this page, you will see a red triangle on the section you have to provide some information. Until you give a value to the parameter, you won’t be able to proceed with the installation. Usually, the parameters you have to fill in the service configuration concern password.
When every missing mandatory parameters are filled, you can click the “Next” button.

The last page that opens is the one used to present all the info entered so far. You will then click on the button “Deploy” and Ambari will start installing all the services you’ve selected.
7. Installing.png
After a while, of course it depends on the number of services and hosts to configure, each progress bar will become green. If this was not the case, by clicking on the message link, next to each host name, you will see a list of services with log information.
When everything succeed, you just have to click “Next”.

You get a summary of the installation that you’ve done. After reviewing it, you just need to click “Complete” and you are brought to the main dashboard page of your newly installed cluster:
8.Dashboard.PNG