Installing Hadoop on a Single Node Cluster in Pseudo-Distributed Mode

In this post we’ll see how to install Hadoop on a single node cluster in pseudo-distributed mode.

Steps shown here are done on Ubuntu 16.04 and Hadoop version used is 2.9.0.

Modes in Hadoop

Before starting installation of Hadoop let’s have a look what all modes are supported for running Hadoop.

Local (Standalone) Mode– This is the default configuration mode for Hadoop where Hadoop runs in a non-distributed mode, as a single Java process. This mode is useful for debugging.
Pseudo-Distributed Mode - Hadoop can also be run on a single-node in a pseudo-distributed mode. In pseudo-distributed mode each Hadoop daemon runs in a separate Java process but on a single node.
Fully-Distributed Mode - In fully-distributed mode Hadoop runs on clusters ranging from a few nodes to extremely large clusters with thousands of nodes.

Required Software

For Hadoop installation following softwares are required-

Java must be installed. To check for the compatible version of Java for the Hadoop version you are installing refer https://wiki.apache.org/hadoop/HadoopJavaVersions.
ssh must be installed and sshd must be running to manage Hadoop daemons running as separate Java processes.

Steps for Hadoop installation

Steps to install Hadoop on a single node cluster in pseudo-distributed mode are as follows-

1- Check for Java installation– As already stated Java is required for running Hadoop so ensure that Java is installed and the version of Java is compatible with the Hadoop version.

Refer How to Install Java on Ubuntu to see how to install Java on Ubuntu.

2- Downloading Hadoop tar ball and unpacking it– You can download the stable version of Hadoop from this location - http://hadoop.apache.org/releases.html

Downloaded tar ball will be in the form Hadoop-xxx.tar.gz so you need to unpack the tar ball. For that do the following things -

2.1 Create a new directory– Create a new directory and move the hadoop tar ball there.

sudo mkdir /usr/local/hadoop

2.2 Move Hadoop installation and untar it– Move Hadoop installation files from Downloads directory to /usr/local/hadoop and unpack it.

Change directory to Downloads and run the following command from there.

sudo cp -R hadoop-2.9.0.tar.gz /usr/local/hadoop

Ensure the correct version of Hadoop in your command.

Now you have Hadoop tar ball in your created directory /usr/local/hadoop.

To unpack it run the following command after changing directory to /usr/local/hadoop.

 tar zxvf hadoop-2.9.0.tar.gz

3- Installing and setting up passphraseless ssh - Hadoop uses SSH to remotely login to nodes. Even in the case of single node cluster daemons run as separate Java processes so we do need to install and configure SSH. For single node host will be localhost.

To install ssh run the following command –

sudo apt-get install ssh

Every time you will try to connect to localhost you will be asked for a passphrase, to avoid that set an empty passphrase so that you are not prompted for passphrase every time.

Command to generate key with an empty passphrase-

 ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

You also need to add this generated key to the list of authorized keys, run the following command to do that.

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Try connecting now using the command ssh localhost, you should not be asked for the passphrase if everything is configured correctly.

4- Setting environment variables and paths- In order to run Hadoop it should know the location of Java installation on your system. For that you need to create a JAVA_HOME variable. You can set that in ~/.bashrc file or in etc/hadoop/hadoop-env.sh file which resides in your Hadoop installation. If you are adding it in haddop-env.sh file then edit that file and at the end add the following line.

export JAVA_HOME=/usr/local/java/jdk1.8.0_151

Here /usr/local/java/jdk1.8.0_151 must be the path to your Java installation.

You can also add HADOOP_HOME environment variable pointing to your Hadoop installation and add the path to bin and sbin directories too. That will help you to run Hadoop commands from anywhere. To add it to /etc/environment run the following commands.

Edit /etc/environment file


sudo gedit /etc/environment

Add HADOOP_HOME variable at the end of the file.

HADOOP_HOME="/usr/hadoop/hadoop-2.9.0"

Add the following to the existing PATH variable -

:/usr/hadoop/hadoop-2.9.0/bin:/usr/hadoop/hadoop-2.9.0/sbin

To reload the environment file run the following command-

source /etc/environment

Run hadoop version command to ensure everything is configured properly. If there is no problem till now then running the command should give you hadoop version information.


$ hadoop version

Hadoop 2.9.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r
0D8ebc8394f483xy25feac05fu478f6d612e6c50
Compiled by jjohn on 2017-11-15T22:16Z
Compiled with protoc 2.5.0
From source with checksum 0b71a9c67a5227390741f8d5931y175
This command was run using
/usr/hadoop/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0.jar

5- Setting configuration files - You need to change XML files placed inside /etc/hadoop directory with in our Hadoop installation folder. XML files that are to be changed and changes required are listed here.

/etc/hadoop/core-site.xml

You can override the default settings used to start Hadoop by changing this file.


<property> 
<name>hadoop.tmp.dir</name>
<value>/usr/tmp</value> 
</property> 
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value> 
</property>

The directory you choose for hadoop.tmp.dir parameter has to be created by you. You would also need to provide permission for read and write to that directory based on your user privileges. If you don’t use this property hadoop.tmp.dir, Hadoop framework will create tmp directory by default.

/etc/hadoop/hdfs-site.xml


<configuration> 
<property>
<name>dfs.replication</name> 
<value>1</value>
</property>
</configuration>

For Yarn settings to run MapReduce job-

/etc/hadoop/mapred-site.xml


<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

etc/hadoop/yarn-site.xml

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

6- Format HDFS file system– You also need to format the HDFS filesystem once. Run the following command to do that.

hdfs namenode -format

7- Starting daemons– Start the HDFS and YARN daemons by executing the following shell scripts -

start-dfs.sh

start-yarn.sh

You will find these shell scripts in the sbin directory with in your Hadoop installation.

Use the jps command to verify that all the daemons are running.


$ jps

6294 NodeManager
6168 ResourceManager
6648 Jps
5997 SecondaryNameNode
5758 DataNode
5631 NameNode

To stop the daemons use the following shell scripts.

stop-dfs.sh

stop-yarn.sh

8- Browse the web interface– You can also check the web interfaces for Namenode and YARN resource manager after the daemons are started.

NameNode– http://localhost:50070/

ResourceManager– http://localhost:8088/

Reference - https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation

That's all for this topic Installing Hadoop on a Single Node Cluster in Pseudo-Distributed Mode. If you have any doubt or any suggestions to make please drop a comment. Thanks!

Related Topics

You may also like -

>>>Go to Hadoop Framework Page

Installing Hadoop on a Single Node Cluster in Pseudo-Distributed Mode

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112