Home Page

Tuesday, April 2, 2013

Run your Hadoop cluster through 10 steps

Running Hadoop on Ubuntu Linux (Single-Node Cluster)

UbuntuHadoop


In this document you are going to see how you can setup pseudo-distributed, single-node Hadoop (any stable version 1.0.X) cluster backed by the Hadoop Distributed File System, running on Ubuntu Linux 12.04 LTS. Run your Hadoop cluster through 10 steps
Pre-request
Software’s to be downloaded before you start these procedures. You can download all the recommended software’s before you get started with the steps listed below
·         Ubuntu 12.04LTS ( or .ios file for installation)
·         Installing Ubuntu 12.04 LTS is all the same as installing any of the other versions of Ubuntu with just a touch changed GUI.
·         Oracle Java – Java 1.6 (aka Java 6) is recommended download binary executable (.bin) file for your Ubuntu 12.04 LTS
·         Hadoop 1.0.4 ( direct link), Oct 2012 stable release
Single-node Hadoop cluster step by step instruction
Ubuntu installation
1.      Once your OS (Ubuntu 12.04 LTS) is installed now you need to install few service required for to run your Hadoop cluster
If you like to use a dedicate user for your Hadoop setup in Ubuntu12.04LTS click here ( not required, in fact you can avoid this. Otherwise you may need to work more on Linux user management to provide access and administrator rights to the newly created user)
SSH Installation
2.      As an initial setup, you have to install SSH in Ubuntu12.04 LTS
i.     Installation of the OpenSSH client and server applications is simple.
ii.     To install the OpenSSH client applications, use this command at a terminal prompt:
$ sudo apt-get install openssh-client
iii.     To install the OpenSSH server application, use this command at a terminal prompt:
$ sudo apt-get install openssh-server
OR
iv.     To install the SSH in Ubuntu you can use this command at a terminal prompt:
$ sudo apt-get install ssh
v.     The openssh-server package can also be selected to install during the Server Edition installation process.
v     For more ref on SSH please click here
Now you installed SSH in Ubuntu 12.04 LTS

SSH Key Generation 
3.      This step is to generate SH Key
This SH Keys allow authentication between two hosts without the need of a password.
SSH key authentication uses two keys a private key and a public key. 
i.     To generate the keys, from a terminal prompt enter:
user@ubuntu :~$ ssh-keygen -t rsa 
(Since we are generating keys without password, so press enter. Below is the sequence of text which appears in the terminal prompt)

1
Generating public/private rsa key pair.
2
Enter file in which to save the key (/home/user/.ssh/id_rsa):
3
Created directory '/home/user/.ssh'.
4
Your identification has been saved in /home/user/.ssh/id_rsa.
5
Your public key has been saved in /home/user/.ssh/id_rsa.pub.
6
The key fingerprint is:
7
9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2
8
user@ubuntuThe key's randomart image is:
9
[...snipp...]
Now you created SH keys

Generate RSA Key pair
4.      This step is to create an RSA key pair with an empty password. You have to enable SSH access to your local machine with this newly created key.
user@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Now you created RSA key pair
To test SSH installed, from a terminal prompt enter:
$ ssh localhost
( You will get a similar notification in the terminal)
1
Welcome to Ubuntu 12.04.2 LTS (GNU/Linux 3.5.0-23-generic x86_64)
2
3
* Documentation:  https://help.ubuntu.com/
4
5
Last login: Fri Mar  8 11:01:03 2013 from localhost
Now you SSH successfully running with keys generated


Java














JAVA Installation
5.      This instruction is to install JAVA (JDK)
i.     If you already didn’t got JAVA (JDK) installed in Ubuntu 12.04 LTS
Type this command:
(Either jdk-6u34-linux-i586.bin or jdk-6u43-linux-x64.bin depends on your OS bit)
$ sudo ap-get update
$ sudo apt-get install openjdk-6-jre
$ wget http://download.oracle.com/otn-pub/java/jdk/6u34-b04/jdk-6u34-linux-i586.bin
$ chmod +x jdk-6u34-linux-i586.bin
$ ./jdk-6u34-linux-i586.bin
OR
ii.     You can download .bin Linux installation file from the Oracle JDK download page
iii.     And run the below said command in a terminal 
 $ chmod +x jdk-<version>.bin
$./jdk-<version>.bin
Now you successfully installed JAVA
Setting Environmental Variable (set JAVA_HOME)
6.      Once you installed Java, now you have to set JAVA_HOME in you Ubuntu 12.04 LTS
i.     To set JAVA_HOME in Ubuntu 12.04 you have to update Java home directory in /etc/bash.bashrc
ii.     edit $HOME/.bashrc file to set JAVA home directory
$ vi bashrc
( You need superuser permissions to edit the file, type in the terminal  $ sudo -s then enter your superuser/admin password. After you log in, then try the above command to edit bashrc in vi editor. )
iii.     Set Java home, you can see export JAVA_HOME line in the file been commented using #. Remove # (uncomment it) and you have to key in by giving your Java installed path 
 export JAVA_HOME= /home/user/jdk1.6.0_34
(check the folder in which you installed you java and give the path)
iv.     exit terminal and open new terminal
v.     to check the variable is set, type the command in a terminal
$ echo $JAVA_HOME
( the above command will display the java directory path you se or you can also type $ java –version or simply $ java to see execution of java commands in the terminal )
Now you set environment variable ie. JAVA_HOME



Hadoop



Hadoop Installation
Below step by step instruction will help you to setup a single-node Hadoop cluster. Before we move on know about the HDFS (Hadoop Distributed File System) Architecture Guide
7.      Download a recent stable release from one of the Apache Download Mirrors.
i.     Download “hadoop-<version>.tar.gz” to your desired directory
ii.     From the terminal type this command where you download your hadoop-< version>.tar.gz file
$ tar -xvf hadoop-<version>.tar.gz
iii.     The above command will extract the hadoop files and folder
iv.     Once you extracted all the files, you may have to edit few configuration files inside  <Hadoop Home> directory
v.     Now edit <Hadoop Home>/conf/hadoop-env.sh to set Java home as you did it before for environmental variable setup
export JAVA_HOME=/home/user/jdk1.6.0_34
vi.     And then update <Hadoop Home>/conf/core-site.xml. with the below xml tag to setup hadoop file system property
<configuration>
<property>
<name>fs.default.name</name>
  <value>hdfs://localhost:50000</value>
</property>
</configuration>
  vii. Now update<Hadoop Home>/conf/ mapred -site.xml with the below xml tag
<configuration>
<property>
<name>mapred.job.tracker</name>
  <value>localhost:50001</value>
</property>
</configuration>
viii. Now update<Hadoop Home>/conf/ hdfs -site.xml with the below xml tag
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/<user>/hadoop-dir/datadir</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/<user>/hadoop-dir/namedir</value>
</property>
</configuration>
Assume you created your directory in your user profile, so user your user name after /home otherwise you can also check your folder by executing pwd command after you get into your terminal inside your created folder 
Assuming create “data” and “name” directory from your home directory, I created a directory hadoop-dir and inside that I have created 2 directories one for name node and other for data node
ensure your data and name directory created and accessed by hadoop, execute the command to change the directory permission  by
$ chmod 755 data
as well as
 $ chmod 755 name
i.e.  if you $ ls -l your directory you data and name directory should be in this mode “drwxr-xr-x” which means owner has three permissions, and group and other have only read and execute permissions

8.      HDFS Format
Before starting your cluster you may need to format your HDFS by running the below command from <Hadoop-Home-Dir>/bin
$  ./hadoop namenode -format

9.      Copy file to HDFS
To copy local file to HDFS execute this command from <Hadoop-Home-Dir>/bin  from the terminal
$  ./hadoop dfs -copyFromLocal <localsrc> URI
Eg: - If I have a sample.txt file in the path /home/<user>/Example
Then I have to executing the command from<Hadoop-Home-Dir>/bin
$  ./hadoop dfs –copyFromLocal /home/<user>/Example/sample.txt /
This command will copy the local fin into HDFS home directory

10.  Browse HDFS through web interface
Starting hadoop cluster is by executing a command from <Hadoop-Home-Dir>/bin
$ ./start-all.sh
This will startup a Namenode, Datanode, Jobtracker and a Tasktracker on your machine
To stop the cluster
$ ./stop-all.sh
to stop all the daemons running on your machine.

To understand more on Getting Started With Hadoop
Browse the web interface for the NameNode and the JobTracker; by default they are available at:
JobTracker - http://localhost:50030/

1 comment:

  1. Sir,

    This looks very simple and very much understandable. Thanks for this article !!!!

    Regards
    Sabari

    ReplyDelete