Install Apache Hadoop/HBase on Ubuntu 20.04

The
You can download this article in PDF format to support us through the following link.

Download the guide in PDF format

turn off
The

The
The

This tutorial will try to explain the steps to install Hadoop and HBase on Ubuntu 20.04 (Focal Fossa) Linux server. HBase is an open source distributed non-relational database written in Java and runs on top of Hadoop File Systems (HDFS). HBase allows you to run huge clusters that host very large tables with billions of rows and millions of columns on top of commodity hardware.

This installation guide does not apply to high-availability production settings, but applies to laboratory settings to enable you to develop. Our HBase installation will be completed on a single node Hadoop cluster. The server is an Ubuntu 20.04 virtual machine with the following specifications:

  • 16 GB RAM
  • 8vCPU.
  • 20GB boot disk
  • 100GB raw disk for data storage

If your resources do not match the lab settings, you can use existing resources and see if the service can be started.

For CentOS 7, please refer to How to install Apache Hadoop/HBase on CentOS 7.

Install Hadoop on Ubuntu 20.04

The first part will introduce the installation of a single node Hadoop cluster on Ubuntu 20.04 LTS Server. The installation of Ubuntu 20.04 server is beyond the scope of this guide. For how to do this, please consult the virtualization environment documentation.

Step 1: Update the system

Update and selectively upgrade all packages installed on the Ubuntu system:

sudo apt update
sudo apt -y upgrade
sudo reboot

Step 2: Install Java on Ubuntu 20.04

If Java is missing on Ubuntu 20.04, please install Java.

sudo apt update
sudo apt install default-jdk default-jre

After successfully installing Java on Ubuntu 20.04, please use the java command line to confirm the version.

$ java -version
openjdk version "11.0.7" 2020-04-14
OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-3ubuntu1)
OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-3ubuntu1, mixed mode, sharing)

Group JAVA_HOME variable.

cat <

Update $PATH and settings.

source /etc/profile.d/hadoop_java.sh

Then test:

$ echo $JAVA_HOME
/usr/lib/jvm/java-11-openjdk-amd64

reference:

How to set the default Java version on Ubuntu/Debian

Step 3: Create a user account for Hadoop

Let's create a separate user for Hadoop in order to maintain isolation between the Hadoop file system and the Unix file system.

sudo adduser hadoop
sudo usermod -aG sudo hadoop
sudo usermod -aG sudo hadoop

After adding a user, generate an SS key pair for the user.

$ sudo su - hadoop$ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):  Created directory '/home/hadoop/.ssh'. Enter passphrase (empty for no passphrase):  Enter same passphrase again:  Your identification has been saved in /home/hadoop/.ssh/id_rsa. Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub. The key fingerprint is: SHA256:mA1b0nzdKcwv/LPktvlA5R9LyNe9UWt+z1z0AjzySt4 [email protected] The key's randomart image is: +---[RSA 2048]----+ |                 | |       o   + . . | |      o + . = o o| |       O . o.o.o=| |      + S . *ooB=| |           o *=.B| |          . . *+=| |         o o o.O+| |          o E.=o=| +----[SHA256]-----+

Add this user's key to the "authorized ssh key" list.

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

Verify that the added key can be used for ssh.

$ ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:42Mx+I3isUOWTzFsuA0ikhNN+cJhxUYzttlZ879y+QI.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 20.04 LTS (GNU/Linux 5.4.0-28-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage


The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
$ exit

Step 4: Download and install Hadoop

an examination The latest version of Hadoop Before downloading the version specified here. At the time of writing, this is the version 3.2.1.

Save the latest version to a variable.

RELEASE="3.2.1"

Then download the Hadoop archive file to the local system.

wget https://www-eu.apache.org/dist/hadoop/common/hadoop-$RELEASE/hadoop-$RELEASE.tar.gz

Unzip the file.

tar -xzvf hadoop-$RELEASE.tar.gz

Move result directory to /usr/local/hadoop.

sudo mv hadoop-$RELEASE/ /usr/local/hadoop
sudo mkdir /usr/local/hadoop/logs
sudo chown -R hadoop:hadoop /usr/local/hadoop

Group HADOOP_HOME And add the directory containing Hadoop binaries to your $PATH.

cat <

Source File.

source /etc/profile.d/hadoop_java.sh

Confirm your Hadoop version.

$ hadoop version
Hadoop 3.2.1
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842
Compiled by rohithsharmaks on 2019-09-10T15:56Z
Compiled with protoc 2.5.0
From source with checksum 776eaf9eee9c0ffc370bcbc1888737
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.2.1.jar

Step 5: Configure Hadoop

All your Hadoop configurations are located /usr/local/hadoop/etc/hadoop/table of Contents.

Many configuration files need to be modified to complete the installation of Hadoop on Ubuntu 20.04.

First edit JAVA_HOME In the shell script hadoop-env.sh:

$ sudo vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
# Set JAVA_HOME - Line 54
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/

Then configure:

1. core-site.xml

of core-site.xml The file contains the Hadoop cluster information used at startup. These attributes include:

  • Port number used by the Hadoop instance
  • Memory allocated for the file system
  • Memory limits for data storage
  • The size of the read/write buffer.

turn on core-site.xml

sudo vim /usr/local/hadoop/etc/hadoop/core-site.xml

Add the following attributes between with label.


   
      fs.default.name
      hdfs://localhost:9000
      The default file system URI
   

See the screenshot below.

Install Apache Hadoop/HBase on Ubuntu 20.042. hdfs-site.xml

This file needs to be configured for each host to be used in the cluster. The file contains the following information:

  • The namenode and datanode paths on the local file system.
  • The value of copying data

In this setup, I want to store the Hadoop infrastructure on a secondary disk – /dev/sdb.

$ lsblk
 NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
 sda      8:0    0 76.3G  0 disk 
 └─sda1   8:1    0 76.3G  0 part /
 sdb      8:16   0   100G  0 disk 
 sr0     11:0    1 1024M  0 rom  

I partition and mount this disk to /hadoop table of Contents.

sudo parted -s -- /dev/sdb mklabel gpt
sudo parted -s -a optimal -- /dev/sdb mkpart primary 0% 100%
sudo parted -s -- /dev/sdb align-check optimal 1
sudo mkfs.xfs /dev/sdb1
sudo mkdir /hadoop
echo "/dev/sdb1 /hadoop xfs defaults 0 0" | sudo tee -a /etc/fstab
sudo mount -a 

confirm:

$ df -hT | grep /dev/sdb1
/dev/sdb1      xfs        50G   84M   100G   1% /hadoop

Create a directory namenode with datanode.

sudo mkdir -p /hadoop/hdfs/{namenode,datanode}

Set ownership to hadoop users and groups.

sudo chown -R hadoop:hadoop /hadoop

Now open the file:

sudo vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml

Then add the following attributes in between with label.


   
      dfs.replication
      1
   
	
   
      dfs.name.dir
      file:///hadoop/hdfs/namenode
   
	
   
      dfs.data.dir
      file:///hadoop/hdfs/datanode
   

See the screenshot below.

Install Apache Hadoop/HBase on Ubuntu 20.043. mapred-site.xml

Set the MapReduce framework to be used here.

sudo vim /usr/local/hadoop/etc/hadoop/mapred-site.xml

The settings are as follows.


   
      mapreduce.framework.name
      yarn
   

4. yarn-site.xml

The settings in this file will be overwritten Hadoop yarn. It defines resource management and job scheduling logic.

sudo vim /usr/local/hadoop/etc/hadoop/yarn-site.xml

plus:


   
      yarn.nodemanager.aux-services
      mapreduce_shuffle
   

This is a screenshot of my configuration.

Install Apache Hadoop/HBase on Ubuntu 20.04

Step 6: Verify the Hadoop configuration

Initialize Hadoop infrastructure storage.

sudo su -  hadoop
hdfs namenode -format

See the output below:

Install Apache Hadoop/HBase on Ubuntu 20.04

Test HDFS configuration.

$ start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [hbase]
hbase: Warning: Permanently added 'hbase' (ECDSA) to the list of known hosts.

Finally verify the YARN configuration:

$ start-yarn.shStarting resourcemanagerStarting nodemanagers

Hadoop 3.x provides Web UI ports:

  • Name node – The default HTTP port is 9870.
  • Resource manager – The default HTTP port is 8088.
  • MapReduce JobHistory server – The default HTTP port is 199888.

You can check the port used by hadoop with the following command:

$ ss -tunelp

Sample output is shown below.

Install Apache Hadoop/HBase on Ubuntu 20.04

Access the Hadoop Web Dashboard http://ServerIP:9870.

Install Apache Hadoop/HBase on Ubuntu 20.04Install Apache Hadoop/HBase on Ubuntu 20.04

View the Hadoop cluster overview at the following location http://ServerIP:8088.

Install Apache Hadoop/HBase on Ubuntu 20.04

Test to see if you can create a directory.

$ hadoop fs -mkdir /test
$ hadoop fs -ls /
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2020-05-29 15:41 /test

Stop Hadoop service

Use the following command:

$ stop-dfs.sh
$ stop-yarn.sh

Install HBase on Ubuntu 20.04

You can choose to install HBase in standalone mode or pseudo distributed mode. The setup process is similar to our Hadoop installation.

Step 1: Download and install HBase

Check Newly released OrStable version Before you download. For production purposes, I recommend that you use the stable version.

VER="2.2.4"
wget http://apache.mirror.gtcomm.net/hbase/stable/hbase-$VER-bin.tar.gz

Extract the downloaded Hbase archive.

tar xvf hbase-$VER-bin.tar.gz
sudo mv hbase-$VER/ /usr/local/HBase/

Update the $PATH value.

cat <

Update your shell environment values.

$ source /etc/profile.d/hadoop_java.sh$ echo $HBASE_HOME/usr/local/HBase

Edit JAVA_HOME In the shell script hbase-env.sh:

$ sudo vim /usr/local/HBase/conf/hbase-env.sh
# Set JAVA_HOME - Line 27
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/

Step 2: Configure HBase

We will configure it like Hadoop. All configuration files of HBase are located at /usr/local/HBase/conf/ table of Contents.

hbase-site.xml

Set the data directory to the appropriate location on this file.

Option 1: Install HBase in standalone mode (not recommended)

In standalone mode, all daemons (HMaster, HRegionServer, and ZooKeeper) run in a jvm process/instance

Create HBase root directory.

sudo mkdir -p /hadoop/HBase/HFiles
sudo mkdir -p /hadoop/zookeeper
sudo chown -R hadoop:hadoop /hadoop/

Open the file for editing.

sudo vim /usr/local/HBase/conf/hbase-site.xml

now at with The label is shown below.


   
      hbase.rootdir
      file:/hadoop/HBase/HFiles
   
	
   
      hbase.zookeeper.property.dataDir
      /hadoop/zookeeper
   

By default, unless you configure hbase.rootdir Attribute, your data is still stored in /tmp/.

Now start HBase using the following command start-hbase.sh Script in HBase bin directory.

$ sudo su - hadoop$ start-hbase.sh running master, logging to /usr/local/HBase/logs/hbase-hadoop-master-hbase.out

Option 2: Install HBase in pseudo-distributed mode (recommended)

Our value hbase.rootdir The previously set will start in independent mode. The pseudo-distributed mode means that HBase is still running entirely on a single host, but each HBase daemon (HMaster, HRegionServer and ZooKeeper) runs as a separate process.

To install HBase in pseudo-distributed mode, set its value to:


   
      hbase.rootdir
      hdfs://localhost:8030/hbase
   
	
   
      hbase.zookeeper.property.dataDir
      /hadoop/zookeeper
   
   
   
     hbase.cluster.distributed
     true
   

In this setup, the data is stored in HDFS.

Make sure to create Zookeeper directory.

sudo mkdir -p /hadoop/zookeeper
sudo chown -R hadoop:hadoop /hadoop/

Now start HBase using the following command start-hbase.sh Script in HBase bin directory.

$ sudo su - hadoop
$ start-hbase.sh 
localhost: running zookeeper, logging to /usr/local/HBase/bin/../logs/hbase-hadoop-zookeeper-hbase.out
running master, logging to /usr/local/HBase/logs/hbase-hadoop-master-hbase.out
: running regionserver, logging to /usr/local/HBase/logs/hbase-hadoop-regionserver-hbase.out

Check the HBase directory in HDFS:

$ hadoop fs -ls /hbaseFound 9 itemsdrwxr-xr-x   - hadoop supergroup          0 2019-04-07 09:19 /hbase/.tmpdrwxr-xr-x   - hadoop supergroup          0 2019-04-07 09:19 /hbase/MasterProcWALsdrwxr-xr-x   - hadoop supergroup          0 2019-04-07 09:19 /hbase/WALsdrwxr-xr-x   - hadoop supergroup          0 2019-04-07 09:17 /hbase/corruptdrwxr-xr-x   - hadoop supergroup          0 2019-04-07 09:16 /hbase/datadrwxr-xr-x   - hadoop supergroup          0 2019-04-07 09:16 /hbase/hbase-rw-r--r--   1 hadoop supergroup         42 2019-04-07 09:16 /hbase/hbase.id-rw-r--r--   1 hadoop supergroup          7 2019-04-07 09:16 /hbase/hbase.versiondrwxr-xr-x   - hadoop supergroup          0 2019-04-07 09:17 /hbase/oldWALs

Step 3: Manage HMaster and HRegionServer

The HMaster server controls the HBase cluster. You can start up to 9 standby HMaster servers, which makes a total of 10 HMasters the main server.

HRegionServer manages the data in its StoreFiles according to the instructions of HMaster. Typically, each node in the cluster runs an HRegionServer. Running multiple HRegionServers on the same system is very useful for testing in pseudo-distributed mode.

Can use scripts to start and stop primary and regional servers local-master-backup.shwithlocal-regionservers.sh respectively.

$ local-master-backup.sh start 2 # Start backup HMaster$ local-regionservers.sh start 3 # Start multiple RegionServers
  • Each HMaster uses two ports (16000 and 16010 by default). Port offsets have been added to these ports, so use 2, The standby HMaster will use ports 16002 and 16012

The following command uses ports 16002/16012, 16003/16013, and 16005/16015 to start three backup servers.

$ local-master-backup.sh start 2 3 5
  • Each RegionServer requires two ports, the default ports are 16020 and 16030

The following command starts four additional RegionServers, which run on sequential ports starting from 16022/16032 (base port 16020/16030 plus 2).

$ local-regionservers.sh start 2 3 4 5

To stop, please replace start Parameters and stop For each command, followed by the server offset to stop. example.

$ local-regionservers.sh stop 5

Start HBase Shell

Before HBase Shell can be used, Hadoop and Hbase should already be running. Here is the correct sequence of starting services.

$ start-all.sh
$ start-hbase.sh

Then use the HBase shell.

[email protected]:~$ hbase shell2019-04-07 10:44:43,821 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicableSLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/usr/local/HBase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]HBase ShellUse "help" to get list of supported commands.Use "exit" to quit this interactive shell.Version 1.4.9, rd625b212e46d01cb17db9ac2e9e927fdb201afa1, Wed Dec  5 11:54:10 PST 2018hbase(main):001:0> 

Stop HBase.

stop-hbase.sh

You have successfully installed Hadoop and HBase on Ubuntu 20.04.

Reading books:


Hadoop: the definitive guide: Internet-scale storage and analysis

Hadoop: the definitive guide: Internet-scale storage and analysis

$ 59.99 $ 23.95
​​​​​​​​​​​​​​
In stock

​​​​​​​​​​​​

13 new
​​​​​​​​​​​​​​​​​​​​​​

Buy now
Install Apache Hadoop/HBase on Ubuntu 20.04Amazon.com
​​​​​​​​​​​​
​​​​​​​​​​​​
​​​​​​​​​​​​​
​​​​​​​​​​​​​​
At the end of June 1, 2020, at 8:39 am


Hadoop explained

Hadoop explained

In stock

​​​​​​​​​​​​

New 1 New

Buy now
Install Apache Hadoop/HBase on Ubuntu 20.04Amazon.com
​​​​​​​​​​​​
​​​​​​​​​​​​
​​​​​​​​​​​​​
​​​​​​​​​​​​​​
At the end of June 1, 2020, at 8:39 am
​​​​​​​​​​​​​​​​
​​​​​​​​​​​​

​​​​​​​​
​​​​​​​​

​​​​​​​​​​​​
​​​​​​​​​​​​​​​​​

​​​​​​​​​​​​​
Features

Release date2014-06-16T00: 00:00.000Z
LanguageEnglish
Pages156
Release date2014-06-16T00: 00:00.000Z
formatKindle eBook


Hadoop application architecture

Hadoop application architecture

$49.99 $44.43
​​​​​​​​​​​​​​​
In stock

​​​​​​​​​​​​

18 new 18 new
Free shipping from $37.72 to 27 used $10.68 from $37.72 starting at 27 from $37.72

Buy now
Install Apache Hadoop/HBase on Ubuntu 20.04Amazon.com
​​​​​​​​​​​​
​​​​​​​​​​​​
​​​​​​​​​​​​​
​​​​​​​​​​​​​​
At the end of June 1, 2020, at 8:39 am


HBase: the definitive guide: random access to your planet size data

HBase: the definitive guide: random access to your planet size data

$39.99 $14.75
​​​​​​​​​​​​​​​​​
In stock

​​​​​​​​​​​​

18 new 18 new
​​​​​​​​​​​​​​​​​​​

Buy now
Install Apache Hadoop/HBase on Ubuntu 20.04Amazon.com
​​​​​​​​​​​​
​​​​​​​​​​​​
​​​​​​​​​​​​​
​​​​​​​​​​​​​​
At the end of June 1, 2020, at 8:39 am
​​​​​​​​​​​​​​​​
​​​​​​​​​​​​

​​​​​​​​
​​​​​​​​

​​​​​​​​​​​​
​​​​​​​​​​​​​​​​​​​​​​​​

​​​​​​​​​​​​​
Features

Part Number978-1-4493-9610-7
Is an adult product
Version1
LanguageEnglish
Pages556
Release date2011-09-23T00: 00: 01Z


Big data: principles and best practices of scalable real-time data systems

Big data: principles and best practices of scalable real-time data systems

$49.99 $35.49
​​​​​​​​​
In stock

​​​​​​​​​​​​

11 new items
Free shipping from $10.03 used from $26.00 31 for $26.00 31

Buy now
Install Apache Hadoop/HBase on Ubuntu 20.04Amazon.com
​​​​​​​​​​​​
​​​​​​​​​​​​
​​​​​​​​​​​​​
​​​​​​​​​​​​​​
At the end of June 1, 2020, at 8:39 am
​​​​​​​​​​​​​​​​
​​​​​​​​​​​​

​​​​​​​​
​​​​​​​​

​​​​​​​​​​​​
​​​​​​​​​​​​​​​​

​​​​​​​​​​​​​
Features

Part Number43171-600463
Is an adult product
Release date2015-05-10T00: 00: 01Z
Versionthe first
LanguageEnglish
Pages328
Release date2015-05-10T00: 00: 01Z


Design data-intensive applications: the big idea behind reliable, scalable and maintainable systems

Design data-intensive applications: the big idea behind reliable, scalable and maintainable systems

$ 59.99 $ 34.99
​​​​​​​​​​​​​​​​​​​​​
In stock

​​​​​​​​​​​​

27 new items
Free shipping from $30.99 23 from $31.48 for use from $30.99 for use from $30.99

Buy now
Install Apache Hadoop/HBase on Ubuntu 20.04Amazon.com
​​​​​​​​​​​​
​​​​​​​​​​​​
​​​​​​​​​​​​​
​​​​​​​​​​​​​​
At the end of June 1, 2020, at 8:39 am
​​​​​​​​​​​​​​​​
​​​​​​​​​​​​

​​​​​​​​
​​​​​​​​

​​​​​​​​​​​​
​​​​​​​​​​​​​​​​​​

​​​​​​​​​​​​​
Features

Part Number41641073
Version1
LanguageEnglish
Pages616
Release date2017-04-11T00: 00: 01Z

reference:

The
You can download this article in PDF format to support us through the following link.

Download the guide in PDF format

turn off
The

The
The

Sidebar