Essential Skills Gained

Implement Hadoop Distributed File System (HDFS) for secure storage.

Execute MapReduce operations for processing large data files.

Administer and configure Hadoop ecosystem components effectively.

Gain hands-on experience with Apache Pig, Hive, and HBase.

Format

Instructor-led
5 days with lectures and hands-on labs.

Audience

Big Data Developers
System Administrators
Data Engineers
IT Professionals

Description

Apache Hadoop: is a set of algorithms (an open-source software framework written in Java) for distributed storage and distributed processing of very large data sets (Big Data) on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are commonplace and thus should be automatically handled in software by the framework. What is covered in this course:

Hadoop Distributed File System (HDFS)
HDFS Operations
HDFS Management
MapReduce
MapReduce Types & Formats
Counters
Hadoop Administration
Apache Pig Installation & Configuration
Hands on Pig
Apache Hive Installation & Configuration
Hands of Hive
Apache HBase
Hands on HBase
Apache Zookeeper
Apache SQOOP
Apache Flume
Cloudera
HortonWorks

Upcoming Course Dates

No upcoming dates. Please check back later.

Course Outline

Download PDF

Hadoop Installation

Download and install the JDK & JRE
Download and Install Apache Hadoop
Add a Path to the profile
Configure SSH
Configure Common HDFS and MapReduce Configurations
Format NameNode and Launch Hadoop Daemons

HDFS Operations

Copy a File from local to HDFS
List Files Directory in HDFS
Copy a File from HDFS to Local
CAT Remove a File Directory from HDFS
Using Administrative Tools
Access NameNode via Web User Interface

Managing HDFS

Sourcing data from various Locations
Using Hadoop Archives
Parallel Copying with distcp
HDFS Upgrade Process
Configuring Rack Awareness

MapReduce

Installing Eclipse
Creating a Mapper Class
Creating a Reducer Class
Creating a Driver Class
Packing jar and Running MapReduce
Accessing Job Tracker via Web User Interface

MapReduce Types & Formats

Running a Default MapReducer Job
Default Mapper
Default Partitioner
Default Reducer
Running a Streaming MapReduce Job

Counters

Understanding Counters
Writing User Defined Counters

Hadoop Administration

Finding Logs
Directory structures of HDFS Components
Commissioning & Decommissioning slave nodes
Optimizing configuration settings
Using Teragen to generate data sets
Using Terasort to Benchmark Hadoop Cluster

Apache Pig Installation & Configuration

Downloading Apache Pig
Installing Apache Pig
Configuring Apache Pig
Starting Pig in Local Mode
Starting Pig in MapReduce Mode
Running a Pig Script

Hands on Pig

Loading & Storing
Filtering & Transforming
Grouping & Sorting
Combining & Splitting
Writing User Defined Functions
Using Diagnostic Operations

Apache Hive Installation & Configuration

Downloading Apache Hive
Installing Apache Hive
Configuring Apache Hive
Creating a Table in Hive
Loading data into the Table
Running HiveQL Statements

Hands on Hive

Creating Tables (Managed & External)
Using Partitions
Creating Views
Creating Indexes
Writing a Hive UDF

Apache HBase

Downloading Apache HBase
Installing Apache HBase
Configuring Apache HBase
Creating a Table in Apache HBase

Hands on HBase

Installing HBase in Fully Distributed Mode
Creating a Table in HBase using HBase Shell
Loading Data in HBase using Pig
Running Hive Queries on HBase Tables
Using REST Server

Apache Zookeeper

Downloading Apache Zookeeper
Installing Apache Zookeeper
Configuring Apache Zookeeper
Using Zookeeper in CLI to perform functions

Apache SQOOP

Downloading Apache SQOOP
Installing Apache SQOOP
Configuring Apache SQOOP
Downloading MySQL Connector for SQOOP
Importing Data from RDBMS to HDFS and Hive
Exporting Data from HDFS to RDBMS

Apache Flume

Downloading Apache Flume
Installing Apache Flume
Configuring Apache Flume
Setting up Twitter Developer Accounts for API Keys
Setting the .conf file to stream data to HDFS
Streaming Twitter data to HDFS

Cloudera

Download & Install VMWare Player on Windows
Download Cloudera CDH VM
Load Cloudera CDH using VMWare Player
Using Cloudera Manager
Using Cloudera HUE
Exploring Cloudera CDH VM

Horton Works

Download the HDP 2.1 Sandbox
Load HDP 2.1 Sandbox using VMWare Player
Getting Started with HDP 2.1 Sandbox
Using Apache Ambari

Introduction to Apache Hadoop