Introduction to Apache Hadoop

Master the essentials of Apache Hadoop, a crucial Big Data ecosystem framework, designed for developers and administrators looking to efficiently manage and process large datasets using distributed storage and computing power.

Course Thumbnail

Essential Skills Gained

Checkmark

Implement Hadoop Distributed File System (HDFS) for secure storage.

Checkmark

Execute MapReduce operations for processing large data files.

Checkmark

Administer and configure Hadoop ecosystem components effectively.

Checkmark

Gain hands-on experience with Apache Pig, Hive, and HBase.

Format

  • Instructor-led
  • 5 days with lectures and hands-on labs.

Audience

  • Big Data Developers
  • System Administrators
  • Data Engineers
  • IT Professionals

Description

Apache Hadoop:  is a set of algorithms (an open-source software framework written in Java) for distributed storage and distributed processing of very large data sets (Big Data) on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are commonplace and thus should be automatically handled in software by the framework. What is covered in this course:

  • Hadoop Distributed File System (HDFS)
  • HDFS Operations
  • HDFS Management
  • MapReduce
  • MapReduce Types & Formats
  • Counters
  • Hadoop Administration
  • Apache Pig Installation & Configuration
  • Hands on Pig
  • Apache Hive Installation & Configuration
  • Hands of Hive
  • Apache HBase
  • Hands on HBase
  • Apache Zookeeper
  • Apache SQOOP
  • Apache Flume
  • Cloudera
  • HortonWorks

Calendar icon

Upcoming Course Dates

No upcoming dates. Please check back later.

Course Outline

Download PDF

Hadoop Installation

  1. Download and install the JDK & JRE

  2. Download and Install Apache Hadoop

  3. Add a Path to the profile

  4. Configure SSH

  5. Configure Common HDFS and MapReduce Configurations

  6. Format NameNode and Launch Hadoop Daemons

HDFS Operations

  1. Copy a File from local to HDFS

  2. List Files Directory in HDFS

  3. Copy a File from HDFS to Local

  4. CAT Remove a File Directory from HDFS

  5. Using Administrative Tools

  6. Access NameNode via Web User Interface

Managing HDFS

  1. Sourcing data from various Locations

  2. Using Hadoop Archives

  3. Parallel Copying with distcp

  4. HDFS Upgrade Process

  5. Configuring Rack Awareness

MapReduce

  1. Installing Eclipse

  2. Creating a Mapper Class

  3. Creating a Reducer Class

  4. Creating a Driver Class

  5. Packing jar and Running MapReduce

  6. Accessing Job Tracker via Web User Interface

MapReduce Types & Formats

  1. Running a Default MapReducer Job

  2. Default Mapper

  3. Default Partitioner

  4. Default Reducer

  5. Running a Streaming MapReduce Job

Counters

  1. Understanding Counters

  2. Writing User Defined Counters

Hadoop Administration

  1. Finding Logs

  2. Directory structures of HDFS Components

  3. Commissioning & Decommissioning slave nodes

  4. Optimizing configuration settings

  5. Using Teragen to generate data sets

  6. Using Terasort to Benchmark Hadoop Cluster

Apache Pig Installation & Configuration

  1. Downloading Apache Pig

  2. Installing Apache Pig

  3. Configuring Apache Pig

  4. Starting Pig in Local Mode

  5. Starting Pig in MapReduce Mode

  6. Running a Pig Script

Hands on Pig

  1. Loading & Storing

  2. Filtering & Transforming

  3. Grouping & Sorting

  4. Combining & Splitting

  5. Writing User Defined Functions

  6. Using Diagnostic Operations

Apache Hive Installation & Configuration

  1. Downloading Apache Hive

  2. Installing Apache Hive

  3. Configuring Apache Hive

  4. Creating a Table in Hive

  5. Loading data into the Table

  6. Running HiveQL Statements

Hands on Hive

  1. Creating Tables (Managed & External)

  2. Using Partitions

  3. Creating Views

  4. Creating Indexes

  5. Writing a Hive UDF

Apache HBase

  1. Downloading Apache HBase

  2. Installing Apache HBase

  3. Configuring Apache HBase

  4. Creating a Table in Apache HBase

Hands on HBase

  1. Installing HBase in Fully Distributed Mode

  2. Creating a Table in HBase using HBase Shell

  3. Loading Data in HBase using Pig

  4. Running Hive Queries on HBase Tables

  5. Using REST Server

Apache Zookeeper

  1. Downloading Apache Zookeeper

  2. Installing Apache Zookeeper

  3. Configuring Apache Zookeeper

  4. Using Zookeeper in CLI to perform functions

Apache SQOOP

  1. Downloading Apache SQOOP

  2. Installing Apache SQOOP

  3. Configuring Apache SQOOP

  4. Downloading MySQL Connector for SQOOP

  5. Importing Data from RDBMS to HDFS and Hive

  6. Exporting Data from HDFS to RDBMS

Apache Flume

  1. Downloading Apache Flume

  2. Installing Apache Flume

  3. Configuring Apache Flume

  4. Setting up Twitter Developer Accounts for API Keys

  5. Setting the .conf file to stream data to HDFS

  6. Streaming Twitter data to HDFS

Cloudera

  1. Download & Install VMWare Player on Windows

  2. Download Cloudera CDH VM

  3. Load Cloudera CDH using VMWare Player

  4. Using Cloudera Manager

  5. Using Cloudera HUE

  6. Exploring Cloudera CDH VM

Horton Works

  1. Download the HDP 2.1 Sandbox

  2. Load HDP 2.1 Sandbox using VMWare Player

  3. Getting Started with HDP 2.1 Sandbox

  4. Using Apache Ambari

Your Team has Unique Training Needs.

Your team deserves training as unique as they are.

Let us tailor the course to your needs at no extra cost.