HADOOP

Bysornait

HADOOP

What is Big Data?

  • Need for a different technique for Data Storage
  • Need for a different paradigm for Data Analysis
  • The 3 V’s of Big Data
  • Different distributions of Hadoop

The Case for Apache Hadoop

  • A Brief History of Hadoop
  • Core Hadoop Components
  • Fundamental Concepts
  • Hadoop Eco-Systems – Overview

The Hadoop Distributed File System

  • HDFS Features
  • HDFS Design Assumptions
  • Overview of HDFS Architecture
  • Writing and Reading Files
  • Hands-On Exercise

MapReduce

  • What Is MapReduce?
  • Features of MapReduce
  • Basic MapReduce Concepts
  • Architectural Overview
  • What is a Combiner?
  • What is a Practitioner?
  • Hands-On Exercise

An Overview of the Hadoop Ecosystem

  • What is the Hadoop Ecosystem?
  • Integration Tools
  • Analysis Tools
  • Data Storage and Retrieval Tools

Planning your Hadoop Cluster

  • General planning Considerations
  • Choosing the Right Hardware
  • Network Considerations
  • Configuring Nodes

Hadoop Installation

  • Deployment Types
  • Installing Hadoop
  • Basic Configuration Parameters
  • Hands-On Exercise on a Pseudo – Cluster
  • Hands-On Exercise on a Multi-Node Cluster

Advanced Configuration

  • Advanced Parameters
  • core-site.xml parameters
  • mapred-site.xml parameters
  • hdfs-site.xml parameters
  • Configuring Rack Awareness

Hadoop Security

  • Why Hadoop Security Is Important
  • Hadoop’ s Security System Concepts
  • What Kerberos Is and How it Works
  • Integrating a Secure Cluster with Other Systems

Managing and Scheduling Jobs

  • Managing Running Jobs
  • Hands-On Exercise
  • The FIFO Scheduler
  • The Fair Scheduler
  • The Capacity Scheduler
  • Configuring the Fair Scheduler
  • Evaluating the different schedulers
  • Hands-On Exercise

Cluster Maintenance

  • Checking HDFS Status
  • Hands-On Exercise
  • Copying Data Between Clusters
  • Adding and Removing Cluster Nodes
  • Rebalancing the Cluster
  • Hands-On Exercise
  • Name Node Metadata Backup
  • Cluster Upgrading

Cluster Monitoring and Troubleshooting

  • General System Monitoring
  • Managing Hadoop’s Log Files
  • Using the Name Node and Job Tracker Web UIs
  • Hands-On Exercise
  • Cluster Monitoring with Ganglia
  • Common Troubleshooting Issues
  • Benchmarking Your Cluster

Installing and Managing Other Hadoop Projects

  • Hive
  • Pig
  • Hbase
  • Oozie