Get in Touch

Course Outline

1: HDFS (17%)

  • Explain the roles of HDFS Daemons.
  • Describe the typical operation of an Apache Hadoop cluster, covering both data storage and processing capabilities.
  • Identify current computing system characteristics that drive the need for frameworks like Apache Hadoop.
  • Classify the primary objectives of HDFS design.
  • Determine the appropriate use case for HDFS Federation based on specific scenarios.
  • Identify the components and daemons within an HDFS High Availability Quorum cluster.
  • Analyze the role of HDFS security mechanisms, specifically Kerberos.
  • Select the optimal data serialization method for given scenarios.
  • Describe the data flow paths for file reading and writing operations.
  • Identify the commands used to manage files via the Hadoop File System Shell.

2: YARN and MapReduce version 2 (MRv2) (17%)

  • Understand the impact of upgrading a cluster from Hadoop 1 to Hadoop 2 on cluster configuration.
  • Learn how to deploy MapReduce v2 (MRv2) with YARN, including the configuration of all YARN daemons.
  • Grasp the fundamental design strategy behind MapReduce v2 (MRv2).
  • Determine how YARN manages resource allocation.
  • Identify the workflow of MapReduce jobs executing on YARN.
  • Determine the necessary file modifications required to migrate a cluster from MapReduce version 1 (MRv1) to MapReduce version 2 (MRv2) on YARN.

3: Hadoop Cluster Planning (16%)

  • Discuss key considerations for selecting hardware and operating systems to host an Apache Hadoop cluster.
  • Analyze options available when choosing an operating system.
  • Understand kernel tuning and disk swapping processes.
  • Identify suitable hardware configurations based on specific scenarios and workload patterns.
  • Determine the essential ecosystem components required for a cluster to meet Service Level Agreements (SLA) in a given scenario.
  • Perform cluster sizing: based on a scenario and execution frequency, identify workload specifics, including CPU, memory, storage, and disk I/O requirements.
  • Address disk sizing and configuration, including JBOD versus RAID, SANs, virtualization, and specific disk sizing needs within a cluster.
  • Network Topologies: comprehend network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design components for a given scenario.

4: Hadoop Cluster Installation and Administration (25%)

  • Identify how a cluster handles disk and machine failures in specific scenarios.
  • Analyze logging configurations and the format of logging configuration files.
  • Understand the basics of Hadoop metrics and cluster health monitoring.
  • Identify the function and purpose of available cluster monitoring tools.
  • Install all ecosystem components in CDH 5, including but not limited to: Impala, Flume, Oozie, Hue, Manager, Sqoop, Hive, and Pig.
  • Identify the function and purpose of tools available for managing the Apache Hadoop file system.

5: Resource Management (10%)

  • Understand the overarching design goals of each Hadoop scheduler.
  • Determine how the FIFO Scheduler allocates cluster resources in a given scenario.
  • Determine how the Fair Scheduler allocates cluster resources under YARN in a given scenario.
  • Determine how the Capacity Scheduler allocates cluster resources in a given scenario.

6: Monitoring and Logging (15%)

  • Understand the functions and features of Hadoop’s metric collection capabilities.
  • Analyze the NameNode and JobTracker Web UIs.
  • Learn how to monitor cluster Daemons.
  • Identify and monitor CPU usage on master nodes.
  • Describe methods for monitoring swap and memory allocation across all nodes.
  • Identify procedures for viewing and managing Hadoop’s log files.
  • Interpret log file contents.

Requirements

  • Fundamental skills in Linux system administration
  • Basic programming proficiency
 35 Hours

Number of participants


Price per participant

Testimonials (3)

Upcoming Courses

Related Categories