Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
1: HDFS (17%)
- Explain the roles of HDFS Daemons.
- Describe the typical operation of an Apache Hadoop cluster, covering both data storage and processing capabilities.
- Identify current computing system characteristics that drive the need for frameworks like Apache Hadoop.
- Classify the primary objectives of HDFS design.
- Determine the appropriate use case for HDFS Federation based on specific scenarios.
- Identify the components and daemons within an HDFS High Availability Quorum cluster.
- Analyze the role of HDFS security mechanisms, specifically Kerberos.
- Select the optimal data serialization method for given scenarios.
- Describe the data flow paths for file reading and writing operations.
- Identify the commands used to manage files via the Hadoop File System Shell.
2: YARN and MapReduce version 2 (MRv2) (17%)
- Understand the impact of upgrading a cluster from Hadoop 1 to Hadoop 2 on cluster configuration.
- Learn how to deploy MapReduce v2 (MRv2) with YARN, including the configuration of all YARN daemons.
- Grasp the fundamental design strategy behind MapReduce v2 (MRv2).
- Determine how YARN manages resource allocation.
- Identify the workflow of MapReduce jobs executing on YARN.
- Determine the necessary file modifications required to migrate a cluster from MapReduce version 1 (MRv1) to MapReduce version 2 (MRv2) on YARN.
3: Hadoop Cluster Planning (16%)
- Discuss key considerations for selecting hardware and operating systems to host an Apache Hadoop cluster.
- Analyze options available when choosing an operating system.
- Understand kernel tuning and disk swapping processes.
- Identify suitable hardware configurations based on specific scenarios and workload patterns.
- Determine the essential ecosystem components required for a cluster to meet Service Level Agreements (SLA) in a given scenario.
- Perform cluster sizing: based on a scenario and execution frequency, identify workload specifics, including CPU, memory, storage, and disk I/O requirements.
- Address disk sizing and configuration, including JBOD versus RAID, SANs, virtualization, and specific disk sizing needs within a cluster.
- Network Topologies: comprehend network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design components for a given scenario.
4: Hadoop Cluster Installation and Administration (25%)
- Identify how a cluster handles disk and machine failures in specific scenarios.
- Analyze logging configurations and the format of logging configuration files.
- Understand the basics of Hadoop metrics and cluster health monitoring.
- Identify the function and purpose of available cluster monitoring tools.
- Install all ecosystem components in CDH 5, including but not limited to: Impala, Flume, Oozie, Hue, Manager, Sqoop, Hive, and Pig.
- Identify the function and purpose of tools available for managing the Apache Hadoop file system.
5: Resource Management (10%)
- Understand the overarching design goals of each Hadoop scheduler.
- Determine how the FIFO Scheduler allocates cluster resources in a given scenario.
- Determine how the Fair Scheduler allocates cluster resources under YARN in a given scenario.
- Determine how the Capacity Scheduler allocates cluster resources in a given scenario.
6: Monitoring and Logging (15%)
- Understand the functions and features of Hadoop’s metric collection capabilities.
- Analyze the NameNode and JobTracker Web UIs.
- Learn how to monitor cluster Daemons.
- Identify and monitor CPU usage on master nodes.
- Describe methods for monitoring swap and memory allocation across all nodes.
- Identify procedures for viewing and managing Hadoop’s log files.
- Interpret log file contents.
Requirements
- Fundamental skills in Linux system administration
- Basic programming proficiency
35 Hours
Testimonials (3)
I genuinely enjoyed the many hands-on sessions.
Jacek Pieczatka
Course - Administrator Training for Apache Hadoop
I genuinely enjoyed the big competences of Trainer.
Grzegorz Gorski
Course - Administrator Training for Apache Hadoop
I mostly liked the trainer giving real live Examples.