Get in Touch

Course Outline

Introduction

  • Introduction to Cloud Computing and Big Data solutions.
  • Overview of Apache Hadoop features and architecture.

Setting up Hadoop

  • Planning a Hadoop cluster (on-premise, cloud, etc.).
  • Selecting the operating system and Hadoop distribution.
  • Provisioning resources (hardware, network, etc.).
  • Downloading and installing the software.
  • Sizing the cluster for flexibility.

Working with HDFS

  • Understanding the Hadoop Distributed File System (HDFS).
  • Overview of HDFS Command Reference.
  • Accessing HDFS.
  • Performing basic file operations on HDFS.
  • Using S3 as a complement to HDFS.

Overview of MapReduce

  • Understanding data flow in the MapReduce framework.
  • Map, Shuffle, Sort, and Reduce.
  • Demo: Computing Top Salaries.

Working with YARN

  • Understanding resource management in Hadoop.
  • Working with ResourceManager, NodeManager, and Application Master.
  • Scheduling jobs under YARN.
  • Scheduling for large numbers of nodes and clusters.
  • Demo: Job scheduling.

Integrating Hadoop with Spark

  • Setting up storage for Spark (HDFS, Amazon S3, NoSQL, etc.).
  • Understanding Resilient Distributed Datasets (RDDs).
  • Creating an RDD.
  • Implementing RDD transformations.
  • Demo: Implementing a Text Search Program for Movie Titles.

Managing a Hadoop Cluster

  • Monitoring Hadoop.
  • Securing a Hadoop cluster.
  • Adding and removing nodes.
  • Running a performance benchmark.
  • Tuning a Hadoop cluster to optimize performance.
  • Backup, recovery, and business continuity planning.
  • Ensuring high availability (HA).

Upgrading and Migrating a Hadoop Cluster

  • Assessing workload requirements.
  • Upgrading Hadoop.
  • Moving from on-premise to cloud and vice-versa.
  • Recovering from failures.

Troubleshooting

Summary and Conclusion

Requirements

  • Experience in system administration.
  • Familiarity with the Linux command line.
  • Understanding of big data concepts.

Audience

  • System administrators.
  • Database Administrators (DBAs).
 35 Hours

Number of participants


Price per participant

Testimonials (3)

Upcoming Courses

Related Categories