Get in Touch

Course Outline

Big Data Overview:

  • Defining Big Data
  • Reasons for the growing popularity of Big Data
  • Real-world Big Data Case Studies
  • Key Characteristics of Big Data
  • Solutions for processing Big Data

Hadoop and Its Components:

  • Introduction to Hadoop and its core components
  • Hadoop Architecture and the types of data it can handle and process
  • A brief history of Hadoop, key companies adopting it, and their motivations
  • Detailed explanation of the Hadoop Framework and its components
  • Understanding HDFS and the operations of reading from and writing to the Hadoop Distributed File System
  • Steps to set up a Hadoop Cluster in various modes: Standalone, Pseudo-distributed, and Multi-node

(This section covers establishing a Hadoop cluster using VirtualBox, KVM, or VMware, configuring the necessary network settings, starting Hadoop Daemons, and testing cluster functionality).

  • Explanation of the MapReduce framework and its operational mechanics
  • Executing MapReduce jobs on a Hadoop cluster
  • Concepts of replication, mirroring, and rack awareness within Hadoop clusters

Hadoop Cluster Planning:

  • Strategies for planning your Hadoop cluster
  • Evaluating hardware and software requirements for cluster planning
  • Analyzing workloads to prevent failures and optimize cluster performance

Introduction to MapR and Its Value:

  • Overview of MapR and its architecture
  • Exploring and working with MapR Control System, MapR Volumes, snapshots, and mirrors
  • Planning a cluster specifically for the MapR environment
  • Comparing MapR with other distributions and Apache Hadoop
  • Installation and deployment of MapR clusters

Cluster Setup and Administration:

  • Managing services, nodes, snapshots, mirrored volumes, and remote clusters
  • Understanding and managing cluster nodes
  • Gaining insight into Hadoop components and installing them alongside MapR services
  • Accessing data on the cluster, including via NFS, while managing services and nodes
  • Data management using volumes, user and group management, role assignment to nodes, node commissioning and decommissioning, cluster administration, performance monitoring, metric configuration and analysis, and MapR security administration
  • Understanding and utilizing M7 Native storage for MapR tables
  • Configuring and tuning the cluster for optimum performance

Cluster Upgrades and Integration:

  • Upgrading MapR software versions and understanding upgrade types
  • Configuring the MapR cluster to access an HDFS cluster
  • Setting up a MapR cluster on Amazon Elastic Mapreduce

All the above topics include demonstrations and practice sessions to provide learners with hands-on experience of the technology.

Requirements

  • Foundational knowledge of the Linux File System
  • Basic understanding of Java
  • Familiarity with Apache Hadoop (recommended)
 28 Hours

Number of participants


Price per participant

Testimonials (1)

Upcoming Courses

Related Categories