Get in Touch

Course Outline

Introduction:

  • The Role of Apache Spark in the Hadoop Ecosystem
  • Brief Overview of Python and Scala

Core Concepts (Theory):

  • Architecture
  • Resilient Distributed Datasets (RDDs)
  • Transformations and Actions
  • Stages, Tasks, and Dependencies

Hands-on Workshop: Mastering the Basics in Databricks:

  • Exercises with the RDD API
  • Fundamental action and transformation functions
  • PairRDDs
  • Join Operations
  • Caching Strategies
  • Exercises with the DataFrame API
  • Spark SQL
  • DataFrame Operations: select, filter, group, sort
  • User-Defined Functions (UDFs)
  • Exploring the Dataset API
  • Streaming

Hands-on Workshop: Deployment in AWS Environment:

  • Introduction to AWS Glue
  • Comparing AWS EMR and AWS Glue
  • Practical Job Examples in Both Environments
  • Pros and Cons Analysis

Additional Topics:

  • Introduction to Apache Airflow for Orchestration

Requirements

Programming skills (preferably in Python or Scala)

Basic knowledge of SQL

 21 Hours

Number of participants


Price per participant

Testimonials (3)

Upcoming Courses

Related Categories