Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction:
- The Role of Apache Spark in the Hadoop Ecosystem
- Brief Overview of Python and Scala
Core Concepts (Theory):
- Architecture
- Resilient Distributed Datasets (RDDs)
- Transformations and Actions
- Stages, Tasks, and Dependencies
Hands-on Workshop: Mastering the Basics in Databricks:
- Exercises with the RDD API
- Fundamental action and transformation functions
- PairRDDs
- Join Operations
- Caching Strategies
- Exercises with the DataFrame API
- Spark SQL
- DataFrame Operations: select, filter, group, sort
- User-Defined Functions (UDFs)
- Exploring the Dataset API
- Streaming
Hands-on Workshop: Deployment in AWS Environment:
- Introduction to AWS Glue
- Comparing AWS EMR and AWS Glue
- Practical Job Examples in Both Environments
- Pros and Cons Analysis
Additional Topics:
- Introduction to Apache Airflow for Orchestration
Requirements
Programming skills (preferably in Python or Scala)
Basic knowledge of SQL
21 Hours
Testimonials (3)
Having hands on session / assignments
Poornima Chenthamarakshan - Intelligent Medical Objects
Course - Apache Spark in the Cloud
1. Right balance between high level concepts and technical details. 2. Andras is very knowledgeable about his teaching. 3. Exercise
Steven Wu - Intelligent Medical Objects
Course - Apache Spark in the Cloud
Get to learn spark streaming , databricks and aws redshift