Get in Touch

Course Outline

Module 1: Informatica Data Engineering Management Overview

  • Core concepts of Data Engineering
  • Features of Data Engineering Management
  • Benefits of Data Engineering Management
  • Architecture of Data Engineering Management
  • Developer responsibilities in Data Engineering Management
  • New features in Data Engineering Integration 10.4

Module 2: Ingestion and Extraction in Hadoop

  • Integrating DEI with a Hadoop cluster
  • Understanding Hadoop file systems
  • Data Ingestion to HDFS and Hive using SQOOP
  • Mass Ingestion to HDFS and Hive – Initial load
  • Mass Ingestion to HDFS and Hive – Incremental load
  • Lab: Configure SQOOP to process data between Oracle (SQOOP) and HDFS
  • Lab: Configure SQOOP to process data between an Oracle database and Hive
  • Lab: Create Mapping Specifications using Mass Ingestion Service

Module 3: Native and Hadoop Engine Strategy

  • Data Engineering Integration engine strategy
  • Hive Engine architecture
  • MapReduce
  • Tez
  • Spark architecture
  • Blaze architecture
  • Lab: Execute a mapping in Spark mode
  • Lab: Connect to a Deployed Application

Module 4: Data Engineering Development Process

  • Advanced Transformations in Data Engineering Integration Python and Update Strategy
  • Hive ACID Use Case
  • Stateful Computing and Windowing
  • Lab: Create a Reusable Python Transformation
  • Lab: Create an Active Python Transformation
  • Lab: Perform Hive Upserts
  • Lab: Use Windowing Function LEAD
  • Lab: Use Windowing Function LAG
  • Lab: Create a Macro Transformation

Module 5: Complex File Processing

  • Data Engineering file formats – Avro, Parquet, JSON
  • Complex file data types – Structs, Arrays, Maps
  • Complex Configuration, Operators and Functions
  • Lab: Convert Flat File data object to an Avro file
  • Lab: Utilize complex data types – Arrays, Structs, and Maps in a mapping

Module 6: Hierarchical Data Processing

  • Hierarchical Data Processing
  • Flatten Hierarchical Data
  • Dynamic Flattening with Schema Changes
  • Hierarchical Data Processing with Schema Changes
  • Complex Configuration, Operators and Functions
  • Dynamic Ports
  • Dynamic Input Rules
  • Lab: Flatten a complex port in a Mapping
  • Lab: Build dynamic mappings using dynamic ports
  • Lab: Build dynamic mappings using input rules
  • Lab: Perform Dynamic Flattening of complex ports
  • Lab: Parse Hierarchical Data on the Spark Engine

Module 7: Mapping Optimization and Performance Tuning

  • Validation Environments
  • Execution Environment
  • Mapping Optimization
  • Mapping Recommendations and Insight
  • Scheduling, Queuing, and Node Labeling
  • Mapping Audits
  • Lab: Implement Recommendation
  • Lab: Implement Insight
  • Lab: Implement Mapping Audits

Module 8: Monitoring Logs and Troubleshooting in Hadoop

  • Hadoop Environment Logs
  • Spark Engine Monitoring
  • Blaze Engine Monitoring
  • REST Operations Hub
  • Log Aggregator
  • Troubleshooting
  • Lab: Monitor Mappings using REST Operations Hub
  • Lab: View and analyze logs using Log Aggregator

Module 9: Intelligent Structure Model

  • Intelligent Structure Discovery Overview
  • Intelligent Structure Model
  • Lab: Use an Intelligent Structure Model in a Mapping

Module 10: Databricks Overview

  • Databricks overview
  • Steps to configure Databricks
  • Databricks clusters
  • Notebooks, Jobs, and Data
  • Delta Lakes

Module 11: Databricks Integration

  • Databricks Integration
  • Components of the Informatica and the Databricks environments
  • Run-time process on the Databricks Spark Engine
  • Databricks Integration Task Flow
  • Pre-requisites for Databricks integration
  • Cluster Workflows
  • Demo: Set up Databricks connection
  • Demo: Run a mapping with Databricks Spark engine

Requirements

Developer Tool for Big Data Developers

 21 Hours

Number of participants


Price per participant

Testimonials (2)

Upcoming Courses

Related Categories