Get in Touch

Course Outline

Quick Overview

  • Data Sources
  • Data Stewardship
  • Recommender systems
  • Target Marketing

Datatypes

  • Structured vs unstructured data
  • Static vs streamed data
  • Attitudinal, behavioural, and demographic data
  • Data-driven vs user-driven analytics
  • Data validity
  • Volume, velocity, and variety of data

Models

  • Building models
  • Statistical Models
  • Machine learning

Data Classification

  • Clustering
  • k-Groups, k-means, and nearest neighbours
  • Bio-inspired models (ant colonies, birds flocking)

Predictive Models

  • Decision trees
  • Support vector machines
  • Naive Bayes classification
  • Neural networks
  • Markov Models
  • Regression
  • Ensemble methods

ROI

  • Benefit-to-cost ratio
  • Software costs
  • Development costs
  • Potential benefits

Building Models

  • Data Preparation (MapReduce)
  • Data cleansing
  • Selecting appropriate methods
  • Model development
  • Model testing
  • Model evaluation
  • Model deployment and integration

Overview of Open Source and commercial software

  • Selection of R-project packages
  • Python libraries
  • Hadoop and Mahout
  • Selected Apache projects related to Big Data and Analytics
  • Selected commercial solutions
  • Integration with existing software and data sources

Requirements

A solid understanding of traditional data management and analysis methods, such as SQL, data warehouses, business intelligence, OLAP, and similar concepts, is required. Familiarity with basic statistics and probability theory (including mean, variance, probability, conditional probability, etc.) is also necessary.

 21 Hours

Number of participants


Price per participant

Testimonials (2)

Upcoming Courses

Related Categories