Course Outline
Introduction to Machine Learning
- Types of machine learning – supervised vs unsupervised.
- Transition from statistical learning to machine learning.
- The data mining workflow: business understanding, data preparation, modeling, deployment.
- Selecting the appropriate algorithm for the task.
- Overfitting and the bias-variance tradeoff.
Python and ML Libraries Overview
- Reasons for using programming languages in ML.
- Choosing between R and Python.
- Python crash course and Jupyter Notebooks.
- Python libraries: pandas, NumPy, scikit-learn, matplotlib, seaborn.
Testing and Evaluating ML Algorithms
- Generalization, overfitting, and model validation.
- Evaluation strategies: holdout, cross-validation, bootstrapping.
- Metrics for regression: ME, MSE, RMSE, MAPE.
- Metrics for classification: accuracy, confusion matrix, unbalanced classes.
- Model performance visualization: profit curve, ROC curve, lift curve.
- Model selection and grid search for tuning.
Data Preparation
- Data import and storage in Python.
- Exploratory analysis and summary statistics.
- Handling missing values and outliers.
- Standardization, normalization, and transformation.
- Qualitative data recoding and data wrangling with pandas.
Classification Algorithms
- Binary vs multiclass classification.
- Logistic regression and discriminant functions.
- Naïve Bayes, k-nearest neighbors.
- Decision trees: CART, Random Forests, Bagging, Boosting, XGBoost.
- Support Vector Machines and kernels.
- Ensemble learning techniques.
Regression and Numerical Prediction
- Least squares and variable selection.
- Regularization methods: L1, L2.
- Polynomial regression and nonlinear models.
- Regression trees and splines.
Neural Networks
- Introduction to neural networks and deep learning.
- Activation functions, layers, and backpropagation.
- Multilayer perceptrons (MLP).
- Using TensorFlow or PyTorch for basic neural network modeling.
- Neural networks for classification and regression.
Sales Forecasting and Predictive Analytics
- Time series vs regression-based forecasting.
- Handling seasonal and trend-based data.
- Building a sales forecasting model using ML techniques.
- Evaluating forecast accuracy and uncertainty.
- Business interpretation and communication of results.
Unsupervised Learning
- Clustering techniques: k-means, k-medoids, hierarchical clustering, SOMs.
- Dimensionality reduction: PCA, factor analysis, SVD.
- Multidimensional scaling.
Text Mining
- Text preprocessing and tokenization.
- Bag-of-words, stemming, and lemmatization.
- Sentiment analysis and word frequency.
- Visualizing text data with word clouds.
Recommendation Systems
- User-based and item-based collaborative filtering.
- Designing and evaluating recommendation engines.
Association Pattern Mining
- Frequent itemsets and Apriori algorithm.
- Market basket analysis and lift ratio.
Outlier Detection
- Extreme value analysis.
- Distance-based and density-based methods.
- Outlier detection in high-dimensional data.
Machine Learning Case Study
- Understanding the business problem.
- Data preprocessing and feature engineering.
- Model selection and parameter tuning.
- Evaluation and presentation of findings.
- Deployment.
Summary and Next Steps
Requirements
- Foundational knowledge of machine learning principles, including supervised and unsupervised learning.
- Familiarity with Python programming (variables, loops, functions).
- Some experience with data handling using libraries like pandas or NumPy is beneficial but not mandatory.
- No prior experience with advanced modeling or neural networks is expected.
Audience
- Data scientists.
- Business analysts.
- Software engineers and technical professionals working with data.
Testimonials (2)
the ML ecosystem not only MLFlow but Optuna, hyperops, docker , docker-compose
Guillaume GAUTIER - OLEA MEDICAL
Course - MLflow
I enjoyed participating in the Kubeflow training, which was held remotely. This training allowed me to consolidate my knowledge for AWS services, K8s, all the devOps tools around Kubeflow which are the necessary bases to properly tackle the subject. I wanted to thank Malawski Marcin for his patience and professionalism for training and advice on best practices. Malawski approaches the subject from different angles, different deployment tools Ansible, EKS kubectl, Terraform. Now I am definitely convinced that I am going into the right field of application.