EXO: End-to-End Local AI Cluster Deployment Training Course
EXO is an open-source framework designed to interconnect Apple Silicon devices into a distributed AI cluster, allowing for the local inference of frontier models that exceed the memory capacity of a single device.
This instructor-led, live training (available online or onsite) targets system administrators and DevOps engineers looking to deploy, configure, and manage EXO clusters for private LLM inference across multiple Apple Silicon or Linux nodes.
Upon completion of this training, participants will be able to:
- Install and configure EXO on both macOS and Linux nodes.
- Activate automatic device discovery and establish multi-node clusters.
- Enable and verify RDMA over Thunderbolt 5 to achieve ultra-low-latency communication between devices.
- Deploy frontier models (including DeepSeek, Qwen, and Llama) across clustered devices.
- Monitor cluster health and troubleshoot common deployment challenges.
Course Format
- Interactive lectures and discussions.
- Extensive exercises and practical practice.
- Hands-on implementation within a live-lab environment.
Customization Options for the Course
- To request customized training, please contact us to arrange.
Course Outline
Introduction to EXO and Local AI Clustering
- Overview of the EXO framework and the exo-explore ecosystem.
- Comparing centralized cloud inference with distributed local inference.
- Architecture: libp2p device discovery, MLX backend, dashboard, and API layers.
- Hardware requirements: Apple Silicon (M3 Ultra, M4 Pro/Max), Thunderbolt 5, and shared storage.
Installing EXO on macOS
- Setting up Xcode, the Metal Toolchain, and other macOS prerequisites.
- Installing uv, Node.js, and the Rust nightly toolchain.
- Installing the pinned macmon fork for Apple Silicon monitoring.
- Cloning the repository and building the dashboard using npm.
- Running EXO from source and verifying the localhost:52415 dashboard.
Installing EXO on Linux
- Installing dependencies via apt or Homebrew on Linux.
- Configuring uv, Node.js 18+, and Rust nightly.
- Building the dashboard and running EXO in CPU-only mode.
- Directory layout: XDG Base Directory paths for config, data, cache, and logs.
Automatic Device Discovery and Cluster Formation
- Understanding libp2p-based auto-discovery across local networks.
- Configuring custom namespaces using EXO_LIBP2P_NAMESPACE for cluster isolation.
- Verifying node membership in the dashboard cluster view.
- Handling discovery failures and network segmentation issues.
Enabling RDMA over Thunderbolt 5
- Understanding RDMA architecture and the claimed 99 percent latency reduction.
- Enabling RDMA in macOS Recovery mode using rdma_ctl.
- Cable requirements and port topology constraints on Mac Studio.
- Ensuring macOS versions match across all cluster nodes.
- Troubleshooting RDMA discovery and DHCP configuration.
Deploying Frontier Models
- Using the dashboard to load and shard DeepSeek v3.1, Qwen3-235B, and Llama family models.
- Previewing instance placements via the /instance/previews API endpoint.
- Creating model instances with pipeline or tensor-parallel sharding.
- Configuring custom model cards from the HuggingFace hub.
Monitoring and Troubleshooting
- Reading EXO logs and understanding distributed tracing.
- Interpreting cluster health in the dashboard cluster view.
- Diagnosing worker node failures and reconnection behavior.
- Using EXO_TRACING_ENABLED for performance bottleneck analysis.
Cluster Maintenance and Updates
- Updating EXO binaries and procedures for rebuilding the dashboard.
- Migrating model caches and managing pre-downloaded models over NFS.
- Gracefully removing nodes and rebalancing workloads.
Requirements
- A solid understanding of networking fundamentals (IP addresses, subnetting, firewalls).
- Practical experience with command-line administration on macOS or Linux.
- Familiarity with Python package management (pip/uv) and Node.js tooling.
Audience
- System administrators.
- DevOps engineers.
- AI infrastructure architects responsible for on-premise LLM deployment.
Open Training Courses require 5+ participants.
EXO: End-to-End Local AI Cluster Deployment Training Course - Booking
EXO: End-to-End Local AI Cluster Deployment Training Course - Enquiry
EXO: End-to-End Local AI Cluster Deployment - Consultancy Enquiry
Upcoming Courses
Related Courses
Advanced LangGraph: Optimization, Debugging, and Monitoring Complex Graphs
35 HoursLangGraph is a framework designed for building stateful, multi-actor LLM applications through composable graphs that maintain persistent state and provide precise control over execution.
This instructor-led live training, available online or onsite, targets advanced-level AI platform engineers, DevOps for AI professionals, and ML architects who aim to optimize, debug, monitor, and operate production-grade LangGraph systems.
Upon completing this training, participants will be able to:
- Design and optimize complex LangGraph topologies to enhance speed, reduce costs, and ensure scalability.
- Engineer system reliability through retries, timeouts, idempotency, and checkpoint-based recovery mechanisms.
- Debug and trace graph executions, inspect state, and systematically reproduce production issues.
- Instrument graphs with logs, metrics, and traces, deploy them to production, and monitor SLAs and associated costs.
Course Format
- Interactive lectures and discussions.
- Extensive exercises and practical practice.
- Hands-on implementation within a live-lab environment.
Course Customization Options
- To request customized training for this course, please contact us to arrange details.
Building Coding Agents with Devstral: From Agent Design to Tooling
14 HoursDevstral is an open-source framework designed for building and running coding agents that can interact with codebases, developer tools, and APIs to enhance engineering productivity.
This instructor-led, live training (online or onsite) is aimed at intermediate-level to advanced-level ML engineers, developer-tooling teams, and SREs who wish to design, implement, and optimize coding agents using Devstral.
By the end of this training, participants will be able to:
- Set up and configure Devstral for coding agent development.
- Design agentic workflows for codebase exploration and modification.
- Integrate coding agents with developer tools and APIs.
- Implement best practices for secure and efficient agent deployment.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Open-Source Model Ops: Self-Hosting, Fine-Tuning and Governance with Devstral & Mistral Models
14 HoursDevstral and Mistral represent open-source AI technologies engineered for flexible deployment, fine-tuning capabilities, and scalable integration.
This instructor-led live training, available online or onsite, targets intermediate to advanced-level machine learning engineers, platform teams, and research engineers seeking to self-host, fine-tune, and govern Mistral and Devstral models within production environments.
Upon completion of this training, participants will be equipped to:
- Establish and configure self-hosted environments for Mistral and Devstral models.
- Utilize fine-tuning techniques to optimize performance for specific domains.
- Implement versioning, monitoring, and lifecycle governance protocols.
- Ensure security, compliance, and responsible usage of open-source models.
Course Format
- Interactive lectures and discussions.
- Hands-on exercises focused on self-hosting and fine-tuning.
- Live-lab implementation of governance and monitoring pipelines.
Customization Options
- To arrange a customized training session for this course, please contact us directly.
Fiji: Image Processing for Biotechnology and Toxicology
14 HoursThis instructor-led, live training in Slovakia (online or onsite) is aimed at beginner-level to intermediate-level researchers and laboratory professionals who wish to process and analyze images related to histological tissues, blood cells, algae, and other biological samples.
By the end of this training, participants will be able to:
- Navigate the Fiji interface and utilize ImageJ’s core functions.
- Preprocess and enhance scientific images for better analysis.
- Analyze images quantitatively, including cell counting and area measurement.
- Automate repetitive tasks using macros and plugins.
- Customize workflows for specific image analysis needs in biological research.
LangGraph Applications in Finance
35 HoursLangGraph serves as a framework designed for constructing stateful, multi-actor LLM applications through composable graphs that maintain persistent state and provide precise control over execution.
This instructor-led, live training session, available both online and onsite, targets intermediate to advanced professionals seeking to design, implement, and operate finance solutions based on LangGraph, ensuring proper governance, observability, and compliance.
Upon completion of this training, participants will be equipped to:
- Design finance-specific LangGraph workflows that align with regulatory and audit requirements.
- Integrate financial data standards and ontologies into graph states and tooling.
- Implement reliability, safety mechanisms, and human-in-the-loop controls for critical processes.
- Deploy, monitor, and optimize LangGraph systems to enhance performance, manage costs, and meet SLAs.
Course Format
- Interactive lectures and discussions.
- Extensive exercises and practical application.
- Hands-on implementation within a live laboratory environment.
Customization Options
- For customized training tailored to your specific needs, please contact us to arrange the details.
LangGraph Foundations: Graph-Based LLM Prompting and Chaining
14 HoursLangGraph serves as a framework designed for constructing LLM applications structured as graphs, supporting features such as planning, branching, tool integration, memory management, and controlled execution.
This instructor-led, live training (available online or onsite) is tailored for developers at the beginner level, prompt engineers, and data practitioners who aim to design and build reliable, multi-step LLM workflows using LangGraph.
Upon completion of this training, participants will be able to:
- Explain the core concepts of LangGraph (nodes, edges, and state) and identify appropriate use cases for each.
- Construct prompt chains that support branching, tool invocation, and memory retention.
- Integrate retrieval mechanisms and external APIs into graph-based workflows.
- Test, debug, and evaluate LangGraph applications to ensure reliability and safety.
Format of the Course
- Interactive lectures and facilitated discussions.
- Guided labs and code walkthroughs within a sandbox environment.
- Scenario-based exercises focusing on design, testing, and evaluation.
Course Customization Options
- To request customized training for this course, please contact us to make arrangements.
LangGraph in Healthcare: Workflow Orchestration for Regulated Environments
35 HoursLangGraph facilitates stateful, multi-actor workflows driven by Large Language Models (LLMs), offering precise control over execution paths and state persistence. In the healthcare sector, these capabilities are essential for ensuring compliance, enhancing interoperability, and developing decision-support systems that align with medical workflows.
This instructor-led, live training (available online or onsite) targets intermediate to advanced professionals seeking to design, implement, and manage LangGraph-based healthcare solutions while addressing regulatory, ethical, and operational challenges.
Upon completing this training, participants will be able to:
- Design healthcare-specific LangGraph workflows with compliance and auditability in mind.
- Integrate LangGraph applications with medical ontologies and standards (FHIR, SNOMED CT, ICD).
- Apply best practices for reliability, traceability, and explainability in sensitive environments.
- Deploy, monitor, and validate LangGraph applications in healthcare production settings.
Course Format
- Interactive lectures and discussions.
- Hands-on exercises using real-world case studies.
- Implementation practice in a live-lab environment.
Customization Options
- To request a customized training session for this course, please contact us to arrange.
LangGraph for Legal Applications
35 HoursLangGraph is a framework designed for constructing stateful, multi-actor LLM applications as composable graphs, featuring persistent state and precise execution control.
This instructor-led training, available online or on-site, targets intermediate to advanced professionals seeking to design, implement, and manage LangGraph-based legal solutions with the necessary compliance, traceability, and governance controls.
Upon completion of this training, participants will be able to:
- Design legal-specific LangGraph workflows that ensure auditability and compliance.
- Integrate legal ontologies and document standards into graph state and processing.
- Implement guardrails, human-in-the-loop approvals, and traceable decision paths.
- Deploy, monitor, and maintain LangGraph services in production environments with observability and cost controls.
Course Format
- Interactive lectures and discussions.
- Numerous exercises and practical sessions.
- Hands-on implementation within a live lab environment.
Customization Options
- To request customized training for this course, please contact us to arrange.
Building Dynamic Workflows with LangGraph and LLM Agents
14 HoursLangGraph serves as a framework designed for assembling graph-structured LLM workflows that facilitate branching, tool utilization, memory management, and controlled execution.
This instructor-led, live training (available online or onsite) targets intermediate-level engineers and product teams seeking to integrate LangGraph’s graph logic with LLM agent loops to create dynamic, context-aware applications, such as customer support agents, decision trees, and information retrieval systems.
Upon completing this training, participants will be capable of:
- Designing graph-based workflows that coordinate LLM agents, tools, and memory.
- Implementing conditional routing, retries, and fallback mechanisms for robust execution.
- Integrating retrieval processes, APIs, and structured outputs into agent loops.
- Evaluating, monitoring, and hardening agent behavior to ensure reliability and safety.
Format of the Course
- Interactive lecture and facilitated discussion.
- Guided labs and code walkthroughs conducted in a sandbox environment.
- Scenario-based design exercises and peer reviews.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
LangGraph for Marketing Automation
14 HoursLangGraph is a graph-based orchestration framework that facilitates conditional, multi-step workflows involving LLMs and tools, making it ideal for automating and personalizing content pipelines.
This instructor-led, live training (available online or onsite) is designed for intermediate-level marketers, content strategists, and automation developers who want to implement dynamic, branching email campaigns and content generation pipelines using LangGraph.
By the end of this training, participants will be able to:
- Design graph-structured content and email workflows with conditional logic.
- Integrate LLMs, APIs, and data sources for automated personalization.
- Manage state, memory, and context across multi-step campaigns.
- Evaluate, monitor, and optimize workflow performance and delivery outcomes.
Format of the Course
- Interactive lectures and group discussions.
- Hands-on labs implementing email workflows and content pipelines.
- Scenario-based exercises on personalization, segmentation, and branching logic.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Le Chat Enterprise: Private ChatOps, Integrations & Admin Controls
14 HoursLe Chat Enterprise offers a secure, customizable, and governed conversational AI solution designed for organizations. It supports RBAC, SSO, connectors, and enterprise app integrations.
This instructor-led training (online or onsite) targets intermediate-level product managers, IT leads, solution engineers, and security/compliance teams interested in deploying, configuring, and governing Le Chat Enterprise.
By the end of this training, participants will be able to:
- Set up and configure Le Chat Enterprise for secure deployments.
- Enable RBAC, SSO, and compliance-driven controls.
- Integrate Le Chat with enterprise applications and data stores.
- Design and implement governance and admin playbooks for ChatOps.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Cost-Effective LLM Architectures: Mistral at Scale (Performance / Cost Engineering)
14 HoursMistral represents a high-performance family of large language models, specifically optimized for cost-effective large-scale production deployment.
This instructor-led training session, available online or on-site, targets advanced infrastructure engineers, cloud architects, and MLOps leads who aim to design, deploy, and optimize Mistral-based architectures to achieve maximum throughput with minimal costs.
Upon completing this training, participants will be equipped to:
- Deploy scalable patterns for Mistral Medium 3.
- Utilize batching, quantization, and efficient serving strategies.
- Optimize inference expenses while preserving performance levels.
- Design production-ready serving topologies for enterprise workloads.
Course Format
- Interactive lectures and discussions.
- Extensive exercises and practical practice.
- Hands-on implementation within a live laboratory environment.
Customization Options
- To request customized training for this course, please contact us to make arrangements.
Productizing Conversational Assistants with Mistral Connectors & Integrations
14 HoursMistral AI provides an open-source AI platform that empowers teams to construct and incorporate conversational assistants into both enterprise operations and customer-facing workflows.
This instructor-led training session, available both online and on-site, is designed for beginner to intermediate-level product managers, full-stack developers, and integration engineers who aim to design, integrate, and bring to market conversational assistants utilizing Mistral connectors and integrations.
Upon completion of this training, participants will be able to:
- Connect Mistral conversational models with enterprise and SaaS connectors.
- Implement retrieval-augmented generation (RAG) to ensure accurate, grounded responses.
- Design user experience patterns for both internal and external chat assistants.
- Deploy assistants into product workflows to address real-world use cases.
Course Format
- Interactive lectures and discussions.
- Practical hands-on integration exercises.
- Live laboratory development of conversational assistants.
Customization Options
- To request customized training for this course, please reach out to us to arrange details.
Enterprise-Grade Deployments with Mistral Medium 3
14 HoursMistral Medium 3 is a high-performance, multimodal large language model designed for production-grade deployment across enterprise environments.
This instructor-led, live training (online or onsite) is aimed at intermediate-level to advanced-level AI/ML engineers, platform architects, and MLOps teams who wish to deploy, optimize, and secure Mistral Medium 3 for enterprise use cases.
By the end of this training, participants will be able to:
- Deploy Mistral Medium 3 using API and self-hosted options.
- Optimize inference performance and costs.
- Implement multimodal use cases with Mistral Medium 3.
- Apply security and compliance best practices for enterprise environments.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Mistral for Responsible AI: Privacy, Data Residency & Enterprise Controls
14 HoursMistral AI offers an open-source, enterprise-grade platform designed to support secure, compliant, and responsible artificial intelligence deployments.
This instructor-led training, available both online and onsite, targets intermediate professionals including compliance leads, security architects, and legal or operations stakeholders. The course focuses on applying responsible AI practices within Mistral by utilizing privacy safeguards, data residency protocols, and enterprise control mechanisms.
Upon completion, participants will be equipped to:
- Deploy privacy-preserving techniques using Mistral.
- Execute data residency strategies to ensure regulatory compliance.
- Configure enterprise-level controls such as Role-Based Access Control (RBAC), Single Sign-On (SSO), and audit logging.
- Assess vendor and deployment options to align with compliance requirements.
Course Format
- Interactive lectures and group discussions.
- Case studies and exercises centered on compliance.
- Practical implementation of enterprise AI controls.
Customization Options
- For tailored training requests, please contact us to arrange.