Core Concepts

This page defines the key concepts and entities in AIchor. Understanding these before exploring the rest of the documentation makes everything else easier to follow.

Organisations

An organisation is the top-level tenant in AIchor. All resources (projects, experiments, engines, VCS providers, storage, and users) belong to exactly one organisation. Organisations are fully isolated from one another.

Each organisation gets a dedicated URL:

https://<organisation-name>.aichor.ai

Aichor architecture overview

AIchor is split into two layers:

Control Plane: the AIchor platform itself — the web interface, backend, infrastructure manager and monitoring stack. Managed by Instadeep.
Engines: the compute infrastructure where the experiment workflow (i.e., cloning, building and running) executes.

This separation means a single AIchor instance can manage workloads across multiple engines and organisations while keeping environments fully isolated.

Engines

An engine is a compute environment registered with AIchor. It is the infrastructure where experiments run. Engines can be:

Type	Examples
Kubernetes cluster	GKE, EKS, AKS, or on-premise clusters
AWS ParallelCluster	HPC-style cluster managed by AWS

Engines are managed by the infrastructure administrator. They can be created through AIchor or imported from an existing cloud account or on-prem infra. Multiple engines can coexist in one organisation, and each project can be assigned to multiple engines simultaneously.

See the Engines section for setup instructions.

Projects

A project links a VCS repository (GitHub, GitLab, Bitbucket, or Azure DevOps) to an engine. It is the unit of access control: team members are invited at the project level.

When a project is created AIchor provisions the necessary infrastructure so experiments can be triggered in an engine and the workflow will be executed

Additionally, each project has dedicated storage for datasets and experiment data:

Object storage buckets backed by a cloud provider (e.g. AWS S3) available for all engine types
Persistent Volume Claims (PVCs) — available for Kubernetes-based engines, useful for faster data access in comparison with cloud buckets.

See the Projects section for setup instructions.

Experiments

An experiment is a single workload run within a project. Experiments can be triggered in two ways:

A git commit with an EXP or exp prefix pushed to the linked repository
A submission via the AIchor CLI, using aichor experiments submit local to submit from a local repository or aichor experiments submit commit-sha to submit a specific commit

Each experiment moves through a series of steps: cloning the repository, building the Docker image and pushing it to an artifact registry and finally running the workload on the desired engine.

The user can observe the progress of the experiment through the steps in the UI. Each step will show logs. For the running step, the user can also access the container running the experiment for debugging purposes, as well as check its compute resource utilization. Experiments can also show a Tensorboard or Ray dashboard, depending on the execution runtime being used.

See the Experiments section for more details.

The AIchor team maintains a GitHub repository of demo projects ready to run on AIchor.

The Manifest File

Each experiment is configured by a manifest.yaml file in the repository. It is the single source of configuration for a workload: it defines the Docker image to build, the execution runtime being used, resources requests, any storage volumes to attach, and optional settings like timeouts, kubernetes node annotations, etc.

AIchor reads this file every time an experiment is triggered, so changing the manifest is how resource requirements, commands, or ML framework settings are updated between runs.

See the Manifest File Reference for the full specification and examples.

Supported Execution Runtimes

AIchor natively supports several execution runtimes. The runtime is specified per experiment in the manifest.yaml via the operator field and AIchor takes care of scheduling, process coordination, and environment variable injection automatically.

Supported runtimes are Ray, PyTorch, JAX, and XGBoost.

On Kubernetes engines, AIchor also supports JobSet — a Kubernetes-native API for framework-agnostic distributed workloads. A JobSet groups multiple Kubernetes Jobs into one managed unit. Use it when your workload doesn't map to Ray, PyTorch, JAX, or XGBoost — for example, custom MPI pipelines or multi-host batch jobs.

See Kubernetes Execution Runtimes for the full list of supported runtimes and configuration details.

User Roles

Role	Responsibilities
User	Triggers and monitors experiments, manages experiment variables, accesses project storage
Organization Admin	Full control over the organisation
Infrastructure Admin	Manages compute engines and cloud providers
FinOps Admin	Reviews usage and cost across the organisation

See Account Types for how accounts are created and Roles & Permissions for a full breakdown.

Organisations​

Aichor architecture overview​

Engines​

Projects​

Experiments​

The Manifest File​

Supported Execution Runtimes​

User Roles​