Core Concepts
This page defines the key concepts and entities in AIchor. Understanding these before exploring the rest of the documentation makes everything else easier to follow.
Organisations
An organisation is the top-level tenant in AIchor. All resources (projects, experiments, engines, VCS providers, storage, and users) belong to exactly one organisation. Organisations are fully isolated from one another.
Each organisation gets a dedicated URL:
https://<organisation-name>.aichor.ai
Aichor architecture overview
AIchor is split into two layers:
- Control Plane: the AIchor platform itself — the web interface, backend, infrastructure manager and monitoring stack. Managed by Instadeep.
- Engines: the compute infrastructure where the experiment workflow (i.e., cloning, building and running) executes.
This separation means a single AIchor instance can manage workloads across multiple engines and organisations while keeping environments fully isolated.
Engines
An engine is a compute environment registered with AIchor. It is the infrastructure where experiments run. Engines can be:
| Type | Examples |
|---|---|
| Kubernetes cluster | GKE, EKS, AKS, or on-premise clusters |
| AWS ParallelCluster | HPC-style cluster managed by AWS |
Engines are managed by the infrastructure administrator. They can be created through AIchor or imported from an existing cloud account or on-prem infra. Multiple engines can coexist in one organisation, and each project can be assigned to multiple engines simultaneously.
See the Engines section for setup instructions.
Projects
A project links a VCS repository (GitHub, GitLab, Bitbucket, or Azure DevOps) to an engine. It is the unit of access control: team members are invited at the project level.
When a project is created AIchor provisions the necessary infrastructure so experiments can be triggered in an engine and the workflow will be executed
Additionally, each project has dedicated storage for datasets and experiment data:
- Object storage buckets backed by a cloud provider (e.g. AWS S3) available for all engine types
- Persistent Volume Claims (PVCs) — available for Kubernetes-based engines, useful for faster data access in comparison with cloud buckets.
See the Projects section for setup instructions.
Experiments
An experiment is a single workload run within a project. Experiments can be triggered in two ways:
- A git commit with an
EXPorexpprefix pushed to the linked repository - A submission via the AIchor CLI, using
aichor experiments submit localto submit from a local repository oraichor experiments submit commit-shato submit a specific commit
Each experiment moves through a series of steps: cloning the repository, building the Docker image and pushing it to an artifact registry and finally running the workload on the desired engine.
The user can observe the progress of the experiment through the steps in the UI. Each step will show logs. For the running step, the user can also access the container running the experiment for debugging purposes, as well as check its compute resource utilization. Experiments can also show a Tensorboard or Ray dashboard, depending on the execution runtime being used.
See the Experiments section for more details.
The AIchor team maintains a GitHub repository of demo projects ready to run on AIchor.
The Manifest File
Each experiment is configured by a manifest.yaml file in the repository. It is the single source of configuration for a workload: it defines the Docker image to build, the execution runtime being used, resources requests, any storage volumes to attach, and optional settings like timeouts, kubernetes node annotations, etc.
AIchor reads this file every time an experiment is triggered, so changing the manifest is how resource requirements, commands, or ML framework settings are updated between runs.
See the Manifest File Reference for the full specification and examples.
Supported Execution Runtimes
AIchor natively supports several execution runtimes. The runtime is specified per experiment in the manifest.yaml via the operator field and AIchor takes care of scheduling, process coordination, and environment variable injection automatically.
Supported runtimes are Ray, PyTorch, JAX, and XGBoost.
On Kubernetes engines, AIchor also supports JobSet — a Kubernetes-native API for framework-agnostic distributed workloads. A JobSet groups multiple Kubernetes Jobs into one managed unit. Use it when your workload doesn't map to Ray, PyTorch, JAX, or XGBoost — for example, custom MPI pipelines or multi-host batch jobs.
See Kubernetes Execution Runtimes for the full list of supported runtimes and configuration details.
User Roles
| Role | Responsibilities |
|---|---|
| User | Triggers and monitors experiments, manages experiment variables, accesses project storage |
| Organization Admin | Full control over the organisation |
| Infrastructure Admin | Manages compute engines and cloud providers |
| FinOps Admin | Reviews usage and cost across the organisation |
See Account Types for how accounts are created and Roles & Permissions for a full breakdown.