Skip to main content

Cloud engine management

Prior to creating projects to run workloads on AIchor, Administrators need to specify the engine where the jobs will be executed.

If the organisation does not have a predefined provider, they can choose to run their jobs on an engine created and managed by AIchor by clicking on Basic and just specifying a name for the engine as well as the maximum resources they would like to set as a compute limit.

Otherwise, depending on the organisation set-up, one or more engines can be created or imported on different cloud providers.

Note: The case of importing on-premise is detailed in the section "Import On Premise engine": https://docs.aichor.ai/docs/user-manual/import-on-prem-cluster

To start the creation/import process, the administrator selects "Advanced" and "In the Cloud".

alt text

Next, depending on the target engine, the administrator selects one of the supported cloud providers.

GCP

If the cloud provider selected is GCP, the administrator can either create a GKE engine through AIchor or import an existing one.

Note All the regions available on GCP are supported by AIchor.

If the administrator would like AIchor to create a GKE cluster, then he needs to select "Create engine":

alt text

And provide the below information:

alt text

The deployment process lasts up to 15 minutes, once the process is finished, the engine should be displayed as ready on the engine management page.

If the administrator would like to import an existing GKE cluster created beforehand, then "Import Existing Engine" is selected:

alt text

And provide the below information:

alt text

In the case of importing an engine, it is the administrator's responsibility to upgrade/patch the cluster.

AWS

If the cloud provider selected is AWS, the administrator can either create a EKS engine through AIchor or import an existing one.

Note All the regions available on AWS are supported by AIchor.

If the administrator would like AIchor to create an EKS cluster, then he needs to select "Create engine":

alt text

And provide the below information:

alt text

The deployment process lasts up to 15 minutes, once the process is finished, the engine should be displayed as ready on the engine management page.

Note The created EKS engine is created with the default AWS configuration: in a single region and multi-AZ.

If the administrator would like to import an existing EKS cluster created beforehand, then "Import Existing Engine" is selected:

alt text

And provide the below information:

alt text

In the case of importing an engine, it is the administrator's responsibility to upgrade/patch the cluster.

If the administrator would like to import an existing AWS ParallelCluster cluster created beforehand, then ParallelCluster Engine type is selected and finally "Import Existing Engine": alt text

And provide the below information:

alt text

Note AIchor does not require an account with root privileges to perform engine creation or to execute jobs.

MS Azure

If the cloud provider selected is Azure, the administrator can create a AKS engine through AIchor.

Note All the regions available on Azure are supported by AIchor.

If the administrator would like AIchor to create an AKS cluster, then he needs to select "Create engine":

alt text

And provide the below information:

alt text The deployment process lasts up to 15 minutes, once the process is finished, the engine should be displayed as ready on the engine management page.

Forms Fields

Field nameDescriptionExample value
Engine nameOpen text for the administrator
EnvironmentTag to be passed to IaaC - Required for specific organisations upon InstaDeep recommendation - Can be ignored
RegionDrop down list (Region for the cloud provider)London
Project IDProject ID (GCP)
Service accountService account created by the GCP administrator to access
cloud provider account
CIDR RangeValue will be passed for VPC creation
GKE/EKS Cluster nameGKE Cluster name on GCP (EKS on AWS)
API Hostnamekube-api endpoint - found in the ~/.kube/confige2b53b98-975d-4be7-9f5b-8dfd6b765d32
External IPUsed for GCP for webhook setup
Storage classUsed for Import k8s cluster
Service tokenTo be deprecated - No static token for Cloud providers
Certificates: Server nameTBC - can be removed
Certificate fileTBC - can be removed
Certificate keyTBC - can be removed
CA FileCertificate on the ~/.kube/config
Assume Role ARNRole ARN created on provider side (AWS) with specific permissions (*)arn:aws:iam::account-id:role/role-name
Base hostSpecific for organisations upon InstaDeep recommendation - Can be ignored
Parallel Cluster NameName of the pCluster on AWS
Head Node IPHead node ip address (API Hostname equivalent for ParallelCluster)
Slurm versionneeded for Slurm api - used when a parallelCluster is importedv0.0.39
EFS Mount dirmounted directory used by parallelCluster/mnt/shared/
Slurm userUser used for slurm tasksslurm
Azure subscriptionTarget MS Azure subscription (TBC upon WIM tests)
Azure Service PrincipalEquivalent of service account in GCP (TBC upon WIM tests)
Azure Resource Group NameEquivalent of Projects on GCP (TBC upon WIM tests)
Load balancer DNSNLB Used for AWS EKS import

(*) Permissions need to be listed.