Skip to main content

Technical requirements

This section outlines the necessary specifications, resources, and configurations required for the successful deployment of AIchor on cloud engines. It provides detailed descriptions of accounts, permissions, network, and parameters needed, ensuring compatibility and performance standards are met. This section is critical for guiding AIchor administrators through the technical foundation, ensuring all components function cohesively within the intended environment. Understanding and meeting these requirements is essential for maintaining system reliability, scalability, and expected behaviour. Depending on the cloud provider and whether the administrator intends to create or import an engine, requirements might vary.

Sensitive data storage

Sensitive data used for trainings is securely stored in designated storage buckets to ensure data privacy and compliance with security standards. These buckets are configured with strict access controls and server-side encryption protocols to safeguard the information from unauthorized access.

Additionally, sensitive environment variables, which include configuration details specified by the users to run AIchor experiments, are stored in a dedicated database. This database is protected by encryption and role-based access mechanisms to ensure that only authorized processes or users can retrieve or modify these variables, ensuring system integrity and security.

AWS

The initial requirement to run workloads on a target AWS environment is to have an AWS account available beforehand.

Some services created by AIchor or by the client administrator are public resources although accessing them requires restricted permissions.

  • S3 buckets: created by AIchor and accessible by specific users/experiments who have access to respective AIchor projects
  • ECR: registry created by AIchor for each project created
  • Public VPC: either created by AIchor (create EKS case) or by the administrator (import EKS case)

The resources below need to be created either by AIchor or the administrator on the client account if the engine is deployed on the client AWS account:

ResourcesCreationImport
EKS (engine)Created by AIchorCreated by administrator
VPCCreated by AIchorCreated by administrator
SubnetCreated by AIchorCreated by administrator
Internet GatewayCreated by AIchorCreated by administrator
RouteCreated by AIchorCreated by administrator
Route tableCreated by AIchorCreated by administrator
Node groupCreated by AIchorCreated by administrator
IAM/OIDC/Instance profile (*)Created by AIchorCreated by administrator
Queue/SQSCreated by AIchorCreated by administrator
KarpenterCreated by AIchorCreated by administrator
EFSCreated by AIchorCreated by administrator
ParallelCluster (engine)Created by AIchorCreated by administrator
State machineCreated by AIchorCreated by administrator
Lambda functionsCreated by AIchorCreated by administrator
Step functionsCreated by AIchorCreated by administrator

(*) The permissions below are required for AIchor to be able to import EKS engines Annex1.

Import EKS engine

To be able to import an existing EKS cluster, the below conditions have to be met:

  • An ARN role with the policies allowing the actions specified in Annex1 applied. Those policies allow AIchor to perform all expected tasks on the target AWS account such as:
    • Create and manage the required IAM roles for AIchor
    • Create and manage storage (S3) buckets on the target account
    • Create and manage docker registry (ECR) on the target account
    • Manage EKS clusters on the target account

Note Those policies are being optimized and a more tailored list of permissions will be published.

  • Whitelist the following hostnames from the target engine to allow traffic between AIchor and the target engine

    • instadeep-infra.eu.auth0.com (authentication)
    • ichorai.eu.auth0.com (authentication)
    • *.aichor.ai (AIchor)
  • Provide the storageclass, this is an input on the form when an EKS cluster is being imported.

  • NLB public ip address to access the EKS engine

Create EKS engine

To be able to create an existing EKS cluster, the below condition has to be met:

  • An ARN role with the policies specified in Annex1 applied. Those policies allow AIchor to perform all expected tasks on the target AWS account such as:
    • Create and manage the required IAM roles for AIchor
    • Create and manage storage (S3) buckets on the target account
    • Create and manage docker registry (ECR) on the target account
    • Create and Manage EKS clusters on the target account

AIchor in the creation scenario creates the required resources when assuming the ARN role on the target account in the specified region (Input parameter on the engine creation form).

Annexes

Annex 1

      "iam:CreateRole",
"iam:CreatePolicy",
"iam:CreatePolicyVersion",
"iam:AttachRolePolicy",
"iam:PutRolePolicy",
"iam:UpdateAssumeRolePolicy",
"iam:UpdateRole",
"iam:GetRole",
"iam:GetPolicy",
"iam:GetPolicyVersion",
"iam:ListAttachedRolePolicies",
"iam:ListRolePolicies",
"iam:ListInstanceProfilesForRole",
"iam:ListPolicyVersions",
"iam:GetRolePolicy",
"iam:TagRole",
"iam:ListRoleTags",
"iam:UntagRole",
"iam:TagPolicy",
"iam:UntagPolicy",
"iam:DeleteRole",
"iam:DeletePolicy",
"iam:DeletePolicyVersion",
"iam:DetachRolePolicy",
"iam:DeleteRolePolicy"

"s3:DeleteAccessPoint",
"s3:DeleteAccessPointForObjectLambda",
"s3:GetStorageLensGroup",
"s3:PutLifecycleConfiguration",
"s3:PutObjectTagging",
"s3:DeleteObject",
"s3:PutAccessPointPolicyForObjectLambda",
"s3:GetBucketWebsite",
"s3:DeleteStorageLensConfigurationTagging",
"s3:GetObjectAttributes",
"s3:DeleteObjectVersionTagging",
"s3:InitiateReplication",
"s3:GetObjectLegalHold",
"s3:GetBucketNotification",
"s3:DeleteBucketPolicy",
"s3:GetReplicationConfiguration",
"s3:DescribeMultiRegionAccessPointOperation",
"s3:PutObject",
"s3:PutBucketNotification",
"s3:PutObjectVersionAcl",
"s3:PutBucketObjectLockConfiguration",
"s3:PutAccessPointPolicy",
"s3:GetStorageLensDashboard",
"s3:GetLifecycleConfiguration",
"s3:UntagResource",
"s3:GetBucketTagging",
"s3:GetInventoryConfiguration",
"s3:GetAccessPointPolicyForObjectLambda",
"s3:ReplicateTags",
"s3:ListBucket",
"s3:AbortMultipartUpload",
"s3:PutBucketTagging",
"s3:DeleteBucket",
"s3:PutBucketVersioning",
"s3:ListBucketMultipartUploads",
"s3:PutIntelligentTieringConfiguration",
"s3:PutMetricsConfiguration",
"s3:PutStorageLensConfigurationTagging",
"s3:PutObjectVersionTagging",
"s3:GetBucketVersioning",
"s3:GetAccessPointConfigurationForObjectLambda",
"s3:PutInventoryConfiguration",
"s3:ObjectOwnerOverrideToBucketOwner",
"s3:GetStorageLensConfiguration",
"s3:DeleteStorageLensConfiguration",
"s3:PutBucketWebsite",
"s3:PutBucketRequestPayment",
"s3:PutObjectRetention",
"s3:CreateAccessPointForObjectLambda",
"s3:GetBucketCORS",
"s3:DeleteAccessPointPolicy",
"s3:GetObjectVersion",
"s3:PutAnalyticsConfiguration",
"s3:PutAccessPointConfigurationForObjectLambda",
"s3:GetObjectVersionTagging",
"s3:CreateBucket",
"s3:GetStorageLensConfigurationTagging",
"s3:ReplicateObject",
"s3:GetObjectAcl",
"s3:GetBucketObjectLockConfiguration",
"s3:DeleteBucketWebsite",
"s3:GetIntelligentTieringConfiguration",
"s3:DeleteAccessPointPolicyForObjectLambda",
"s3:GetObjectVersionAcl",
"s3:PutBucketAcl",
"s3:DeleteObjectTagging",
"s3:GetBucketPolicyStatus",
"s3:GetObjectRetention",
"s3:TagResource",
"s3:PutObjectLegalHold",
"s3:PutBucketCORS",
"s3:ListMultipartUploadParts",
"s3:GetObject",
"s3:PutBucketLogging",
"s3:GetAnalyticsConfiguration",
"s3:GetObjectVersionForReplication",
"s3:GetAccessPointForObjectLambda",
"s3:CreateAccessPoint",
"s3:PutAccelerateConfiguration",
"s3:DeleteObjectVersion",
"s3:GetBucketLogging",
"s3:ListBucketVersions",
"s3:RestoreObject",
"s3:GetAccelerateConfiguration",
"s3:GetObjectVersionAttributes",
"s3:GetBucketPolicy",
"s3:ListTagsForResource",
"s3:PutEncryptionConfiguration",
"s3:GetEncryptionConfiguration",
"s3:GetObjectVersionTorrent",
"s3:GetBucketRequestPayment",
"s3:GetAccessPointPolicyStatus",
"s3:DeleteStorageLensGroup",
"s3:GetObjectTagging",
"s3:GetBucketOwnershipControls",
"s3:GetMetricsConfiguration",
"s3:PutObjectAcl",
"s3:GetBucketPublicAccessBlock",
"s3:PutBucketPublicAccessBlock",
"s3:GetAccessPointPolicyStatusForObjectLambda",
"s3:UpdateStorageLensGroup",
"s3:PutBucketOwnershipControls",
"s3:GetBucketAcl",
"s3:BypassGovernanceRetention",
"s3:GetObjectTorrent",
"s3:PutBucketPolicy",
"s3:GetBucketLocation",
"s3:GetAccessPointPolicy",
"s3:ReplicateDelete"

"s3:ListStorageLensConfigurations",
"s3:ListAccessPointsForObjectLambda",
"s3:GetAccessPoint",
"s3:PutAccountPublicAccessBlock",
"s3:GetAccountPublicAccessBlock",
"s3:ListAllMyBuckets",
"s3:ListAccessPoints",
"s3:PutAccessPointPublicAccessBlock",
"s3:CreateStorageLensGroup",
"s3:PutStorageLensConfiguration",
"s3:ListMultiRegionAccessPoints",
"s3:ListStorageLensGroups"

"ecr:CreateRepository",
"ecr:ListTagsForResource",
"ecr:TagResource",
"ecr:UntagResource",
"ecr:SetRepositoryPolicy",
"ecr:PutLifecyclePolicy",
"ecr:PutImageTagMutability",
"ecr:GetRepositoryPolicy",
"ecr:GetLifecyclePolicy",
"ecr:DescribeRepositories",
"ecr:DeleteRepositoryPolicy",
"ecr:DeleteRepository",
"ecr:DeleteLifecyclePolicy"

"eks:DescribeCluster"

"ssm:GetParameter",
"ssm:GetParameters",
"ssm:ListTagsForResource"
"ssm:DescribeParameters"

"kms:Decrypt"

GCP

Create GCP engine

To be able to create a Kubernetes cluster on GCP, the below condition has to be met:

First it is necessary to create a dedicated service account for AIchor:

gcloud iam service-accounts create aichor-sa-create \
--description="Service account for AIchor cluster creation" \
--display-name="aichor-sa-create"

A Gcp Service account with specific permissions. Those permissions allow AIchor to perform all expected tasks on the target GCP account such as:

  • Create and manage the required IAM roles for AIchor
  • Create and manage storage buckets on the target account
  • Create and manage docker registry on the target account
  • Create and Manage Kubernetes clusters on the target account

This service account must have the following roles:

  • Kubernetes Engine Admin (Allows AIchor to create kubernetes resources on the dataplane.)
  • Service Account Token Creator (Impersonate service accounts, create OAuth2 access tokens, sign blobs or JWTs, etc.)
  • Service Account User (Run operations as the service account.)
gcloud projects add-iam-policy-binding <PROJECT_ID> \
--member="serviceAccount:aichor-sa-create@<PROJECT_ID>.iam.gserviceaccount.com" \
--role="roles/container.admin"

gcloud projects add-iam-policy-binding <PROJECT_ID> \
--member="serviceAccount:aichor-sa-create@<PROJECT_ID>.iam.gserviceaccount.com" \
--role="roles/iam.serviceAccountTokenCreator"

gcloud projects add-iam-policy-binding <PROJECT_ID> \
--member="serviceAccount:aichor-sa-create@<PROJECT_ID>.iam.gserviceaccount.com" \
--role="roles/iam.serviceAccountUser"

For better security, AIchor also requires specific permissions. You can create a custom role granting only the minimum required permissions: First, prepare a JSON file (aichor_cluster_creator.json) with the following content:

  {
"title": "AichorClusterCreator",
"description": "Custom role for Aichor cluster created with required permissions",
"stage": "GA",
"includedPermissions": [
"artifactregistry.repositories.create",
"artifactregistry.repositories.delete",
"artifactregistry.repositories.get",
"artifactregistry.repositories.getIamPolicy",
"artifactregistry.repositories.setIamPolicy",
"compute.addresses.create",
"compute.addresses.delete",
"compute.addresses.get",
"compute.instanceGroupManagers.get",
"compute.networks.create",
"compute.networks.delete",
"compute.networks.get",
"compute.networks.update",
"compute.networks.updatePolicy",
"compute.routers.create",
"compute.routers.delete",
"compute.routers.get",
"compute.routers.update",
"compute.subnetworks.create",
"compute.subnetworks.delete",
"compute.subnetworks.get",
"iam.serviceAccountKeys.create",
"iam.serviceAccountKeys.get",
"iam.serviceAccounts.create",
"iam.serviceAccounts.delete",
"iam.serviceAccounts.get",
"iam.serviceAccounts.getIamPolicy",
"iam.serviceAccounts.setIamPolicy",
"resourcemanager.projects.getIamPolicy",
"resourcemanager.projects.setIamPolicy",
"storage.objects.create",
"storage.objects.list",
"storage.objects.delete",
"storage.buckets.create",
"storage.buckets.delete",
"storage.buckets.get",
"storage.buckets.getIamPolicy",
"storage.buckets.setIamPolicy",
"storage.buckets.update",
"storage.hmacKeys.create",
"storage.hmacKeys.delete",
"storage.hmacKeys.get",
"storage.hmacKeys.update",
"storage.anywhereCaches.get",
"storage.anywhereCaches.list",
"compute.addresses.setLabels"
]}

Then create the custom role with your json file

gcloud iam roles create aichorClusterCreator \
--project=<PROJECT_ID> \
--file=aichor_cluster_creator.json

And bind the custom role to your service account

gcloud projects add-iam-policy-binding <PROJECT_ID> \
--member="serviceAccount:aichor-sa-create@<PROJECT_ID>.iam.gserviceaccount.com" \
--role="projects/<PROJECT_ID>/roles/aichorClusterCreator"

It's needed to grant the AIchor service account permission to impersonate your GCP service account, the <AICHOR_FEDERATION_SERVICE_ACCOUNT> and <AICHOR_PROD_PROJECT_ID> values will be provided upon request.

gcloud iam service-accounts add-iam-policy-binding \
aichor-sa-create@<PROJECT_ID>.iam.gserviceaccount.com\
--member='serviceAccount:<AICHOR_FEDERATION_SERVICE_ACCOUNT>@<AICHOR_PROD_PROJECT_ID>.iam.gserviceaccount.com' \
--role='roles/iam.serviceAccountTokenCreator'

Import GCP engine

To be able to import an existing GCP cluster, the below conditions have to be met:

First it is necessary to create a dedicated service account for AIchor:

gcloud iam service-accounts create aichor-sa-import \
--description="Service account for AIchor cluster import" \
--display-name="aichor-sa-import"

A Gcp Service account with specific permissions. Those permissions allow AIchor to perform all expected tasks on the target GCP account such as:

  • Create and manage the required IAM roles for AIchor
  • Create and manage storage buckets on the target account
  • Create and manage docker registry on the target account
  • Manage Kubernetes clusters on the target account

This service account must have the following roles:

  • Kubernetes Engine Admin (Allows AIchor to create kubernetes resources on the dataplane.)
  • Service Account Token Creator (Impersonate service accounts, create OAuth2 access tokens, sign blobs or JWTs, etc.)
gcloud projects add-iam-policy-binding <PROJECT_ID> \
--member="serviceAccount:aichor-sa-import@<PROJECT_ID>.iam.gserviceaccount.com" \
--role="roles/container.admin"

gcloud projects add-iam-policy-binding <PROJECT_ID> \
--member="serviceAccount:aichor-sa-import@<PROJECT_ID>.iam.gserviceaccount.com" \
--role="roles/iam.serviceAccountTokenCreator"

For better security, AIchor also requires specific permissions. You can create a custom role granting only the minimum required permissions: First, prepare a JSON file (aichor_cluster_importer.json) with the following content:

  {
"title": "AichorClusterImporter",
"description": "Custom role for AIchor cluster import with required permissions",
"stage": "GA",
"includedPermissions": [
"artifactregistry.repositories.create",
"artifactregistry.repositories.delete",
"artifactregistry.repositories.get",
"artifactregistry.repositories.getIamPolicy",
"artifactregistry.repositories.setIamPolicy",
"iam.serviceAccountKeys.create",
"iam.serviceAccountKeys.delete",
"iam.serviceAccountKeys.get",
"iam.serviceAccounts.create",
"iam.serviceAccounts.delete",
"iam.serviceAccounts.get",
"iam.serviceAccounts.getAccessToken",
"iam.serviceAccounts.getIamPolicy",
"iam.serviceAccounts.setIamPolicy",
"resourcemanager.projects.getIamPolicy",
"resourcemanager.projects.setIamPolicy",
"storage.objects.create",
"storage.objects.list",
"storage.objects.delete",
"storage.buckets.create",
"storage.buckets.delete",
"storage.buckets.get",
"storage.buckets.getIamPolicy",
"storage.buckets.setIamPolicy",
"storage.buckets.update",
"storage.hmacKeys.create",
"storage.hmacKeys.delete",
"storage.hmacKeys.get",
"storage.hmacKeys.update",
"storage.anywhereCaches.get",
"storage.anywhereCaches.list",
"compute.addresses.setLabels"
]}

Then create the custom role with your json file

gcloud iam roles create aichorClusterImporter \
--project=<PROJECT_ID> \
--file=aichor_cluster_importer.json

And bind the custom role to your service account

gcloud projects add-iam-policy-binding <PROJECT_ID> \
--member="serviceAccount:aichor-sa-import@<PROJECT_ID>.iam.gserviceaccount.com" \
--role="projects/<PROJECT_ID>/roles/aichorClusterImporter"

It's needed to grant the AIchor service account permission to impersonate your GCP service account, the <AICHOR_FEDERATION_SERVICE_ACCOUNT> and <AICHOR_PROD_PROJECT_ID> values will be provided upon request.

gcloud iam service-accounts add-iam-policy-binding \
aichor-sa-import@<PROJECT_ID>.iam.gserviceaccount.com\
--member='serviceAccount:<AICHOR_FEDERATION_SERVICE_ACCOUNT>@<AICHOR_PROD_PROJECT_ID>.iam.gserviceaccount.com' \
--role='roles/iam.serviceAccountTokenCreator'

AZURE

Create AKS engine

To be able to create a Kubernetes cluster on Azure, the below condition has to be met:

A Azure app registration with specific permissions. Those permissions allow AIchor to perform all expected tasks on the target Azure subscription such as:

  • Create and manage the required IAM roles for AIchor
  • Create and manage storage (Blobs) on the target subscription
  • Create and manage docker registry on the target subscription
  • Create and Manage Kubernetes clusters on the target subscription

AIchor also needs the permissions listed bellow, you can create a custom role if you do not want to grant administrator access

  - actions:
- 'Microsoft.ContainerService/managedClusters/*'
- 'Microsoft.ManagedIdentity/userAssignedIdentities/*'
- 'Microsoft.Storage/storageAccounts/*'
- 'Microsoft.ContainerRegistry/registries/*'
- 'Microsoft.Resources/*/read'
- 'Microsoft.Authorization/roleAssignments/read'
- 'Microsoft.Authorization/roleAssignments/write'
- 'Microsoft.Authorization/roleDefinitions/read'
- 'Microsoft.Resources/subscriptions/read'
- dataActions:
- 'Microsoft.ContainerService/managedClusters/*/read'
- 'Microsoft.ContainerService/managedClusters/*/write'
- 'Microsoft.ContainerService/managedClusters/*/delete'
- 'Microsoft.ContainerService/managedClusters/*/action'