Skip to main content

Developer User Interface

The developer user will be invited to use the AIchor Platform by the administrator. After receiving an invitation via email, the developer will be able to set up the credentials (username=email and password) that will grant him access to the AIchor platform user interface. He will only have visibility of the projects the admin has granted access to and of experiments created after different commits of a given GitLab/GitHub repository.

Email invite

The user will receive an invite over email to complete the account creation on the AIchor Platform.

Password Set-up

After accepting the email invitation, the developer user will be redirected to the login page where he will need to create a password for the AIchor account then grant access to the profile and email. Note that the username will be his email address.

Pushing Experiments from GitHub/GitLab

The developer will push experiments from a VCS provider. The workflow to push an experiment from GitHub/GitLab to AIchor should include the following steps:

  • Edit the init script (main.py) and initialize it with the following snippet, when using Ray:
   if "REDIS_PASSWORD" in os.environ:
ray.init(
address=os.environ.get("RAY_SERVER", "auto"),
_redis_password=os.environ.get("REDIS_PASSWORD", ""),
)
else:
# according to the docs local_mode, if true, forces serial execution which is meant for debugging
# unfortunately, it also allows requests for resources such as GPUs to subsequently ignore them without
# any error or warning
ray.init()

  • Create Dockerfile;
  • The “Manifest.yaml” file should include the following information:
    • operator: this corresponds to the name of the operator;
      • docker
        • image: name of the Docker image;
        • Dockerfile: location of the dockerfile
        • context
      • Spec:
        • Command: script or command to be executed
        • rayVersion
        • Job:
          • cpus
          • gpus
          • ramRatio
          • GPU Type
          • Ports if any
        • Head:
          • cpus
          • gpus
          • ramRatio
          • gpuType
        • Workers pool:
          • Name: pool's name
          • Count: number of replicas
          • cpus
          • ramRatio
      • command: name of the init Python file;
      • head/job/workerPools: Specifications for the ray cluster.
  • After executing a git commit from the terminal, an event will be triggered resulting in the experiments reaching the AIchor platform.

Manifest example:


kind: AIchorManifest
apiVersion: 0.2.1

builder:
image: melqart
dockerfile: ./Dockerfile
context: .

spec:
operator: ray
image: melqart
command: "python train.py"
rayVersion: "v2.2"

tensorboard: # optional, disabled by default
enabled: true

storage: # optional
sharedVolume: # optional
mountPoint: "/mnt/shared"
sizeGB: 16
attachExistingPVCs: # optional, array - name: "my-awesome-pvc"
#mountPoint: "/mnt/my-60tib-dataset"

# Ray types are: Head, Job, Workers

# They are all required

# At least one worker must be set

types:
Head:
ports: [] # optional
resources:
cpus: 10
ramRatio: 2

# machineName: "dgx" # optional
shmSizeGB: 48 # optional

accelerators: # optional
gpu:
count: 2
product: Tesla-V100-SXM3-32GB
type: gpu

Job:
ports: [] # optional
resources:
cpus: 10
ramRatio: 2

# machineName: "node007" # optional
shmSizeGB: 0 # optional

Workers:
- name: "cpu-workers"
count: 2
ports: [] # optional
resources:
cpus: 1
ramRatio: 2
# machineName: "node007" # optional
shmSizeGB: 0 # optional

Experiments Dashboard

Upon successfully logging into the platform, the developer will be presented with a drop-down menu listing all the projects he has access to. The drop-down menu will allow him to switch between projects or to select a particular project.

After selecting a particular project from the list, a list with all the experiments will be displayed to the user detailing the following information:

  • User avatar;

  • Id of the experiment;

  • Name of the branch;

  • Commit id;

  • A visual representation made up of five widgets indicating the status and degree of progress of the experiment:

    • Green tick: Success;
    • Red cross: Error;
    • Yellow pause: In pause;
    • Blue moon: In progress.
  • Resources allocated to the experiment in terms of number of CPUs, GPUs and memory size;

  • Status of the experiment:

    • Processing;
    • Cancelled;
    • Succeeded;
    • Failed.
  • Last update;

alt text

The most recent feature added to the platform is “Object Storage”, the user will be able to add new buckets, download data from buckets etc whether it is for input dataset or experiments results. By default, an input and an output bucket is created for each experiment. When an experiment is triggered, environment variables are being injected and among these variables, there are AICHOR_OUTPUT_DATASET AICHOR_INPUT_DATASET The values of these variables contain the path where the buckets are mounted.

alt text alt text

Users can download keys to access buckets' contents from the UI.

alt text

The path to a location inside buckets can be copied and shared from the UI:

alt text

On the experiment view, users can monitor resources metrics in real time via AIchor interface.

alt text

Logs Viewer

By clicking on an experiment in the dashboard view, the developer will be able to see the logs associated with the experiment. The logs will be displayed to the user allowing him to scroll down for easy access to different parts of the document.

alt text

Note1: Logs are displayed in UTC (GMT) time.
Note2: Logs retention is 30 days on AIchor, it is the user's responsibility to save the logs for use beyond this retention period.

Pods Status

On the tab next to Logs there is a tab dedicated to Pods status. This tab is useful to check the experiments pods are actually running or not.

alt text