Quick Start (Application Developers)

As an App Developer, you can manage datasets, train jobs & inference jobs on SINGA-Auto. This guide walks through a full train-inference flow:

  1. Authenticating on SINGA-Auto
  2. Uploading datasets
  3. Creating a model training job
  4. Creating a model serving job after the model training job completes

This guide assumes that you have access to a running instance of SINGA-Auto Admin at <singa_auto_host>:<admin_port> and SINGA-Auto Web Admin at <singa_auto_host>:<web_admin_port>, and there have been models added to SINGA-Auto under the task of IMAGE_CLASSIFICATION.

To learn more about what else you can do on SINGA-Auto, explore the methods of singa_auto.client.Client.

Installing the client

  1. Install Python 3.6 such that the python and pip point to the correct installation of Python (see Installing Python)

  2. Clone the project at https://github.com/nusdbsystem/singa-auto (e.g. with Git)

  3. Within the project’s root folder, install SINGA-Auto’s client-side Python dependencies by running:

    pip install -r ./singa_auto/requirements.txt
    

Initializing the client

Example:

from singa_auto.client import Client
client = Client(admin_host='localhost', admin_port=3000)
client.login(email='superadmin@singaauto', password='singa_auto')

See also

singa_auto.client.Client.login()

Listing available models by task

Example:

client.get_available_models(task='IMAGE_CLASSIFICATION')
# While leave the "task" unspecified, the method will retrieve information of all uploaded models
client.get_available_models()

Output:

[{'access_right': 'PRIVATE',
 'datetime_created': 'Mon, 17 Dec 2018 07:06:03 GMT',
 'dependencies': {'tensorflow': '1.12.0'},
 'id': '45df3f34-53d7-4fb8-a7c2-55391ea10030',
 'name': 'TfFeedForward',
 'task': 'IMAGE_CLASSIFICATION',
 'user_id': 'fb5671f1-c673-40e7-b53a-9208eb1ccc50'},
 {'access_right': 'PRIVATE',
 'datetime_created': 'Mon, 17 Dec 2018 07:06:03 GMT',
 'dependencies': {'scikit-learn': '0.20.0'},
 'id': 'd0ea96ce-478b-4167-8a84-eb36ae631235',
 'name': 'SkDt',
 'task': 'IMAGE_CLASSIFICATION',
 'user_id': 'fb5671f1-c673-40e7-b53a-9208eb1ccc50'}]

See also

singa_auto.client.Client.get_available_models()

Creating datasets

You’ll first need to convert your dataset into a format specified by one of the tasks (see tasks), and split them into two files: one for training & one for validation. After doing so, you’ll create 2 corresponding datasets on SINGA-Auto by uploading them from your filesystem.

Example (pre-processing step):

# Run this in shell
python examples/datasets/image_files/load_fashion_mnist.py

Example:

client.create_dataset(
    name='fashion_mnist_train',
    task='IMAGE_CLASSIFICATION',
    dataset_path='data/fashion_mnist_train.zip'
)

client.create_dataset(
    name='fashion_mnist_val',
    task='IMAGE_CLASSIFICATION',
    dataset_path='data/fashion_mnist_val.zip'
)

Output:

{'id': 'ecf87d2f-6893-4e4b-8ed9-1d9454af9763',
'name': 'fashion_mnist_train',
'size_bytes': 36702897,
'task': 'IMAGE_CLASSIFICATION'}

{'id': '7e9a2f8a-c61d-4365-ae4a-601e90892b88',
'name': 'fashion_mnist_val',
'size_bytes': 6116386,
'task': 'IMAGE_CLASSIFICATION'}

See also

singa_auto.client.Client.create_dataset()

Note

The code that preprocesses the original Fashion MNIST dataset is available at ./examples/datasets/image_files/load_mnist_format.py.

Creating a train job

To create a model training job, you’ll specify the train & validation datasets by their IDs, together with your application’s name and its associated task.

After creating a train job, you can monitor it on SINGA-Auto Web Admin (see Using SINGA-Auto’s Web Admin).

Refer to the parameters of singa_auto.client.Client.create_train_job() for configuring how your train job runs on SINGA-Auto, such as enabling GPU usage & specifying which models to use.

Example:

client.create_train_job(
    app='fashion_mnist_app',
    task='IMAGE_CLASSIFICATION',
    train_dataset_id='ecf87d2f-6893-4e4b-8ed9-1d9454af9763',
    val_dataset_id='7e9a2f8a-c61d-4365-ae4a-601e90892b88',
    budget={ 'MODEL_TRIAL_COUNT': 5 }
    model_ids='["652db9f7-d23d-4b79-945b-a56446ceff33"]'
)
# Omitting the GPU_COUNT is the same as letting GPU_COUNT equal to 0, which means training will be hosted on CPU only
# MODEL_TRIAL_COUNT stands for number of trials, minimus MODEL_TRIAL_COUNT is 1 for a valid training
# TIME_HOURS is assigned training time limit in hours.
# train_args={} could be left empty or unspecified, if not in use
client.create_train_job(
    app='fashion_mnist_app',
    task='IMAGE_CLASSIFICATION',
    train_dataset_id='ecf87d2f-6893-4e4b-8ed9-1d9454af9763',
    val_dataset_id='7e9a2f8a-c61d-4365-ae4a-601e90892b88',
    budget={'TIME_HOURS': 0.01,
            'GPU_COUNT': 0,
            'MODEL_TRIAL_COUNT': 1}
    model_ids='["652db9f7-d23d-4b79-945b-a56446ceff33"]',
    train_args={}
)

Output:

{'app': 'fashion_mnist_app',
'app_version': 1,
'id': 'ec4db479-b9b2-4289-8086-52794ffc71c8'}
Using distributed training:
refer to https://pytorch.org/docs/stable/distributed.html

Example:

Output:

{'app': 'DistMinist',
'app_version': 1,
'id': 'ec4db479-b9b2-4289-8086-52794ffc71c8'}

See also

singa_auto.client.Client.create_train_job()

Listing train jobs

Example:

client.get_train_jobs_of_app(app='fashion_mnist_app')

Output:

[{'app': 'fashion_mnist_app',
'app_version': 1,
'budget': {'MODEL_TRIAL_COUNT': 5},
'datetime_started': 'Mon, 17 Dec 2018 07:08:05 GMT',
'datetime_stopped': None,
'id': 'ec4db479-b9b2-4289-8086-52794ffc71c8',
'status': 'RUNNING',
'task': 'IMAGE_CLASSIFICATION',
'val_dataset_id': '7e9a2f8a-c61d-4365-ae4a-601e90892b88',
'train_dataset_id': 'ecf87d2f-6893-4e4b-8ed9-1d9454af9763'}]

See also

singa_auto.client.Client.get_train_jobs_of_app()

Retrieving the latest train job’s details

Example:

client.get_train_job(app='fashion_mnist_app')

Output:

{'app': 'fashion_mnist_app',
'app_version': 1,
'datetime_started': 'Mon, 17 Dec 2018 07:08:05 GMT',
'datetime_stopped': 'Mon, 17 Dec 2018 07:11:11 GMT',
'id': 'ec4db479-b9b2-4289-8086-52794ffc71c8',
'status': 'STOPPED',
'task': 'IMAGE_CLASSIFICATION'
'val_dataset_id': '7e9a2f8a-c61d-4365-ae4a-601e90892b88',
'train_dataset_id': 'ecf87d2f-6893-4e4b-8ed9-1d9454af9763',
'workers': [{'datetime_started': 'Mon, 17 Dec 2018 07:08:05 GMT',
            'datetime_stopped': 'Mon, 17 Dec 2018 07:11:14 GMT',
            'model_name': 'SkDt',
            'replicas': 2,
            'service_id': '2ada1ff3-84e9-4eca-bac9-241cd8c765ef',
            'status': 'STOPPED'},
            {'datetime_started': 'Mon, 17 Dec 2018 07:08:05 GMT',
            'datetime_stopped': 'Mon, 17 Dec 2018 07:11:42 GMT',
            'model_name': 'TfFeedForward',
            'replicas': 2,
            'service_id': '81ff23a7-ddd0-4a62-9d86-a3cc985ca6fe',
            'status': 'STOPPED'}]}

See also

singa_auto.client.Client.get_train_job()

Listing best trials of the latest train job

Example:

client.get_best_trials_of_train_job(app='fashion_mnist_app')

Output:

[{'datetime_started': 'Mon, 17 Dec 2018 07:09:17 GMT',
'datetime_stopped': 'Mon, 17 Dec 2018 07:11:38 GMT',
'id': '1b7dc65a-87ae-4d42-9a01-67602115a4a4',
'knobs': {'batch_size': 32,
            'epochs': 3,
            'hidden_layer_count': 2,
            'hidden_layer_units': 36,
            'image_size': 32,
            'learning_rate': 0.014650971133579896},
'model_name': 'TfFeedForward',
'score': 0.8269},
{'datetime_started': 'Mon, 17 Dec 2018 07:08:38 GMT',
'datetime_stopped': 'Mon, 17 Dec 2018 07:11:11 GMT',
'id': '0c1f9184-7b46-4aaf-a581-be62bf3f49bf',
'knobs': {'criterion': 'entropy', 'max_depth': 4},
'model_name': 'SkDt',
'score': 0.6686}]

See also

singa_auto.client.Client.get_best_trials_of_train_job()

Creating an inference job with the latest train job

Your app’s users will make queries to the /predict endpoint of predictor_host over HTTP.

To create an model serving job, you’ll have to wait for your train job to stop. Then, you’ll submit the app name associated with the train job (with a status of STOPPED). The inference job would be created from the best trials from that train job.

Example:

client.create_inference_job(app='fashion_mnist_app')
# Or with more details specified, such as Number of GPU 'GPU_COUNT'
client.create_inference_job(app='fashion_mnist_app', app_version=1, budget={'GPU_COUNT': 1} )

Output:

{'app': 'fashion_mnist_app',
'app_version': 1,
'id': '0477d03c-d312-48c5-8612-f9b37b368949',
'predictor_host': '127.0.0.1:30001',
'train_job_id': 'ec4db479-b9b2-4289-8086-52794ffc71c8'}

See also

singa_auto.client.Client.create_inference_job()

Listing inference jobs

Example:

client.get_inference_jobs_of_app(app='fashion_mnist_app')

Output:

{'app': 'fashion_mnist_app',
  'app_version': 1,
  'datetime_started': 'Mon, 17 Dec 2018 07:15:12 GMT',
  'datetime_stopped': None,
  'id': '0477d03c-d312-48c5-8612-f9b37b368949',
  'predictor_host': '127.0.0.1:30000',
  'status': 'RUNNING',
  'train_job_id': 'ec4db479-b9b2-4289-8086-52794ffc71c8'}

See also

singa_auto.client.Client.get_inference_jobs_of_app()

Retrieving details of running inference job

See also

singa_auto.client.Client.get_running_inference_job()

Example:

client.get_running_inference_job(app='fashion_mnist_app')

Output:

{'app': 'fashion_mnist_app',
'app_version': 1,
'datetime_started': 'Mon, 17 Dec 2018 07:25:36 GMT',
'datetime_stopped': None,
'id': '09e5040e-2134-411b-855f-793927c80b4b',
'predictor_host': '127.0.0.1:30000',
'status': 'RUNNING',
'train_job_id': 'ec4db479-b9b2-4289-8086-52794ffc71c8',
'workers': [{'datetime_started': 'Mon, 17 Dec 2018 07:25:36 GMT',
            'datetime_stopped': None,
            'replicas': 2,
            'service_id': '661035bb-3966-46e8-828c-e200960a76c0',
            'status': 'RUNNING',
            'trial': {'id': '1b7dc65a-87ae-4d42-9a01-67602115a4a4',
                        'knobs': {'batch_size': 32,
                                'epochs': 3,
                                'hidden_layer_count': 2,
                                'hidden_layer_units': 36,
                                'image_size': 32,
                                'learning_rate': 0.014650971133579896},
                        'model_name': 'TfFeedForward',
                        'score': 0.8269}},
            {'datetime_started': 'Mon, 17 Dec 2018 07:25:36 GMT',
            'datetime_stopped': None,
            'replicas': 2,
            'service_id': '6a769007-b18f-4271-b3db-8b60ed5fb545',
            'status': 'RUNNING',
            'trial': {'id': '0c1f9184-7b46-4aaf-a581-be62bf3f49bf',
                        'knobs': {'criterion': 'entropy', 'max_depth': 4},
                        'model_name': 'SkDt',
                        'score': 0.6686}}]}

Stopping a running inference job

Example:

client.stop_inference_job(app='fashion_mnist_app')

See also

singa_auto.client.Client.stop_inference_job()

Downloading the trained model for a trial

After running a train job, you might want to download the trained model instance of a trial of the train job, instead of creating an inference job to make predictions. Subsequently, you’ll be able to make batch predictions locally with the trained model instance.

To do this, you must have the trial’s model class file already in your local filesystem, the dependencies of the model must have been installed separately, and the model class must have been imported and passed into this method.

To download the model class file, use the method singa_auto.client.Client.download_model_file().

Example:

In shell,

# Install the dependencies of the `TfFeedForward` model
pip install tensorflow==1.12.0

In Python,

# Find the best trial for model `TfFeedForward`
trials = [x for x in client.get_best_trials_of_train_job(app='fashion_mnist_app')
    if x.get('model_name') == 'TfFeedForward' and x.get('status') == 'COMPLETED']
trial = trials[0]
trial_id = trial.get('id')

# Import the model class
from examples.models.image_classification.TfFeedForward import TfFeedForward

# Load an instance of the model with trial's parameters
model_inst = client.load_trial_model(trial_id, TfFeedForward)

# Make predictions with trained model instance associated with best trial
queries = [[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 0, 0, 7, 0, 37, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 27, 84, 11, 0, 0, 0, 0, 0, 0, 119, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 88, 143, 110, 0, 0, 0, 0, 22, 93, 106, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 53, 129, 120, 147, 175, 157, 166, 135, 154, 168, 140, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 11, 137, 130, 128, 160, 176, 159, 167, 178, 149, 151, 144, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 2, 1, 0, 3, 0, 0, 115, 114, 106, 137, 168, 153, 156, 165, 167, 143, 157, 158, 11, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 3, 0, 0, 89, 139, 90, 94, 153, 149, 131, 151, 169, 172, 143, 159, 169, 48, 0],
[0, 0, 0, 0, 0, 0, 2, 4, 1, 0, 0, 0, 98, 136, 110, 109, 110, 162, 135, 144, 149, 159, 167, 144, 158, 169, 119, 0],
[0, 0, 2, 2, 1, 2, 0, 0, 0, 0, 26, 108, 117, 99, 111, 117, 136, 156, 134, 154, 154, 156, 160, 141, 147, 156, 178, 0],
[3, 0, 0, 0, 0, 0, 0, 21, 53, 92, 117, 111, 103, 115, 129, 134, 143, 154, 165, 170, 154, 151, 154, 143, 138, 150, 165, 43],
[0, 0, 23, 54, 65, 76, 85, 118, 128, 123, 111, 113, 118, 127, 125, 139, 133, 136, 160, 140, 155, 161, 144, 155, 172, 161, 189, 62],
[0, 68, 94, 90, 111, 114, 111, 114, 115, 127, 135, 136, 143, 126, 127, 151, 154, 143, 148, 125, 162, 162, 144, 138, 153, 162, 196, 58],
[70, 169, 129, 104, 98, 100, 94, 97, 98, 102, 108, 106, 119, 120, 129, 149, 156, 167, 190, 190, 196, 198, 198, 187, 197, 189, 184, 36],
[16, 126, 171, 188, 188, 184, 171, 153, 135, 120, 126, 127, 146, 185, 195, 209, 208, 255, 209, 177, 245, 252, 251, 251, 247, 220, 206, 49],
[0, 0, 0, 12, 67, 106, 164, 185, 199, 210, 211, 210, 208, 190, 150, 82, 8, 0, 0, 0, 178, 208, 188, 175, 162, 158, 151, 11],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]]
print(model_inst.predict(queries))

See also

singa_auto.client.Client.load_trial_model()