Quick Start (Application Developers)¶
As an App Developer, you can manage datasets, train jobs & inference jobs on SINGA-Auto. This guide walks through a full train-inference flow:
- Authenticating on SINGA-Auto
- Uploading datasets
- Creating a model training job
- Creating a model serving job after the model training job completes
This guide assumes that you have access to a running instance of SINGA-Auto Admin at <singa_auto_host>:<admin_port>
and SINGA-Auto Web Admin at <singa_auto_host>:<web_admin_port>
, and there have been models added to SINGA-Auto under the task of IMAGE_CLASSIFICATION.
To learn more about what else you can do on SINGA-Auto, explore the methods of singa_auto.client.Client
.
Installing the client¶
Install Python 3.6 such that the
python
andpip
point to the correct installation of Python (see Installing Python)Clone the project at https://github.com/nusdbsystem/singa-auto (e.g. with Git)
Within the project’s root folder, install SINGA-Auto’s client-side Python dependencies by running:
pip install -r ./singa_auto/requirements.txt
Initializing the client¶
Example:
from singa_auto.client import Client client = Client(admin_host='localhost', admin_port=3000) client.login(email='superadmin@singaauto', password='singa_auto')
See also
singa_auto.client.Client.login()
Listing available models by task¶
Example:
client.get_available_models(task='IMAGE_CLASSIFICATION') # While leave the "task" unspecified, the method will retrieve information of all uploaded models client.get_available_models()Output:
[{'access_right': 'PRIVATE', 'datetime_created': 'Mon, 17 Dec 2018 07:06:03 GMT', 'dependencies': {'tensorflow': '1.12.0'}, 'id': '45df3f34-53d7-4fb8-a7c2-55391ea10030', 'name': 'TfFeedForward', 'task': 'IMAGE_CLASSIFICATION', 'user_id': 'fb5671f1-c673-40e7-b53a-9208eb1ccc50'}, {'access_right': 'PRIVATE', 'datetime_created': 'Mon, 17 Dec 2018 07:06:03 GMT', 'dependencies': {'scikit-learn': '0.20.0'}, 'id': 'd0ea96ce-478b-4167-8a84-eb36ae631235', 'name': 'SkDt', 'task': 'IMAGE_CLASSIFICATION', 'user_id': 'fb5671f1-c673-40e7-b53a-9208eb1ccc50'}]
See also
singa_auto.client.Client.get_available_models()
Creating datasets¶
You’ll first need to convert your dataset into a format specified by one of the tasks (see tasks), and split them into two files: one for training & one for validation. After doing so, you’ll create 2 corresponding datasets on SINGA-Auto by uploading them from your filesystem.
Example (pre-processing step):
# Run this in shell python examples/datasets/image_files/load_fashion_mnist.py
Example:
client.create_dataset( name='fashion_mnist_train', task='IMAGE_CLASSIFICATION', dataset_path='data/fashion_mnist_train.zip' ) client.create_dataset( name='fashion_mnist_val', task='IMAGE_CLASSIFICATION', dataset_path='data/fashion_mnist_val.zip' )Output:
{'id': 'ecf87d2f-6893-4e4b-8ed9-1d9454af9763', 'name': 'fashion_mnist_train', 'size_bytes': 36702897, 'task': 'IMAGE_CLASSIFICATION'} {'id': '7e9a2f8a-c61d-4365-ae4a-601e90892b88', 'name': 'fashion_mnist_val', 'size_bytes': 6116386, 'task': 'IMAGE_CLASSIFICATION'}
See also
singa_auto.client.Client.create_dataset()
Note
The code that preprocesses the original Fashion MNIST dataset is available at ./examples/datasets/image_files/load_mnist_format.py.
Creating a train job¶
To create a model training job, you’ll specify the train & validation datasets by their IDs, together with your application’s name and its associated task.
After creating a train job, you can monitor it on SINGA-Auto Web Admin (see Using SINGA-Auto’s Web Admin).
Refer to the parameters of singa_auto.client.Client.create_train_job()
for configuring how your train job runs on SINGA-Auto, such as enabling GPU usage & specifying which models to use.
Example:
client.create_train_job( app='fashion_mnist_app', task='IMAGE_CLASSIFICATION', train_dataset_id='ecf87d2f-6893-4e4b-8ed9-1d9454af9763', val_dataset_id='7e9a2f8a-c61d-4365-ae4a-601e90892b88', budget={ 'MODEL_TRIAL_COUNT': 5 } model_ids='["652db9f7-d23d-4b79-945b-a56446ceff33"]' ) # Omitting the GPU_COUNT is the same as letting GPU_COUNT equal to 0, which means training will be hosted on CPU only # MODEL_TRIAL_COUNT stands for number of trials, minimus MODEL_TRIAL_COUNT is 1 for a valid training # TIME_HOURS is assigned training time limit in hours. # train_args={} could be left empty or unspecified, if not in use client.create_train_job( app='fashion_mnist_app', task='IMAGE_CLASSIFICATION', train_dataset_id='ecf87d2f-6893-4e4b-8ed9-1d9454af9763', val_dataset_id='7e9a2f8a-c61d-4365-ae4a-601e90892b88', budget={'TIME_HOURS': 0.01, 'GPU_COUNT': 0, 'MODEL_TRIAL_COUNT': 1} model_ids='["652db9f7-d23d-4b79-945b-a56446ceff33"]', train_args={} )Output:
{'app': 'fashion_mnist_app', 'app_version': 1, 'id': 'ec4db479-b9b2-4289-8086-52794ffc71c8'}
- Using distributed training:
- refer to https://pytorch.org/docs/stable/distributed.html
Example:
Output:
{'app': 'DistMinist', 'app_version': 1, 'id': 'ec4db479-b9b2-4289-8086-52794ffc71c8'}
See also
singa_auto.client.Client.create_train_job()
Listing train jobs¶
Example:
client.get_train_jobs_of_app(app='fashion_mnist_app')Output:
[{'app': 'fashion_mnist_app', 'app_version': 1, 'budget': {'MODEL_TRIAL_COUNT': 5}, 'datetime_started': 'Mon, 17 Dec 2018 07:08:05 GMT', 'datetime_stopped': None, 'id': 'ec4db479-b9b2-4289-8086-52794ffc71c8', 'status': 'RUNNING', 'task': 'IMAGE_CLASSIFICATION', 'val_dataset_id': '7e9a2f8a-c61d-4365-ae4a-601e90892b88', 'train_dataset_id': 'ecf87d2f-6893-4e4b-8ed9-1d9454af9763'}]
See also
singa_auto.client.Client.get_train_jobs_of_app()
Retrieving the latest train job’s details¶
Example:
client.get_train_job(app='fashion_mnist_app')Output:
{'app': 'fashion_mnist_app', 'app_version': 1, 'datetime_started': 'Mon, 17 Dec 2018 07:08:05 GMT', 'datetime_stopped': 'Mon, 17 Dec 2018 07:11:11 GMT', 'id': 'ec4db479-b9b2-4289-8086-52794ffc71c8', 'status': 'STOPPED', 'task': 'IMAGE_CLASSIFICATION' 'val_dataset_id': '7e9a2f8a-c61d-4365-ae4a-601e90892b88', 'train_dataset_id': 'ecf87d2f-6893-4e4b-8ed9-1d9454af9763', 'workers': [{'datetime_started': 'Mon, 17 Dec 2018 07:08:05 GMT', 'datetime_stopped': 'Mon, 17 Dec 2018 07:11:14 GMT', 'model_name': 'SkDt', 'replicas': 2, 'service_id': '2ada1ff3-84e9-4eca-bac9-241cd8c765ef', 'status': 'STOPPED'}, {'datetime_started': 'Mon, 17 Dec 2018 07:08:05 GMT', 'datetime_stopped': 'Mon, 17 Dec 2018 07:11:42 GMT', 'model_name': 'TfFeedForward', 'replicas': 2, 'service_id': '81ff23a7-ddd0-4a62-9d86-a3cc985ca6fe', 'status': 'STOPPED'}]}
See also
singa_auto.client.Client.get_train_job()
Listing best trials of the latest train job¶
Example:
client.get_best_trials_of_train_job(app='fashion_mnist_app')Output:
[{'datetime_started': 'Mon, 17 Dec 2018 07:09:17 GMT', 'datetime_stopped': 'Mon, 17 Dec 2018 07:11:38 GMT', 'id': '1b7dc65a-87ae-4d42-9a01-67602115a4a4', 'knobs': {'batch_size': 32, 'epochs': 3, 'hidden_layer_count': 2, 'hidden_layer_units': 36, 'image_size': 32, 'learning_rate': 0.014650971133579896}, 'model_name': 'TfFeedForward', 'score': 0.8269}, {'datetime_started': 'Mon, 17 Dec 2018 07:08:38 GMT', 'datetime_stopped': 'Mon, 17 Dec 2018 07:11:11 GMT', 'id': '0c1f9184-7b46-4aaf-a581-be62bf3f49bf', 'knobs': {'criterion': 'entropy', 'max_depth': 4}, 'model_name': 'SkDt', 'score': 0.6686}]
See also
singa_auto.client.Client.get_best_trials_of_train_job()
Creating an inference job with the latest train job¶
Your app’s users will make queries to the /predict endpoint of predictor_host over HTTP.
See also
To create an model serving job, you’ll have to wait for your train job to stop.
Then, you’ll submit the app name associated with the train job (with a status of STOPPED
).
The inference job would be created from the best trials from that train job.
Example:
client.create_inference_job(app='fashion_mnist_app') # Or with more details specified, such as Number of GPU 'GPU_COUNT' client.create_inference_job(app='fashion_mnist_app', app_version=1, budget={'GPU_COUNT': 1} )Output:
{'app': 'fashion_mnist_app', 'app_version': 1, 'id': '0477d03c-d312-48c5-8612-f9b37b368949', 'predictor_host': '127.0.0.1:30001', 'train_job_id': 'ec4db479-b9b2-4289-8086-52794ffc71c8'}
See also
singa_auto.client.Client.create_inference_job()
Listing inference jobs¶
Example:
client.get_inference_jobs_of_app(app='fashion_mnist_app')Output:
{'app': 'fashion_mnist_app', 'app_version': 1, 'datetime_started': 'Mon, 17 Dec 2018 07:15:12 GMT', 'datetime_stopped': None, 'id': '0477d03c-d312-48c5-8612-f9b37b368949', 'predictor_host': '127.0.0.1:30000', 'status': 'RUNNING', 'train_job_id': 'ec4db479-b9b2-4289-8086-52794ffc71c8'}
See also
singa_auto.client.Client.get_inference_jobs_of_app()
Retrieving details of running inference job¶
See also
singa_auto.client.Client.get_running_inference_job()
Example:
client.get_running_inference_job(app='fashion_mnist_app')Output:
{'app': 'fashion_mnist_app', 'app_version': 1, 'datetime_started': 'Mon, 17 Dec 2018 07:25:36 GMT', 'datetime_stopped': None, 'id': '09e5040e-2134-411b-855f-793927c80b4b', 'predictor_host': '127.0.0.1:30000', 'status': 'RUNNING', 'train_job_id': 'ec4db479-b9b2-4289-8086-52794ffc71c8', 'workers': [{'datetime_started': 'Mon, 17 Dec 2018 07:25:36 GMT', 'datetime_stopped': None, 'replicas': 2, 'service_id': '661035bb-3966-46e8-828c-e200960a76c0', 'status': 'RUNNING', 'trial': {'id': '1b7dc65a-87ae-4d42-9a01-67602115a4a4', 'knobs': {'batch_size': 32, 'epochs': 3, 'hidden_layer_count': 2, 'hidden_layer_units': 36, 'image_size': 32, 'learning_rate': 0.014650971133579896}, 'model_name': 'TfFeedForward', 'score': 0.8269}}, {'datetime_started': 'Mon, 17 Dec 2018 07:25:36 GMT', 'datetime_stopped': None, 'replicas': 2, 'service_id': '6a769007-b18f-4271-b3db-8b60ed5fb545', 'status': 'RUNNING', 'trial': {'id': '0c1f9184-7b46-4aaf-a581-be62bf3f49bf', 'knobs': {'criterion': 'entropy', 'max_depth': 4}, 'model_name': 'SkDt', 'score': 0.6686}}]}
Stopping a running inference job¶
Example:
client.stop_inference_job(app='fashion_mnist_app')
See also
singa_auto.client.Client.stop_inference_job()
Downloading the trained model for a trial¶
After running a train job, you might want to download the trained model instance of a trial of the train job, instead of creating an inference job to make predictions. Subsequently, you’ll be able to make batch predictions locally with the trained model instance.
To do this, you must have the trial’s model class file already in your local filesystem, the dependencies of the model must have been installed separately, and the model class must have been imported and passed into this method.
To download the model class file, use the method singa_auto.client.Client.download_model_file()
.
Example:
In shell,
# Install the dependencies of the `TfFeedForward` model pip install tensorflow==1.12.0In Python,
# Find the best trial for model `TfFeedForward` trials = [x for x in client.get_best_trials_of_train_job(app='fashion_mnist_app') if x.get('model_name') == 'TfFeedForward' and x.get('status') == 'COMPLETED'] trial = trials[0] trial_id = trial.get('id') # Import the model class from examples.models.image_classification.TfFeedForward import TfFeedForward # Load an instance of the model with trial's parameters model_inst = client.load_trial_model(trial_id, TfFeedForward) # Make predictions with trained model instance associated with best trial queries = [[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 0, 0, 7, 0, 37, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 27, 84, 11, 0, 0, 0, 0, 0, 0, 119, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 88, 143, 110, 0, 0, 0, 0, 22, 93, 106, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 53, 129, 120, 147, 175, 157, 166, 135, 154, 168, 140, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 11, 137, 130, 128, 160, 176, 159, 167, 178, 149, 151, 144, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0, 2, 1, 0, 3, 0, 0, 115, 114, 106, 137, 168, 153, 156, 165, 167, 143, 157, 158, 11, 0], [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 3, 0, 0, 89, 139, 90, 94, 153, 149, 131, 151, 169, 172, 143, 159, 169, 48, 0], [0, 0, 0, 0, 0, 0, 2, 4, 1, 0, 0, 0, 98, 136, 110, 109, 110, 162, 135, 144, 149, 159, 167, 144, 158, 169, 119, 0], [0, 0, 2, 2, 1, 2, 0, 0, 0, 0, 26, 108, 117, 99, 111, 117, 136, 156, 134, 154, 154, 156, 160, 141, 147, 156, 178, 0], [3, 0, 0, 0, 0, 0, 0, 21, 53, 92, 117, 111, 103, 115, 129, 134, 143, 154, 165, 170, 154, 151, 154, 143, 138, 150, 165, 43], [0, 0, 23, 54, 65, 76, 85, 118, 128, 123, 111, 113, 118, 127, 125, 139, 133, 136, 160, 140, 155, 161, 144, 155, 172, 161, 189, 62], [0, 68, 94, 90, 111, 114, 111, 114, 115, 127, 135, 136, 143, 126, 127, 151, 154, 143, 148, 125, 162, 162, 144, 138, 153, 162, 196, 58], [70, 169, 129, 104, 98, 100, 94, 97, 98, 102, 108, 106, 119, 120, 129, 149, 156, 167, 190, 190, 196, 198, 198, 187, 197, 189, 184, 36], [16, 126, 171, 188, 188, 184, 171, 153, 135, 120, 126, 127, 146, 185, 195, 209, 208, 255, 209, 177, 245, 252, 251, 251, 247, 220, 206, 49], [0, 0, 0, 12, 67, 106, 164, 185, 199, 210, 211, 210, 208, 190, 150, 82, 8, 0, 0, 0, 178, 208, 188, 175, 162, 158, 151, 11], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]] print(model_inst.predict(queries))
See also
singa_auto.client.Client.load_trial_model()