Welcome to SINGA-Auto’s Documentation!¶
What is SINGA-Auto?¶
SINGA-Auto is a distributed system that trains machine learning (ML) models and deploys trained models, built with ease-of-use in mind. To do so, it leverages on automated machine learning (AutoML).
For Application Developers and Application Users, without any ML expertise, they can:
- Create a model training job for supported tasks, with their own datasets
- Deploy an ensemble of trained models for inference
- Integrate model predictions in their apps over HTTP
For Model Developers, they can:
- Contribute to SINGA-Auto’s pool of model templates
Check out Quick Setup to deploy/develop SINGA-Auto on your machine, and/or Quick Start to use a deployed instance of SINGA-Auto.
Index¶
User Guide¶
Quick Start¶
Table of Contents
- Quick Start
- Installing the client
- Initializing the client
- Creating models
- Listing available models by task
- Creating datasets
- Creating a train job
- Listing train jobs
- Creating an inference job with the latest train job
- Listing inference jobs
- Making predictions
- Prediction for QuestionAnswering
- Prediction for SpeechRecognition
- Stopping a running inference job
This guide assumes you have deployed your own empty instance of SINGA-Auto and you want to try a full train-inference flow as the Super Admin:
- Authenticating on SINGA-Auto
- Submitting models
- Uploading datasets
- Creating a model training job
- Creating a model serving job after the model training job completes
- Making predictions
Follow the sequence of examples below to submit the Fashion MNIST dataset for training and inference. Alternatively, refer and run the scripted version of this quickstart ./examples/scripts/quickstart.py.
To learn more about what else you can do on SINGA-Auto, explore the methods of singa_auto.client.Client
.
Note
If you haven’t set up SINGA-Auto on your local machine, refer to Quick Setup before continuing.
Note
- For Model Developers just looking to contribute models, refer to Quick Start (Model Developers)
- For Application Developers just looking to train and deploy models, refer to Quick Start (Application Developers)
- For Application Users just looking to make predictions, refer to Quick Start (Application Users)
Installing the client¶
Install Python 3.6 such that the
python
andpip
point to the correct installation of Python (see Installing Python)Clone the project at https://github.com/nusdbsystem/singa-auto (e.g. with Git)
Within the project’s root folder, install SINGA-Auto’s client-side Python dependencies by running:
pip install -r ./singa_auto/requirements.txt
Initializing the client¶
Example:
from singa_auto.client import Client client = Client(admin_host='localhost', admin_port=3000) # 'localhost' can be replaced by '127.0.0.1' or other server address client.login(email='superadmin@singaauto', password='singa_auto')
See also
singa_auto.client.Client.login()
Creating models¶
To create a model, you’ll need to submit a model class that conforms to the specification
by singa_auto.model.BaseModel
, written in a single Python file.
The model’s implementation should conform to a specific task (see tasks).
Refer to the parameters of singa_auto.client.Client.create_model()
for configuring how your model runs on SINGA-Auto,
and refer to Model Development Guide to understand more about how to write & test models for SINGA-Auto.
Example:
client.create_model( name='TfFeedForward', task='IMAGE_CLASSIFICATION', model_file_path='examples/models/image_classification/TfFeedForward.py', model_class='TfFeedForward', dependencies={ 'tensorflow': '1.12.0' } ) client.create_model( name='SkDt', task='IMAGE_CLASSIFICATION', model_file_path='examples/models/image_classification/SkDt.py', model_class='SkDt', dependencies={ 'scikit-learn': '0.20.0' } )
See also
singa_auto.client.Client.create_model()
Listing available models by task¶
Example:
client.get_available_models(task='IMAGE_CLASSIFICATION') # While leave the "task" unspecified, the method will retrieve information of all uploaded models client.get_available_models()Output:
[{'access_right': 'PRIVATE', 'datetime_created': 'Mon, 17 Dec 2018 07:06:03 GMT', 'dependencies': {'tensorflow': '1.12.0'}, 'id': '45df3f34-53d7-4fb8-a7c2-55391ea10030', 'name': 'TfFeedForward', 'task': 'IMAGE_CLASSIFICATION', 'user_id': 'fb5671f1-c673-40e7-b53a-9208eb1ccc50'}, {'access_right': 'PRIVATE', 'datetime_created': 'Mon, 17 Dec 2018 07:06:03 GMT', 'dependencies': {'scikit-learn': '0.20.0'}, 'id': 'd0ea96ce-478b-4167-8a84-eb36ae631235', 'name': 'SkDt', 'task': 'IMAGE_CLASSIFICATION', 'user_id': 'fb5671f1-c673-40e7-b53a-9208eb1ccc50'}]
See also
singa_auto.client.Client.get_available_models()
Creating datasets¶
You’ll first need to convert your dataset into a format specified by one of the tasks (see tasks), and split them into two files: one for training & one for validation. After doing so, you’ll create 2 corresponding datasets on SINGA-Auto by uploading them from your filesystem.
Example (pre-processing step):
# Run this in shell python examples/datasets/image_files/load_fashion_mnist.py
Example:
client.create_dataset( name='fashion_mnist_train', task='IMAGE_CLASSIFICATION', dataset_path='data/fashion_mnist_train.zip' ) client.create_dataset( name='fashion_mnist_val', task='IMAGE_CLASSIFICATION', dataset_path='data/fashion_mnist_val.zip' )Output:
{'id': 'ecf87d2f-6893-4e4b-8ed9-1d9454af9763', 'name': 'fashion_mnist_train', 'size_bytes': 36702897, 'task': 'IMAGE_CLASSIFICATION'} {'id': '7e9a2f8a-c61d-4365-ae4a-601e90892b88', 'name': 'fashion_mnist_val', 'size_bytes': 6116386, 'task': 'IMAGE_CLASSIFICATION'}
See also
singa_auto.client.Client.create_dataset()
Note
The code that preprocesses the original Fashion MNIST dataset is available at ./examples/datasets/image_files/load_mnist_format.py.
Creating a train job¶
To create a model training job, you’ll specify the train & validation datasets by their IDs, together with your application’s name and its associated task.
After creating a train job, you can monitor it on SINGA-Auto Web Admin (see Using SINGA-Auto’s Web Admin).
Refer to the parameters of singa_auto.client.Client.create_train_job()
for configuring how your train job runs on SINGA-Auto, such as enabling GPU usage & specifying which models to use.
Example:
client.create_train_job( app='fashion_mnist_app', task='IMAGE_CLASSIFICATION', train_dataset_id='ecf87d2f-6893-4e4b-8ed9-1d9454af9763', val_dataset_id='7e9a2f8a-c61d-4365-ae4a-601e90892b88', budget={ 'MODEL_TRIAL_COUNT': 5 } model_ids='["652db9f7-d23d-4b79-945b-a56446ceff33"]' ) # Omitting the GPU_COUNT is the same as letting GPU_COUNT equal to 0, which means training will be hosted on CPU only # MODEL_TRIAL_COUNT stands for number of trials, minimus MODEL_TRIAL_COUNT is 1 for a valid training # TIME_HOURS is assigned training time limit in hours. # train_args={} could be left empty or unspecified, if not in use client.create_train_job( app='fashion_mnist_app', task='IMAGE_CLASSIFICATION', train_dataset_id='ecf87d2f-6893-4e4b-8ed9-1d9454af9763', val_dataset_id='7e9a2f8a-c61d-4365-ae4a-601e90892b88', budget={'TIME_HOURS': 0.01, 'GPU_COUNT': 0, 'MODEL_TRIAL_COUNT': 1} model_ids='["652db9f7-d23d-4b79-945b-a56446ceff33"]', train_args={} )Output:
{'app': 'fashion_mnist_app', 'app_version': 1, 'id': 'ec4db479-b9b2-4289-8086-52794ffc71c8'}
- Using distributed training:
- refer to https://pytorch.org/docs/stable/distributed.html
Example:
Output:
{'app': 'DistMinist', 'app_version': 1, 'id': 'ec4db479-b9b2-4289-8086-52794ffc71c8'}
See also
singa_auto.client.Client.create_train_job()
Listing train jobs¶
Example:
client.get_train_jobs_of_app(app='fashion_mnist_app')Output:
[{'app': 'fashion_mnist_app', 'app_version': 1, 'budget': {'MODEL_TRIAL_COUNT': 5}, 'datetime_started': 'Mon, 17 Dec 2018 07:08:05 GMT', 'datetime_stopped': None, 'id': 'ec4db479-b9b2-4289-8086-52794ffc71c8', 'status': 'RUNNING', 'task': 'IMAGE_CLASSIFICATION', 'val_dataset_id': '7e9a2f8a-c61d-4365-ae4a-601e90892b88', 'train_dataset_id': 'ecf87d2f-6893-4e4b-8ed9-1d9454af9763'}]
See also
singa_auto.client.Client.get_train_jobs_of_app()
Creating an inference job with the latest train job¶
To create an model serving job, you’ll have to wait for your train job to stop.
Then, you’ll submit the app name associated with the train job (with a status of STOPPED
).
The inference job would be created from the best trials from that train job.
Example:
client.create_inference_job(app='fashion_mnist_app') # Or with more details specified, such as Number of GPU 'GPU_COUNT' client.create_inference_job(app='fashion_mnist_app', app_version=1, budget={'GPU_COUNT': 1} )Output:
{'app': 'fashion_mnist_app', 'app_version': 1, 'id': '0477d03c-d312-48c5-8612-f9b37b368949', 'predictor_host': '127.0.0.1:30001', 'train_job_id': 'ec4db479-b9b2-4289-8086-52794ffc71c8'}
See also
singa_auto.client.Client.create_inference_job()
Listing inference jobs¶
Example:
client.get_inference_jobs_of_app(app='fashion_mnist_app')Output:
{'app': 'fashion_mnist_app', 'app_version': 1, 'datetime_started': 'Mon, 17 Dec 2018 07:15:12 GMT', 'datetime_stopped': None, 'id': '0477d03c-d312-48c5-8612-f9b37b368949', 'predictor_host': '127.0.0.1:30000', 'status': 'RUNNING', 'train_job_id': 'ec4db479-b9b2-4289-8086-52794ffc71c8'}
See also
singa_auto.client.Client.get_inference_jobs_of_app()
Making predictions¶
Send a POST /predict
to predictor_host
with a body of the following format in JSON:
{ "query": <query> }
…where the format of <query>
depends on the associated task (see tasks).
The body of the response will be of the following format in JSON:
{ "prediction": <prediction> }
…where the format of <prediction>
depends on the associated task.
Example:
If
predictor_host
is127.0.0.1:30000
, run the following in Python:predictor_host = '127.0.0.1:30000' query_path = 'examples/data/image_classification/fashion_mnist_test_1.png' # Load query image as 3D list of pixels from singa_auto.model import utils [query] = utils.dataset.load_images([query_path]).tolist() # Make request to predictor import requests import json res = requests.post('http://{}/predict'.format(predictor_host), json={ 'query': query }) print(res.json())Output:
{'prediction': [0.9364003576825639, 1.016065009906697e-08, 0.0027604885399341583, 0.00014587241457775235, 6.018594376655528e-06, 1.042887332047826e-09, 0.060679372351310566, 2.024707311532037e-11, 7.901770004536957e-06, 1.5299328026685544e-08], 'predictions': []}
Prediction for QuestionAnswering¶
- The query question should be uploaded by the following format
data={"questions": ["How long individuals are contagious?"]} res = requests.post('http://{}/predict'.format(predictor_host), json=data)
- To print out the prediction result, you should use ‘res.text’
print(res.text)
Prediction for SpeechRecognition¶
- The query data is passed using the following steps
data = [‘data/ldc93s1/ldc93s1/LDC93S1.wav’] data = json.dumps(data) res = requests.post(’http://{}/predict’.format(predictor_host), json=data[0])
- To print out the prediction result, you should use ‘res.text’
print(res.text)
If the SINGA-Auto instance is deployed with Kubernetes, all the inference job are at the default Ingress port 3005 with the format of <host>:3005/<app>, where <host> is the host name of the SINGA-Auto instance, and <app> is the name of the application prodvided when we submit train jobs.
Stopping a running inference job¶
Example:
client.stop_inference_job(app='fashion_mnist_app')
See also
singa_auto.client.Client.stop_inference_job()
Quick Start (Model Developers)¶
As a Model Developer, you can manage models, datasets, train jobs & inference jobs on SINGA-Auto. This guide only highlights the key methods available to manage models.
To learn about how to manage datasets, train jobs & inference jobs, go to Quick Start (Application Developers).
This guide assumes that you have access to a running instance of SINGA-Auto Admin at <singa_auto_host>:<admin_port>
and SINGA-Auto Web Admin at <singa_auto_host>:<web_admin_port>
.
To learn more about what else you can do on SINGA-Auto, explore the methods of singa_auto.client.Client
Installing the client¶
Install Python 3.6 such that the
python
andpip
point to the correct installation of Python (see Installing Python)Clone the project at https://github.com/nusdbsystem/singa-auto (e.g. with Git)
Within the project’s root folder, install SINGA-Auto’s client-side Python dependencies by running:
pip install -r ./singa_auto/requirements.txt
Initializing the client¶
Example:
from singa_auto.client import Client client = Client(admin_host='localhost', admin_port=3000) client.login(email='superadmin@singa_auto', password='singa_auto')
See also
singa_auto.client.Client.login()
Creating models¶
To create a model, you’ll need to submit a model class that conforms to the specification
by singa_auto.model.BaseModel
, written in a single Python file.
The model’s implementation should conform to a specific task (see tasks).
Refer to the parameters of singa_auto.client.Client.create_model()
for configuring how your model runs on SINGA-Auto,
and refer to Model Development Guide to understand more about how to write & test models for SINGA-Auto.
Example:
client.create_model( name='TfFeedForward', task='IMAGE_CLASSIFICATION', model_file_path='examples/models/image_classification/TfFeedForward.py', model_class='TfFeedForward', dependencies={ 'tensorflow': '1.12.0' } ) client.create_model( name='SkDt', task='IMAGE_CLASSIFICATION', model_file_path='examples/models/image_classification/SkDt.py', model_class='SkDt', dependencies={ 'scikit-learn': '0.20.0' } )
See also
singa_auto.client.Client.create_model()
Listing available models by task¶
Example:
client.get_available_models(task='IMAGE_CLASSIFICATION') # While leave the "task" unspecified, the method will retrieve information of all uploaded models client.get_available_models()Output:
[{'access_right': 'PRIVATE', 'datetime_created': 'Mon, 17 Dec 2018 07:06:03 GMT', 'dependencies': {'tensorflow': '1.12.0'}, 'id': '45df3f34-53d7-4fb8-a7c2-55391ea10030', 'name': 'TfFeedForward', 'task': 'IMAGE_CLASSIFICATION', 'user_id': 'fb5671f1-c673-40e7-b53a-9208eb1ccc50'}, {'access_right': 'PRIVATE', 'datetime_created': 'Mon, 17 Dec 2018 07:06:03 GMT', 'dependencies': {'scikit-learn': '0.20.0'}, 'id': 'd0ea96ce-478b-4167-8a84-eb36ae631235', 'name': 'SkDt', 'task': 'IMAGE_CLASSIFICATION', 'user_id': 'fb5671f1-c673-40e7-b53a-9208eb1ccc50'}]
See also
singa_auto.client.Client.get_available_models()
Deleting a model¶
Example:
client.delete_model('fb5671f1-c673-40e7-b53a-9208eb1ccc50')
See also
singa_auto.client.Client.delete_model()
Quick Start (Application Developers)¶
As an App Developer, you can manage datasets, train jobs & inference jobs on SINGA-Auto. This guide walks through a full train-inference flow:
- Authenticating on SINGA-Auto
- Uploading datasets
- Creating a model training job
- Creating a model serving job after the model training job completes
This guide assumes that you have access to a running instance of SINGA-Auto Admin at <singa_auto_host>:<admin_port>
and SINGA-Auto Web Admin at <singa_auto_host>:<web_admin_port>
, and there have been models added to SINGA-Auto under the task of IMAGE_CLASSIFICATION.
To learn more about what else you can do on SINGA-Auto, explore the methods of singa_auto.client.Client
.
Installing the client¶
Install Python 3.6 such that the
python
andpip
point to the correct installation of Python (see Installing Python)Clone the project at https://github.com/nusdbsystem/singa-auto (e.g. with Git)
Within the project’s root folder, install SINGA-Auto’s client-side Python dependencies by running:
pip install -r ./singa_auto/requirements.txt
Initializing the client¶
Example:
from singa_auto.client import Client client = Client(admin_host='localhost', admin_port=3000) client.login(email='superadmin@singaauto', password='singa_auto')
See also
singa_auto.client.Client.login()
Listing available models by task¶
Example:
client.get_available_models(task='IMAGE_CLASSIFICATION') # While leave the "task" unspecified, the method will retrieve information of all uploaded models client.get_available_models()Output:
[{'access_right': 'PRIVATE', 'datetime_created': 'Mon, 17 Dec 2018 07:06:03 GMT', 'dependencies': {'tensorflow': '1.12.0'}, 'id': '45df3f34-53d7-4fb8-a7c2-55391ea10030', 'name': 'TfFeedForward', 'task': 'IMAGE_CLASSIFICATION', 'user_id': 'fb5671f1-c673-40e7-b53a-9208eb1ccc50'}, {'access_right': 'PRIVATE', 'datetime_created': 'Mon, 17 Dec 2018 07:06:03 GMT', 'dependencies': {'scikit-learn': '0.20.0'}, 'id': 'd0ea96ce-478b-4167-8a84-eb36ae631235', 'name': 'SkDt', 'task': 'IMAGE_CLASSIFICATION', 'user_id': 'fb5671f1-c673-40e7-b53a-9208eb1ccc50'}]
See also
singa_auto.client.Client.get_available_models()
Creating datasets¶
You’ll first need to convert your dataset into a format specified by one of the tasks (see tasks), and split them into two files: one for training & one for validation. After doing so, you’ll create 2 corresponding datasets on SINGA-Auto by uploading them from your filesystem.
Example (pre-processing step):
# Run this in shell python examples/datasets/image_files/load_fashion_mnist.py
Example:
client.create_dataset( name='fashion_mnist_train', task='IMAGE_CLASSIFICATION', dataset_path='data/fashion_mnist_train.zip' ) client.create_dataset( name='fashion_mnist_val', task='IMAGE_CLASSIFICATION', dataset_path='data/fashion_mnist_val.zip' )Output:
{'id': 'ecf87d2f-6893-4e4b-8ed9-1d9454af9763', 'name': 'fashion_mnist_train', 'size_bytes': 36702897, 'task': 'IMAGE_CLASSIFICATION'} {'id': '7e9a2f8a-c61d-4365-ae4a-601e90892b88', 'name': 'fashion_mnist_val', 'size_bytes': 6116386, 'task': 'IMAGE_CLASSIFICATION'}
See also
singa_auto.client.Client.create_dataset()
Note
The code that preprocesses the original Fashion MNIST dataset is available at ./examples/datasets/image_files/load_mnist_format.py.
Creating a train job¶
To create a model training job, you’ll specify the train & validation datasets by their IDs, together with your application’s name and its associated task.
After creating a train job, you can monitor it on SINGA-Auto Web Admin (see Using SINGA-Auto’s Web Admin).
Refer to the parameters of singa_auto.client.Client.create_train_job()
for configuring how your train job runs on SINGA-Auto, such as enabling GPU usage & specifying which models to use.
Example:
client.create_train_job( app='fashion_mnist_app', task='IMAGE_CLASSIFICATION', train_dataset_id='ecf87d2f-6893-4e4b-8ed9-1d9454af9763', val_dataset_id='7e9a2f8a-c61d-4365-ae4a-601e90892b88', budget={ 'MODEL_TRIAL_COUNT': 5 } model_ids='["652db9f7-d23d-4b79-945b-a56446ceff33"]' ) # Omitting the GPU_COUNT is the same as letting GPU_COUNT equal to 0, which means training will be hosted on CPU only # MODEL_TRIAL_COUNT stands for number of trials, minimus MODEL_TRIAL_COUNT is 1 for a valid training # TIME_HOURS is assigned training time limit in hours. # train_args={} could be left empty or unspecified, if not in use client.create_train_job( app='fashion_mnist_app', task='IMAGE_CLASSIFICATION', train_dataset_id='ecf87d2f-6893-4e4b-8ed9-1d9454af9763', val_dataset_id='7e9a2f8a-c61d-4365-ae4a-601e90892b88', budget={'TIME_HOURS': 0.01, 'GPU_COUNT': 0, 'MODEL_TRIAL_COUNT': 1} model_ids='["652db9f7-d23d-4b79-945b-a56446ceff33"]', train_args={} )Output:
{'app': 'fashion_mnist_app', 'app_version': 1, 'id': 'ec4db479-b9b2-4289-8086-52794ffc71c8'}
- Using distributed training:
- refer to https://pytorch.org/docs/stable/distributed.html
Example:
Output:
{'app': 'DistMinist', 'app_version': 1, 'id': 'ec4db479-b9b2-4289-8086-52794ffc71c8'}
See also
singa_auto.client.Client.create_train_job()
Listing train jobs¶
Example:
client.get_train_jobs_of_app(app='fashion_mnist_app')Output:
[{'app': 'fashion_mnist_app', 'app_version': 1, 'budget': {'MODEL_TRIAL_COUNT': 5}, 'datetime_started': 'Mon, 17 Dec 2018 07:08:05 GMT', 'datetime_stopped': None, 'id': 'ec4db479-b9b2-4289-8086-52794ffc71c8', 'status': 'RUNNING', 'task': 'IMAGE_CLASSIFICATION', 'val_dataset_id': '7e9a2f8a-c61d-4365-ae4a-601e90892b88', 'train_dataset_id': 'ecf87d2f-6893-4e4b-8ed9-1d9454af9763'}]
See also
singa_auto.client.Client.get_train_jobs_of_app()
Retrieving the latest train job’s details¶
Example:
client.get_train_job(app='fashion_mnist_app')Output:
{'app': 'fashion_mnist_app', 'app_version': 1, 'datetime_started': 'Mon, 17 Dec 2018 07:08:05 GMT', 'datetime_stopped': 'Mon, 17 Dec 2018 07:11:11 GMT', 'id': 'ec4db479-b9b2-4289-8086-52794ffc71c8', 'status': 'STOPPED', 'task': 'IMAGE_CLASSIFICATION' 'val_dataset_id': '7e9a2f8a-c61d-4365-ae4a-601e90892b88', 'train_dataset_id': 'ecf87d2f-6893-4e4b-8ed9-1d9454af9763', 'workers': [{'datetime_started': 'Mon, 17 Dec 2018 07:08:05 GMT', 'datetime_stopped': 'Mon, 17 Dec 2018 07:11:14 GMT', 'model_name': 'SkDt', 'replicas': 2, 'service_id': '2ada1ff3-84e9-4eca-bac9-241cd8c765ef', 'status': 'STOPPED'}, {'datetime_started': 'Mon, 17 Dec 2018 07:08:05 GMT', 'datetime_stopped': 'Mon, 17 Dec 2018 07:11:42 GMT', 'model_name': 'TfFeedForward', 'replicas': 2, 'service_id': '81ff23a7-ddd0-4a62-9d86-a3cc985ca6fe', 'status': 'STOPPED'}]}
See also
singa_auto.client.Client.get_train_job()
Listing best trials of the latest train job¶
Example:
client.get_best_trials_of_train_job(app='fashion_mnist_app')Output:
[{'datetime_started': 'Mon, 17 Dec 2018 07:09:17 GMT', 'datetime_stopped': 'Mon, 17 Dec 2018 07:11:38 GMT', 'id': '1b7dc65a-87ae-4d42-9a01-67602115a4a4', 'knobs': {'batch_size': 32, 'epochs': 3, 'hidden_layer_count': 2, 'hidden_layer_units': 36, 'image_size': 32, 'learning_rate': 0.014650971133579896}, 'model_name': 'TfFeedForward', 'score': 0.8269}, {'datetime_started': 'Mon, 17 Dec 2018 07:08:38 GMT', 'datetime_stopped': 'Mon, 17 Dec 2018 07:11:11 GMT', 'id': '0c1f9184-7b46-4aaf-a581-be62bf3f49bf', 'knobs': {'criterion': 'entropy', 'max_depth': 4}, 'model_name': 'SkDt', 'score': 0.6686}]
See also
singa_auto.client.Client.get_best_trials_of_train_job()
Creating an inference job with the latest train job¶
Your app’s users will make queries to the /predict endpoint of predictor_host over HTTP.
See also
To create an model serving job, you’ll have to wait for your train job to stop.
Then, you’ll submit the app name associated with the train job (with a status of STOPPED
).
The inference job would be created from the best trials from that train job.
Example:
client.create_inference_job(app='fashion_mnist_app') # Or with more details specified, such as Number of GPU 'GPU_COUNT' client.create_inference_job(app='fashion_mnist_app', app_version=1, budget={'GPU_COUNT': 1} )Output:
{'app': 'fashion_mnist_app', 'app_version': 1, 'id': '0477d03c-d312-48c5-8612-f9b37b368949', 'predictor_host': '127.0.0.1:30001', 'train_job_id': 'ec4db479-b9b2-4289-8086-52794ffc71c8'}
See also
singa_auto.client.Client.create_inference_job()
Listing inference jobs¶
Example:
client.get_inference_jobs_of_app(app='fashion_mnist_app')Output:
{'app': 'fashion_mnist_app', 'app_version': 1, 'datetime_started': 'Mon, 17 Dec 2018 07:15:12 GMT', 'datetime_stopped': None, 'id': '0477d03c-d312-48c5-8612-f9b37b368949', 'predictor_host': '127.0.0.1:30000', 'status': 'RUNNING', 'train_job_id': 'ec4db479-b9b2-4289-8086-52794ffc71c8'}
See also
singa_auto.client.Client.get_inference_jobs_of_app()
Retrieving details of running inference job¶
See also
singa_auto.client.Client.get_running_inference_job()
Example:
client.get_running_inference_job(app='fashion_mnist_app')Output:
{'app': 'fashion_mnist_app', 'app_version': 1, 'datetime_started': 'Mon, 17 Dec 2018 07:25:36 GMT', 'datetime_stopped': None, 'id': '09e5040e-2134-411b-855f-793927c80b4b', 'predictor_host': '127.0.0.1:30000', 'status': 'RUNNING', 'train_job_id': 'ec4db479-b9b2-4289-8086-52794ffc71c8', 'workers': [{'datetime_started': 'Mon, 17 Dec 2018 07:25:36 GMT', 'datetime_stopped': None, 'replicas': 2, 'service_id': '661035bb-3966-46e8-828c-e200960a76c0', 'status': 'RUNNING', 'trial': {'id': '1b7dc65a-87ae-4d42-9a01-67602115a4a4', 'knobs': {'batch_size': 32, 'epochs': 3, 'hidden_layer_count': 2, 'hidden_layer_units': 36, 'image_size': 32, 'learning_rate': 0.014650971133579896}, 'model_name': 'TfFeedForward', 'score': 0.8269}}, {'datetime_started': 'Mon, 17 Dec 2018 07:25:36 GMT', 'datetime_stopped': None, 'replicas': 2, 'service_id': '6a769007-b18f-4271-b3db-8b60ed5fb545', 'status': 'RUNNING', 'trial': {'id': '0c1f9184-7b46-4aaf-a581-be62bf3f49bf', 'knobs': {'criterion': 'entropy', 'max_depth': 4}, 'model_name': 'SkDt', 'score': 0.6686}}]}
Stopping a running inference job¶
Example:
client.stop_inference_job(app='fashion_mnist_app')
See also
singa_auto.client.Client.stop_inference_job()
Downloading the trained model for a trial¶
After running a train job, you might want to download the trained model instance of a trial of the train job, instead of creating an inference job to make predictions. Subsequently, you’ll be able to make batch predictions locally with the trained model instance.
To do this, you must have the trial’s model class file already in your local filesystem, the dependencies of the model must have been installed separately, and the model class must have been imported and passed into this method.
To download the model class file, use the method singa_auto.client.Client.download_model_file()
.
Example:
In shell,
# Install the dependencies of the `TfFeedForward` model pip install tensorflow==1.12.0In Python,
# Find the best trial for model `TfFeedForward` trials = [x for x in client.get_best_trials_of_train_job(app='fashion_mnist_app') if x.get('model_name') == 'TfFeedForward' and x.get('status') == 'COMPLETED'] trial = trials[0] trial_id = trial.get('id') # Import the model class from examples.models.image_classification.TfFeedForward import TfFeedForward # Load an instance of the model with trial's parameters model_inst = client.load_trial_model(trial_id, TfFeedForward) # Make predictions with trained model instance associated with best trial queries = [[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 0, 0, 7, 0, 37, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 27, 84, 11, 0, 0, 0, 0, 0, 0, 119, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 88, 143, 110, 0, 0, 0, 0, 22, 93, 106, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 53, 129, 120, 147, 175, 157, 166, 135, 154, 168, 140, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 11, 137, 130, 128, 160, 176, 159, 167, 178, 149, 151, 144, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0, 2, 1, 0, 3, 0, 0, 115, 114, 106, 137, 168, 153, 156, 165, 167, 143, 157, 158, 11, 0], [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 3, 0, 0, 89, 139, 90, 94, 153, 149, 131, 151, 169, 172, 143, 159, 169, 48, 0], [0, 0, 0, 0, 0, 0, 2, 4, 1, 0, 0, 0, 98, 136, 110, 109, 110, 162, 135, 144, 149, 159, 167, 144, 158, 169, 119, 0], [0, 0, 2, 2, 1, 2, 0, 0, 0, 0, 26, 108, 117, 99, 111, 117, 136, 156, 134, 154, 154, 156, 160, 141, 147, 156, 178, 0], [3, 0, 0, 0, 0, 0, 0, 21, 53, 92, 117, 111, 103, 115, 129, 134, 143, 154, 165, 170, 154, 151, 154, 143, 138, 150, 165, 43], [0, 0, 23, 54, 65, 76, 85, 118, 128, 123, 111, 113, 118, 127, 125, 139, 133, 136, 160, 140, 155, 161, 144, 155, 172, 161, 189, 62], [0, 68, 94, 90, 111, 114, 111, 114, 115, 127, 135, 136, 143, 126, 127, 151, 154, 143, 148, 125, 162, 162, 144, 138, 153, 162, 196, 58], [70, 169, 129, 104, 98, 100, 94, 97, 98, 102, 108, 106, 119, 120, 129, 149, 156, 167, 190, 190, 196, 198, 198, 187, 197, 189, 184, 36], [16, 126, 171, 188, 188, 184, 171, 153, 135, 120, 126, 127, 146, 185, 195, 209, 208, 255, 209, 177, 245, 252, 251, 251, 247, 220, 206, 49], [0, 0, 0, 12, 67, 106, 164, 185, 199, 210, 211, 210, 208, 190, 150, 82, 8, 0, 0, 0, 178, 208, 188, 175, 162, 158, 151, 11], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]] print(model_inst.predict(queries))
See also
singa_auto.client.Client.load_trial_model()
Quick Start (Application Users)¶
As an App User, you can make predictions on models deployed on SINGA-Auto.
Making a single prediction¶
Your app developer should have created an inference job and shared predictor_host, the host at which you can send queries to and receive predictions over HTTP.
Send a POST /predict
to predictor_host
with a body of the following format in JSON:
{ "query": <query> }
…where the format of <query>
depends on the associated task (see tasks).
The body of the response will be of the following format in JSON:
{ "prediction": <prediction> }
…where the format of <prediction>
depends on the associated task.
Example:
If
predictor_host
is127.0.0.1:30000
, run the following in Python:predictor_host = '127.0.0.1:30000' query_path = 'examples/data/image_classification/fashion_mnist_test_1.png' # Load query image as 3D list of pixels from singa_auto.model import utils [query] = utils.dataset.load_images([query_path]).tolist() # Make request to predictor import requests import json res = requests.post('http://{}/predict'.format(predictor_host), json={ 'query': query }) print(res.json())Output:
{'prediction': [0.9364003576825639, 1.016065009906697e-08, 0.0027604885399341583, 0.00014587241457775235, 6.018594376655528e-06, 1.042887332047826e-09, 0.060679372351310566, 2.024707311532037e-11, 7.901770004536957e-06, 1.5299328026685544e-08], 'predictions': []}
Prediction for QuestionAnswering¶
- The query question should be uploaded by the following format
data={"questions": ["How long individuals are contagious?"]} res = requests.post('http://{}/predict'.format(predictor_host), json=data)
- To print out the prediction result, you should use ‘res.text’
print(res.text)
Prediction for SpeechRecognition¶
- The query data is passed using the following steps
data = [‘data/ldc93s1/ldc93s1/LDC93S1.wav’] data = json.dumps(data) res = requests.post(’http://{}/predict’.format(predictor_host), json=data[0])
- To print out the prediction result, you should use ‘res.text’
print(res.text)
If the SINGA-Auto instance is deployed with Kubernetes, all the inference job are at the default Ingress port 3005 with the format of <host>:3005/<app>, where <host> is the host name of the SINGA-Auto instance, and <app> is the name of the application prodvided when we submit train jobs.
Making batch predictions¶
Similar to making a single prediction, but use the queries
attribute instead of query
in your request and
pass an array of queries instead.
Example:
If
predictor_host
is127.0.0.1:30000
, run the following in Python:predictor_host = '127.0.0.1:30000' query_paths = ['examples/data/image_classification/fashion_mnist_test_1.png', 'examples/data/image_classification/fashion_mnist_test_2.png'] # Load query image as 3D list of pixels from singa_auto.model import utils queries = utils.dataset.load_images(query_paths).tolist() # Make request to predictor import requests res = requests.post('http://{}/predict'.format(predictor_host), json={ 'queries': queries }) print(res.json())Output:
{'prediction': None, 'predictions': [[0.9364002384732744, 1.0160608354681244e-08, 0.0027604878414422274, 0.0001458720798837021, 6.018587100697914e-06, 1.0428869989809186e-09, 0.06067946175827773, 2.0247028012509993e-11, 7.901745448180009e-06, 1.5299294275905595e-08], [0.866741563402005, 5.757699909736402e-05, 0.0006144539802335203, 0.03480150588134776, 3.4249271266162395e-05, 1.3578344004727683e-09, 0.09774905198545598, 6.071191726436664e-12, 1.5324986861742218e-06, 1.583319586551113e-10]]}
Quick Start (Admins)¶
As an Admin, you can manage users, datasets, models, train jobs & inference jobs on SINGA-Auto. This guide only highlights the key methods available to manage users.
To learn about how to manage models, go to Quick Start (Model Developers).
To learn about how to manage train & inference jobs, go to Quick Start (Application Developers).
This guide assumes that you have access to a running instance of SINGA-Auto Admin at <singa_auto_host>:<admin_port>
, e.g., 127.0.0.1:3000
,
and SINGA-Auto Web Admin at <singa_auto_host>:<web_admin_port>
, e.g., 127.0.0.1:3001
.
Installation¶
Install Python 3.6 such that the
python
andpip
point to the correct installation of Python (see Installing Python)Clone the project at https://github.com/nusdbsystem/singa-auto (e.g. with Git)
Within the project’s root folder, install SINGA-Auto’s client-side Python dependencies by running:
pip install -r ./singa_auto/requirements.txt
Initializing the client¶
Example:
from singa_auto.client import Client client = Client(admin_host='localhost', admin_port=3000) # 'localhost' can be replaced by '127.0.0.1' or other server address client.login(email='superadmin@singaauto', password='singa_auto')
See also
singa_auto.client.Client.login()
Creating users¶
Examples:
client.create_user( email='admin@singaauto', password='singa_auto', user_type='ADMIN' ) client.create_user( email='model_developer@singaauto', password='singa_auto', user_type='MODEL_DEVELOPER' ) client.create_user( email='app_developer@singaauto', password='singa_auto', user_type='APP_DEVELOPER' )
See also
singa_auto.client.Client.create_user()
Listing all users¶
Example:
client.get_users()[{'email': 'superadmin@singaauto', 'id': 'c815fa08-ce06-467d-941b-afc27684d092', 'user_type': 'SUPERADMIN'}, {'email': 'admin@singaauto', 'id': 'cb2c0d61-acd3-4b65-a5a7-d78aa5648283', 'user_type': 'ADMIN'}, {'email': 'model_developer@singaauto', 'id': 'bfe58183-9c69-4fbd-a7b3-3fdc267b3290', 'user_type': 'MODEL_DEVELOPER'}, {'email': 'app_developer@singaauto', 'id': '958a7d65-aa1d-437f-858e-8837bb3ecf32', 'user_type': 'APP_DEVELOPER'}]
See also
singa_auto.client.Client.get_users()
Banning a user¶
Example:
client.ban_user('app_developer@singaauto')
See also
singa_auto.client.Client.ban_user()
Using SINGA-Auto’s Web Admin¶
SINGA-Auto Web Admin is accessible at <singa_auto_host>:<web_admin_port>
(e.g. 127.0.0.1:3001
by default).
Log in with the same credentials for SINGA-Auto Admin.
You’re currently able to view your own train jobs & datasets, and additionally create train jobs & datasets for the
IMAGE_CLASSIFICATION
task.
Supported Tasks¶
Each task has an associated a Dataset Format, a Query Format and a Prediction Format.
A task’s Dataset Format specifies the format of the dataset files. Datasets are prepared by Application Developers when they create Train Jobs and received by Model Developers when they define singa_auto.model.BaseModel.train and singa_auto.model.BaseModel.evaluate.
A task’s Query Format specifies the format of queries when they are passed to models. Queries are generated by Application Users when they send queries to Inference Jobs and received by Model Developers when they define singa_auto.model.BaseModel.predict.
A task’s Prediction Format specifies the format of predictions made by models. Predictions are generated by Model Developers when they define singa_auto.model.BaseModel.predict and received by Application Users as predictions to their queries sent to Inference Jobs.
IMAGE_SEGMENTATION¶
Dataset Format¶
dataset-type: SEGMENTATION_IMAGES
note
We use the same annotation format as Pascal VOC segmentation dataset
- An image and its corresponding mask should have the same width and
length while the number of channels can be different. For example, an
image can have three channels representing
RGB
values but its mask should only have one grayscale channel. - In the mask image, each pixel’s grayscale value represents its label,
while there can be a specific value represents the pixel is
meaningless (the same definition as
ignore_lable
in some loss function) such as paddings or borders.
Query Format¶
An image file in the following common formats: .jpg
, .jpeg
,
.png
, .gif
, .bmp
, or .tiff
.
Prediction Format¶
A W x H
single-channel mask image file with each pixel’s
grayscale value representing its label.
IMAGE_CLASSIFICATION¶
Dataset Format¶
dataset-type: IMAGE_FILES
- There is only 1 tag column of
class
, corresponding to the class of the image as an integer from0
tok - 1
, wherek
is the total no. of classes. - The train & validation dataset’s images should be have the same
dimensions
W x H
and same total no. of classes.
An example:
path,class
image-0-of-class-0.png,0
image-1-of-class-0.png,0
...
image-0-of-class-1.png,1
...
image-99-of-class-9.png,9
**note**
You can refer to and run
`./examples/datasets/image\_files/load\_folder\_format.py <https://github.com/nusdbsystem/singa-auto/tree/master/examples/datasets/load_folder_format.py>`__
for converting *directories of images* to SINGA-Auto's
``IMAGE_CLASSIFICATION`` format.
Query Format¶
An image file in the following common formats: .jpg
, .jpeg
,
.png
, .gif
, .bmp
, or .tiff
.
Prediction Format¶
A jsonified string representing the classification result. There are no strict requirements for the format of the output string, which is entirely determined by the model itself, such as directly outputting the label or class name of the classification result, one-hot encoding, or the probability corresponding to each class.
OBJECT_DETECTION¶
Dataset Format¶
dataset-type: DETECTION_DATASET
It is recommended to follow the YOLO dataset format.
- For folder hierarchy, two folders ‘images’ and ‘labels’ should be
prepared. In ‘images’ folder, there are PIL loadable images, and the
corresponding
txt
label files should be placed in ‘labels’ folder, with the same basename with the images. - The label file format is as follows, where
object-id
is the index of object, the following four numbers should be normalized to range between 0 and 1 by dividing by the width and height of the image.center_x center_y
are the central coordinates of bounding box, andwidth heigh
is the sides lengths of it. It is allowable to use empty label file (negative samples), which means there are no objects to detect in the image.
object-id center_x center_y width height
...
- In addition,
train.txt
,valid.txt
can be provided to note images used for training/validataion, only including the path of image files. Aclass.names
contains the category names and thier line numbers areobject-id
.
Query Format¶
An image file in the following common formats: .jpg
, .jpeg
,
.png
, .gif
, .bmp
, or .tiff
.
Prediction Format¶
A jsonified dict (string) indicating the bounding boxes and their corresponding classes. The keys and values format are strictly required as following:
{'explanations':
{'box_info': [{'coord': (224, 275, 281, 357),
'class_name': 'person'},
{'coord': (64, 263, 150, 368),
'class_name': 'person'}]
}
}
GENERAL_TASK¶
Dataset Format¶
dataset-type: GENERAL_FILES
- For general task, as its name states, any domain’s task (or model) can be included within this category, such as image processing, nlp, speech, or video.
- There is no requirements for the form of dataset, as long as it can be read into memory in the form of a file. However, the model developer has to know in advance how to handle the read-in file.
Query Format¶
A file is required as the query format. As long as this file corresponds to the input required by the model, it can be in any file format.
Prediction Format¶
The same as the input query, the prediction returns the output file as set in the model.’
POS_TAGGING¶
Dataset Format¶
dataset-type:CORPUS
- Sentences are delimited by
\n
tokens. - There is only 1 tag column of
tag
corresponding to the POS tag of the token as an integer from0
tok-1
.
An example:
token tag
Two 3
leading 2
...
line-item 1
veto 5
. 4
\n 0
Professors 6
Philip 6
...
previous 1
presidents 8
. 4
\n 0
Query Format¶
An array of strings representing a sentence as a list of tokens in that sentence.
Prediction Format¶
A array of integers representing the list of predicted tag for each token, in sequence, for the sentence.
QUESTION_ANSWERING¶
COVID19 Task Dataset Format¶
dataset-type:QUESTION_ANSWERING_COVID19
Dataset can be used to finetune the SQuAD pre-trained Bert model.
- The dataset zips folders containing JSON files. JSON files under different level folders will be automaticly read all together.
Dataset structure example:
/DATASET_NAME.zip
│
├──FOLDER_NAME_1 # first level folder
│ └──FOLDER_NAME_2 # second level folder, not necessarily to be included
│ └──FOLDER_NAME_3 # third level folder, not necessarily to be included
│ ├── 003d2e515e1aaf06f0052769953e8.json # JSON file name is a random combination of either alphabets/numbers or both
│ ├── 00a407540a8bdd.json
│ ...
│
├──FOLDER_NAME_4 # first level folder
│ ├── 0015023cc06b5362d332b3.json
│ ├── 001b4a31684c8fc6e2cfbb70304354978317c429.json
│ ...
...
│
└──metadata.csv # if additional information is provided for above JSON files, user can add a metadata.csv
- JSON file includes
body_text
, providing list of paragraphs in full body which can be used for question answering.body_text
can contain different entries, only the “text” field of each entry will be read.
- For JSON files extracted from papers, it comes that one JSON file for
one paper. And if additional information is given in metadata.csv for
papers, each JSON file and each metadata.csv entries are linked via
sha
values of both. - For dataset having their additional information paragraph, the
body_text
>text
entry is in<question> + <\n> + <information paragraph>
string format. In this circumstance, there is nosha
value nor metadata.csv file needed.
Sample of JSON file:
# JSON file 1 # for example, a JSON file extracted from one paper
{
"sha": <str>, # 40-character sha1 of the PDF, this field is only required for JSON extracted from papers. it will be read into model in forms of string
"body_text": [ # list of paragraphs in full body, this is must-have
{
"text": <str>, # text body for first entry, which is for one paragraph of this paper. this is must-have. it will be read as string into model
}
... # other 'text' blocks, i.e. paragraphs blocks the same as above, then all string ‘text’ will be handled and processed into panda datafame
],
}
# ---------------------------------------------------------------------------------------------------------------------- #
# JSON file 2 # for example, a JSON file extraced from SQuAD2.0
{
"body_text": [ # list of paragraphs in full body, this is must-have
{
"text": 'What are the treatments for Age-related Macular Degeneration ?\n If You Have Advanced AMD Once dry AMD reaches the advanced stage, no form of treatment can prevent vision loss...',
# text body for first entry, this is must-have
},
... # other 'text' blocks, i.e. paragraphs blocks look the same as above
],
}
metadata.csv
is not strictly required. User can provide additional information with it, i.e. authors, title, journal and publish_time, mapping to each JSON files by every sha value.cord_uid
serves unique values serve as the entry identity. Time sensitive entry, is advised to havepublish_time
value in Date format. Other values, General format is recommended.
Sample of metadata.csv
entry:
Column Names Column Values cord_uid zjufx4fo sha b2897e1277f56641193a6db73825f707eed3e4c9 source_x PMC title Sequence requirements for RNA strand transfer during nidovirus … doi 10.1093/emboj/20.24.7220 pmcid PMC125340 pubmed_id 11742998 license unk abstract Nidovirus subgenomic mRNAs contain a leader sequence derived … publish_time 2001-12-17
Query Format¶
note
- The pretrained model should be fine-tuned with a dataset first to adapt to particular question domains when necessary.
- Otherwise, following the question, input should contain relevant information (context paragraph or candidate answers, or both), whether or not addresses the question.
- Optionally, while the relevant information as additional paragraph are provided in query, the question always comes first, followed by additional paragraph. We use “n” separators between the question and its paragraph of the input.
Query is in JSON format. It could be a \ of a single question in
questions
field. Model will only read the questions
field.
{
'questions': ['Is individual's age considered a potential risk factor of COVID19? \n People of all ages can be infected by the new coronavirus (2019-nCoV). Older people, and people with pre-existing medical conditions (such as asthma, diabetes, heart disease) appear to be more vulnerable to becoming severely ill with the virus. WHO advises people of all ages to take steps to protect themselves from the virus, for example by following good hand hygiene and good respiratory hygiene.',
# query string can include optional context which follows the question with `\n` syntax
'Is COVID-19 associated with cardiomyopathy and cardiac arrest?'], # will be read as a list of string by model, and each question will be extracted as string to process the question answering stage recursively
... # questions in string format
... # other fileds. fields, other than 'questions', won't be read into the model
}
Prediction Format¶
The output is in JSON format.
['Given a higher mortality rate for older cases, in one study, li et al showed that more than 50% of early patients with covid-19 in wuhan were more than 60 years old',
'cardiac involvement has been reported in patients with covid-19, which may be reflected by ecg changes.'
...
] # output field is a list of string
MedQuAD Task Dataset Format¶
dataset-type:QUESTION_ANSWERING_MEDQUAD
Dataset structure example:
/MedQuAD.zip
│
├──FOLDER_NAME_1 # first level folder
│ └──FOLDER_NAME_2 # second level folder, not necessarily to be included
│ └──FOLDER_NAME_3 # third level folder, not necessarily to be included
│ ├── 003d2e515e1aaf0052769953e8.xml # xml file name is a random combination of either alphabets/numbers or both
│ ├── 00a40758bdd.xml
│ ...
│
├──FOLDER_NAME_4 # first level folder
│ ├── 0015023cc06b5332b3.xml
│ ├── 001b4a31684c8fc6e2cfbb70304c429.xml
│ ...
...
**note**
- For following .xml sample, model would only take Question and
Answer fields into the question answering processing.
- Each xml file contains multiple \\. Each \\ contains one question
and its answer.
Sample .xml file:
<?xml version="1.0" encoding="UTF-8"?>
<Document>
...
<QAPairs>
<QAPair pid="1"> # pair #1
<Question qid="000001-1"> A question here ... </Question> # question #1, will be read as string by model
<Answer> An answer here ... </Answer> # answer of question #1, will be read as string by model
</QAPair>
... # multiple subsequent <QAPair> blocks, Question and its Answer pair will be combined into one string by model, and strings of QAPair are then processed into panda dataframe
</QAPairs>
</Document>
Query Format¶
note
- The pretrained model should be fine-tuned with a dataset first to adapt to particular question domains when necessary.
- Otherwise, following the question, input should contain relevant information (context paragraph or candidate answers, or both), whether or not addresses the question.
- Optionally, while the relevant information as additional paragraph are provided in query, the question always comes first, followed by additional paragraph. We use “n” separators between the question and its paragraph of the input.
Query is in JSON format. It could be a \ of a single question in
questions
field. Model will only read the questions
field.
{
'questions': ['Who is at risk for Adult Acute Lymphoblastic Leukemia?',
'What are the treatments for Adult Acute Lymphoblastic Leukemia ?'], # will be read as a list of string by model, and each question will be extracted as string to process the question answering stage recursively
... # questions in format of string
... # other fileds. fields, other than 'questions', won't be read into the model
}
Prediction Format¶
The output is in JSON format.
{'answers':['Past treatment with chemotherapy or radiation therapy. Having certain genetic disorders.', # output 'answers' field is a list of string
'Chemotherapy. Radiation therapy. Chemotherapy with stem cell transplant. Targeted therapy.'
...
]}
SPEECH_RECOGNITION¶
Speech recognition for the English language.
Dataset Type¶
dataset-type:AUDIO_FILES
The audios.csv
should be of a
.CSV format
with 3 columns of wav_filename
, wav_filesize
and transcript
.
For each row,
wav_filename
should be a file path to a.wav
audio file within the archive, relative to the root of the directory. Each audio file’s sample rate must equal to 16kHz.
wav_filesize
should be an integer representing the size of the.wav
audio file, in number of bytes.
transcript
should be a string of the true transcript for the audio file. Transcripts should only contain the following alphabets:a b c d e f g h i j k l m n o p q r s t u v w x y z 'An example of
audios.csv
follows:
wav_filename,wav_filesize,transcript
6930-81414-0000.wav,412684,audio transcript one
6930-81414-0001.wav,559564,audio transcript two
...
672-122797-0005.wav,104364,audio transcript one thousand
...
1995-1837-0001.wav,279404,audio transcript three thousand
Query Format¶
A Base64-encoded string of the bytes of the audio as a 16kHz .wav file
Prediction Format¶
A string, representing the predicted transcript for the audio.
TABULAR_CLASSIFICATION¶
Dataset Type¶
dataset-type:TABULAR
The following optional train arguments are supported:
Train A rgument Description f eatures
List of feature columns’ names as a list of strings (defaults to first N-1
columns in the CSV file)` target` Target column name as a string (defaults to the last column in the CSV file) The train & validation datasets should have the same columns.
age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
48,0,2,130,275,0,1,139,0,0.2,2,0,2,1
58,0,0,170,225,1,0,146,1,2.8,1,2,1,0
Query Format¶
An size-N-1
dictionary representing feature-value pairs.
E.g.
queries=[
{'age': 48,'sex': 1,'cp': 2,'trestbps': 130,'chol': 225,'fbs': 1,'restecg': 1,'thalach': 172,'exang': 1,'oldpeak': 1.7,'slope': 2,'ca': 0,'thal': 3},
{'age': 48,'sex': 0,'cp': 2,'trestbps': 130,'chol': 275,'fbs': 0,'restecg': 1,'thalach': 139,'exang': 0,'oldpeak': 0.2,'slope': 2,'ca': 0,'thal': 2},
]
Prediction Format¶
A size-k
list of floats, representing the probabilities of each
class from 0
to k-1
for the target column.
TABULAR_REGRESSION¶
Dataset Type¶
dataset-type:TABULAR
The following optional train arguments are supported:
Train A rgument Description f eatures
List of feature columns’ names as a list of strings (defaults to first N-1
columns in the CSV file)` target` Target column name as a string (defaults to the last column in the CSV file) The train & validation datasets should have the same columns.
An example of the dataset follows:
density,bodyfat,age,weight,height,neck,chest,abdomen,hip,thigh,knee,ankle,biceps,forearm,wrist
1.0708,12.3,23,154.25,67.75,36.2,93.1,85.2,94.5,59,37.3,21.9,32,27.4,17.1
1.0853,6.1,22,173.25,72.25,38.5,93.6,83,98.7,58.7,37.3,23.4,30.5,28.9,18.2
1.0414,25.3,22,154,66.25,34,95.8,87.9,99.2,59.6,38.9,24,28.8,25.2,16.6
...
Query Format¶
An size-N-1
dictionary representing feature-value pairs.
Prediction Format¶
A float, representing the value of the target column.
Dataset Types¶
note
Refer to ./examples/datasets/ for examples on pre-processing common dataset formats to conform to the SINGA-Auto’s own dataset formats.
CORPUS¶
The dataset file must be of the .zip
archive format with a
corpus.tsv
at the root of the directory.
The corpus.tsv
should be of a
.TSV format
with columns of token
and N
other variable column names (tag
columns).
For each row,
token
should be a string, a token (e.g. word) in the corpus. These tokens should appear in the order as it is in the text of the corpus. To delimit sentences,token
can be take the value of\n
.The other
N
columns describe the corresponding token as part of the text of the corpus, depending on the task.
SEGMENTATION_IMAGES¶
Inside the uploaded
.zip
file, the training and validation sets should be wrapped separately, and be named strictly astrain
andval
.For
train
folder (the same forval
folder), the images and annotated masks should also be wrapped separately, and be named strictly asimage
andmask
.mask
folder should contain only.png
files and file name should be the same as each mask’s corresponding image. (eg. for an image named0001.jpg
, its corresponding mask should be named as0001.png
)An JSON file named
params.json
must also be included in the.zip
file, in order to indicates the essential training parameters such asnum_classes
, for example:{ "num_classes": 21 }
An example of the upload .zip
file structure:
+ dataset.zip
+ train
+ image
+ 0001.jpg
+ 0002.jpg
+ ...
+ mask
+ 0001.png
+ 0002.png
+ ..
+ val
+ image
+ 0003.jpg
+ ...
+ mask
+ 0003.png
+ ...
+ params.json
IMAGE_FILES¶
The dataset file must be of the .zip
archive format with a
images.csv
at the root of the directory.
The images.csv
should be of a
.CSV format
with columns of path
and N
other variable column names (tag
columns).
For each row,
path
should be a file path to a.png
,.jpg
or.jpeg
image file within the archive, relative to the root of the directory.The other
N
columns describe the corresponding image, depending on the task.
DETECTION_DATASET¶
It is recommended to follow the YOLO dataset format.
- For folder hierarchy, two folders ‘images’ and ‘labels’ should be
prepared. In ‘images’ folder, there are PIL loadable images, and the
corresponding
txt
label files should be placed in ‘labels’ folder, with the same basename with the images. - The label file format is as follows, where
object-id
is the index of object, the following four numbers should be normalized to range between 0 and 1 by dividing by the width and height of the image.center_x center_y
are the central coordinates of bounding box, andwidth heigh
is the sides lengths of it. It is allowable to use empty label file (negative samples), which means there are no objects to detect in the image.
object-id center_x center_y width height
...
- In addition,
train.txt
,valid.txt
can be provided to note images used for training/validataion, only including the path of image files. Aclass.names
contains the category names and thier line numbers areobject-id
.
GENERAL_FILES¶
- For general task, as its name states, any domain’s task (or model) can be included within this category, such as image processing, nlp, speech, or video.
- There is no requirements for the form of dataset, as long as it can be read into memory in the form of a file. However, the model developer has to know in advance how to handle the read-in file.
QUESTION_ANSWERING_COVID19¶
The dataset file must be of the .zip
archive format, containing
JSON files. JSON files under
different levels of folders will be automaticly read all together.
Each JSON file is extracted from one paper. JSON structure contains field body_text, which is a list of {“text”: <str>} blocks. Each text block is namely each paragraph of corresponding paper.
Meanwhile, a metadata.csv file, at the root of the archive directory, is optional. It is to provide the model with publish_time column, each entry is in Date format, e.g. 2001-12-17. In this condition, each metadata entry is required to have sha value column in General format, and each JSON file required to have “sha”:<str> field, while both sha values linked. When neither metadata.csv or publish_time Date value is provided, the model would not check the timeliness of corresponding JSON body_text field.
QUESTION_ANSWERING_MEDQUAD¶
The dataset file must be of the .zip
archive format, containing
xml
files. Xml files under different levels of folders will be automaticly
read all together.
Model would only take <Document> <QAPairs> … </QAPairs> </Document>field, and this filed contains multiple <QAPair> … </QAPair>. Each QAPair has one <Question> … </Question> and its <Answer> … </Answer> combination.
TABULAR¶
The dataset file must be a tabular dataset of the .csv
format with
N
columns.
AUDIO_FILES¶
The dataset file must be of the .zip
archive format with a
audios.csv
at the root of the directory.
The audios.csv
should be of a
.CSV format
with 3 columns of wav_filename
, wav_filesize
and transcript
.
For each row,
wav_filename
should be a file path to a.wav
audio file within the archive, relative to the root of the directory. Each audio file’s sample rate must equal to 16kHz.
wav_filesize
should be an integer representing the size of the.wav
audio file, in number of bytes.
transcript
should be a string of the true transcript for the audio file. Transcripts should only contain the following alphabets:a b c d e f g h i j k l m n o p q r s t u v w x y z 'An example of
audios.csv
follows:
wav_filename,wav_filesize,transcript
6930-81414-0000.wav,412684,audio transcript one
6930-81414-0001.wav,559564,audio transcript two
...
672-122797-0005.wav,104364,audio transcript one thousand
...
1995-1837-0001.wav,279404,audio transcript three thousand
Query Format¶
A Base64-encoded string of the bytes of the audio as a 16kHz .wav file
Prediction Format¶
A string, representing the predicted transcript for the audio.
Installing Python¶
Usage of SINGA-Auto requires Python 3.6. Specifically, you’ll need the command python
to point to a Python 3.6 program, and pip
to point to PIP for that Python 3.6 installation.
To achieve this, we recommend using Conda with a Python 3.6 environment as per the instructions below:
Install the latest version of miniconda
Run the following commands on shell:
conda create --name singa_auto python=3.6Every time you need to use
python
orpip
for SINGA-Auto, run the following command on shell:conda activate singa_auto
Otherwise, you can refer to these links below on installing Python natively:
Installing Kubernetes¶
Usage of SINGA-Auto in Kubernetes mode requires Kubernetes 1.15+.
To achieve this, we recommend the instructions below:
Install kubelet kubeadm kubectl
apt-get install -y kubelet kubeadm kubectl --allow-unauthenticatedClose swap
swapoff -aConfig cri and change docker mode to systemd, reference to Kubernetes Container runtimes
Edit /etc/default/kubelet
Environment= KUBELET_EXTRA_ARGS=–cgroup-driver=systemd
Reset kubeadm, maybe not necessary
kubeadm resetInit k8s service, use your own host ip and the node name you want
kubeadm init --kubernetes-version=1.15.1 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=YOURHOSTIP --node-name=YOURNODENAME --ignore-preflight-errors=ImagePullAdd Kubernetes config to current user
mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/configIf just a single node, set master node as worker node
kubectl taint nodes --all node-role.kubernetes.io/master-Install flannel from github
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.ymlConfig role
kubectl create clusterrolebinding add-on-cluster-admin --clusterrole=cluster-admin --serviceaccount=default:defaultNodeport range setting
sudo vim /etc/kubernetes/manifests/kube-apiserver.yamlset “- –service-node-port-range=1-65535” in spec.containers.command node
Otherwise, you can refer to these links below on installing Kubernetes:
Model Development Guide¶
SINGA-Auto leverages on a dynamic pool of model templates contributed by Model Developers.
As a Model Developer, you’ll define a Python class that conforms to SINGA-Auto’s base model specification, and
submit it to SINGA-Auto with the singa_auto.client.Client.create_model()
method.
Implementing the Base Model Interface¶
As an overview, your model template needs to provide the following logic for deployment on SINGA-Auto:
- Definition of the space of your model’s hyperparameters (knob configuration)
- Initialization of the model with a concrete set of hyperparameters (knobs)
- Training of the model given a (train) dataset on the local file system
- Evaluation of the model given a (validation) dataset onthe local file system
- Dumping of the model’s parameters for serialization, after training
- Loading of the model with trained parameters
- Making batch predictions with the model, after being trained
Full details of SINGA-Auto’s base model interface is documented at singa_auto.model.BaseModel
.
Your model implementation has to follow a specific task’s specification (see tasks).
To aid your implementation, you can refer to Sample Models.
Testing¶
After implementing your model, you’ll use singa_auto.model.dev.test_model_class()
to test your model.
Refer to its documentation for more details on how to use it, or refer to the sample models’ usage of the method.
Logging¶
utils.logger
in the singa_auto.model
module provides a set of methods to log messages & metrics while your model is training.
These messages & metrics would be displayed on SINGA-Auto Web Admin for monitoring & debugging purposes.
Refer to singa_auto.model.LoggerUtils
for more details.
See also
Dataset Loading¶
utils.dataset
in the singa_auto.model
module provides a simple set of in-built dataset loading methods.
Refer to singa_auto.model.DatasetUtils
for more details.
Defining Hyperparameter Search Space¶
Refer to How Model Tuning Works for the specifics of how you can tune your models on SINGA-Auto.
Sample Models¶
To illustrate how to write models for SINGA-Auto, we have written the following:
- Sample pre-processing logic to convert common dataset formats to SINGA-Auto’s own dataset formats in ./examples/datasets/
- Sample models in ./examples/models/
Example: Testing Models for IMAGE_CLASSIFICATION
¶
Download & pre-process the original Fashion MNIST dataset to the dataset format specified by
IMAGE_CLASSIFICATION
:python examples/datasets/image_files/load_fashion_mnist.py
Install the Python dependencies for the sample models:
pip install scikit-learn==0.20.0 pip install tensorflow==1.12.0
Test the sample models in
./examples/models/image_classification
:python examples/models/image_classification/SkDt.py python examples/models/image_classification/TfFeedForward.py
Example: Testing Models for POS_TAGGING
¶
Download & pre-process the subsample of the Penn Treebank dataset to the dataset format specified by
POS_TAGGING
:python examples/datasets/corpus/load_sample_ptb.py
Install the Python dependencies for the sample models:
pip install torch==0.4.1
Test the sample models in
./examples/models/pos_tagging
:python examples/models/pos_tagging/BigramHmm.py python examples/models/pos_tagging/PyBiLstm.py
Configuring the Model’s Environment¶
Your model will be run in Python 3.6 with the following Python libraries pre-installed:
requests==2.20.0 numpy==1.14.5 Pillow==7.1.0
Additionally, you’ll specify a list of Python dependencies to be installed for your model,
prior to model training and inference. This is configurable with the dependencies
option
during model creation. These dependencies will be lazily installed on top of the worker’s Docker image before your model’s code is executed.
If the model is to be run on GPU, SINGA-Auto would map dependencies to their GPU-supported versions, if supported.
For example, { 'tensorflow': '1.12.0' }
will be installed as { 'tensorflow-gpu': '1.12.0' }
.
SINGA-Auto could also parse specific dependency names to install certain non-PyPI packages.
For example, { 'singa': '1.1.1' }
will be installed as singa-cpu=1.1.1
or singa-gpu=1.1.1
using conda
.
Refer to the list of officially supported dependencies below. For dependencies that are not listed, they will be installed as PyPI packages of the specified name and version.
Dependency | Installation Command |
tensorflow |
pip install tensorflow==${ver} or pip install tensorflow-gpu==${ver} |
singa |
conda install -c nusdbsystem singa-cpu=${ver} or conda install -c nusdbsystem singa-gpu=${ver} |
Keras |
pip install Keras==${ver} |
scikit-learn |
pip install scikit-learn==${ver} |
torch |
pip install torch==${ver} |
Alternatively, you can build a custom Docker image that extends rafikiai/rafiki_worker
,
installing the required dependencies for your model. This is configurable with docker_image
option
during model creation.
See also
singa_auto.client.Client.create_model()
Your model should be GPU-sensitive based on the environment variable CUDA_AVAILABLE_DEVICES
(see here).
If CUDA_AVAILABLE_DEVICES
is set to -1
, your model should simply run on CPU.
You can assume that your model has exclusive access to the GPUs listed in CUDA_AVAILABLE_DEVICES
.
How Model Tuning Works¶
Traditionally, getting the best performing model on a dataset involves involves tedious manual hyperparameter tuning. On SINGA-Auto, model hyperparameter tuning is automated by conducting multiple trials in a train job.
Over the trials, the model is initialized with different hyperparameters (knobs), trained and evaluated. A hyperparameter tuning advisor on SINGA-Auto ingests the validation scores from these trials to suggest better hyperparameters for future trials, to maximise performance of a model on the dataset. At the very end of the train job, SINGA-Auto could deploy the best-scoring trials for predictions.
Defining Hyperparameter Search Space¶
You’ll define a search space of hyperparameters (knob configuration) in a declarative manner with the static method singa_auto.model.BaseModel.get_knob_config()
.
The method should return a mapping of hyperparameter names (knob names) to hyperparameter specifications (knob specifications).
A hyperparameter specification is an instance of a class that extends singa_auto.model.BaseKnob
, which is limited to any of the following:
singa_auto.model.FixedKnob
singa_auto.model.CategoricalKnob
singa_auto.model.FloatKnob
singa_auto.model.IntegerKnob
singa_auto.model.PolicyKnob
singa_auto.model.ArchKnob
Refer to their documentation for more details on each type of knob specification, and refer to Sample Models to see examples of how knob configurations are declared.
Model Policies¶
singa_auto.model.PolicyKnob
is a special type of knob specification that allows SINGA-Auto to configure the behaviour of a model on a trial basis.
In a modern model hyperparameter tuning scheme, a model tends to switch between different “modes”, or so we call policies. For example, when you tune your model manually, you might want the model to do early-stopping for the first e.g. 100 trials, then conduct a final trial for a full e.g. 300 epochs. As such, the concept of model policies in SINGA-Auto enables SINGA-Auto’s tuning advisor to externally configure your model to switch between these “modes”.
Your model communicates to SINGA-Auto which policies it supports by adding PolicyKnob(policy_name)
to your model’s knob_configuration.
On the other hand, during training, SINGA-Auto configures the activation of the model’s policies on a trial basis
by realising the values of PolicyKnob
to either True
(activated) or False
(not activated).
For example, if SINGA-Auto’s tuning scheme for your model requires your model to engage in e.g. early-stopping for all trials except for the final trial,
if your model has { 'early_stop': PolicyKnob('EARLY_STOP'), ... }
, SINGA-Auto will pass early_stop=False
for just the final trial as part of its knobs, and
pass early_stop=True
for all other trials. Your model would situationally do early-stopping based on the value of the knob early-stop.
Below is the list of officially recognized model policies:
Policy | Description |
---|---|
SHARE_PARAMS |
Whether model should load the shared parameters passed in train() |
EARLY_STOP |
Whether model should stop training early in train() , e.g. with use of early stopping or reduced no. of epochs |
SKIP_TRAIN |
Whether model should skip training its parameters |
QUICK_EVAL |
Whether model should stop evaluation early in evaluate() , e.g. by evaluating on only a subset of their
validation dataset |
DOWNSCALE |
Whether a smaller version of the model should be constructed e.g. with fewer layers |
Model Tuning Schemes¶
At a model level, SINGA-Auto automatically selects the appropriate tuning scheme (advisor) based on the composition of the model’s knob configuration and the incoming train job’s budget.
Specifically, it employs the following rules, in the given order, to select the type of advisor to use:
Rule | Tuning Scheme | |
---|---|---|
Only
PolicyKnob , FixedKnob |
Only conduct a single trial | |
Only
PolicyKnob , FixedKnob ,FloatKnob , IntegerKnob ,CategoricalKnob , with policySHARE_PARAMS |
Hyperparameter tuning with Bayesian Optimization & cross-trial parameter sharing.
Share globally best-scoring parameters across workers in a epsilon greedy manner.
Optionally employ early stopping (
EARLY_STOP policy) for all trials.More details at Hyperparameter Tuning with Bayesian Optimization & Parameter Sharing.
|
|
Only
PolicyKnob , FixedKnob ,FloatKnob , IntegerKnob ,CategoricalKnob |
Hyperparameter tuning with Bayesian Optimization. Optionally employ early stopping
(
EARLY_STOP policy) before the last 1h, and perform standard trials during the last 1h. |
|
Only
PolicyKnob , FixedKnob ,ArchKnob , with policiesSHARE_PARAMS , EARLY_STOP SKIP_TRAIN , QUICK_EVAL DOWNSCALE , and TIME_HOURS budget>= 12h
|
Architecture tuning with cell-based
It conducts ENAS architecture search before the last 12h, then performs the final
training of the best architectures found in the last 12h.
More details at Architecture Tuning with ENAS.
|
|
All others | Hyperparameter tuning with uniformly random knobs |
The following subsections briefly explain how to leverage on the various model tuning schemes on SINGA-Auto.
Hyperparameter Tuning with Bayesian Optimization¶
To tune the hyperparameters of your model, where the hyperparameters are simply floats, integers or categorical, use singa_auto.model.FixedKnob
,
singa_auto.model.CategoricalKnob
, singa_auto.model.FloatKnob
& singa_auto.model.IntegerKnob
.
Hyperparameter Tuning with Bayesian Optimization & Early Stopping¶
To additionally employ early stopping during hyperparameter tuning to speed up the tuning process, declare an extra singa_auto.model.PolicyKnob
of
the EARLY_STOP
policy (see Model Policies).
Refer to the sample model ./examples/models/image_classification/TfFeedForward.py.
Hyperparameter Tuning with Bayesian Optimization & Parameter Sharing¶
To additionally have best-scoring model parameters shared between trials to speed up the tuning process
(as outlined in “SINGA-Auto: Machine Learning as an Analytics Service System”),
declare an extra singa_auto.model.PolicyKnob
of the SHARE_PARAMS
policy (see Model Policies).
Refer to the sample model ./examples/models/image_classification/PyDenseNetBc.py and its corresponding usage script ./examples/scripts/image_classification/train_densenet.py to better understand how to do parameter sharing.
Architecture Tuning with ENAS¶
To tune the architecture for your model with the modern architecture search algorithm
“Efficient Neural Architecture Search via Parameter Sharing” (ENAS),
declare a singa_auto.model.ArchKnob
and offer the policies SHARE_PARAMS
, EARLY_STOP
, SKIP_TRAIN
, QUICK_EVAL
and DOWNSCALE
(see Model Policies).
Specifically, you’ll need your model to support parameter sharing, stopping training early, skipping the training step, evaluating
on a subset of the validation dataset, and downscaling the model e.g. to use fewer layers. These policies are critical in
the speed & performance of ENAS. See Deep Dive on ENAS to understand more about SINGA-Auto’s implementation of ENAS.
Refer to the sample model ./examples/models/image_classification/TfEnas.py and its corresponding usage script ./examples/scripts/image_classification/run_enas.py to better understand how to do architecture tuning.
Deep Dive on ENAS¶
The ENAS paper outlines a new methodology for automatic neural network construction, speeding up the original Neural Architecture Search (NAS) methodology by 1000x without affecting its ability to search for a competitive architecture. The authors made the crucial observation that 2 different architectures would share a common subgraph, and the model parameters in that subgraph could be reused across trials without having to re-train these parameters from scratch every trial.
The following is an overview of how ENAS works. As explained in the ENAS paper, during an ENAS search for best CNN architecture (ENAS Search), there is an alternation between 2 phases: training of the ENAS CNN’s shared parameters (CNN Train Phase), and the training of the ENAS controller (Controller Train Phase). While CNN parameters are carried over the phases, the CNN’s shared parameters are not trained during Controller Train Phases. After ENAS Search is done, there is a final training of the best CNN architecture found (ENAS Train), this time initializing its CNN parameters from scratch,
On SINGA-Auto, we’ve replicated the Cell-Based ENAS controller for image classification as one of SINGA-Auto’s tuning scheme and
a SINGA-Auto model TfEnas
, with very close reference to author’s code. In this specific setup for ENAS,
ENAS Search is done with the construction of a single supergraph of all possible architectures,
while ENAS Train is done with the construction of a fixed graph of the best architecture (with slight architectural differences from ENAS Search).
Each CNN Train Phase involves training the CNN for 1 epoch, while within each Controller Train Phase, the controller is trained for 30 steps.
In each controller step, 10 architectures are sampled from the controller, evaluated on the ENAS CNN by dynamically changing its architecture,
and losses based on validation accuracies are back-propagated in the controller to update the controller’s parameters.
Each validation accuracy is computed on only a batch of the validation dataset.
The alternation between CNN Train Phase and Controller Train Phase happens for X
cycles during ENAS Search, and close to
the end of training, during ENAS Train, architecture samples with highest validation accuracies, this time computed on the full validation dataset,
would be trained from scratch to arrive at final best models.
We’ve generalized the ENAS controller, its architecture encoding scheme and its overall tuning scheme on SINGA-Auto, such that SINGA-Auto models can leverage on architecture tuning with a flexible architecture encoding, and SINGA-Auto’s application developers can train with these models in an end-to-end manner.
We’ve also devised a simple, yet effective strategy to run ENAS in a distributed setting. When given multiple GPUs, SINGA-Auto performs ENAS locally at each worker in a train job, with these workers sharing a central ENAS controller.
Developer Guide¶
Setup & Configuration¶
Quick Setup¶
We assume development or deployment in a MacOS or Linux environment.
As for User:
Note
If you’re not a user in the docker
group, you’ll instead need sudo
access and prefix every bash command with sudo -E
.
- Install Kubernetes 1.15+ (see Installing Kubernetes) if using Kubernetes.
- Install Python 3.6 such that the
python
andpip
commands point to the correct installation of Python 3.6 (see Installing Python). - pip install singa-auto==0.3.4
- start the service using : sago stop the service using : sastop clean the service using : saclean
As for Developer
Note
If you’re not a user in the docker
group, you’ll instead need sudo
access and prefix every bash command with sudo -E
.
Install Kubernetes 1.15+ (see Installing Kubernetes) if using Kubernetes.
Install Python 3.6 such that the
python
andpip
commands point to the correct installation of Python 3.6 (see Installing Python).Clone the project at https://github.com/nusdbsystem/singa-auto (e.g. with Git)
In file web/src/HTTPconfig.js, there are parameters specifying backend server and port that Web UI interacts with. Developers have to modify the following values to conform with their server setting:
const adminHost = '127.0.0.1' # Singa-Auto server address, in str format const adminPort = '3000' # Singa-Auto server port, in str format const LocalGateways = {... // NOTE: must append '/' at the end! singa_auto: "http://127.0.0.1:3000/", # http://<ServerAddress>:<Port>/, in str format } HTTPconfig.adminHost = `127.0.0.1` # Singa-Auto server address, in str format HTTPconfig.adminPort = `3000` # Singa-Auto server port, in str format
By using 127.0.0.1 as Singa-Auto server address, it means Singa-Auto will be deployed on your ‘local’ machine.
If using docker, Setup SINGA-Auto’s complete stack with the setup script:
bash scripts/docker_swarm/start.sh
If using kubernetes, Setup SINGA-Auto’s complete stack with the setup script:
bash scripts/kubernetes/start.sh
SINGA-Auto Admin and SINGA-Auto Web Admin will be available at 127.0.0.1:3000
and 127.0.0.1:3001
respectively, or the server specified as ‘IP_ADRESS’ in scripts/docker_swarm/.env.sh or scripts/kubernetes/.env.sh.
If using docker, to destroy SINGA-Auto’s complete stack:
bash scripts/docker_swarm/stop.sh
If using kubernetes, to destroy SINGA-Auto’s complete stack:
bash scripts/kubernetes/stop.sh
Updating docker images¶
bash scripts/kubernetes/build_images.sh
or
bash scripts/docker_swarm/build_images.sh bash scripts/push_images.sh
By default, you can read logs of SINGA-Auto Admin & any of SINGA-Auto’s workers
in ./logs
directory at the root of the project’s directory of the master node.
Scaling SINGA-Auto¶
SINGA-Auto’s default setup runs on a single machine and only runs its workloads on CPUs.
SINGA-Auto’s model training workers run in Docker containers that extend the Docker image nvidia/cuda:9.0-runtime-ubuntu16.04
,
and are capable of leveraging on CUDA-Capable GPUs
Scaling SINGA-Auto horizontally and enabling GPU usage involves setting up Network File System (NFS) at a common path across all nodes, installing & configuring the default Docker runtime to nvidia for each GPU-bearing node. If using docker swarm, putting all these nodes into a single Docker Swarm. If using kubernetes, putting all these nodes into kubernetes.
See also
To run SINGA-Auto on multiple machines with GPUs on docker swarm, do the following:
If SINGA-Auto is running, stop SINGA-Auto with
bash scripts/docker_swarm/stop.sh
Have all nodes leave any Docker Swarm they are in
Set up NFS such that the master node is a NFS host, other nodes are NFS clients, and the master node shares an ancestor directory containing SINGA-Auto’s project directory. Here are instructions for Ubuntu
All nodes should be in a common network. On the master node, change
DOCKER_SWARM_ADVERTISE_ADDR
in the project’s.env.sh
to the IP address of the master node in the network that your nodes are inFor each node (including the master node), ensure the firewall rules allow TCP & UDP traffic on ports 2377, 7946 and 4789
For each node that has GPUs:
6.1. Install NVIDIA drivers for CUDA 9.0 or above
6.3. Set the
default-runtime
of Docker to nvidia (e.g. instructions here)On the master node, start SINGA-Auto with
bash scripts/docker_swarm/start.sh
For each worker node, have the node join the master node’s Docker Swarm
On the master node, for each node (including the master node), configure it with the script:
bash scripts/docker_swarm/setup_node.sh
To run SINGA-Auto on multiple machines with GPUs on kubernetes, do the following:
If SINGA-Auto is running, stop SINGA-Auto with
bash scripts/kubernetes/stop.sh
Put all nodes you need in kubernetes cluster, reference to kubeadm join
Set up NFS such that the master node is a NFS host, other nodes are NFS clients, and the master node shares an ancestor directory containing SINGA-Auto’s project directory. Here are instructions for Ubuntu
Change
KUBERNETES_ADVERTISE_ADDR
in the project’sscripts/kubernetes/.env.sh
to the IP address of the master node in the network that your nodes are inFor each node that has GPUs:
5.1. Install NVIDIA drivers for CUDA 9.0 or above
5.3. Set the
default-runtime
of Docker to nvidia (e.g. instructions here)5.4. Install nvidia-device-plugin, use command “kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.10/nvidia-device-plugin.yml” on the master node
- On the master node, start SINGA-Auto with
bash scripts/kubernetes/start.sh
Exposing SINGA-Auto Publicly¶
SINGA-Auto Admin and SINGA-Auto Web Admin runs on the master node.
If using docker swarm, change SINGA_AUTO_ADDR
in .env.sh
to the IP address of the master node
in the network you intend to expose SINGA-Auto in.
If using kubernetes, change SINGA_AUTO_ADDR
in scripts/kubernetes/.env.sh
to the IP address of the master node
in the network you intend to expose SINGA-Auto in.
Example:
export SINGA_AUTO_ADDR=172.28.176.35
Re-deploy SINGA-Auto with step 4, changing Singa-Auto server address to conform. SINGA-Auto Admin and SINGA-Auto Web Admin will be available at that IP address, over ports 3000 and 3001 (by default), assuming incoming connections to these ports are allowed.
Before you expose SINGA-Auto to the public, it is highly recommended to change the master passwords for superadmin, server and the database (located in `.env.sh` as `POSTGRES_PASSWORD`, `APP_SECRET` & `SUPERADMIN_PASSWORD`)
Reading SINGA-Auto’s logs¶
By default, you can read logs of SINGA-Auto Admin & any of SINGA-Auto’s workers
in ./logs
directory at the root of the project’s directory of the master node.
Troubleshooting¶
Q: There seems to be connectivity issues amongst containers across nodes!
Development¶
Before running any individual scripts, make sure to run the shell configuration script:
source scripts/docker_swarm/.env.sh
In ‘.env.sh’, the default server is fixed by ‘IP_ADRESS=127.0.0.1’, which means that Singa-Auto will be using the ‘local’ machine as the server. HOST_WORKDIR_PATH by default is the current directory, and ‘SINGA_AUTO_VERSION’ is set to ‘dev’ for development mode, otherwise, a specific version should be given.
Refer to SINGA-Auto’s Architecture and Folder Structure for a developer’s overview of SINGA-Auto.
Testing Latest Code Changes¶
To test the lastet code changes e.g. in the dev
branch, you’ll need to do the following:
- Build SINGA-Auto’s images on each participating node (the quickstart instructions pull pre-built SINGA-Auto’s images from Docker Hub):
bash scripts/docker_swarm/build_images.sh
- Purge all of SINGA-Auto’s data (since there might be database schema changes):
bash scripts/clean.sh
Making a Release to master
¶
In general, before making a release to master
from dev
, ensure that the code at dev
is stable & well-tested:
- Consider running all of SINGA-Auto’s tests (see Running SINGA-Auto’s Tests). Remember to re-build the Docker images to ensure the latest code changes are reflected (see Testing Latest Code Changes)
- Consider running all of SINGA-Auto’s example models in ./examples/models/
- Consider running all of SINGA-Auto’s example usage scripts in ./examples/scripts/
- Consider running all of SINGA-Auto’s example dataset-preparation scripts in ./examples/datasets/
- Consider visiting SINGA-Auto Web Admin and manually testing it
- Consider building SINGA-Auto’s documentation site and checking if the documentation matches the codebase (see Building SINGA-Auto’s Documentation)
After merging dev
into master
, do the following:
Build & push SINGA-Auto’s new Docker images to SINGA-Auto’s own Docker Hub account:
bash scripts/docker_swarm/build_images.sh bash scripts/push_images.shGet Docker Hub credentials from @nginyc.
Build & deploy SINGA-Auto’s new documentation to
SINGA-Auto's microsite powered by Github Pages
. Run the following:bash scripts/docker_swarm/build_docs.sh latestFinally, commit all resultant generated documentation changes and push them to dev branch. The latest documentation should be reflected at https://singa-auto.readthedocs.io/en/latest/.
Refer to documentation on Github Pages <https://guides.github.com/features/pages/> to understand more on how this works.
Draft a new Singa-Auto Github release. Make sure to include the list of changes relative to the previous release.
Subsequently, you’ll need to increase SINGA_AUTO_VERSION
in .env.sh
to reflect a new release.
Managing SINGA-Auto’s DB¶
By default, you can connect to the PostgreSQL DB using a PostgreSQL client (e.g Postico) with these credentials:
SINGA_AUTO_ADDR=127.0.0.1 POSTGRES_EXT_PORT=5433 POSTGRES_USER=singa_auto POSTGRES_DB=singa_auto POSTGRES_PASSWORD=singa_auto
Connecting to SINGA-Auto’s Redis¶
You can connect to Redis DB with rebrow:
bash scripts/docker_swarm/test/start_rebrow.sh
…with these credentials by default:
SINGA_AUTO_ADDR=127.0.0.1 REDIS_EXT_PORT=6380
Pushing Images to Docker Hub¶
To push the SINGA-Auto’s latest images to Docker Hub (e.g. to reflect the latest code changes):
bash scripts/push_images.sh
Building SINGA-Auto’s Documentation¶
SINGA-Auto uses Sphinx documentation and hosts the documentation with Github Pages on the dev branch. Build & view SINGA-Auto’s Sphinx documentation on your machine with the following commands:
bash scripts/docker_swarm/build_docs.sh latest open docs/index.html
Running SINGA-Auto’s Tests¶
SINGA-Auto uses pytest.
First, start SINGA-Auto.
Then, run all integration tests with:
pip install -r singa_auto/requirements.txt pip install -r singa_auto/advisor/requirements.txt bash scripts/docker_swarm/test/test.sh
Troubleshooting¶
While building SINGA-Auto’s images locally, if you encounter errors like “No space left on device”, you might be running out of space allocated for Docker. Try one of the following:
# Prunes dangling images docker system prune --all# Delete all containers docker rm $(docker ps -a -q) # Delete all images docker rmi $(docker images -q)
From Mac Mojave onwards, due to Mac’s new privacy protection feature, you might need to explicitly give Docker Full Disk Access, restart Docker, or even do a factory reset of Docker.
Using SINGA-Auto Admin’s HTTP interface¶
To make calls to the HTTP endpoints of SINGA-Auto Admin, you’ll need first authenticate with email & password against the POST /tokens endpoint to obtain an authentication token token, and subsequently add the Authorization header for every other call:
Authorization: Bearer {{token}}
Users of SINGA-Auto¶
There are 4 types of users on SINGA-Auto:
Application Developers create, manage, monitor and stop model training and serving jobs on SINGA-Auto. They are the primary users of SINGA-Auto - they upload their datasets onto SINGA-Auto and create model training jobs that train on these datasets. After model training, they trigger the deployment of these trained ML models as a web service that Application Users interact with. While their model training and serving jobs are running, they administer these jobs and monitor their progress.
Application Users send queries to trained models exposed as a web service on SINGA-Auto, receiving predictions back. Not to be confused with Application Developers, these users may be developers that are looking to conveniently integrate ML predictions into their mobile, web or desktop applications. These application users have consumer-provider relationships with the aforementioned ML application developers, having delegated the work of training and deploying ML models to them.
Model Developers create, update and delete model templates to form SINGA-Auto’s dynamic repository of ML model templates. These users are key external contributors to SINGA-Auto, and represent the main source of up-to-date ML expertise on SINGA-Auto, playing a crucial role in consistently expanding and diversifying SINGA-Auto’s underlying set of ML model templates for a variety of ML tasks. Coupled with SINGA-Auto’s modern ML model tuning framework on SINGA-Auto, these contributions heavily dictate the ML performance that SINGA-Auto provides to Application Developers.
SINGA-Auto Admins create, update and remove users on SINGA-Auto. They regulate access of the other types of users to a running instance of SINGA-Auto.
SINGA-Auto’s Architecture¶
SINGA-Auto’s system architecture consists of 3 static components, 2 central databases, 4 types of dynamic components, and 1 client-side SDK, which can be illustrated with a 3-layer architecture diagram.
Static Stack of SINGA-Auto¶
SINGA-Auto’s static stack consists of the following:
SINGA-Auto Admin (Python/Flask) is the centrepiece of SINGA-Auto. It is a multi-threaded HTTP server which presents a unified REST API over HTTP that fully administrates the SINGA-Auto instance. When users send requests to SINGA-Auto Admin, it handles these requests by accordingly modifying SINGA-Auto’s Metadata Store or deploying/stopping the dynamic components of SINGA-Auto’s stack (i.e. workers for model training & serving).
SINGA-Auto Metadata Store (PostgreSQL) is SINGA-Auto’s centralized, persistent database for user metadata, job metadata, worker metadata and model templates.
SINGA-Auto Redis (Redis) is SINGA-Auto’s temporary in-memory store for the implementation of fast asynchronous cross-worker communication, in a way that decouples senders from receivers. It synchronizes the back-and-forth of queries & predictions between multiple SINGA-Auto Inference Workers and a single SINGA-Auto Predictor for an Inference Job.
SINGA-Auto Web Admin (NodeJS/ExpressJS) is a HTTP server that serves SINGA-Auto’s web front-end to users, allowing Application Developers to survey their jobs on a friendly web GUI.
SINGA-Auto Client (Python) is SINGA-Auto’s client-side Python SDK to simplify communication with Admin.
Dynamic Stack of SINGA-Auto¶
On the other hand, SINGA-Auto’s dynamic stack consists of a dynamic pool of workers. Internally within SINGA-Auto’s architecture, Admin adopts master-slave relationships with these workers, managing the deployment and termination of these workers in real-time depending on Train Job and Inference Job requests, as well as the stream of events it receives from its workers. When a worker is deployed, it is configured with the identifier for an associated job, and once it starts running, it would first initialize itself by pulling the job’s metadata from Metadata Store before starting on its task.
The types of workers are as follows:
SINGA-Auto Advisor Workers (Python) proposes knobs & training configuration for Train Workers. For each model, there is a single Advisor Worker centrally orchestrating tuning of the model together with multiple Train Workers.
SINGA-Auto Train Workers (Python) train models for Train Jobs by conducting Trials.
SINGA-Auto Predictors (Python/Flask) are multi-threaded HTTP servers that receive queries from Application Users and respond with predictions as part of an Inference Job. It does this through producer-consumer relationships with multiple SINGA-Auto Inference Workers. If necessary, it performs model ensembling on predictions received from different workers.
SINGA-Auto Inference Workers (Python) serve models for Inference Jobs. In a single Inference Job, there could be multiple Inference Workers concurrently making predictions for a single batch of queries.
Container Orchestration Strategy¶
All of SINGA-Auto’s components’ environment and configuration has been fully specified as a replicable, portable Docker image publicly available as Dockerfiles and on SINGA-Auto’s own Docker Hub account.
When an instance of SINGA-Auto is deployed on the master node, a Docker Swarm is initialized and all of SINGA-Auto’s components run within a single Docker routing-mesh overlay network. Subsequently, SINGA-Auto can be horizontally scaled by adding more worker nodes to the Docker Swarm. Dynamically-deployed workers run as Docker Swarm Services and are placed in a resource-aware manner.
Distributed File System Strategy¶
All components depend on a shared file system across multiple nodes, powered by Network File System (NFS). Each component written in Python continually writes logs to this shared file system.
Folder Structure¶
singa_auto/
SINGA-Auto’s Python package
admin/
SINGA-Auto’s static Admin component
advisor/
SINGA-Auto’s advisors
client/
SINGA-Auto’s client-side SDK
See also
singa_auto.client
worker/
SINGA-Auto’s train, inference & advisor workers
predictor/
SINGA-Auto’s predictor
meta_store/
Abstract data access layer for singa_auto’s main metadata store (backed by PostgreSQL)
param_store/
Abstract data access layer for SINGA-Auto’s store of model parameters (backed by filesystem)
data_store/
Abstract data access layer for SINGA-Auto’s store of datasets (backed by filesystem)
cache/
Abstract data access layer for SINGA-Auto’s temporary store of model parameters, train job metadata and queries & predictions in train & inference jobs (backed by Redis)
container/
Abstract access layer for dynamic deployment of workers
utils/
Collection of SINGA-Auto-internal utility methods (e.g. for logging, authentication)
model/
Definition of abstract
singa_auto.model.BaseModel
that all SINGA-Auto models should extend, programming abstractions used in model development, as well as a collection of utility methods for model developers in the implementation of their own modelsconstants.py
SINGA-Auto’s programming abstractions & constants (e.g. valid values for user types, job statuses)
web/
SINGA-Auto’s Web Admin component
dockerfiles/
Stores Dockerfiles for customized components of SINGA-Auto
examples/
Sample usage code for SINGA-Auto, such as standard models, datasets dowloading and processing codes, sample image/question data, and quick test code
docs/
Source documentation for SINGA-Auto (e.g. Sphinx documentation files)
test/
Test code for SINGA-Auto
scripts/
Shell & python scripts for initializing, starting and stopping various components of SINGA-Auto’s stack
docker_swarm/
Containing server environment settings and scripts for running Docker
kubernetes/
Containing server environment settings and scripts for running Kubernetes
.base_env.sh
Stores configuration variables for SINGA-Auto
log_minitor/
Dockerfile and configurations for elasticsearch and logstash
singa_auto_scheduler/
Dockerfiles and configurations for scheduler and monitor
Acknowledgements¶
The research is supported by the National Research Foundation, Prime Minister’s Office, Singapore under its National Cybersecurity R&D Programme (Grant No. NRF2016NCR-NCR002-020), National Natural Science Foundation of China (No. 61832001), National Key Research and Development Program of China (No. 2017YFB1201001), China Thousand Talents Program for Young Professionals (3070011 181811).