Quick Start (Application Users)

As an App User, you can make predictions on models deployed on SINGA-Auto.

Making a single prediction

Your app developer should have created an inference job and shared predictor_host, the host at which you can send queries to and receive predictions over HTTP.

Send a POST /predict to predictor_host with a body of the following format in JSON:

{
    "query": <query>
}

…where the format of <query> depends on the associated task (see tasks).

The body of the response will be of the following format in JSON:

{
    "prediction": <prediction>
}

…where the format of <prediction> depends on the associated task.

Example:

If predictor_host is 127.0.0.1:30000, run the following in Python:

predictor_host = '127.0.0.1:30000'
query_path = 'examples/data/image_classification/fashion_mnist_test_1.png'

# Load query image as 3D list of pixels
from singa_auto.model import utils
[query] = utils.dataset.load_images([query_path]).tolist()

# Make request to predictor
import requests
import json
res = requests.post('http://{}/predict'.format(predictor_host), json={ 'query': query })
print(res.json())

Output:

{'prediction': [0.9364003576825639, 1.016065009906697e-08, 0.0027604885399341583, 0.00014587241457775235, 6.018594376655528e-06, 1.042887332047826e-09, 0.060679372351310566, 2.024707311532037e-11, 7.901770004536957e-06, 1.5299328026685544e-08],
'predictions': []}

Prediction for QuestionAnswering

The query question should be uploaded by the following format
data={"questions": ["How long individuals are contagious?"]}
res = requests.post('http://{}/predict'.format(predictor_host), json=data)
To print out the prediction result, you should use ‘res.text’
print(res.text)

Prediction for SpeechRecognition

The query data is passed using the following steps

data = [‘data/ldc93s1/ldc93s1/LDC93S1.wav’] data = json.dumps(data) res = requests.post(’http://{}/predict’.format(predictor_host), json=data[0])

To print out the prediction result, you should use ‘res.text’
print(res.text)

If the SINGA-Auto instance is deployed with Kubernetes, all the inference job are at the default Ingress port 3005 with the format of <host>:3005/<app>, where <host> is the host name of the SINGA-Auto instance, and <app> is the name of the application prodvided when we submit train jobs.

Making batch predictions

Similar to making a single prediction, but use the queries attribute instead of query in your request and pass an array of queries instead.

Example:

If predictor_host is 127.0.0.1:30000, run the following in Python:

predictor_host = '127.0.0.1:30000'
query_paths = ['examples/data/image_classification/fashion_mnist_test_1.png',
            'examples/data/image_classification/fashion_mnist_test_2.png']

# Load query image as 3D list of pixels
from singa_auto.model import utils
queries = utils.dataset.load_images(query_paths).tolist()

# Make request to predictor
import requests
res = requests.post('http://{}/predict'.format(predictor_host), json={ 'queries': queries })
print(res.json())

Output:

{'prediction': None,
'predictions': [[0.9364002384732744, 1.0160608354681244e-08, 0.0027604878414422274, 0.0001458720798837021, 6.018587100697914e-06, 1.0428869989809186e-09, 0.06067946175827773, 2.0247028012509993e-11, 7.901745448180009e-06, 1.5299294275905595e-08], [0.866741563402005, 5.757699909736402e-05, 0.0006144539802335203, 0.03480150588134776, 3.4249271266162395e-05, 1.3578344004727683e-09, 0.09774905198545598, 6.071191726436664e-12, 1.5324986861742218e-06, 1.583319586551113e-10]]}