Using first principles to designing and implement a API to wrap ML functionality.
Repository
📬 Receive new lessons straight to your inbox (once a month) and join 20K+ developers in learning how to responsibly deliver value with ML.
Intuition
So far our workflows have involved directly running functions from our Python scripts and more recently, using the CLI application to quickly execute commands. But not all of our users will want to work at the code level or even download the package as we would need to for the CLI app. Instead, many users will simply want to use the functionality of our model and inspect the relevant details around it. To address this, we can develop an application programming interface (API) that provides the appropriate level of abstraction that enables our users to interact with the underlying data in our application.
The interactions in our situation involve the client (users, other applications, etc.) sending a request to the server (our application) and receiving a response in return.

Request
We'll first take a look at the different components of a request:
URI
A uniform resource identifier (URI) is an identifier for a specific resource.
https://localhost:5000/users/{userId}/models/{modelId}/?filter=completed#details
Parts of the URI | Description |
---|---|
scheme | defines which protocol to use. |
domain | address of the website. |
port | communication endpoint. If not defined, it's usually 80 for HTTP and 443 for HTTPS. |
path | location of the resource of interest. |
query string | parameters sent to endpoint to identify specific resources. |
anchor | specific location inside an HTML page. |
Parts of the path | Description |
---|---|
/users |
collection resource of all users |
/users/{userID} |
single resource for a specific user userId |
/models |
sub-collection resource models for the specific user userID |
/models/{modelID} |
single resource for the userID 's models sub-collection |
userID and modelId |
path parameters |
filter |
query parameter |
Method
The method is the operation to execute on the specific resource defined by the URI. There are many possible methods to choose from, but here are the four most popular, which are often referred to as CRUD because they allow you to Create, Read, Update and Delete.
GET
: get a resourcePOST
: create or update a resourcePUT/PATCH
: create or update a resourceDELETE
: delete a resource
Note
You could use either the POST
or PUT
request method to create and modify resources but the main difference is that PUT
is idempotent which means you can call the method repeatedly and it'll produce the same state every time. Whereas, calling POST
multiple times can result in creating multiple instance and so changing the overall state each time.
1 2 3 4 5 6 |
|
We can use cURL to execute our API calls with the following options:
# cURL options
$ curl --help
...
-X, --request HTTP method (ie. GET)
-H, --header headers to be sent to the request (ex. authentication)
-d, --data data to POST, PUT/PATCH, DELETE (usually JSON)
...
For example, if we want to GET all users
:
1 |
|
Headers
Headers contain information about a certain event and are usually found in both the client's request as well as the server's response. It can range from what type of format they'll send and receive, authentication and caching info, etc.
1 2 3 |
|
Body
The body contains information that may be necessary for the request to be processed. It's usually a JSON object sent during POST
, PUT
/PATCH
, DELETE
request methods.
1 2 3 4 |
|
Response
The response we receive from our server is the result of the request we sent. The response also includes headers and a body which should include the proper HTTP status code as well as explicit messages, data, etc.
1 2 3 4 5 6 7 |
|
There are many HTTP status codes to choose from depending on the situation but here are the most common options:
Code | Description |
---|---|
200 OK |
method operation was successful. |
201 CREATED |
POST or PUT method successfully created a resource. |
202 ACCEPTED |
the request was accepted for processing (but processing may not be done). |
400 BAD REQUEST |
server cannot process the request because of a client side error. |
401 UNAUTHORIZED |
you're missing required authentication. |
403 FORBIDDEN |
you're not allowed to do this operation. |
404 NOT FOUND |
the resource you're looking for was not found. |
500 INTERNAL SERVER ERROR |
there was a failure somewhere in the system process. |
501 NOT IMPLEMENTED |
this operation on the resource doesn't exist yet. |
Best practices
When designing our API, there are some best practices to follow:
- URI paths, messages, etc. should be as explicit as possible. Avoid using cryptic resource names, etc.
- Nouns not verbs when naming. The request method already takes care of the verb (
GET /users not
GET /get_users
). - Plural nouns (
GET /users/{userId}
notGET /user/{userID}
). - Dashes in URIs for resources and path parameters but use underscores for query parameters (
GET /admin-users/?find_desc=super
). - Return appropriate HTTP and informative messages to the user.
Application
We're going to organize our API under the app directory because in the future we may have additional packages like tagifai
so we don't want our app to be attached to any one package. Our API will be defined in the following scripts:
api.py
: the main script that will include our API initialization and endpoints.schemas.py
: definitions for the different objects we'll use in our resource endpoints.
We'll step through the components in these scripts to show how we'll design our API.
FastAPI
We're going to use FastAPI as our framework to build our API service. There are plenty of other framework options out there such as Flask, Django and even non-Python based options like Node, Angular, etc. FastAPI is a relative newcomer that combines many of the advantages across these frameworks and is maturing quickly and becoming more widely adopted. It's notable advantages include:
- highly performant
- data validation via pydantic
- autogenerated documentation
- dependency injection
- security via OAuth2
Note
Your choice of framework also depends on your team's existing systems and processes. However, with the wide adoption of microservices, we can wrap our specific application is any framework we choose and expose the appropriate resources so all other systems can easily communicate with it.
To show how intuitive and powerful FastAPI is, we could laboriously go through the documentation but instead we'll walk through everything as we cover the components of our own application.
Initialization
The first step is to initialize our API in our app/api.py file by defining metadata like the title, description and version.
1 2 3 4 5 6 7 8 |
|
Our first endpoint is going to be a simple one where we want to show that everything is working as intended. The path for the endpoint will just be /
(when a user visit our base URI) and it'll be a GET
request. This simple endpoint definition is often used as a health check to ensure that our application is indeed up and running properly.
1 2 3 4 5 6 7 8 9 10 11 |
|
We let our application know that the endpoint is at /
through the path operation decorator in line 3 and we simply return a JSON response with the 200 OK
HTTP status code. Let's go ahead and start our application and see what this response looks like!
Note
In our actual api.py
script, you'll notice that even our index function looks different. Don't worry, we're slowly adding components to our endpoints and justifying them along the way.
Launching
We can launch our application with the following command (also saved as a Makefile target as make app
):
1 2 3 4 5 6 |
|
We're using Uvicorn, a fast ASGI server (it can run asynchronous code in a single process) to launch our application. Notice that we only reload on changes to specific directories, as this is to avoid reloading on files that won't impact our application such as log files, etc.
Note
If we want to manage multiple uvicorn workers to enable parallelism in our application, we can use Gunicorn in conjunction with Uvicorn. This will usually be done in a production environment where we'll be dealing with meaningful traffic. I've included a config/gunicorn.py
script with the customizable configuration and we can launch all the workers with the follow command (or make app-prod
):
1 |
|
Requests
Now that we have our application running, we can submit our GET
request using several different methods:
- Visit the endpoint on your browser at http://localhost:5000/
- cURL
1
curl -X GET http://localhost:5000/
- Access endpoints via code. Here we show how to do it with the requests library in Python but it can be done with most popular languages. You can even use an online tool to convert your cURL commands into code!
1 2 3 4 5
import json import requests response = requests.get('http://localhost:5000/') print (json.loads(response.text))
- Directly in the API's autogenerated documentation (which we'll see later).
- Using external tools like Postman, which is great for managed tests that you can save and share with other, etc.
For all of these, we'll see the exact same response from our API:
{ "message": "OK", "status-code": 200, "data": {} }
Decorators
We're going to use decorators to wrap some of our endpoints so we can customize our function's inputs and outputs. In our GET \
request's response above, there was not a whole lot of information about the actual request, so we should append details such as URL, timestamp, etc. But we don't want to do this individually for each endpoint so let's use a decorator to append the request information for every response.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
We're passing in a Request instance in line 6 so we can access information like the request method and URL. Therefore, our endpoint functions also need to have this Request object as an input argument. Once we receive the results from our endpoint function f
, we can append the extra details and return more informative response
. To use this decorator, we just have to wrap our functions accordingly.
1 2 3 4 5 6 7 8 9 10 |
|
{ message: "OK", method: "GET", status-code: 200, timestamp: "2021-02-08T13:19:11.343801", url: "http://localhost:5000/", data: { } }
There are also some built-in decorators we should be aware of. We've already seen the path operation decorator (ex. @app.get("/")
) which defines the path for the endpoint as well as other attributes. There is also the events decorator (@app.on_event()
) which we can use to startup and shutdown our application. For example, we use the startup
event to load all the previous best runs and identify the best run (and model) to use for inference. The advantage of doing this as an Event is that our service won't start until this is ready so no requests will be prematurely processed and error out.
1 2 3 4 5 |
|
Documentation
When we define an endpoint, FastAPI automatically generates some documentation, adhering to OpenAPI standards, based on the function's inputs, typing, outputs, etc. We can access the Swagger UI for our documentation by going to /docs
endpoints on any browser.

Click on an endpoint > Try it out
> Execute
to see what the server's response will look like. Since this was a GET
request without any inputs, our request body was empty but for other method's we'll need to provide some information (we'll illustrate this when we do a POST
request).

You'll notice that our endpoints are organized under sections in the UI. This is because we used tags
when defining our endpoints in the script.
1 2 3 4 5 6 7 8 9 10 |
|
Note
You can also use /redoc
endpoint to view the ReDoc documentation or Postman to execute and manage tests that you can save and share with others.
Resources
When designing the resources for our API service, we need to think about the following questions:
- Who are the users? This will define what resources need to be exposed.
Our users include anyone will want to receive the relevant tags for a given input. They may not necessarily be technical or aware of how machine learning works.
- What functionality do we want to enable our users with?
Though there are many different operations we could enable for our users (optimize, train, delete, update models, etc.), we're going to scope our service by only enabling prediction at this time. However, our Python scripts and the CLI application are available if a developer wants to be able to do more (ie. train a model).
- What are the objects (or entities) that we'll need to build and expose resources around?
We want to be able to explore and use parameters and performance metrics from our trained models.
Query parameters
1 2 3 4 5 6 |
|
filter
here to indicate the subset of performance we care about. We'd include this parameter in our GET
request like so:
1 2 3 |
|
And this will only produce the subset of the performance we indicated through the query parameter.
{
"message": "OK",
"method": "GET",
"status-code": 200,
"timestamp": "2021-03-21T13:12:01.297630",
"url": "http://localhost:5000/performance?filter=overall",
"data": {
"overall": {
"precision": 0.843033473244977,
"recall": 0.597872340425532,
"f1": 0.6821603372348584,
"num_samples": 217
}
}
}
Path parameters
Our next endpoint will be to GET
the parameters for our trained model. This time, we're using a path parameter param
, which is a required field in the URI.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
We can perform our GET
request like so, where the param
is part of the request URI's path as opposed to being part of it's query string.
1 2 3 |
|
And we'd receive a response like this:
{
"message": "OK",
"method": "GET",
"status-code": 200,
"timestamp": "2021-03-21T13:13:46.696429",
"url": "http://localhost:5000/params/hidden_dim",
"data": {
"hidden_dim": 443
}
}
Schemas
Users can list all runs, find out more information about a specific run and now we want to enable them to get predictions on some input text from any of these runs.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
We receive a payload from the request's body which contains information as to what to predict on. The definition of this PredictionPayload
is defined in our app/schemas.py script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
In line 12, we're defining the PredictPayload
object as a list of Text
objects called texts
. Each Text
object is a string that defaults to None
and must have a minimum length of 1 character.
Note
We could've just defined our PredictPayload
like so:
1 2 |
|
Validation
Built-in
We're using pydantic's BaseModel
object here because it offers built-in validation for all of our schemas. In our case, if a Text
instance is less than 1 character, then our service will return the appropriate error message and code.
1 |
|
# 422 Error: Unprocessable Entity { "detail": [ { "loc": [ "body", "texts", 0, "text" ], "msg": "ensure this value has at least 1 characters", "type": "value_error.any_str.min_length", "ctx": { "limit_value": 1 } } ] }
Custom
We can also add custom validation on a specific entity by using the @validator
decorator, like we do to ensure that list of texts
is not empty.
1 2 3 4 5 6 7 8 9 10 |
|
1 |
|
{ "detail": [ { "loc": [ "body", "texts" ], "msg": "List of texts to classify cannot be empty.", "type": "value_error" } ] }
Extras
Lastly, we have a schema_extra
object under the Config
class to depict what an example PredictPayload
should look like. When we do this, it automatically appears in our endpoint's documentation when we want to "Try it out".

Projects
To make our API a standalone product, we'll need to create and manage a database for our users and resources. These users will have credentials which they will use for authentication and use their privileges to be able to communicate with our service. And of course, we can display a rendered frontend to make all of this seamless with HTML forms, buttons, etc. This is exactly how the old MWML platform was built and we leveraged FastAPI to deliver high performance for 500K+ daily service requests.
If you are building a product, then I highly recommending forking this generation template to get started. It includes the backbone architecture you need for your product:
- Databases (models, migrations, etc.)
- Authentication via JWT
- Asynchronous task queue with Celery
- Customizable frontend via Vue JS
- Docker integration
- so much more!
However, for the majority of ML developers, thanks to the wide adoption of microservices, we don't need to do all of this. A well designed API service that can seamlessly communicate with all other services (framework agnostic) will fit into any process and add value to the overall product. Our main focus should be to ensure that our service is working as it should and constantly improve, which is exactly what the next cluster of lessons will focus on (testing and monitoring)
Note
We've only covered the foundations of using FastAPI but there's so much more we can do. Be sure to check out their advanced documentation to see everything we can leverage.
To cite this lesson, please use:
1 2 3 4 5 6 |
|