Status: Complete Python Coverage License

📖 Deployment¶

🎯 Goal of This Notebook
🌐 REST API via Flask
⚡ FastAPI with Pydantic
🧪 Local Testing with cURL & Postman
📦 Dockerizing the API
🎛️ Streamlit Demo App
🧱 Basic Security & Versioning


🎯 Goal of This Notebook¶

Training a model isn’t the end — it needs to be delivered as a working system. This notebook focuses on serving your model as an API, so it can be used by real users or apps.

You’ll learn to:

  • Serve predictions using Flask and FastAPI
  • Run inference from scripts, apps, or Postman
  • Containerize your API using Docker
  • Add a basic UI with Streamlit
  • Understand versioning and input handling best practices

These steps form the foundation of productionizing ML, whether it's on AWS SageMaker, GCP Cloud Run, or your own EC2 instance.

Back to the top


🌐 REST API via Flask¶

🛠️ Creating a Basic /predict Endpoint¶

Serve Your Model Through HTTP¶

A REST API allows external apps (frontends, schedulers, other services) to send requests and get predictions in response.

Using Flask, you can expose a /predict endpoint that:

  • Accepts POST requests with JSON input
  • Passes the input to your trained model
  • Returns predictions as JSON

This is the most common format used in lightweight deployments (e.g., on EC2, Cloud Run, or Heroku).

In [1]:
!pip install --upgrade jupyter_core jupyter_client ipykernel
Requirement already satisfied: jupyter_core in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (5.8.1)
Requirement already satisfied: jupyter_client in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (8.6.3)
Requirement already satisfied: ipykernel in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (6.29.5)
Requirement already satisfied: platformdirs>=2.5 in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from jupyter_core) (2.5.2)
Requirement already satisfied: traitlets>=5.3 in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from jupyter_core) (5.7.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from jupyter_client) (2.8.2)
Requirement already satisfied: pyzmq>=23.0 in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from jupyter_client) (25.1.0)
Requirement already satisfied: tornado>=6.2 in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from jupyter_client) (6.2)
Requirement already satisfied: appnope in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from ipykernel) (0.1.2)
Requirement already satisfied: comm>=0.1.1 in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from ipykernel) (0.1.2)
Requirement already satisfied: debugpy>=1.6.5 in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from ipykernel) (1.8.14)
Requirement already satisfied: ipython>=7.23.1 in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from ipykernel) (8.12.0)
Requirement already satisfied: matplotlib-inline>=0.1 in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from ipykernel) (0.1.6)
Requirement already satisfied: nest-asyncio in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from ipykernel) (1.5.6)
Requirement already satisfied: packaging in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from ipykernel) (23.0)
Requirement already satisfied: psutil in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from ipykernel) (5.9.0)
Requirement already satisfied: backcall in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from ipython>=7.23.1->ipykernel) (0.2.0)
Requirement already satisfied: decorator in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from ipython>=7.23.1->ipykernel) (5.1.1)
Requirement already satisfied: jedi>=0.16 in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from ipython>=7.23.1->ipykernel) (0.18.1)
Requirement already satisfied: pickleshare in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from ipython>=7.23.1->ipykernel) (0.7.5)
Requirement already satisfied: prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30 in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from ipython>=7.23.1->ipykernel) (3.0.36)
Requirement already satisfied: pygments>=2.4.0 in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from ipython>=7.23.1->ipykernel) (2.15.1)
Requirement already satisfied: stack-data in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from ipython>=7.23.1->ipykernel) (0.2.0)
Requirement already satisfied: pexpect>4.3 in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from ipython>=7.23.1->ipykernel) (4.8.0)
Requirement already satisfied: six>=1.5 in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from python-dateutil>=2.8.2->jupyter_client) (1.16.0)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from jedi>=0.16->ipython>=7.23.1->ipykernel) (0.8.3)
Requirement already satisfied: ptyprocess>=0.5 in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from pexpect>4.3->ipython>=7.23.1->ipykernel) (0.7.0)
Requirement already satisfied: wcwidth in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30->ipython>=7.23.1->ipykernel) (0.2.5)
Requirement already satisfied: executing in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from stack-data->ipython>=7.23.1->ipykernel) (0.8.3)
Requirement already satisfied: asttokens in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from stack-data->ipython>=7.23.1->ipykernel) (2.0.5)
Requirement already satisfied: pure-eval in /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages (from stack-data->ipython>=7.23.1->ipykernel) (0.2.2)
In [2]:
# !conda create -n flask-env python=3.11 flask pandas scikit-learn xgboost joblib
# !conda activate flask-env
In [3]:
from flask import Flask, request, jsonify
import joblib
import pandas as pd

# Initialize app and load model
app = Flask(__name__)
model = joblib.load("models/xgb_model.joblib")

@app.route("/predict", methods=["POST"])
def predict():
    input_json = request.get_json()
    input_df = pd.DataFrame(input_json)
    preds = model.predict(input_df).tolist()
    return jsonify({"predictions": preds})

# Uncomment to run directly if using this as a script
# app.run(debug=True)

📦 Loading a Saved Model into Memory¶

Don’t Load Per Request¶

Model loading should happen once when the server starts — not inside the request handler. This ensures:

  • Fast response times (model already in memory)
  • Lower CPU/memory usage
  • Cleaner separation between setup and inference

In cloud environments, model files are often stored in mounted volumes or prebuilt into the Docker image.

Back to the top


⚡ FastAPI with Pydantic¶

In [4]:
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np

# Define input schema
class InputData(BaseModel):
    value: float

# Define response schema
class Prediction(BaseModel):
    prediction: int

app = FastAPI()

# Load model once at startup
model = joblib.load("models/xgb_model.joblib")

@app.post("/predict", response_model=Prediction)
def predict(data: InputData):
    input_array = np.array([[data.value]])
    pred = model.predict(input_array)
    return {"prediction": int(pred[0])}

⚙️ Input Schema Validation¶

FastAPI uses Pydantic to validate and parse incoming request data. You define schemas using Python classes and FastAPI automatically enforces structure, types, and constraints.

This eliminates the need for manual input parsing and makes your APIs safer and more readable.

🔁 Response Format + Status Codes¶

FastAPI also allows you to define response models, which ensures consistent API output. It automatically handles:

  • HTTP status codes (e.g. 200, 422)
  • JSON formatting
  • Type coercion and validation for outputs

You can even add metadata (like descriptions and examples) to both requests and responses.

Back to the top


🧪 Local Testing with cURL & Postman¶

🔍 Example curl Calls¶

You can test the API locally using curl commands from the terminal or Postman GUI. This helps simulate real-world client requests and validate your endpoint behavior.

Use the correct headers (e.g., Content-Type: application/json) and send JSON payloads matching the expected schema.

🧑‍💻 Debugging Input / Output Behavior¶

  • If the server crashes, check the request body format.
  • Use print() or logging inside your route handler to inspect received data.
  • Status code 422 usually means schema validation failed.
  • Postman’s history, saved collections, and visual UI make debugging easier during development.
In [5]:
# curl -X POST http://127.0.0.1:8000/predict \
#      -H "Content-Type: application/json" \
#      -d '{"value": 3.5}'

Back to the top


📦 Dockerizing the API¶

🐳 Dockerfile to Wrap API¶

To containerize your API, write a Dockerfile that installs dependencies and exposes the correct port.

Common base images:

  • python:3.11-slim
  • tiangolo/uvicorn-gunicorn-fastapi:python3.11

Expose the port (e.g., 8000) and set the CMD to run your app via Uvicorn.

In [6]:
# Dockerfile
# FROM python:3.11-slim

# WORKDIR /app

# COPY requirements.txt .
# RUN pip install --no-cache-dir -r requirements.txt

# COPY . .

# CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

🚀 Running + Testing API Container¶

Once the image is built, you can run it locally and test using cURL or Postman.

Steps:

  • docker build -t fastapi-app .
  • docker run -p 8000:8000 fastapi-app
  • Test with a POST request to localhost:8000/predict
In [7]:
# docker build -t fastapi-app .
# docker run -p 8000:8000 fastapi-app

Back to the top


🎛️ Streamlit Demo App¶

🖼️ Interactive Input → Model Prediction¶

This Streamlit app takes simple user input (a numeric value), converts it into the appropriate model input format, performs inference using the locally loaded model, and displays the prediction instantly. Useful for internal demos and lightweight GUI testing.

In [8]:
import streamlit as st
import pandas as pd
import joblib

# Load model once
model = joblib.load("models/xgb_model.joblib")

st.title("XGBoost Demo: Predict from Input")

# Get user input
num = st.number_input("Enter a number:", min_value=0.0, max_value=100.0, step=0.5)

# Convert to DataFrame
input_df = pd.DataFrame([[num]])

# Predict
if st.button("Predict"):
    pred = model.predict(input_df)[0]
    st.success(f"Prediction: {pred}")
2025-06-30 21:05:25.574 WARNING streamlit.runtime.scriptrunner_utils.script_run_context: Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.651 
  Warning: to view this Streamlit app on a browser, run it with the following
  command:

    streamlit run /Users/ashrithreddy/anaconda3/lib/python3.11/site-packages/ipykernel_launcher.py [ARGUMENTS]
2025-06-30 21:05:25.651 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.651 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.652 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.652 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.652 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.653 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.653 Session state does not function when running a script without `streamlit run`
2025-06-30 21:05:25.653 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.654 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.654 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.655 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.655 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.655 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.656 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.656 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.656 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.

🔗 Connect to REST API¶

Instead of embedding the model locally, Streamlit can call a remote REST endpoint. This separates UI from business logic and works well in multi-tiered production systems. The input is sent via a POST request and the prediction is parsed from the API response.

In [9]:
import streamlit as st
import requests

st.title("Remote Prediction via REST API")

num = st.number_input("Enter a number:", min_value=0.0, max_value=100.0, step=0.5)

if st.button("Predict"):
    url = "http://127.0.0.1:5000/predict"
    response = requests.post(url, json={"input": [num]})
    
    if response.status_code == 200:
        pred = response.json().get("prediction", "N/A")
        st.success(f"Prediction: {pred}")
    else:
        st.error(f"Request failed. Code: {response.status_code}")
2025-06-30 21:05:25.710 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.710 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.710 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.711 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.711 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.711 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.712 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.712 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.712 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.713 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.713 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.714 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.714 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.714 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.714 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.715 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.710 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.710 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.711 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.711 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.711 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.712 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.712 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.712 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.713 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.713 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.714 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.714 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.714 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.714 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-06-30 21:05:25.715 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.

Back to the top


🧱 Basic Security & Versioning¶

🔐 Input Sanitization¶

APIs must never assume that incoming data is safe. Basic validation ensures that inputs are correctly typed, within acceptable ranges, and non-malicious. Use pydantic (FastAPI) or manual checks (Flask) to prevent injection attacks or malformed inputs.

In [10]:
from fastapi import FastAPI
from pydantic import BaseModel, conlist

app = FastAPI()

class InputData(BaseModel):
    input: conlist(float, min_length=1, max_length=1)

@app.post("/predict")
def predict(data: InputData):
    x = data.input[0]
    if not (0 <= x <= 100):  # example constraint
        return {"error": "Input out of bounds"}
    
    return {"prediction": 1}

🧾 Versioning Models via URL or Param¶

To support multiple model versions simultaneously, expose the version via URL path or query parameter. This allows backward compatibility and smooth rollouts of new models.

In [11]:
from flask import Flask, request, jsonify
import joblib

app = Flask(__name__)
models = {
    "v1": joblib.load("models/xgb_model_v1.joblib"),
    "v2": joblib.load("models/xgb_model_v2.joblib")
}

@app.route("/predict", methods=["POST"])
def predict():
    version = request.args.get("version", "v1")
    model = models.get(version)
    
    if model is None:
        return jsonify({"error": "Invalid model version"}), 400

    data = request.get_json()["input"]
    pred = model.predict([data])[0]
    return jsonify({"version": version, "prediction": pred})

Back to the top