On this page
article
Capstone: Full-Stack ML Application
Build a complete spam detection API — train a Scikit-learn model, serve it with FastAPI, containerize with Docker, and set up CI/CD.
This capstone combines skills from across the curriculum into one production-style application: a spam detection API that classifies email messages.
Architecture
Client → FastAPI → Scikit-learn Model (joblib)
→ Logging / Health checks
→ Docker container
→ GitHub Actions CI
What You’ll Build
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"message": "Win a free iPhone! Click here now!"}'
# {"label": "spam", "confidence": 0.97}
Project Structure
spam-detector/
├── app/
│ ├── __init__.py
│ ├── main.py
│ ├── model.py
│ └── schemas.py
├── ml/
│ ├── train.py
│ └── model.pkl # generated
├── tests/
│ └── test_api.py
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
├── .github/workflows/ci.yml
└── README.md
Step 1: Train the Model
# ml/train.py
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score
import joblib
# Sample training data (use a real dataset for production)
texts = [
"Hey, are we still meeting for lunch tomorrow?",
"Please review the attached report by Friday.",
"Win a FREE iPhone! Click here NOW!!!",
"Your account has been compromised. Verify immediately.",
"Can you send me the meeting notes?",
"Congratulations! You've won $1,000,000!!!",
"The project deadline is next Monday.",
"URGENT: Claim your prize before it expires!",
"Let's schedule a call to discuss the proposal.",
"Buy cheap medications online with no prescription!",
]
labels = [0, 0, 1, 1, 0, 1, 0, 1, 0, 1] # 0=ham, 1=spam
pipeline = Pipeline([
("tfidf", TfidfVectorizer(max_features=5000, ngram_range=(1, 2))),
("classifier", LogisticRegression(max_iter=1000)),
])
scores = cross_val_score(pipeline, texts, labels, cv=3)
print(f"CV accuracy: {scores.mean():.2f}")
pipeline.fit(texts, labels)
joblib.dump(pipeline, "ml/model.pkl")
print("Model saved to ml/model.pkl")
Run: python ml/train.py
Step 2: FastAPI Application
# app/schemas.py
from pydantic import BaseModel, Field
class MessageRequest(BaseModel):
message: str = Field(min_length=1, max_length=10000)
class PredictionResponse(BaseModel):
label: str
confidence: float
# app/model.py
import joblib
from pathlib import Path
MODEL_PATH = Path(__file__).parent.parent / "ml" / "model.pkl"
_model = None
def get_model():
global _model
if _model is None:
_model = joblib.load(MODEL_PATH)
return _model
def predict(message: str) -> tuple[str, float]:
model = get_model()
proba = model.predict_proba([message])[0]
label_idx = proba.argmax()
label = "spam" if label_idx == 1 else "ham"
return label, float(proba[label_idx])
# app/main.py
import logging
from fastapi import FastAPI, HTTPException
from app.schemas import MessageRequest, PredictionResponse
from app.model import predict
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
app = FastAPI(title="Spam Detector API", version="1.0.0")
@app.get("/health")
def health():
return {"status": "ok"}
@app.post("/predict", response_model=PredictionResponse)
def classify_message(request: MessageRequest):
try:
label, confidence = predict(request.message)
logger.info("Classified as %s (%.2f)", label, confidence)
return PredictionResponse(label=label, confidence=round(confidence, 4))
except Exception as e:
logger.exception("Prediction failed")
raise HTTPException(status_code=500, detail="Prediction failed")
Step 3: Tests
# tests/test_api.py
from fastapi.testclient import TestClient
from app.main import app
client = TestClient(app)
def test_health():
assert client.get("/health").json() == {"status": "ok"}
def test_spam_detection():
response = client.post("/predict", json={
"message": "Win a free iPhone! Click here now!"
})
assert response.status_code == 200
data = response.json()
assert data["label"] == "spam"
assert data["confidence"] > 0.5
def test_ham_detection():
response = client.post("/predict", json={
"message": "Can we reschedule our meeting to Thursday?"
})
assert response.status_code == 200
assert response.json()["label"] == "ham"
Run: pytest tests/ -v
Step 4: Docker
# Dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
RUN python ml/train.py
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
# docker-compose.yml
services:
api:
build: .
ports:
- "8000:8000"
environment:
- LOG_LEVEL=INFO
docker compose up --build
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"message": "Free money! Act now!"}'
Step 5: CI/CD with GitHub Actions
# .github/workflows/ci.yml
name: CI
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install -r requirements.txt
- run: python ml/train.py
- run: pytest tests/ -v
- run: pip install flake8 && flake8 app/ ml/
Skills Combined
| Stage | Chapters Used |
|---|---|
| ML training | Scikit-learn |
| API | FastAPI |
| Validation | Type Hints / Pydantic |
| Testing | pytest |
| Logging | Logging |
| Docker | DevOps |
| Security | Security |
Bonus Extensions
- Real dataset — use the SMS Spam Collection from Kaggle
- Model versioning — save models with timestamps, add
/model/infoendpoint - Rate limiting — add slowapi middleware
- Auth — protect
/predictwith API keys (FastAPI Auth) - Monitoring — add Prometheus metrics endpoint
- Frontend — simple HTML form that calls the API
- Deploy — push to Railway, Render, or AWS ECS
This capstone demonstrates the full lifecycle: train → serve → test → containerize → automate — the workflow of a production ML engineer.