All insights

Engineering · May 1, 2026 · 10 min read

Uncle Bob Was Right: Clean Architecture Belongs in Data Science

We will admit it up front: we are unapologetic fans of Robert C. Martin's Clean Architecture. Not because the diagrams are elegant — though they are — but because every principle in that book was forged in the same fire data science teams are walking into today: software that works on day one and is unmaintainable by day ninety.

The data science community has, for understandable reasons, treated architecture as someone else's problem. Notebooks are for thinking; "engineering" happens later, downstream, by other people. That division of labor is exactly what produces the ML estates we get called in to rescue. Clean Architecture, applied with a light touch, is the cheapest insurance policy a data team can buy.

The one rule that matters

Strip Clean Architecture down to its essential claim and you get the Dependency Rule: source code dependencies point inward, toward policy. Your business rules don't know about your database. Your database doesn't know about your web framework. The things that change for different reasons are kept apart, so they can change at different rates.

Translate that into the vocabulary of an ML system and the layers almost name themselves:

  • Entities / domain. The mathematical core: the features, the loss, the model abstraction, the decision policy. The thing a regulator would ask you to explain.
  • Use cases. "Score a loan application." "Retrain on yesterday's data." "Backtest against the 2023 cohort." Pure orchestration of the domain — no Spark, no S3, no FastAPI.
  • Interface adapters. Feature store clients, model registry wrappers, request/response shapes, schema translators.
  • Frameworks & drivers. Airflow, Ray, Sagemaker, Snowflake, Kafka, whatever serving stack survived the last re-platforming. The replaceable outer ring.

Notice what falls out: the model itself — the part the business actually paid for — sits at the center, with zero dependencies on the platform du jour. When the platform changes (and it always does), the model doesn't move.

Why data science needs it more, not less

A typical web service changes shape over years. A typical ML pipeline changes shape over weeks: new features land, the training window shifts, a champion model gets challenged, the serving SLA tightens, a regulator asks for a counterfactual. Every one of those changes is a force pushing on the codebase. Without architectural seams, each force deforms the whole thing.

The symptoms are familiar to anyone who has inherited a two-year-old ML repo:

  • The training script imports the serving config, so you can't run either in isolation.
  • Feature engineering is duplicated across the notebook, the batch job, and the online scorer — and the three have quietly drifted.
  • Swapping XGBoost for a transformer requires touching twelve files across four repos because the model class is wired into the serving handler.
  • Nobody can write a unit test for the decision logic, because instantiating it requires a live Spark session and a populated feature store.

Each of these is, in Uncle Bob's vocabulary, a Dependency Rule violation. Each is also a line item in next year's modernization budget.

What it looks like in practice

Clean Architecture in an ML codebase doesn't require a fifty-file scaffold. It requires three habits.

1. A pure domain layer for the model

The model is a callable that takes a typed feature record and returns a typed prediction. It does not know where the features came from. It does not know whether it is being called from a batch job, a REST endpoint, or a backtest. Concretely:

# domain/scoring.py — no I/O, no framework imports
from dataclasses import dataclass

@dataclass(frozen=True)
class LoanFeatures:
    fico: int
    dti: float
    tenure_months: int

@dataclass(frozen=True)
class RiskScore:
    pd_12m: float
    band: str  # "low" | "medium" | "high"

class RiskModel:
    def __init__(self, estimator):
        self._estimator = estimator
    def score(self, f: LoanFeatures) -> RiskScore:
        p = float(self._estimator.predict_proba([[f.fico, f.dti, f.tenure_months]])[0, 1])
        return RiskScore(pd_12m=p, band=_band(p))

This file is testable on a laptop in milliseconds. It is also the only file an auditor needs to read to understand the decision.

2. Use cases that orchestrate, ports that abstract

The "score a loan" use case depends on a port — an interface — for fetching features and recording outcomes, not on a concrete feature store SDK. The Snowflake adapter and the offline Parquet adapter both implement that port. Swapping platforms means writing one adapter, not rewriting the use case.

# use_cases/score_loan.py
class FeatureSource(Protocol):
    def fetch(self, applicant_id: str) -> LoanFeatures: ...

class DecisionLog(Protocol):
    def record(self, applicant_id: str, score: RiskScore) -> None: ...

def score_loan(applicant_id: str, features: FeatureSource,
               model: RiskModel, log: DecisionLog) -> RiskScore:
    s = model.score(features.fetch(applicant_id))
    log.record(applicant_id, s)
    return s

3. The notebook is an adapter, not the system

This is the cultural shift. A notebook is a perfectly good driver for the use-case layer — a great place to explore, visualize, and challenge the model. It is a terrible place for the model itself to live. When the domain and use cases sit in importable modules, the notebook becomes a thin shell that imports them, exactly like the production scorer does. Experimentation and production converge on the same code, which is the only honest definition of reproducibility.

The objections, answered

"This is over-engineering for a model that might get killed in three months." The three habits above add maybe a day of structure to a project. The model that gets killed costs you a day. The model that survives costs you a year of refactoring you didn't budget for. The asymmetry is not close.

"Data scientists won't write code this way." They will, when the scaffolding is given to them and the domain layer is obviously theirs to own. What they resist — correctly — is being handed a Java-shaped framework and told to fit their work into it. Clean Architecture is a set of boundaries, not a framework. The boundaries can be as lightweight as a folder structure and a Protocol class.

"Our platform already does this for us." No platform we have ever seen does this for you. Platforms give you the outer ring — the drivers — and assume you have already drawn the inner rings. That assumption is where most ML estates quietly fail.

Why this shows up in our work

When we build with Crosswalk, the Dependency Rule is not a stylistic preference — it is the generation target. Experimentation code gets refactored into a domain layer, use cases, and adapters, because that is the only shape that survives the next platform migration, the next regulator, and the next data scientist who inherits the repo. Uncle Bob wrote the manual thirty years ago. We're just applying it to the discipline that needs it most.

See how Crosswalk closes this gap.

Explore Crosswalk