ml

Python · R · Julia

ML pipelines leak. Yours probably does too.

Seven verbs. Four constraints.
The types make leakage structurally impossible.

Get started GitHub

Install pip install mlw / library(ml) · 2,300+ tests

EPAGOGY

INDEPENDENT RESEARCH

The solution

scikit-learn · 17 lines

Same task. Same data. Both produce evidence. Only one enforces a structural boundary between validation and test.

scikit-learn17 lines

output:

ml5 lines

evaluate + assess output:

── evaluate(m, s.valid) ──────────
Metric       Score
Accuracy     0.8534
F1           0.8210
AUC          0.9102

── assess(m, test=s.test) ────────
Metric       Score
Accuracy     0.8497
F1           0.8178
AUC          0.9064
⚠ Final verdict. Test set used once.

evaluate() is repeatable. assess() is terminal. They return different types.
A second assess() call on the same model raises. The grammar rejects it at call time.

The grammar

Seven primitives. One DAG.

Invalid compositions have no derivation. They are rejected at call time, not caught after the fact.

DataFrame → Partition

split()

Partition into train, valid, test. Establishes the assessment boundary.

data

DataFrame → PreparedData

prepare()

Normalize, encode, impute. Per fold, after split. Never on the full dataset.

data

DataFrame × target → Model

fit()

Train a model. Handles preparation internally. Seed required.

iterate

Model × DataFrame → Predictions

predict()

Generate predictions. No partition constraint; works on any data.

iterate

Model × DataFrame → Metrics

evaluate()

Measure on validation. Repeatable. The iterate zone.

iterate

Model → Explanation

explain()

Feature importance, partial dependence. Diagnostic, outside the validity chain.

iterate

Model × DataFrame → Evidence

assess()

Terminal measurement. Once per model. Returns Evidence, not Metrics.

commit

Constraints

Every framework prevents some leakage. None prevented all of it.

Four rules. Not forty. That's the whole game. For now.

Assess once per model d=0.93

Repeated test-set peeking inflates results. A second assess() call raises.

Prepare after split, per fold d≈0

Effect is negligible, but the constraint is costless and principled.

Type-safe transitions d=0.53–1.11

Fitting on test data, evaluating without a model: invalid compositions have no derivation.

No label access before split d=0.93

Feature selection using test labels inflates. The guard on fit() rejects untagged data.

Algorithms

Eleven families. Native Rust.

Core algorithms compiled into Python and R via PyO3 and extendr. Same kernel, both languages.

engine="auto" picks the fastest backend. "ml" pins Rust explicitly.

Algorithm	Engine	Classification	Regression
Random Forest	Rust	✓	✓
Decision Tree	Rust	✓	✓
Extra Trees	Rust	✓	✓
Gradient Boosting	Rust	✓	✓
Ridge / Linear	Rust	—	✓
Logistic	Rust	✓	—
Elastic Net	Rust	—	✓
KNN	Rust	✓	✓
Naive Bayes	Rust	✓	—
SVM	Rust	✓	✓
XGBoost	optional	✓	✓

Get started

Install in 30 seconds.

Python 3.10+, R 4.1+, or Julia 1.9+. Native Rust backend included.

python

# core
pip install mlw

# + XGBoost (recommended)
pip install "mlw[xgboost]"

# everything
pip install "mlw[all]"

# requires: remotes + Rust toolchain (rustup.rs)
install.packages("remotes")
remotes::install_github("epagogy/ml", subdir = "ml/r")

# CRAN submission in progress

julia

# requires Julia 1.9+
] add https://github.com/epagogy/ml

# General registry submission in progress

View on GitHub PyPI R package

When to stop using ml: when your framework of choice enforces all four constraints natively. I look forward to that day.