ml hex sticker

ml

Python · R · Julia

ML pipelines leak. Yours probably does too.

Seven verbs. Four constraints.
The types make leakage structurally impossible.

Install pip install mlw / library(ml) · 2,300+ tests
EPAGOGY
INDEPENDENT RESEARCH
The solution

scikit-learn · 17 lines

Same task. Same data. Both produce evidence. Only one enforces a structural boundary between validation and test.

scikit-learn17 lines
output:
ml5 lines
evaluate + assess output:
── evaluate(m, s.valid) ──────────
Metric       Score
Accuracy     0.8534
F1           0.8210
AUC          0.9102

── assess(m, test=s.test) ────────
Metric       Score
Accuracy     0.8497
F1           0.8178
AUC          0.9064
⚠ Final verdict. Test set used once.
The grammar

Seven primitives. One DAG.

Invalid compositions have no derivation. They are rejected at call time, not caught after the fact.

DataFrame → Partition
split()
Partition into train, valid, test. Establishes the assessment boundary.
data
DataFrame → PreparedData
prepare()
Normalize, encode, impute. Per fold, after split. Never on the full dataset.
data
DataFrame × target → Model
fit()
Train a model. Handles preparation internally. Seed required.
iterate
Model × DataFrame → Predictions
predict()
Generate predictions. No partition constraint; works on any data.
iterate
Model × DataFrame → Metrics
evaluate()
Measure on validation. Repeatable. The iterate zone.
iterate
Model → Explanation
explain()
Feature importance, partial dependence. Diagnostic, outside the validity chain.
iterate
Model × DataFrame → Evidence
assess()
Terminal measurement. Once per model. Returns Evidence, not Metrics.
commit
Constraints

Every framework prevents some leakage. None prevented all of it.

Four rules. Not forty. That's the whole game. For now.

C1
Assess once per model d=0.93
Repeated test-set peeking inflates results. A second assess() call raises.
C2
Prepare after split, per fold d≈0
Effect is negligible, but the constraint is costless and principled.
C3
Type-safe transitions d=0.53–1.11
Fitting on test data, evaluating without a model: invalid compositions have no derivation.
C4
No label access before split d=0.93
Feature selection using test labels inflates. The guard on fit() rejects untagged data.
Algorithms

Eleven families. Native Rust.

Core algorithms compiled into Python and R via PyO3 and extendr. Same kernel, both languages.

engine="auto" picks the fastest backend. "ml" pins Rust explicitly.

AlgorithmEngineClassificationRegression
Random ForestRust
Decision TreeRust
Extra TreesRust
Gradient BoostingRust
Ridge / LinearRust
LogisticRust
Elastic NetRust
KNNRust
Naive BayesRust
SVMRust
XGBoostoptional
Get started

Install in 30 seconds.

Python 3.10+, R 4.1+, or Julia 1.9+. Native Rust backend included.

python
# core
pip install mlw

# + XGBoost (recommended)
pip install "mlw[xgboost]"

# everything
pip install "mlw[all]"
r
# requires: remotes + Rust toolchain (rustup.rs)
install.packages("remotes")
remotes::install_github("epagogy/ml", subdir = "ml/r")

# CRAN submission in progress
julia
# requires Julia 1.9+
] add https://github.com/epagogy/ml

# General registry submission in progress
View on GitHub PyPI R package

When to stop using ml: when your framework of choice enforces all four constraints natively. I look forward to that day.