ml
Python · R · Julia
ML pipelines leak. Yours probably does too.
Seven verbs. Four constraints.
The types make leakage structurally impossible.
Install
pip install mlw / library(ml) · 2,300+ tests EPAGOGY
INDEPENDENT RESEARCH
The solution
scikit-learn · 17 lines
Same task. Same data. Both produce evidence. Only one enforces a structural boundary between validation and test.
scikit-learn17 lines
output:
ml5 lines
evaluate + assess output:── evaluate(m, s.valid) ──────────
Metric Score
Accuracy 0.8534
F1 0.8210
AUC 0.9102
── assess(m, test=s.test) ────────
Metric Score
Accuracy 0.8497
F1 0.8178
AUC 0.9064
⚠ Final verdict. Test set used once. The grammar
Seven primitives. One DAG.
Invalid compositions have no derivation. They are rejected at call time, not caught after the fact.
DataFrame → Partition
split()
Partition into train, valid, test. Establishes the assessment boundary.
data
DataFrame → PreparedData
prepare()
Normalize, encode, impute. Per fold, after split. Never on the full dataset.
data
DataFrame × target → Model
fit()
Train a model. Handles preparation internally. Seed required.
iterate
Model × DataFrame → Predictions
predict()
Generate predictions. No partition constraint; works on any data.
iterate
Model × DataFrame → Metrics
evaluate()
Measure on validation. Repeatable. The iterate zone.
iterate
Model → Explanation
explain()
Feature importance, partial dependence. Diagnostic, outside the validity chain.
iterate
Model × DataFrame → Evidence
assess()
Terminal measurement. Once per model. Returns Evidence, not Metrics.
commit
Constraints
Every framework prevents some leakage. None prevented all of it.
Four rules. Not forty. That's the whole game. For now.
C1
Assess once per model d=0.93
Repeated test-set peeking inflates results. A second
assess() call raises. C2
Prepare after split, per fold d≈0
Effect is negligible, but the constraint is costless and principled.
C3
Type-safe transitions d=0.53–1.11
Fitting on test data, evaluating without a model: invalid compositions have no derivation.
C4
No label access before split d=0.93
Feature selection using test labels inflates. The guard on
fit() rejects untagged data.Algorithms
Eleven families. Native Rust.
Core algorithms compiled into Python and R via PyO3 and extendr. Same kernel, both languages.
engine="auto" picks the fastest backend. "ml" pins Rust explicitly.
| Algorithm | Engine | Classification | Regression |
|---|---|---|---|
| Random Forest | Rust | ✓ | ✓ |
| Decision Tree | Rust | ✓ | ✓ |
| Extra Trees | Rust | ✓ | ✓ |
| Gradient Boosting | Rust | ✓ | ✓ |
| Ridge / Linear | Rust | — | ✓ |
| Logistic | Rust | ✓ | — |
| Elastic Net | Rust | — | ✓ |
| KNN | Rust | ✓ | ✓ |
| Naive Bayes | Rust | ✓ | — |
| SVM | Rust | ✓ | ✓ |
| XGBoost | optional | ✓ | ✓ |
Get started
Install in 30 seconds.
Python 3.10+, R 4.1+, or Julia 1.9+. Native Rust backend included.
# core pip install mlw # + XGBoost (recommended) pip install "mlw[xgboost]" # everything pip install "mlw[all]"
# requires: remotes + Rust toolchain (rustup.rs) install.packages("remotes") remotes::install_github("epagogy/ml", subdir = "ml/r") # CRAN submission in progress
# requires Julia 1.9+ ] add https://github.com/epagogy/ml # General registry submission in progress
When to stop using ml: when your framework of choice enforces all four constraints natively. I look forward to that day.