Predictive Analytics for Elite Military Selection
Applying sabermetrics-inspired machine learning to optimize recruitment for one of the U.S. military's most selective special operations units - achieving 90%+ prediction accuracy and delivering executive-ready intelligence to leadership at every level.
Major League Hacking x DoD Fellowship
A competitive fellowship backed by the U.S. Department of Defense. The project began as a team effort - but when teammates dropped out, the entire workload was absorbed and delivered solo: data analysis, model development, production deployment, and executive dashboards - end-to-end, without a gap.
Project Overview
U.S. Department of Defense - 75th Ranger Regiment
The 75th Ranger Regiment is an elite special operations force with some of the most rigorous selection standards in the U.S. military. Their Ranger Assessment and Selection Program (RASP) evaluates candidates across mental, physical, and combat skill dimensions. With significant resources invested in each recruit's training pipeline, improving selection accuracy has a direct impact on mission readiness and operational effectiveness - making this exactly the kind of problem that benefits from applied machine learning.
The Challenge
Pipeline Attrition Cost
Significant resources were invested in candidates who ultimately didn't complete the program. Without early predictive signals, training investment was spread uniformly regardless of each recruit's actual likelihood of success.
Hidden Success Factors
Traditional evaluation methods captured obvious surface metrics, but the true determinants of who graduates - psychological resilience, injury patterns, training environment response - weren't quantified or well understood.
Intelligence Gap for Leadership
Raw recruit data existed but wasn't translated into anything decision-makers could act on. The gap between data and decisions needed bridging - for both technical analysts and non-technical military leadership.
Scale and Reproducibility
With thousands of recruits flowing through the pipeline annually, any solution needed to work at scale, be reproducible across cohorts, and remain interpretable enough that findings could actually change practice.
Approach
The project drew inspiration from sabermetrics - the statistical revolution that transformed baseball by looking past surface-level stats to find the true predictors of performance. The same logic applied here: not "who passes RASP?" but "what factors actually determine who graduates?"
The Sabermetrics Reframe
Just as Billy Beane's Oakland A's exposed that batting averages missed what actually wins games, the question here was what conventional selection metrics were missing. Reframing the problem - focusing on the true underlying predictors rather than surface-level evaluation scores - shaped every modeling decision.
Multi-Architecture Evaluation
No single algorithm captures the full picture. H2O AutoML was used to systematically compare neural networks, gradient boosting, random forests, and ensemble methods across 3,879 recruit profiles - selecting and combining the strongest performers for both accuracy and interpretability.
Intelligence That Leaders Use
90%+ accurate models are worthless if decision-makers can't act on them. Every model output was translated into plain-language findings and Google Data Studio dashboards designed for non-technical military leadership - not data scientists.
What Was Built
Predictive Models
Neural network and ensemble models trained on 3,879 recruit profiles, achieving 90%+ accuracy in predicting graduate success. Multiple architectures were evaluated and combined for optimal performance.
High-Potential Identification
Algorithms designed to flag high-potential recruits early in the RASP 1 pipeline, enabling targeted investment of training resources where they deliver the greatest return on readiness.
Executive Dashboards
Google Data Studio dashboards with plain-language executive summaries, translating complex model outputs into clear, actionable visualizations that non-technical leadership could interpret and act on immediately.
Technologies Used
Value Delivered
Selection decisions relied on conventional metrics and experienced judgment, without quantified insight into which factors actually predicted program completion. Training resources were invested uniformly across all recruits with no early signal of likely success.
Machine learning models surfaced the true predictors of graduation - ranked by success likelihood - with dashboards that non-technical leadership could read and act on without a statistics background.
Injury Prevention Signals
Analysis revealed injury patterns as a significant predictor of attrition - surfacing targeted opportunities to adjust physical training protocols before attrition occurs rather than after.
Mental Resilience Quantified
Psychological assessment scores emerged as key predictors of graduation - validating and quantifying what experienced instructors long suspected: mental resilience matters more than raw physical performance metrics.
Training Environment Design
Data surfaced how competitive vs. collaborative training environments affect different recruit profiles differently - informing more nuanced training design and resource allocation decisions.