Max Kuhn - The Post-Modeling Model to Fix the Model
I recently watched Max Kuhn’s presentation on The Post-Modeling Model to Fix the Model. Kuhn emphasizes the limitations of traditional modeling approaches in machine learning, especially when it comes to interpretability and model evaluation.
Kuhn highlights two core issues with typical modeling practices. The first is the “overfitting” problem, where a model performs well on training data but poorly on unseen data. The second issue revolves around “model selection,” which can often lead to an unintentional bias in selecting models based on performance metrics that do not translate into real-world applicability.
Quantitative Perspective on Model Evaluation
Kuhn introduces a metric called the “cross-validated AUC,” a robust way to gauge model performance, which provides a more nuanced understanding than simple accuracy. In practice, accuracy can be misleading; for example, in an imbalanced dataset where 95% of users are of one class, a naive model that predicts the majority class can achieve 95% accuracy while failing to recognize the minority class entirely. Calculating AUC takes into account both the true positive rate and the false positive rate, providing a more comprehensive assessment across different threshold settings.
He provides an example where two models score an AUC of 0.85. However, one model achieves this by correctly classifying 70% of the positives and making 30% false positives, whereas the other achieves it by correctly classifying 50% of the positives with only 10% false positives. Just relying on AUC could lead to choosing a less optimal model without considering the underlying distribution of the data.
The Shift to Post-Modeling Practices
Kuhn advocates a post-modeling approach which involves:
-
Model Validation vs. Model Verification: He stresses that model validation should focus on confirming that the model meets business case requirements rather than just statistical performance. For instance, what’s the cost of false negatives in a credit scoring model? This requires a precise understanding of business contexts rather than raw statistics.
-
Ensuring Reproducibility: Many models lack transparency. He questions how often models are rerun with the same seed value and data splits compared to the full data landscape. Are you using
set.seed(123)
routinely? Reproducing results in complex models helps avoid pseudoscientific claims of performance. -
Incorporating Human Context: The effectiveness of a model is contingent on how humans use it. Model drift can occur over time as input data changes, yet many practitioners neglect routine checks against real-world outcomes. For example, a model predicting customer churn may become obsolete if a company changes its customer engagement strategies without periodic recalibration.
Future Directions with Robust Frameworks
Kuhn then presents robust frameworks such as “recipes” and “tidymodels” for R. These frameworks assist in creating repeatable modeling processes which encapsulate all stages from pre-processing to final evaluation succinctly.
He concludes with the notion that practitioners should embrace a cycle of constant feedback, involving stakeholders throughout the modeling phase. This encourages adjustments based on real-world applicability and opens avenues for further data collection methods to close any existing gaps.
This discourse by Kuhn provokes a reexamination of our conventional methodologies in data science, urging for actionable models integrated with human decision-making processes rather than pure statistical artifacts. Understanding the subtleties behind metrics and their implications can effectuate far superior models with enduring utility.