Curated Data Science by Rahul

Leveraging Plumber in Excel for Predictive Modeling in Energy

Ryan Klein, a data scientist at Continental Resources, delivered a practical talk on integrating R’s Plumber package with Excel to enhance data-driven decision-making in the energy sector. The implications of such a solution extend well beyond the realm of Excel, yet the choice of interface is significant for user adoption.

Problem Statement

Continental Resources operates approximately 6,000 wells, each requiring an upfront average investment of around $8 million. This staggering figure underscores the need for predictive models capable of determining the viability of these investments based on varying parameters. Klein highlighted essential aspects that influence well performance, such as geological factors and operating parameters. The challenge arises from the necessity of capturing and adjusting up to 50 operational inputs for accurate predictions.

Current Practices

Initially, engineers used a Shiny R tool that required significant manual input, deriving parameters from existing wells. Despite comprising a rich set of functionalities, users reverted to Excel for final output manipulation and financial analyses. The conversation revealed a critical insight: building complex analytics within a tool that users don’t prefer hampers ROI and can even lead to misinformed decisions.

Transition to Plumber

Klein’s solution pivoted on the need to meet users where they work—Excel. He proposed a system whereby R’s Plumber could interact directly with Excel, enabling users to utilize existing skills while harnessing deeper analytical capabilities within R. This integration allows for real-time modifications to input parameters, generating outputs that include predictions and an outlier score.

Technical Implementation

  1. Model Development: Klein shared the application of Random Forest regression on a 2012-2013 housing dataset in Taiwan as an example. While not relevant to energy, it illustrated the model training and saving process necessary for Plumber API deployment.
  2. Outlier Analysis: An Isolation Forest contributed to outlier score calculation. Klein calculated the trustworthiness of the predictions by normalizing scores between 0 and 1, whereby values closer to 1 indicated extreme outlier conditions—crucial for ensuring the integrity of decisions made on predicted outcomes.

API with Plumber

Klein described the creation of a Plumber API that achieved the following:

Excel Integration

The interface between R and Excel features variable-driven cells, making it user-friendly. Solution users can input parameters like well characteristics and immediately see changes in the predictions. Klein reported that the system could handle extensive runs—running 100,000 iterations, albeit potentially taking hours, which remains feasible.

Significant Findings and Pitfalls

  1. Character Limit: Klein noted Power Query’s character limit (approximately 2,000 characters) necessitated batch processing for API calls, implemented through additional VBA scripts.
  2. Port Management: Each API run assigns a dynamic port. Establishing a static port connection is vital to maintain stability.

Conclusion

Klein’s integration of the Plumber package into Excel demonstrates a valuable merging of technical prowess with practical user behavior. Data-centric applications often falter due to interface misalignment with user preferences. This solution provides the analytical strength of R while preserving Excel as the primary operational tool for engineers. By enabling quick simulation of various scenarios, the model amplifies collective insights while reducing the risk associated with financial commitments in oil and gas ventures.

For those interested in the finer details, you can watch the full talk here.