Curated Data Science by Rahul

How to Interrogate Data Like a Journalist

The recent talk by Molly Huie and Andrew Wallender at a Bloomberg conference offered actionable insights into data journalism, a crucial intersection of data analysis and effective storytelling. You can watch the full video here. Here’s a deep dive into their methodology, breaking down the nuances that stand out.

Gathering Data: Internal vs. External

Molly’s team primarily works with internal data sources, such as the Bloomberg terminal and federal dockets, which provides a rich foundation. This internal data generally includes:

In contrast, Wallender discussed external data challenges, emphasizing the need for vigilance when vetting data collected from third-party sources, such as government or NGO reports. Journalists often face limitations on their ability to curate or construct their datasets, resulting in reliance on available data, which may not be complete or accurate.

Vetting Data: The Importance of Validation

Wallender emphasized three critical steps for vetting data:

  1. Understand the Format: This includes not just checking numerical ranges—being aware of whether data represents hundreds, thousands, millions, or billions is crucial.
  2. Conduct Spot Checks: Randomly sample the data to spot anomalies or outliers. For example, small data sets from surveys might require manual inspection to ensure accuracy.
  3. Make No Assumptions: Wallender recounted an instance where HR survey respondents inflating budget numbers created discrepancies in analysis. Always cross-verify self-reported data.

Writing About Data: Clarity is Key

With data vetted, Molly and Andrew turned to the writing process. They offered guidelines for effectively communicating complex information to a non-technical audience:

They presented a case study of California’s sales tax distribution, explaining how local sales taxes are based on the transaction location rather than the buyer’s location. This example showcases the importance of bridging dense analytical points with relatable real-world context. Making the info “approachable” is key and directly enhances comprehension.

Visualizing Data: Enhancing Understanding

Visual aids can condense complex data into digestible formats. Key takeaways include:

Wallender highlighted the mantra from their graphic designer: “Don’t be wrong; don’t be confusing.” Effective visuals turn complex narratives into clear stories.

The Wish List for Data Producers

Wallender shared expectations for data producers, important for journalists who analyze or report on that data:

  1. Data Dictionary: A detailed explanation of data fields enhances transparency.
  2. Point of Contact: Having someone accessible for clarifying data collection methods can save time and avoid misinterpretation.
  3. Transparent Updates: Clearly state when data was last updated to allow for timely analysis.
  4. Downloadable Access: Data locked behind visualization dashboards can hinder usability. Open access encourages broader analysis and collaboration.
  5. Tidy Spreadsheets: Cluttered data structures complicate analysis. Ideally, multiple clean data sources with distinct fields should be more favorable than a single, unwieldy spreadsheet.

In summary, the session emphasized the intricate relationship between data collection and storytelling. By adhering to a structured approach—from data gathering to writing and visualization—data journalists can effectively bridge the gap between technical data and public understanding.