Data Warehouse Advisory Group: 3-19-26

Data Warehouse Advisory Group: 3-19-26


Meeting Details

Meeting Date:

Mar 19, 2026

Purpose:

Data Warehouse Advisory Group

Participants:

Mark Cohen, Steve Klein, Ben Moore, Erik Cooper, Layheng Ting , Pam Mery, Christopher Blackmore, Denice Inciong, Jason Makabali, Jeanae Releford, Kai Yun Pekarsky, Tim Flanagan, Vinod Verma, Virginia/Ginny Moran, Matt Hurley, Jacob Kevari, Eric Houck, Amber Hroch, Jack Thompson, Elaine Kuo, Gayle Pitman, Gene Tjoa

Agenda

Item

Item

1

Redshift Query Best Practices Guide

  • Frequent issue: users submit queries that fail, timeout, or run extremely slowly.

  • Root causes:

    • Overly large, complex single queries

    • Inefficient constructs (e.g., excessive DISTINCT, subqueries, IN clauses)

    • Poor structuring leading to memory limits in Redshift

  • Key optimization strategies discussed:

    • Break queries into stages (use temp tables / CTEs)

    • Prefer CTEs over subqueries for readability/debugging

    • Use EXISTS instead of IN where appropriate

    • Apply filters early to reduce data volume

    • Avoid heavy operations inside aggregates (e.g., CASE in aggregates)

    • Use EXPLAIN to evaluate execution plans

Key Requests

  • Develop a formal best practices guide for writing efficient Redshift queries

    • Include:

      • Query structuring techniques (e.g., staging, temp tables, CTEs)

      • Performance optimization tips (e.g., avoiding costly operations like DISTINCT, using EXISTS)

      • Guidance on using Redshift tools (e.g., EXPLAIN)

      • Clear rationale for each recommendation

      • Real-world examples of inefficient vs. optimized queries

    • Provide guidance tailored to columnar databases vs. traditional relational databases

Action Items

  • Draft a Redshift query best practices document

  • Incorporate feedback and examples from team members

  • Distribute draft prior to next meeting

  • Use next meeting to review and refine the document collaboratively

2

Data Warehouse Documentation Gaps

  • Current documentation includes:

    • Data dictionaries, user guides, and access instructions

  • Identified gaps / improvement areas:

    • Data profiling insights:

      • Column population levels (null/sparse data)

      • Sample values

      • Historical population trends

    • Anomaly detection & alerts for data issues

    • Use-case playbooks (e.g., retention, success metrics)

    • Pre-built/sample queries for common analyses

    • Central query repository (e.g., GitHub-style sharing)

    • Training resources:

      • Beginner onboarding (how to access/query)

      • Structured learning pathways

      • Workshops / office hours

  • Key challenge:

    • Research use cases are highly variable, making standardization difficult

  • Opportunity:

    • Improve adoption and awareness of the data warehouse across users

Key Requests

  • Expand documentation beyond data dictionaries to include:

    • Data profiling information:

      • Column completeness (null/sparsity)

      • Sample values

      • Historical data availability trends

    • Common use-case playbooks:

      • Metrics such as retention, success, transfer, etc.

    • Pre-built/sample queries for common analyses

    • A shared repository (e.g., GitHub-style) for queries and scripts

    • Documentation of data nuances and known anomalies

  • Improve training and onboarding resources:

    • Beginner guides for accessing and querying the warehouse

    • Structured learning pathways

    • Increased outreach to raise awareness and adoption

    • Potential recurring support sessions (office hours, webinars)

Action Items

  • Begin data profiling initiative to enhance documentation with:

    • Column usage insights

    • Data quality indicators

  • Collect and prioritize high-value use cases for documentation (e.g., retention metrics)

  • Explore feasibility of:

    • Central query repository (crowdsourced or curated)

    • Pre-built query library

  • Gather additional feedback from users on documentation gaps

  • Evaluate and expand training opportunities:

    • Introductory sessions

    • Regular office hours or workshops

  • Continue improving and maintaining the documentation site

3

NORC Data Enclave / Secure Data Access Model

  • Discussion: adopting a secure data enclave (“clean room”) model for sensitive data

  • Key features:

    • Role-based, highly controlled access

    • Restrictions on downloading/copying data

    • Monitoring and auditing of user activity

  • Benefits:

    • Enhances data security and compliance

    • Reduces risk and user anxiety

    • Enables safer data sharing with external partners

  • Trade-offs:

    • Potential usability constraints

    • Implementation complexity and cost

  • Related approaches discussed:

    • Virtual machines with restricted access

    • Controlled environments with limited export capabilities

Key Requests

  • Investigate a secure data enclave model for sensitive data access

  • Evaluate capabilities such as:

    • Role-based access controls

    • Restrictions on downloading/exporting data

    • Activity monitoring and auditing

  • Assess how this model could:

    • Enable safer data sharing with external partners

    • Reduce risk and improve compliance

    • Support institutions with strict security requirements

Action Items

  • Conduct further research on data enclave / clean room technologies

  • Review existing implementations (e.g., AWS clean rooms, government/education models)

  • Evaluate:

    • Technical feasibility

    • Costs (infrastructure, licensing, staffing)

    • Impact on usability and workflows

  • Identify potential use cases and partners who would benefit

  • Continue internal discussion and gather stakeholder feedback

4

Overall Action Summary

  • Draft and review Redshift best practices guide

  • Enhance documentation with data profiling, use cases, and query examples

  • Explore training and adoption strategies

  • Research and assess secure data enclave solutions

Issues/Questions Resolved

Issue/Question

Resolution/Answer

Issue/Question

Resolution/Answer

1

 

 

Issues/Questions Needing Resolution

Issue/Question

Resolution/Answer

Issue/Question

Resolution/Answer

1

 

 

 

 

Action Items/Next Steps

Item

Notes

Item

Notes