Skip to content

UnloadReports Microservice

Overview

The UnloadReports microservice is responsible for executing SQL queries against Snowflake data warehouse to generate reports. It processes report requests from various sources, validates them, executes the queries, and publishes the results to downstream systems for further processing and delivery.

Business Purpose

This service serves as the execution engine for the Publisher platform's reporting system. It: - Executes SQL queries against Snowflake data warehouse - Validates report requests and filters out stale requests - Provides asynchronous report generation capabilities - Publishes report execution results for downstream processing - Supports both UI-driven and scheduled report generation

Architecture

Service Type

  • Deployment: Kubernetes containerized microservice
  • Trigger: HTTP-triggered Azure Function with post-response processing
  • Runtime: Node.js
  • Database: Snowflake Data Warehouse

Key Components

  1. Handler: Main request processor and message validator
  2. Post-Response Processor: Asynchronous SQL execution engine
  3. Helper: Event Hub publishing utilities
  4. Configuration: Environment-specific database and connection settings

Data Flow

graph TD
    A[HTTP Request] --> B[Message Validation]
    B --> C{Valid Query?}
    C -->|Yes| D[Age Validation]
    C -->|No| E[Mark Invalid]
    D --> F{Recent Enough?}
    F -->|Yes| G[Queue for Processing]
    F -->|No| H[Mark as Old]
    G --> I[HTTP Response]
    I --> J[Post-Response Processing]
    J --> K[Snowflake Connection]
    K --> L[SQL Execution]
    L --> M[Result Processing]
    M --> N[Event Hub Publishing]

    E --> O[Publish Invalid Event]
    H --> P[Publish Old Event]

Dependencies

External Services

  • Snowflake Data Warehouse: SQL query execution
  • Azure Event Hub: Report result publishing
  • Application Insights: Logging and monitoring

NPM Dependencies

  • snowflake-sdk: Snowflake database connectivity
  • async: Asynchronous flow control
  • idgen: Unique identifier generation

Configuration

Environment-Specific Configs

  • config.js: Development configuration
  • config.int.js: Integration environment
  • config.prod.js: Production environment

Key Configuration Parameters

  • Database Settings:
  • database: DATA_LAKE_TEST
  • schema: PUBLISHER
  • role: PUBLISHERAPPTEST
  • uiWarehouse: PUBLISHER_WH
  • Report Validation:
  • dropOldReportMinutes: Age thresholds for different report types
  • Event Hub: Consumer group unloadreports for report hub

API Endpoints

POST /

Processes report execution requests.

Request Body: Array of report request objects

[
  {
    "query": "SELECT * FROM table WHERE condition",
    "reportSource": {
      "serviceName": "UIReportReader",
      "serviceVersion": "1.0.0",
      "sourceType": "UI Report"
    },
    "reportConfig": {
      "publisherkey": "publisher_id",
      "reportId": "report_id",
      "reportName": "report_name"
    },
    "request_timestamp": "2024-01-01T00:00:00Z"
  }
]

Response:

{
  "failedEvents": []
}

Message Processing Logic

Validation Steps

  1. Message Parsing: Convert string messages to JSON objects
  2. Query Validation: Ensure query field is present
  3. Age Validation: Check if report request is within acceptable time window
  4. Processing Queue: Valid messages are queued for post-response processing

Age-Based Filtering

  • UI Reports: Dropped if older than 10 minutes
  • Other Report Types: Configurable thresholds per source type
  • Purpose: Prevents execution of stale or outdated report requests

Record Tracking

  • Each message receives a unique 32-character record ID
  • Processing timestamps are tracked throughout the lifecycle
  • Service version information is embedded for audit purposes

Post-Response Processing

Asynchronous Execution

  • SQL queries are executed after HTTP response is sent
  • Dedicated Snowflake connection for UI warehouse operations
  • Parallel processing of multiple report requests

Snowflake Integration

  • Connection pooling and management
  • Error handling and retry logic
  • Query execution with completion callbacks

Result Processing

  • Success/failure status tracking
  • Execution timing measurements
  • Error message capture and logging

Event Publishing

Event Hub Integration

  • Results published to generatedreport Event Hub
  • Comprehensive event metadata included
  • Support for downstream report delivery systems

Event Structure

{
  "query": "executed_sql_query",
  "reportSource": { /* source information */ },
  "reportConfig": { /* report configuration */ },
  "unload": {
    "serviceVersion": "1.0.0",
    "processStartTime": "2024-01-01T00:00:00Z",
    "queryStartTime": "2024-01-01T00:00:01Z",
    "queryEndTime": "2024-01-01T00:00:05Z",
    "recordid": "unique_32_char_id",
    "result": {
      "status": "success|error|invalid|old",
      "error": "error_message_if_failed"
    }
  }
}

Error Handling

Validation Errors

  • Invalid Messages: Missing query field
  • Old Messages: Exceeded age threshold
  • Parse Errors: Malformed JSON input

Execution Errors

  • Connection Failures: Snowflake connectivity issues
  • SQL Errors: Query syntax or execution problems
  • Timeout Errors: Long-running query handling

Error Recovery

  • Failed events are published with error details
  • Retry logic handled by upstream Event Hub configuration
  • Self-healing integration for persistent failures

Performance Characteristics

Scalability

  • Asynchronous processing prevents blocking
  • Parallel query execution capabilities
  • Configurable warehouse resources in Snowflake

Latency

  • Immediate HTTP response (validation only)
  • Query execution time depends on complexity and data volume
  • Post-response processing adds minimal overhead

Security Considerations

Database Access

  • Role-based access control in Snowflake
  • Secure credential management
  • Connection encryption and authentication

Query Validation

  • No SQL injection prevention (trusts upstream sources)
  • Query complexity and resource usage monitoring
  • Audit trail for all executed queries

Development

Local Setup

  1. Install dependencies: npm install
  2. Configure Snowflake credentials
  3. Set up Event Hub connections
  4. Configure Application Insights

Testing

  • Test file: test.js
  • Manual testing capabilities
  • Snowflake connection testing utilities

Build Process

  • Webpack configuration for bundling
  • Terser plugin for code minification
  • Copy plugin for static assets

Deployment

Kubernetes Configuration

  • Containerized deployment
  • Health check endpoints: /live and /ready
  • Post-response processing support
  • Environment-specific function.json configurations

Environment Variables

  • Snowflake connection parameters
  • Event Hub connection strings
  • Application Insights instrumentation key
  • Database role and warehouse configurations

Monitoring and Logging

Application Insights

  • Query execution metrics
  • Error tracking and alerting
  • Performance monitoring

Custom Logging

  • Record-level processing tracking
  • Query execution timing
  • Validation and filtering statistics

Key Metrics

  • Query success/failure rates
  • Processing latency
  • Message validation statistics
  • Snowflake connection health

This service integrates with: - UIReportReader: Primary source of UI report requests - ReportGenerator: Scheduled report processing - ReportSender: Downstream report delivery - SelfHealer: Error recovery and retry handling

Troubleshooting

Common Issues

  1. Snowflake Connection Failures: Check credentials and network connectivity
  2. High Query Failures: Monitor SQL syntax and data availability
  3. Old Message Drops: Verify upstream processing latency
  4. Event Hub Publishing Issues: Check connection strings and permissions

Monitoring Points

  • Query execution success rates
  • Message processing throughput
  • Snowflake warehouse utilization
  • Event Hub publishing success rates
  • Age-based message filtering statistics