UnloadReports Microservice
Overview
The UnloadReports microservice is responsible for executing SQL queries against Snowflake data warehouse to generate reports. It processes report requests from various sources, validates them, executes the queries, and publishes the results to downstream systems for further processing and delivery.
Business Purpose
This service serves as the execution engine for the Publisher platform's reporting system. It: - Executes SQL queries against Snowflake data warehouse - Validates report requests and filters out stale requests - Provides asynchronous report generation capabilities - Publishes report execution results for downstream processing - Supports both UI-driven and scheduled report generation
Architecture
Service Type
- Deployment: Kubernetes containerized microservice
- Trigger: HTTP-triggered Azure Function with post-response processing
- Runtime: Node.js
- Database: Snowflake Data Warehouse
Key Components
- Handler: Main request processor and message validator
- Post-Response Processor: Asynchronous SQL execution engine
- Helper: Event Hub publishing utilities
- Configuration: Environment-specific database and connection settings
Data Flow
graph TD
A[HTTP Request] --> B[Message Validation]
B --> C{Valid Query?}
C -->|Yes| D[Age Validation]
C -->|No| E[Mark Invalid]
D --> F{Recent Enough?}
F -->|Yes| G[Queue for Processing]
F -->|No| H[Mark as Old]
G --> I[HTTP Response]
I --> J[Post-Response Processing]
J --> K[Snowflake Connection]
K --> L[SQL Execution]
L --> M[Result Processing]
M --> N[Event Hub Publishing]
E --> O[Publish Invalid Event]
H --> P[Publish Old Event]
Dependencies
External Services
- Snowflake Data Warehouse: SQL query execution
- Azure Event Hub: Report result publishing
- Application Insights: Logging and monitoring
NPM Dependencies
snowflake-sdk: Snowflake database connectivityasync: Asynchronous flow controlidgen: Unique identifier generation
Configuration
Environment-Specific Configs
config.js: Development configurationconfig.int.js: Integration environmentconfig.prod.js: Production environment
Key Configuration Parameters
- Database Settings:
database: DATA_LAKE_TESTschema: PUBLISHERrole: PUBLISHERAPPTESTuiWarehouse: PUBLISHER_WH- Report Validation:
dropOldReportMinutes: Age thresholds for different report types- Event Hub: Consumer group
unloadreportsforreporthub
API Endpoints
POST /
Processes report execution requests.
Request Body: Array of report request objects
[
{
"query": "SELECT * FROM table WHERE condition",
"reportSource": {
"serviceName": "UIReportReader",
"serviceVersion": "1.0.0",
"sourceType": "UI Report"
},
"reportConfig": {
"publisherkey": "publisher_id",
"reportId": "report_id",
"reportName": "report_name"
},
"request_timestamp": "2024-01-01T00:00:00Z"
}
]
Response:
{
"failedEvents": []
}
Message Processing Logic
Validation Steps
- Message Parsing: Convert string messages to JSON objects
- Query Validation: Ensure query field is present
- Age Validation: Check if report request is within acceptable time window
- Processing Queue: Valid messages are queued for post-response processing
Age-Based Filtering
- UI Reports: Dropped if older than 10 minutes
- Other Report Types: Configurable thresholds per source type
- Purpose: Prevents execution of stale or outdated report requests
Record Tracking
- Each message receives a unique 32-character record ID
- Processing timestamps are tracked throughout the lifecycle
- Service version information is embedded for audit purposes
Post-Response Processing
Asynchronous Execution
- SQL queries are executed after HTTP response is sent
- Dedicated Snowflake connection for UI warehouse operations
- Parallel processing of multiple report requests
Snowflake Integration
- Connection pooling and management
- Error handling and retry logic
- Query execution with completion callbacks
Result Processing
- Success/failure status tracking
- Execution timing measurements
- Error message capture and logging
Event Publishing
Event Hub Integration
- Results published to
generatedreportEvent Hub - Comprehensive event metadata included
- Support for downstream report delivery systems
Event Structure
{
"query": "executed_sql_query",
"reportSource": { /* source information */ },
"reportConfig": { /* report configuration */ },
"unload": {
"serviceVersion": "1.0.0",
"processStartTime": "2024-01-01T00:00:00Z",
"queryStartTime": "2024-01-01T00:00:01Z",
"queryEndTime": "2024-01-01T00:00:05Z",
"recordid": "unique_32_char_id",
"result": {
"status": "success|error|invalid|old",
"error": "error_message_if_failed"
}
}
}
Error Handling
Validation Errors
- Invalid Messages: Missing query field
- Old Messages: Exceeded age threshold
- Parse Errors: Malformed JSON input
Execution Errors
- Connection Failures: Snowflake connectivity issues
- SQL Errors: Query syntax or execution problems
- Timeout Errors: Long-running query handling
Error Recovery
- Failed events are published with error details
- Retry logic handled by upstream Event Hub configuration
- Self-healing integration for persistent failures
Performance Characteristics
Scalability
- Asynchronous processing prevents blocking
- Parallel query execution capabilities
- Configurable warehouse resources in Snowflake
Latency
- Immediate HTTP response (validation only)
- Query execution time depends on complexity and data volume
- Post-response processing adds minimal overhead
Security Considerations
Database Access
- Role-based access control in Snowflake
- Secure credential management
- Connection encryption and authentication
Query Validation
- No SQL injection prevention (trusts upstream sources)
- Query complexity and resource usage monitoring
- Audit trail for all executed queries
Development
Local Setup
- Install dependencies:
npm install - Configure Snowflake credentials
- Set up Event Hub connections
- Configure Application Insights
Testing
- Test file:
test.js - Manual testing capabilities
- Snowflake connection testing utilities
Build Process
- Webpack configuration for bundling
- Terser plugin for code minification
- Copy plugin for static assets
Deployment
Kubernetes Configuration
- Containerized deployment
- Health check endpoints:
/liveand/ready - Post-response processing support
- Environment-specific function.json configurations
Environment Variables
- Snowflake connection parameters
- Event Hub connection strings
- Application Insights instrumentation key
- Database role and warehouse configurations
Monitoring and Logging
Application Insights
- Query execution metrics
- Error tracking and alerting
- Performance monitoring
Custom Logging
- Record-level processing tracking
- Query execution timing
- Validation and filtering statistics
Key Metrics
- Query success/failure rates
- Processing latency
- Message validation statistics
- Snowflake connection health
Related Services
This service integrates with: - UIReportReader: Primary source of UI report requests - ReportGenerator: Scheduled report processing - ReportSender: Downstream report delivery - SelfHealer: Error recovery and retry handling
Troubleshooting
Common Issues
- Snowflake Connection Failures: Check credentials and network connectivity
- High Query Failures: Monitor SQL syntax and data availability
- Old Message Drops: Verify upstream processing latency
- Event Hub Publishing Issues: Check connection strings and permissions
Monitoring Points
- Query execution success rates
- Message processing throughput
- Snowflake warehouse utilization
- Event Hub publishing success rates
- Age-based message filtering statistics