Report Generator Microservice
A Kubernetes-deployed microservice that generates and schedules data reports from a Snowflake data warehouse. This service runs as a containerized Node.js application in a Kubernetes pod and is part of a larger publisher/data lake ecosystem, handling both scheduled and on-demand report generation.
Overview
The Report Generator acts as a report orchestrator that bridges scheduled business requirements with data warehouse capabilities, handling the complexity of timing, authentication, and data export logistics.
Core Functionality
Report Types
- Scheduled Reports: Runs automatically via cron jobs (every hour at the 5th minute) through an Event Dispatcher
- Manual Reports: Can be triggered manually with specific run dates
- UI Reports: Interactive reports requested through a web interface
Data Pipeline Architecture
graph TD
A[Event Dispatcher<br/>Cron Scheduler] -->|Scheduled Trigger| B[Report Generator<br/>Kubernetes Pod]
C[UI/Manual Request] -->|HTTP Request| B
D[Cosmos DB<br/>Report Configs] -->|Fetch Reports| B
B -->|Generate SQL| E[Snowflake<br/>Data Warehouse]
B -->|Publish Events| F[Event Hub]
F -->|Process Events| G[Downstream Services]
G -->|Export Data| H[Azure Blob Storage<br/>CSV Files]
style B fill:#e1f5fe
style D fill:#f3e5f5
style F fill:#e8f5e8
style H fill:#fff3e0
Key Components
Handler.js
Main orchestrator that: - Determines request type (scheduled, manual, or UI) - Retrieves eligible reports from Cosmos DB - Generates SQL queries for Snowflake - Publishes events to Event Hub for processing
GetReportParams.js
Report management that: - Fetches report configurations from Cosmos DB - Filters reports based on cron schedules and eligibility - Handles different report execution scenarios
GetSql.js
SQL generation that: - Creates Snowflake COPY INTO statements for data export - Handles date/time token replacement - Generates Azure Blob Storage paths for output files
Cosmos.js
Database abstraction layer for Azure Cosmos DB operations
auth.js
Authentication and authorization handling: - JWT token-based authentication - Role-based access control (admin/user roles) - Publisher-specific authorization
API Endpoints
Main Endpoint
- POST
/- Main report generation endpoint - Handles scheduled, manual, and UI report requests
- Requires authentication (except for cron jobs)
Health Check Endpoints
- GET
/live- Kubernetes liveness probe - GET
/ready- Kubernetes readiness probe
Request Types
1. EventHub Requests (Scheduled)
{
"headers": {
"x-eventhub": "reportgenerator"
}
}
2. UI Requests (Interactive)
{
"body": {
"publisherKey": "publisher123",
"deliver": false
}
}
3. Manual Requests
{
"headers": {
"ismanualrun": "true"
},
"body": {
"runDate": "2023-12-01",
"deliver": true
}
}
Configuration
Environment Configurations
config/config.js- Development configurationconfig/config.int.js- Integration configurationconfig/config.prod.js- Production configuration
Key Configuration Elements
- Cosmos DB: Report storage and retrieval
- Snowflake: Data warehouse connection and SAS tokens
- Azure Blob Storage: Output file storage
- Event Dispatcher: Cron scheduling configuration
- Application Insights: Monitoring and logging
Data Flow
- Report Configuration: Reports are configured and stored in Cosmos DB
- Trigger: Service triggered by cron, manual request, or UI
- Report Retrieval: Eligible reports fetched based on schedule/criteria
- SQL Generation: Snowflake COPY INTO statements generated with proper paths
- Event Publishing: Events published to Event Hub for downstream processing
- File Output: CSV files exported to Azure Blob Storage with structured naming
Detailed Process Flow
flowchart TD
Start([Request Received]) --> Auth{Authentication<br/>Required?}
Auth -->|Yes| AuthCheck[Verify JWT Token<br/>& Permissions]
Auth -->|No - Cron| AuthCheck
AuthCheck --> AuthFail{Auth<br/>Success?}
AuthFail -->|No| Error1[Return 401<br/>Unauthorized]
AuthFail -->|Yes| ReqType{Request<br/>Type?}
ReqType -->|UI Request| UI[Get Specific Report<br/>deliver = optional]
ReqType -->|Manual Run| Manual[Get Manual Run Reports<br/>deliver = default true]
ReqType -->|Scheduled| Scheduled[Get Eligible Reports<br/>deliver = true]
UI --> ProcessReports[Process Each Report]
Manual --> ProcessReports
Scheduled --> ProcessReports
ProcessReports --> GenSQL[Generate SQL<br/>with Date Tokens]
GenSQL --> CreateEvent[Create Event Hub<br/>Message]
CreateEvent --> PubEvent[Publish to<br/>Event Hub]
PubEvent --> MoreReports{More<br/>Reports?}
MoreReports -->|Yes| ProcessReports
MoreReports -->|No| BuildResponse[Build Response<br/>Body]
BuildResponse --> Success[Return 200<br/>with Blob URLs]
Error1 --> End([End])
Success --> End
style Start fill:#e8f5e8
style End fill:#ffebee
style AuthCheck fill:#e3f2fd
style ProcessReports fill:#f3e5f5
style PubEvent fill:#fff3e0
Output File Structure
azure://{resourceGroup}.blob.core.windows.net/
{container}/{prefix}/{publisherkey}/{YYYY}/{MM}/{DD}/{filename}
Authentication & Security
- JWT Authentication: Required for UI and manual requests
- Role-Based Access: Admin and user roles supported
- Publisher Authorization: Users restricted to their publisher organization
- Bypass for Cron: Scheduled jobs bypass authentication via header check
Dependencies
Core Dependencies
@azure/cosmos- Cosmos DB clientjsonwebtoken- JWT authenticationcron-parser- Cron expression parsingdateformat- Date formatting utilitiesmoment-timezone- Timezone handling
Development Dependencies
webpack- Module bundlingterser-webpack-plugin- Code minification
Deployment
The service is containerized and deployed as a Kubernetes pod with: - Node.js 22 runtime - Docker containerization - Load balancer exposure - CORS enabled - Event Dispatcher integration
Kubernetes Resource Configuration
- CPU: 300m
- Memory: 512Mi
- Memory Limits: 512Mi
- Replicas: 1 (dev), 3 (int), 2 (prod)
Monitoring
- Application Insights integration for logging and telemetry
- Kubernetes health check endpoints for pod orchestration
- Comprehensive error handling with structured logging
- Request tracing with correlation IDs
Version
Current version: 0.0.116
Architecture Pattern
This follows a microservice event-driven architecture where the Report Generator acts as an orchestrator that: - Manages report scheduling and configuration - Generates appropriate SQL for data extraction - Publishes events for downstream processing - Handles authentication and authorization - Provides monitoring and health check capabilities
The service essentially decouples report definition from execution, allowing for scalable and maintainable data pipeline operations.
Getting Started
Prerequisites
- Node.js 22
- Docker
- Kubernetes cluster
- Access to Azure Cosmos DB
- Access to Snowflake data warehouse
- Azure Event Hub connection
Local Development
# Install dependencies
npm install
# Run tests
npm test
# Build for production
npm run build
Container Build
# Build Docker image
docker build -t reportgenerator:latest .
# Run container locally
docker run -p 3000:3000 reportgenerator:latest
Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: reportgenerator
spec:
replicas: 1
selector:
matchLabels:
app: reportgenerator
template:
metadata:
labels:
app: reportgenerator
spec:
containers:
- name: reportgenerator
image: reportgenerator:latest
ports:
- containerPort: 3000
resources:
requests:
cpu: 300m
memory: 512Mi
limits:
memory: 512Mi
livenessProbe:
httpGet:
path: /live
port: 3000
readinessProbe:
httpGet:
path: /ready
port: 3000