Skip to content

Report Generator Microservice

A Kubernetes-deployed microservice that generates and schedules data reports from a Snowflake data warehouse. This service runs as a containerized Node.js application in a Kubernetes pod and is part of a larger publisher/data lake ecosystem, handling both scheduled and on-demand report generation.

Overview

The Report Generator acts as a report orchestrator that bridges scheduled business requirements with data warehouse capabilities, handling the complexity of timing, authentication, and data export logistics.

Core Functionality

Report Types

  • Scheduled Reports: Runs automatically via cron jobs (every hour at the 5th minute) through an Event Dispatcher
  • Manual Reports: Can be triggered manually with specific run dates
  • UI Reports: Interactive reports requested through a web interface

Data Pipeline Architecture

graph TD
    A[Event Dispatcher<br/>Cron Scheduler] -->|Scheduled Trigger| B[Report Generator<br/>Kubernetes Pod]
    C[UI/Manual Request] -->|HTTP Request| B
    D[Cosmos DB<br/>Report Configs] -->|Fetch Reports| B
    B -->|Generate SQL| E[Snowflake<br/>Data Warehouse]
    B -->|Publish Events| F[Event Hub]
    F -->|Process Events| G[Downstream Services]
    G -->|Export Data| H[Azure Blob Storage<br/>CSV Files]

    style B fill:#e1f5fe
    style D fill:#f3e5f5
    style F fill:#e8f5e8
    style H fill:#fff3e0

Key Components

Handler.js

Main orchestrator that: - Determines request type (scheduled, manual, or UI) - Retrieves eligible reports from Cosmos DB - Generates SQL queries for Snowflake - Publishes events to Event Hub for processing

GetReportParams.js

Report management that: - Fetches report configurations from Cosmos DB - Filters reports based on cron schedules and eligibility - Handles different report execution scenarios

GetSql.js

SQL generation that: - Creates Snowflake COPY INTO statements for data export - Handles date/time token replacement - Generates Azure Blob Storage paths for output files

Cosmos.js

Database abstraction layer for Azure Cosmos DB operations

auth.js

Authentication and authorization handling: - JWT token-based authentication - Role-based access control (admin/user roles) - Publisher-specific authorization

API Endpoints

Main Endpoint

  • POST / - Main report generation endpoint
  • Handles scheduled, manual, and UI report requests
  • Requires authentication (except for cron jobs)

Health Check Endpoints

  • GET /live - Kubernetes liveness probe
  • GET /ready - Kubernetes readiness probe

Request Types

1. EventHub Requests (Scheduled)

{
  "headers": {
    "x-eventhub": "reportgenerator"
  }
}

2. UI Requests (Interactive)

{
  "body": {
    "publisherKey": "publisher123",
    "deliver": false
  }
}

3. Manual Requests

{
  "headers": {
    "ismanualrun": "true"
  },
  "body": {
    "runDate": "2023-12-01",
    "deliver": true
  }
}

Configuration

Environment Configurations

  • config/config.js - Development configuration
  • config/config.int.js - Integration configuration
  • config/config.prod.js - Production configuration

Key Configuration Elements

  • Cosmos DB: Report storage and retrieval
  • Snowflake: Data warehouse connection and SAS tokens
  • Azure Blob Storage: Output file storage
  • Event Dispatcher: Cron scheduling configuration
  • Application Insights: Monitoring and logging

Data Flow

  1. Report Configuration: Reports are configured and stored in Cosmos DB
  2. Trigger: Service triggered by cron, manual request, or UI
  3. Report Retrieval: Eligible reports fetched based on schedule/criteria
  4. SQL Generation: Snowflake COPY INTO statements generated with proper paths
  5. Event Publishing: Events published to Event Hub for downstream processing
  6. File Output: CSV files exported to Azure Blob Storage with structured naming

Detailed Process Flow

flowchart TD
    Start([Request Received]) --> Auth{Authentication<br/>Required?}
    Auth -->|Yes| AuthCheck[Verify JWT Token<br/>& Permissions]
    Auth -->|No - Cron| AuthCheck
    AuthCheck --> AuthFail{Auth<br/>Success?}
    AuthFail -->|No| Error1[Return 401<br/>Unauthorized]
    AuthFail -->|Yes| ReqType{Request<br/>Type?}

    ReqType -->|UI Request| UI[Get Specific Report<br/>deliver = optional]
    ReqType -->|Manual Run| Manual[Get Manual Run Reports<br/>deliver = default true]
    ReqType -->|Scheduled| Scheduled[Get Eligible Reports<br/>deliver = true]

    UI --> ProcessReports[Process Each Report]
    Manual --> ProcessReports
    Scheduled --> ProcessReports

    ProcessReports --> GenSQL[Generate SQL<br/>with Date Tokens]
    GenSQL --> CreateEvent[Create Event Hub<br/>Message]
    CreateEvent --> PubEvent[Publish to<br/>Event Hub]
    PubEvent --> MoreReports{More<br/>Reports?}

    MoreReports -->|Yes| ProcessReports
    MoreReports -->|No| BuildResponse[Build Response<br/>Body]
    BuildResponse --> Success[Return 200<br/>with Blob URLs]

    Error1 --> End([End])
    Success --> End

    style Start fill:#e8f5e8
    style End fill:#ffebee
    style AuthCheck fill:#e3f2fd
    style ProcessReports fill:#f3e5f5
    style PubEvent fill:#fff3e0

Output File Structure

azure://{resourceGroup}.blob.core.windows.net/
{container}/{prefix}/{publisherkey}/{YYYY}/{MM}/{DD}/{filename}

Authentication & Security

  • JWT Authentication: Required for UI and manual requests
  • Role-Based Access: Admin and user roles supported
  • Publisher Authorization: Users restricted to their publisher organization
  • Bypass for Cron: Scheduled jobs bypass authentication via header check

Dependencies

Core Dependencies

  • @azure/cosmos - Cosmos DB client
  • jsonwebtoken - JWT authentication
  • cron-parser - Cron expression parsing
  • dateformat - Date formatting utilities
  • moment-timezone - Timezone handling

Development Dependencies

  • webpack - Module bundling
  • terser-webpack-plugin - Code minification

Deployment

The service is containerized and deployed as a Kubernetes pod with: - Node.js 22 runtime - Docker containerization - Load balancer exposure - CORS enabled - Event Dispatcher integration

Kubernetes Resource Configuration

  • CPU: 300m
  • Memory: 512Mi
  • Memory Limits: 512Mi
  • Replicas: 1 (dev), 3 (int), 2 (prod)

Monitoring

  • Application Insights integration for logging and telemetry
  • Kubernetes health check endpoints for pod orchestration
  • Comprehensive error handling with structured logging
  • Request tracing with correlation IDs

Version

Current version: 0.0.116

Architecture Pattern

This follows a microservice event-driven architecture where the Report Generator acts as an orchestrator that: - Manages report scheduling and configuration - Generates appropriate SQL for data extraction - Publishes events for downstream processing - Handles authentication and authorization - Provides monitoring and health check capabilities

The service essentially decouples report definition from execution, allowing for scalable and maintainable data pipeline operations.

Getting Started

Prerequisites

  • Node.js 22
  • Docker
  • Kubernetes cluster
  • Access to Azure Cosmos DB
  • Access to Snowflake data warehouse
  • Azure Event Hub connection

Local Development

# Install dependencies
npm install

# Run tests
npm test

# Build for production
npm run build

Container Build

# Build Docker image
docker build -t reportgenerator:latest .

# Run container locally
docker run -p 3000:3000 reportgenerator:latest

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: reportgenerator
spec:
  replicas: 1
  selector:
    matchLabels:
      app: reportgenerator
  template:
    metadata:
      labels:
        app: reportgenerator
    spec:
      containers:
      - name: reportgenerator
        image: reportgenerator:latest
        ports:
        - containerPort: 3000
        resources:
          requests:
            cpu: 300m
            memory: 512Mi
          limits:
            memory: 512Mi
        livenessProbe:
          httpGet:
            path: /live
            port: 3000
        readinessProbe:
          httpGet:
            path: /ready
            port: 3000