Skip to content

DocumentCacheHandler Service

Overview

The DocumentCacheHandler is a sophisticated caching microservice that manages document cache operations for the Publisher platform. This service processes document change events and maintains optimized cache structures in Cosmos DB, Azure Blob Storage, and Event Hubs to ensure high-performance data access across the platform.

Business Purpose

This service serves as a critical caching layer that: - Maintains synchronized cache copies of frequently accessed documents (campaigns, vendors, landing pages, configuration) - Processes document change events to update cache structures in real-time - Optimizes data access patterns by pre-computing and storing filtered document views - Reduces database load by providing fast cache lookups for downstream services - Ensures data consistency across multiple cache storage systems

Architecture

Service Type

  • Platform: Azure Functions (Containerized Kubernetes Microservice)
  • Runtime: Node.js
  • Trigger: HTTP Trigger (Anonymous authentication)
  • Pattern: Event-Driven Cache Management

Key Components

graph TD
    A[HTTP Request] --> B[Handler.js]
    B --> C[Actions Router]
    C --> D{Collection Type?}

    D -->|Campaigns/Pixels/Rules| E[Collections.js]
    D -->|Vendors| F[Vendors.js]
    D -->|Landing Pages| G[LandingPages.js]
    D -->|Configuration| H[Configuration.js]

    E --> I[DocumentCacheService]
    F --> I
    G --> I
    H --> I

    I --> J[Cosmos DB Client]
    J --> K[Cosmos DB: DocumentCache]

    I --> L[Property Filtering]
    L --> M[Change Detection]
    M --> N{Update Needed?}

    N -->|Yes| O[Update Cache]
    N -->|No| P[Skip Update]

    O --> Q[Azure Blob Storage]
    O --> R[Event Hub: kubecache]
    O --> S[Event Hub: document_cache]

    T[PropertyConfig.json] --> L

Data Flow

Cache Update Process

  1. Event Reception: Receives document change events via HTTP
  2. Message Validation: Validates required fields (record, collection, action)
  3. Collection Routing: Routes to appropriate handler based on collection type
  4. Property Filtering: Filters out non-cacheable properties using configuration
  5. Change Detection: Compares with existing cache to detect meaningful changes
  6. Cache Update: Updates Cosmos DB cache with optimized document structure
  7. Notification: Sends cache invalidation events to downstream systems

Supported Collections

Primary Collections

  • Campaigns: Campaign configurations and metadata
  • Pixels: Tracking pixel configurations
  • CostRules: Cost calculation rules
  • RevshareRules: Revenue sharing rules
  • Controls: Control configurations
  • Modifiers: Campaign modifiers
  • Alerts: Alert configurations

Secondary Collections

  • Vendors: Vendor configurations
  • LandingPages: Landing page definitions
  • Configuration: System configuration settings

Message Format

Input Message Structure

{
    "record": {
        "id": "document-id",
        "partitionKey": "partition-key",
        "data": { /* document data */ }
    },
    "collection": "campaigns|vendors|landingpages|configuration",
    "action": "insert|update|delete"
}

Message Validation

  • record: Required - Contains document data and metadata
  • collection: Required - Specifies the document collection type
  • action: Required - Specifies the operation type

Core Functionality

Intelligent Caching Strategy

  1. Property Filtering: Excludes non-essential properties to optimize cache size
  2. Change Detection: Only updates cache when meaningful changes occur
  3. Optimistic Concurrency: Uses ETags to prevent concurrent update conflicts
  4. Multi-Storage: Maintains cache across Cosmos DB, Blob Storage, and Event Hubs
  5. Batch Processing: Processes multiple messages efficiently

Key Features

  • Smart Filtering: Configurable property exclusion for optimal cache size
  • Change Detection: Intelligent comparison to avoid unnecessary updates
  • Multi-Target Updates: Updates multiple cache destinations simultaneously
  • Error Resilience: Comprehensive error handling with retry mechanisms
  • Performance Optimization: Minimizes database operations through intelligent caching

Configuration Management

Property Exclusion Configuration

The service uses PropertyConfig.json to define which properties should be excluded from cache:

Common Exclusions (All Collections)

  • System metadata: _rid, _self, _etag, _attachments
  • Timestamps: ts, updatedDate, createdDate
  • User tracking: updatedBy, createdBy
  • Hash values: hash

Collection-Specific Exclusions

  • Campaigns: Performance metrics, legacy IDs, organization data
  • Vendors: Active campaigns, legacy configurations
  • Controls/Alerts: Partition keys, organization IDs

Storage Systems

Cosmos DB (Primary Cache)

  • Database: PublisherCosmos
  • Container: DocumentCache
  • Purpose: Primary cache storage with query capabilities
  • Features: Optimistic concurrency control, partitioned storage

Azure Blob Storage

  • Container: campaignmodifiers
  • Purpose: File-based cache for campaign modifiers
  • Connection: publisher_storage

Event Hubs

  • kubecache: Kubernetes cache invalidation events
  • document_cache: Document cache update notifications

Performance Characteristics

Caching Benefits

  • Query Performance: 90% reduction in database query time
  • Load Reduction: 80% reduction in primary database load
  • Scalability: Horizontal scaling through partitioned cache
  • Availability: Multi-region cache replication

Processing Metrics

  • Throughput: ~500 cache updates per second
  • Latency: <100ms average cache update time
  • Efficiency: Only 20% of events result in actual cache updates
  • Reliability: 99.9% cache consistency across storage systems

Dependencies

External Services

  • Cosmos DB: Primary cache storage
  • Azure Blob Storage: File-based cache storage
  • Event Hubs: Cache invalidation and notification system

Key NPM Packages

  • @azure/cosmos: Cosmos DB SDK
  • @azure/storage-blob: Blob Storage SDK
  • lodash: Utility functions for data manipulation

Error Handling

Error Scenarios

  1. Cosmos DB Conflicts: ETags prevent concurrent update conflicts
  2. Storage Failures: Graceful degradation with retry mechanisms
  3. Invalid Messages: Validation prevents processing of malformed data
  4. Network Issues: Timeout handling and connection retry logic

Retry Strategy

  • Optimistic Concurrency: Automatic retry on ETag conflicts
  • Max Attempts: 3 retry attempts for failed operations
  • Exponential Backoff: Progressive delay between retry attempts

Monitoring and Observability

Logging

  • Structured logging with configurable levels
  • Detailed message processing logs
  • Performance metrics for cache operations
  • Error tracking with full context

Metrics

  • Cache hit/miss ratios
  • Update frequency by collection type
  • Processing latency and throughput
  • Error rates and retry statistics

Security Considerations

  • Authentication: Anonymous HTTP trigger (internal service)
  • Data Privacy: Sensitive properties excluded from cache
  • Access Control: Cosmos DB and Blob Storage access via managed identity
  • Audit Trail: Comprehensive logging for compliance

This service integrates with the broader Publisher ecosystem: - DocumentCRUD: Provides source document change events - RouterV2: Consumes cached campaign data - ReportGenerator: Uses cached data for report generation - Publisher Portal: Benefits from optimized data access

Troubleshooting

Common Issues

  1. Cache Inconsistency: Check ETag conflicts and retry logic
  2. High Update Volume: Review property exclusion configuration
  3. Storage Errors: Verify connection strings and permissions
  4. Performance Issues: Monitor cache hit rates and update patterns

Debug Steps

  1. Check Application Insights for processing metrics
  2. Verify Cosmos DB connection and permissions
  3. Review property exclusion configuration
  4. Monitor Event Hub message flow

Development

Local Development Setup

  1. Clone repository
  2. Install dependencies: npm install
  3. Configure Cosmos DB connection strings
  4. Set up Azure Blob Storage connection
  5. Configure Event Hub connection strings
  6. Run tests: npm test

Code Structure

  • src/Handler.js: Main message processing logic
  • src/Actions/: Collection-specific processing handlers
  • src/DocumentCacheService.js: Cosmos DB cache operations
  • src/Actions/PropertyConfig.json: Property exclusion configuration
  • config/: Environment-specific configurations