DocumentCacheHandler Service
Overview
The DocumentCacheHandler is a sophisticated caching microservice that manages document cache operations for the Publisher platform. This service processes document change events and maintains optimized cache structures in Cosmos DB, Azure Blob Storage, and Event Hubs to ensure high-performance data access across the platform.
Business Purpose
This service serves as a critical caching layer that: - Maintains synchronized cache copies of frequently accessed documents (campaigns, vendors, landing pages, configuration) - Processes document change events to update cache structures in real-time - Optimizes data access patterns by pre-computing and storing filtered document views - Reduces database load by providing fast cache lookups for downstream services - Ensures data consistency across multiple cache storage systems
Architecture
Service Type
- Platform: Azure Functions (Containerized Kubernetes Microservice)
- Runtime: Node.js
- Trigger: HTTP Trigger (Anonymous authentication)
- Pattern: Event-Driven Cache Management
Key Components
graph TD
A[HTTP Request] --> B[Handler.js]
B --> C[Actions Router]
C --> D{Collection Type?}
D -->|Campaigns/Pixels/Rules| E[Collections.js]
D -->|Vendors| F[Vendors.js]
D -->|Landing Pages| G[LandingPages.js]
D -->|Configuration| H[Configuration.js]
E --> I[DocumentCacheService]
F --> I
G --> I
H --> I
I --> J[Cosmos DB Client]
J --> K[Cosmos DB: DocumentCache]
I --> L[Property Filtering]
L --> M[Change Detection]
M --> N{Update Needed?}
N -->|Yes| O[Update Cache]
N -->|No| P[Skip Update]
O --> Q[Azure Blob Storage]
O --> R[Event Hub: kubecache]
O --> S[Event Hub: document_cache]
T[PropertyConfig.json] --> L
Data Flow
Cache Update Process
- Event Reception: Receives document change events via HTTP
- Message Validation: Validates required fields (record, collection, action)
- Collection Routing: Routes to appropriate handler based on collection type
- Property Filtering: Filters out non-cacheable properties using configuration
- Change Detection: Compares with existing cache to detect meaningful changes
- Cache Update: Updates Cosmos DB cache with optimized document structure
- Notification: Sends cache invalidation events to downstream systems
Supported Collections
Primary Collections
- Campaigns: Campaign configurations and metadata
- Pixels: Tracking pixel configurations
- CostRules: Cost calculation rules
- RevshareRules: Revenue sharing rules
- Controls: Control configurations
- Modifiers: Campaign modifiers
- Alerts: Alert configurations
Secondary Collections
- Vendors: Vendor configurations
- LandingPages: Landing page definitions
- Configuration: System configuration settings
Message Format
Input Message Structure
{
"record": {
"id": "document-id",
"partitionKey": "partition-key",
"data": { /* document data */ }
},
"collection": "campaigns|vendors|landingpages|configuration",
"action": "insert|update|delete"
}
Message Validation
record: Required - Contains document data and metadatacollection: Required - Specifies the document collection typeaction: Required - Specifies the operation type
Core Functionality
Intelligent Caching Strategy
- Property Filtering: Excludes non-essential properties to optimize cache size
- Change Detection: Only updates cache when meaningful changes occur
- Optimistic Concurrency: Uses ETags to prevent concurrent update conflicts
- Multi-Storage: Maintains cache across Cosmos DB, Blob Storage, and Event Hubs
- Batch Processing: Processes multiple messages efficiently
Key Features
- Smart Filtering: Configurable property exclusion for optimal cache size
- Change Detection: Intelligent comparison to avoid unnecessary updates
- Multi-Target Updates: Updates multiple cache destinations simultaneously
- Error Resilience: Comprehensive error handling with retry mechanisms
- Performance Optimization: Minimizes database operations through intelligent caching
Configuration Management
Property Exclusion Configuration
The service uses PropertyConfig.json to define which properties should be excluded from cache:
Common Exclusions (All Collections)
- System metadata:
_rid,_self,_etag,_attachments - Timestamps:
ts,updatedDate,createdDate - User tracking:
updatedBy,createdBy - Hash values:
hash
Collection-Specific Exclusions
- Campaigns: Performance metrics, legacy IDs, organization data
- Vendors: Active campaigns, legacy configurations
- Controls/Alerts: Partition keys, organization IDs
Storage Systems
Cosmos DB (Primary Cache)
- Database: PublisherCosmos
- Container: DocumentCache
- Purpose: Primary cache storage with query capabilities
- Features: Optimistic concurrency control, partitioned storage
Azure Blob Storage
- Container: campaignmodifiers
- Purpose: File-based cache for campaign modifiers
- Connection: publisher_storage
Event Hubs
- kubecache: Kubernetes cache invalidation events
- document_cache: Document cache update notifications
Performance Characteristics
Caching Benefits
- Query Performance: 90% reduction in database query time
- Load Reduction: 80% reduction in primary database load
- Scalability: Horizontal scaling through partitioned cache
- Availability: Multi-region cache replication
Processing Metrics
- Throughput: ~500 cache updates per second
- Latency: <100ms average cache update time
- Efficiency: Only 20% of events result in actual cache updates
- Reliability: 99.9% cache consistency across storage systems
Dependencies
External Services
- Cosmos DB: Primary cache storage
- Azure Blob Storage: File-based cache storage
- Event Hubs: Cache invalidation and notification system
Key NPM Packages
@azure/cosmos: Cosmos DB SDK@azure/storage-blob: Blob Storage SDKlodash: Utility functions for data manipulation
Error Handling
Error Scenarios
- Cosmos DB Conflicts: ETags prevent concurrent update conflicts
- Storage Failures: Graceful degradation with retry mechanisms
- Invalid Messages: Validation prevents processing of malformed data
- Network Issues: Timeout handling and connection retry logic
Retry Strategy
- Optimistic Concurrency: Automatic retry on ETag conflicts
- Max Attempts: 3 retry attempts for failed operations
- Exponential Backoff: Progressive delay between retry attempts
Monitoring and Observability
Logging
- Structured logging with configurable levels
- Detailed message processing logs
- Performance metrics for cache operations
- Error tracking with full context
Metrics
- Cache hit/miss ratios
- Update frequency by collection type
- Processing latency and throughput
- Error rates and retry statistics
Security Considerations
- Authentication: Anonymous HTTP trigger (internal service)
- Data Privacy: Sensitive properties excluded from cache
- Access Control: Cosmos DB and Blob Storage access via managed identity
- Audit Trail: Comprehensive logging for compliance
Related Services
This service integrates with the broader Publisher ecosystem: - DocumentCRUD: Provides source document change events - RouterV2: Consumes cached campaign data - ReportGenerator: Uses cached data for report generation - Publisher Portal: Benefits from optimized data access
Troubleshooting
Common Issues
- Cache Inconsistency: Check ETag conflicts and retry logic
- High Update Volume: Review property exclusion configuration
- Storage Errors: Verify connection strings and permissions
- Performance Issues: Monitor cache hit rates and update patterns
Debug Steps
- Check Application Insights for processing metrics
- Verify Cosmos DB connection and permissions
- Review property exclusion configuration
- Monitor Event Hub message flow
Development
Local Development Setup
- Clone repository
- Install dependencies:
npm install - Configure Cosmos DB connection strings
- Set up Azure Blob Storage connection
- Configure Event Hub connection strings
- Run tests:
npm test
Code Structure
src/Handler.js: Main message processing logicsrc/Actions/: Collection-specific processing handlerssrc/DocumentCacheService.js: Cosmos DB cache operationssrc/Actions/PropertyConfig.json: Property exclusion configurationconfig/: Environment-specific configurations