SnapshotFile Service
Overview
The SnapshotFile is a specialized Azure Function microservice that generates snapshot reports from Snowflake data warehouse and exports them to Azure Blob Storage. This service acts as a data extraction and export utility that creates periodic snapshots of publisher revenue and performance data for analysis, backup, and external consumption. It integrates with Snowflake for data querying and Azure Blob Storage for file storage.
Business Purpose
This service serves as a data export and snapshot utility that: - Generates periodic snapshots of publisher revenue and performance data - Exports data from Snowflake data warehouse to Azure Blob Storage - Creates vendor-specific data extracts for analysis and reporting - Supports data archival and backup processes - Provides data exports for external systems and partners - Maintains historical data snapshots for compliance and auditing - Enables offline data analysis and processing
Architecture
Service Type
- Platform: Azure Functions (Containerized Kubernetes Microservice)
- Runtime: Node.js
- Trigger: HTTP Trigger (Anonymous authentication)
- Pattern: Scheduled Data Export with Snowflake Integration
Key Components
graph TD
A[HTTP Trigger] --> B[SnapshotFile Handler]
B --> C[Date Range Calculation]
C --> D[Get Vendors]
D --> E[Snowflake Query]
E --> F[Vendor Processing Loop]
F --> G[Generate Snapshot Report]
G --> H[Build SQL Query]
H --> I[Create COPY INTO Statement]
I --> J[Event Hub: report]
K[Snowflake Data Warehouse] --> E
L[Azure Blob Storage] --> I
M[SAS Token] --> I
N[Vendor List] --> F
O[SQL Templates] --> H
P[Configuration] --> C
Q[ReportGenerator] --> J
R[Azure Blob Storage] --> S[Exported Files]
Core Functionality
Data Export Process
- Date Range Calculation: Calculates snapshot date range based on configuration
- Vendor Retrieval: Fetches list of active vendors from Snowflake
- Report Generation: Creates vendor-specific snapshot reports
- SQL Query Building: Constructs Snowflake COPY INTO statements
- Event Publishing: Publishes report generation events to Event Hub
Snapshot Configuration
- Default Range: Configurable number of days back from current date
- Date Format: MM/DD/YYYY format for Snowflake compatibility
- Vendor Filtering: Supports individual vendor or all vendors
- File Naming: Structured file naming with date and vendor information
Key Features
- Automated Snapshots: Scheduled data export from Snowflake to Azure Blob Storage
- Vendor-Specific Exports: Individual data exports per vendor
- Flexible Date Ranges: Configurable snapshot periods
- Snowflake Integration: Direct integration with Snowflake data warehouse
- Azure Blob Storage: Secure file storage with SAS token authentication
- Event-Driven: Publishes events for downstream report processing
- SQL Template System: Reusable SQL templates for different report types
Data Flow
Snapshot Generation Process
- Trigger Activation: Service triggered via HTTP request or schedule
- Date Calculation: Calculates start and end dates for snapshot period
- Vendor Query: Retrieves active vendor list from Snowflake
- Vendor Iteration: Processes each vendor individually
- SQL Generation: Creates vendor-specific COPY INTO SQL statements
- Event Publishing: Publishes report generation events to Event Hub
- Report Processing: ReportGenerator service processes the events
SQL Query Structure
COPY INTO 'azure://storage.blob.core.windows.net/container/path/file.csv'
FROM (
SELECT vendor_data, revenue_metrics, performance_data
FROM publisher_data
WHERE date_range AND vendor_filter
)
FILE_FORMAT = (TYPE = CSV COMPRESSION=NONE FIELD_OPTIONALLY_ENCLOSED_BY = '"' NULL_IF=())
SINGLE = TRUE
OVERWRITE = TRUE
CREDENTIALS = (azure_sas_token='token');
Configuration
Date Range Configuration
- snapshotForDays: Number of days back from current date (configurable)
- Start Date: Calculated as current date minus snapshotForDays
- End Date: Current date plus 1 day for inclusive range
File Path Structure
azure://{resourceGroup}.blob.core.windows.net/
{container}/{prefix}/{publisherkey}/{YYYY}/{MM}/{filename}
Snowflake Configuration
- Connection: Snowflake data warehouse connection
- SAS Token: Azure Blob Storage authentication token
- Database: Publisher data warehouse database
- Schema: Revenue and performance data schema
Event Structure
Output Event Format
{
"query": "COPY INTO 'azure://...' FROM (...) FILE_FORMAT = (...)",
"publisher": {
"publisherkey": "vendor-identifier",
"name": "Snapshot Report",
"blobFileName1": "azure://storage.blob.core.windows.net/",
"blobFileName2": "container/prefix/vendor/YYYY/MM/filename.csv",
"deliver": false
},
"request_timestamp": "2023-01-01T00:00:00.000Z"
}
Event Hub Integration
- Destination:
reportEvent Hub - Consumer: ReportGenerator service
- Processing: Executes Snowflake queries and generates files
SQL Templates
Vendor-Specific Query (snapshot.sql)
- Filters data for specific vendor
- Includes vendor clause in WHERE condition
- Optimized for individual vendor processing
All Vendors Query (allSql)
- Processes all vendors in single query
- Used for comprehensive snapshots
- Bulk processing for efficiency
Query Parameters
{startdate}: Snapshot start date{enddate}: Snapshot end date{vendorClause}: Vendor filtering condition
Performance Characteristics
Processing Metrics
- Vendor Processing: Sequential processing per vendor
- Query Execution: Depends on Snowflake performance and data volume
- File Generation: Parallel file creation in Azure Blob Storage
- Event Publishing: Real-time event publishing to Event Hub
Scalability Considerations
- Vendor Volume: Scales linearly with number of vendors
- Data Volume: Performance depends on Snowflake query complexity
- Storage: Azure Blob Storage handles large file volumes
- Concurrent Processing: Sequential vendor processing prevents resource conflicts
Dependencies
External Services
- Snowflake: Data warehouse for source data
- Azure Blob Storage: File storage destination
- Event Hub: Report generation event publishing
- ReportGenerator: Downstream report processing service
Key NPM Packages
snowflake-sdk: Snowflake database connectivitymoment: Date/time manipulation and formattingasync: Asynchronous flow control for vendor processing
Error Handling
Error Scenarios
- Snowflake Connectivity: Database connection or query failures
- Vendor Retrieval: Unable to fetch vendor list
- SQL Generation: Template processing or parameter substitution errors
- Event Publishing: Event Hub connectivity or publishing failures
- Configuration Issues: Missing or invalid configuration parameters
Recovery Mechanisms
- Error Logging: Comprehensive error logging for debugging
- Graceful Degradation: Continue processing other vendors if one fails
- Retry Logic: Built-in retry for transient failures
- Validation: Input validation and parameter checking
Monitoring and Observability
Logging
- Vendor processing progress and status
- SQL query generation and parameters
- Event publishing success/failure
- Error conditions and stack traces
Metrics
- Number of vendors processed
- Snapshot generation success rates
- Query execution times
- File generation volumes
Application Insights Integration
- Custom telemetry for snapshot operations
- Performance monitoring for Snowflake queries
- Error tracking and alerting
Security Considerations
- SAS Token Authentication: Secure Azure Blob Storage access
- Snowflake Credentials: Secure database connection credentials
- Data Privacy: Secure handling of vendor and revenue data
- Access Control: Restricted access to sensitive data exports
File Management
File Naming Convention
- Pattern:
{vendor}_{YYYY}_{MM}_{DD}_snapshot.csv - Organization: Hierarchical folder structure by vendor and date
- Versioning: Overwrite existing files for same date range
Storage Organization
container/
├── prefix/
│ ├── vendor1/
│ │ ├── 2023/
│ │ │ ├── 01/
│ │ │ │ └── vendor1_2023_01_15_snapshot.csv
│ │ │ └── 02/
│ │ └── 2024/
│ └── vendor2/
Related Services
This service integrates with the broader Publisher ecosystem: - ReportGenerator: Processes snapshot generation events - Snowflake Data Warehouse: Source of snapshot data - Azure Blob Storage: Storage destination for snapshot files - Vendor Management: Provides vendor list for processing
Troubleshooting
Common Issues
- Snowflake Connection: Verify database connectivity and credentials
- SAS Token Expiration: Check Azure Blob Storage SAS token validity
- Vendor Query Failures: Verify vendor table structure and permissions
- File Path Issues: Check blob storage container and path configuration
- Date Range Problems: Verify date calculation and format
Debug Steps
- Check Application Insights for processing logs
- Verify Snowflake connection and query execution
- Validate Azure Blob Storage SAS token and permissions
- Review vendor list query and results
- Test SQL template parameter substitution
Development
Local Development Setup
- Clone repository
- Install dependencies:
npm install - Configure Snowflake connection
- Set up Azure Blob Storage SAS token
- Configure Event Hub connection
- Run tests:
npm test
Testing
# Test snapshot generation
node test.js
# Test with specific vendor
# Configure vendor filter in test data
Code Structure
src/Handler.js: Main snapshot processing logicsrc/actions/GetVendors.js: Vendor retrieval from Snowflakesrc/actions/SnapshotReport.js: SQL generation and report configurationsrc/actions/sql/: SQL template filesconfig/: Environment-specific configurations
Operational Considerations
Scheduling
- Typically scheduled to run daily or weekly
- Consider Snowflake data refresh schedules
- Coordinate with downstream report processing
Capacity Planning
- Monitor Snowflake query performance and costs
- Plan Azure Blob Storage capacity for file volumes
- Consider vendor growth and data volume increases
Future Enhancements
Potential Improvements
- Parallel Processing: Concurrent vendor processing for improved performance
- Incremental Snapshots: Delta processing for large datasets
- Compression: File compression for storage optimization
- Retention Policies: Automated cleanup of old snapshot files
- Custom Date Ranges: Support for custom snapshot periods
- Data Validation: Post-export data validation and quality checks