Skip to content

SnapshotFile Service

Overview

The SnapshotFile is a specialized Azure Function microservice that generates snapshot reports from Snowflake data warehouse and exports them to Azure Blob Storage. This service acts as a data extraction and export utility that creates periodic snapshots of publisher revenue and performance data for analysis, backup, and external consumption. It integrates with Snowflake for data querying and Azure Blob Storage for file storage.

Business Purpose

This service serves as a data export and snapshot utility that: - Generates periodic snapshots of publisher revenue and performance data - Exports data from Snowflake data warehouse to Azure Blob Storage - Creates vendor-specific data extracts for analysis and reporting - Supports data archival and backup processes - Provides data exports for external systems and partners - Maintains historical data snapshots for compliance and auditing - Enables offline data analysis and processing

Architecture

Service Type

  • Platform: Azure Functions (Containerized Kubernetes Microservice)
  • Runtime: Node.js
  • Trigger: HTTP Trigger (Anonymous authentication)
  • Pattern: Scheduled Data Export with Snowflake Integration

Key Components

graph TD
    A[HTTP Trigger] --> B[SnapshotFile Handler]
    B --> C[Date Range Calculation]
    C --> D[Get Vendors]
    D --> E[Snowflake Query]
    E --> F[Vendor Processing Loop]

    F --> G[Generate Snapshot Report]
    G --> H[Build SQL Query]
    H --> I[Create COPY INTO Statement]
    I --> J[Event Hub: report]

    K[Snowflake Data Warehouse] --> E
    L[Azure Blob Storage] --> I
    M[SAS Token] --> I

    N[Vendor List] --> F
    O[SQL Templates] --> H
    P[Configuration] --> C

    Q[ReportGenerator] --> J
    R[Azure Blob Storage] --> S[Exported Files]

Core Functionality

Data Export Process

  1. Date Range Calculation: Calculates snapshot date range based on configuration
  2. Vendor Retrieval: Fetches list of active vendors from Snowflake
  3. Report Generation: Creates vendor-specific snapshot reports
  4. SQL Query Building: Constructs Snowflake COPY INTO statements
  5. Event Publishing: Publishes report generation events to Event Hub

Snapshot Configuration

  • Default Range: Configurable number of days back from current date
  • Date Format: MM/DD/YYYY format for Snowflake compatibility
  • Vendor Filtering: Supports individual vendor or all vendors
  • File Naming: Structured file naming with date and vendor information

Key Features

  • Automated Snapshots: Scheduled data export from Snowflake to Azure Blob Storage
  • Vendor-Specific Exports: Individual data exports per vendor
  • Flexible Date Ranges: Configurable snapshot periods
  • Snowflake Integration: Direct integration with Snowflake data warehouse
  • Azure Blob Storage: Secure file storage with SAS token authentication
  • Event-Driven: Publishes events for downstream report processing
  • SQL Template System: Reusable SQL templates for different report types

Data Flow

Snapshot Generation Process

  1. Trigger Activation: Service triggered via HTTP request or schedule
  2. Date Calculation: Calculates start and end dates for snapshot period
  3. Vendor Query: Retrieves active vendor list from Snowflake
  4. Vendor Iteration: Processes each vendor individually
  5. SQL Generation: Creates vendor-specific COPY INTO SQL statements
  6. Event Publishing: Publishes report generation events to Event Hub
  7. Report Processing: ReportGenerator service processes the events

SQL Query Structure

COPY INTO 'azure://storage.blob.core.windows.net/container/path/file.csv'
FROM (
    SELECT vendor_data, revenue_metrics, performance_data
    FROM publisher_data 
    WHERE date_range AND vendor_filter
)
FILE_FORMAT = (TYPE = CSV COMPRESSION=NONE FIELD_OPTIONALLY_ENCLOSED_BY = '"' NULL_IF=())
SINGLE = TRUE
OVERWRITE = TRUE
CREDENTIALS = (azure_sas_token='token');

Configuration

Date Range Configuration

  • snapshotForDays: Number of days back from current date (configurable)
  • Start Date: Calculated as current date minus snapshotForDays
  • End Date: Current date plus 1 day for inclusive range

File Path Structure

azure://{resourceGroup}.blob.core.windows.net/
{container}/{prefix}/{publisherkey}/{YYYY}/{MM}/{filename}

Snowflake Configuration

  • Connection: Snowflake data warehouse connection
  • SAS Token: Azure Blob Storage authentication token
  • Database: Publisher data warehouse database
  • Schema: Revenue and performance data schema

Event Structure

Output Event Format

{
    "query": "COPY INTO 'azure://...' FROM (...) FILE_FORMAT = (...)",
    "publisher": {
        "publisherkey": "vendor-identifier",
        "name": "Snapshot Report",
        "blobFileName1": "azure://storage.blob.core.windows.net/",
        "blobFileName2": "container/prefix/vendor/YYYY/MM/filename.csv",
        "deliver": false
    },
    "request_timestamp": "2023-01-01T00:00:00.000Z"
}

Event Hub Integration

  • Destination: report Event Hub
  • Consumer: ReportGenerator service
  • Processing: Executes Snowflake queries and generates files

SQL Templates

Vendor-Specific Query (snapshot.sql)

  • Filters data for specific vendor
  • Includes vendor clause in WHERE condition
  • Optimized for individual vendor processing

All Vendors Query (allSql)

  • Processes all vendors in single query
  • Used for comprehensive snapshots
  • Bulk processing for efficiency

Query Parameters

  • {startdate}: Snapshot start date
  • {enddate}: Snapshot end date
  • {vendorClause}: Vendor filtering condition

Performance Characteristics

Processing Metrics

  • Vendor Processing: Sequential processing per vendor
  • Query Execution: Depends on Snowflake performance and data volume
  • File Generation: Parallel file creation in Azure Blob Storage
  • Event Publishing: Real-time event publishing to Event Hub

Scalability Considerations

  • Vendor Volume: Scales linearly with number of vendors
  • Data Volume: Performance depends on Snowflake query complexity
  • Storage: Azure Blob Storage handles large file volumes
  • Concurrent Processing: Sequential vendor processing prevents resource conflicts

Dependencies

External Services

  • Snowflake: Data warehouse for source data
  • Azure Blob Storage: File storage destination
  • Event Hub: Report generation event publishing
  • ReportGenerator: Downstream report processing service

Key NPM Packages

  • snowflake-sdk: Snowflake database connectivity
  • moment: Date/time manipulation and formatting
  • async: Asynchronous flow control for vendor processing

Error Handling

Error Scenarios

  1. Snowflake Connectivity: Database connection or query failures
  2. Vendor Retrieval: Unable to fetch vendor list
  3. SQL Generation: Template processing or parameter substitution errors
  4. Event Publishing: Event Hub connectivity or publishing failures
  5. Configuration Issues: Missing or invalid configuration parameters

Recovery Mechanisms

  • Error Logging: Comprehensive error logging for debugging
  • Graceful Degradation: Continue processing other vendors if one fails
  • Retry Logic: Built-in retry for transient failures
  • Validation: Input validation and parameter checking

Monitoring and Observability

Logging

  • Vendor processing progress and status
  • SQL query generation and parameters
  • Event publishing success/failure
  • Error conditions and stack traces

Metrics

  • Number of vendors processed
  • Snapshot generation success rates
  • Query execution times
  • File generation volumes

Application Insights Integration

  • Custom telemetry for snapshot operations
  • Performance monitoring for Snowflake queries
  • Error tracking and alerting

Security Considerations

  • SAS Token Authentication: Secure Azure Blob Storage access
  • Snowflake Credentials: Secure database connection credentials
  • Data Privacy: Secure handling of vendor and revenue data
  • Access Control: Restricted access to sensitive data exports

File Management

File Naming Convention

  • Pattern: {vendor}_{YYYY}_{MM}_{DD}_snapshot.csv
  • Organization: Hierarchical folder structure by vendor and date
  • Versioning: Overwrite existing files for same date range

Storage Organization

container/
├── prefix/
│   ├── vendor1/
│   │   ├── 2023/
│   │   │   ├── 01/
│   │   │   │   └── vendor1_2023_01_15_snapshot.csv
│   │   │   └── 02/
│   │   └── 2024/
│   └── vendor2/

This service integrates with the broader Publisher ecosystem: - ReportGenerator: Processes snapshot generation events - Snowflake Data Warehouse: Source of snapshot data - Azure Blob Storage: Storage destination for snapshot files - Vendor Management: Provides vendor list for processing

Troubleshooting

Common Issues

  1. Snowflake Connection: Verify database connectivity and credentials
  2. SAS Token Expiration: Check Azure Blob Storage SAS token validity
  3. Vendor Query Failures: Verify vendor table structure and permissions
  4. File Path Issues: Check blob storage container and path configuration
  5. Date Range Problems: Verify date calculation and format

Debug Steps

  1. Check Application Insights for processing logs
  2. Verify Snowflake connection and query execution
  3. Validate Azure Blob Storage SAS token and permissions
  4. Review vendor list query and results
  5. Test SQL template parameter substitution

Development

Local Development Setup

  1. Clone repository
  2. Install dependencies: npm install
  3. Configure Snowflake connection
  4. Set up Azure Blob Storage SAS token
  5. Configure Event Hub connection
  6. Run tests: npm test

Testing

# Test snapshot generation
node test.js

# Test with specific vendor
# Configure vendor filter in test data

Code Structure

  • src/Handler.js: Main snapshot processing logic
  • src/actions/GetVendors.js: Vendor retrieval from Snowflake
  • src/actions/SnapshotReport.js: SQL generation and report configuration
  • src/actions/sql/: SQL template files
  • config/: Environment-specific configurations

Operational Considerations

Scheduling

  • Typically scheduled to run daily or weekly
  • Consider Snowflake data refresh schedules
  • Coordinate with downstream report processing

Capacity Planning

  • Monitor Snowflake query performance and costs
  • Plan Azure Blob Storage capacity for file volumes
  • Consider vendor growth and data volume increases

Future Enhancements

Potential Improvements

  • Parallel Processing: Concurrent vendor processing for improved performance
  • Incremental Snapshots: Delta processing for large datasets
  • Compression: File compression for storage optimization
  • Retention Policies: Automated cleanup of old snapshot files
  • Custom Date Ranges: Support for custom snapshot periods
  • Data Validation: Post-export data validation and quality checks