Case Study: For Leading Pharmacy Retail Stores 
	
	
Business Context
A leading US retail pharmacy chain handles vast amounts of transactional, inventory, and customer data daily. This data comes from diverse sources, including APIs (REST and GraphQL), SQL/NoSQL databases, and third-party platforms such as marketing and logistics systems. The company requires a scalable ELT pipeline to ensure high-quality data is transformed and analyzed efficiently.
Problem Statement
The pharmacy faced the following challenges:
    - Scattered Data Sources:
 
        
             - APIs, SQL/NoSQL databases, and third-party platforms provided data in various formats and frequencies, making it difficult to consolidate.
 
        
    - Data Quality Issues:
 
        
            - Inconsistent customer details, duplicate entries, and missing product information created reporting inaccuracies.
 
        
    - Performance Bottlenecks in Legacy ETL System:
 
        
            - The traditional ETL process extracted and transformed data before loading it into the database.
 
            - This approach could not handle 2 million new rows daily (~50 million rows in total).
 
        
    - Inefficient Reporting:
 
        
            - Tableau dashboards were slow, taking several minutes to load due to high query complexity.
 
        
Proposed Solution
To address these challenges, the company implemented a modern ELT (Extract-Load-Transform) architecture with the following components:
    - Data Sources: APIs, SQL/NoSQL databases, and third-party platforms.
 
    - Data Quality Tool: Informatica Data Quality (IDQ) for profiling, cleansing, and standardization.
 
    - Data Destination: MySQL as the centralized data warehouse.
 
    - Reporting: Tableau for analytics and visualization.
 
Scenario Details
Data Sources and Volumes
    - APIs (REST, GraphQL):
 
        
            - Customer transactions, loyalty programs, and prescription refills.
 
            - Daily Volume: 1 million rows.
 
        
    - SQL Databases:
 
        
            - Inventory and store-level sales data.
 
            - Daily Volume: 600,000 rows.
 
        
    - NoSQL Databases:
 
        
            - Real-time clickstream data from the pharmacy's e-commerce platform.
 
            - Daily Volume: 300,000 rows.
 
        
    - Third-Party Platforms:
 
        
            - Logistics, marketing campaigns, and supplier data.
 
            - Daily Volume: 100,000 rows.
 
        
Solution Architecture
    - Extract-Load (EL)
 
        
            - Data is directly extracted from APIs, SQL/NoSQL databases, and third-party platforms in near real-time.
 
            - Extracted data is loaded as-is into staging tables in MySQL without transformations.
 
            - APIs are scheduled to fetch data every 15 minutes, while SQL/NoSQL databases use incremental loads.
 
        
    - Transform (T): ELT in MySQL
 
        
            - Transformations occur inside MySQL using SQL scripts, stored procedures, and views for better performance and scalability.
 
            - Key Transformation Steps:
 
                
                    - Data Deduplication: Identify and remove duplicate customer and transaction records.
 
                    - Standardization: Harmonize product names, customer addresses, and date formats.
 
                    - Aggregations: Summarize daily transactions, inventory levels, and campaign performance for Tableau.
 
                    - Data Enrichment: Join customer loyalty data with transaction history to derive customer lifetime value (CLV).
 
                
        
 
    
    
    - Data Quality with Informatica Data Quality (IDQ):
 
        
            - Profiling: Regularly profile data to identify anomalies such as missing or invalid fields.
 
            - Cleansing: Remove duplicates, validate email addresses, and ensure data consistency across sources.
 
            - Validation Rules: Ensure business rules are met, e.g., product price > $0, valid prescription IDs.
 
        
    - Data Destination: MySQL as Centralized Repository
 
        
            - Partitioning: Data is partitioned by date to optimize query performance and storage.
 
            - Indexing: Key fields like customer ID, transaction date, and product ID are indexed for faster lookups.
 
            - Compression: Reduces storage costs while maintaining query speed for large tables.
 
        
    - Reporting with Tableau
 
        
            - Tableau connects directly to MySQL’s aggregated and transformed tables.
 
            - Dashboards:
 
                
                    - Sales Dashboard: Displays sales trends by store, region, and product category.
 
                    - Inventory Dashboard: Highlights stockouts and overstocked items by location.
 
                    - Customer Insights Dashboard: Provides metrics like CLV, loyalty program engagement, and top customers.
 
                
            - Optimizations: Tableau’s extract mode is used for historical data, while live connections provide real-time updates.
 
        
    
    
Performance Metrics
    - Data Volume Management:
 
        
            - 2 million rows daily (~50 million total) are processed efficiently within the MySQL database.
 
            - ELT ensures that raw data is loaded quickly, with transformations handled during off-peak hours.
 
        
    - Improved Query Performance:
 
        
            - MySQL’s indexing and partitioning reduce query execution times by 40%.
 
            - Tableau dashboards load in under 5 seconds, down from several minutes previously.
 
        
    - Data Quality:
 
        
            - Informatica Data Quality ensures 99% data accuracy by resolving duplicates and fixing errors during profiling.
 
        
KPIs Monitored Post-Implementation
    - Operational Efficiency:
 
        
            - ETL pipeline execution time reduced from 4 hours to 1.5 hours with ELT.
 
        
    - Reporting Speed:
 
        
            - Tableau dashboard loading time reduced from 3 minutes to under 5 seconds.
 
        
    - Sales Growth Analysis:
 
        
            - Enabled real-time insights into sales trends, increasing revenue by 8% through optimized promotions.
 
        
    - Inventory Optimization:
 
        
            - Reduced stockouts by 15% and overstock by 10% using near-real-time inventory monitoring.
 
        
    - Customer Engagement:
 
        
            - Enhanced loyalty program participation by 20% through data-driven campaign targeting.
 
        
Benefits of the ELT Approach
    - Scalability:
 
        
            - MySQL can handle growing data volumes, ensuring scalability beyond the current 50 million records.
 
        
    - Efficiency:
 
        
            - ELT reduces the need for expensive compute resources during extraction and loading.
 
        
    - Improved Data Quality:
 
        
            - Informatica Data Quality ensures reliable data for decision-making.
 
        
    - Faster Reporting:
 
        
            - Tableau’s optimized dashboards empower executives with actionable insights, improving decision-making speed.
 
        
This ELT-based architecture transformed the pharmacy’s data operations, enabling efficient data processing, high-quality reporting, and data-driven decisions.