We understand Spark inside-out. Whether it is profiling Spark for performance or whether it is optimizing its workloads to maximize cost-savings and improving performance, we have done it all many a times. We have mastered the profile-optimize loop of Spark:
Besides deep understanding of profilers available with various Spark distros, we deploy our own powerful super-profiler ZettaProf to extract last bit of performance and cost-savings for customer’s workloads
ZettaProf is extensible to accommodate bespoke profiling requirements for specific use cases.
Spark distros expertise:
Open Source, Databricks, Azure Synapse, RAPIDs, EMR
Optimizations applied:
Resource sizing : CPUs, RAM, disk, network, parallelism
Right machines : Best suited machines on AWS, Azure, GCP etc.
Query rewrites
Table scan : partitioning, bucketing, skew handling, serialization, S3 slowness
Query : JOIN optimizations, shuffle minimization, caching etc.
End-to-end pipelines for:
Reading data from any source (DBs, devices etc.) and connecting to analytic tools (Looker, Tableau etc.)
Comparing performance across various distros, machine configurations, cloud options
Against other data processing options (Hive, Presto etc.)