At scale everything is a challenge. After systems for programming and managing storage at scale, this system, which is in use at Google, draws attention to the problem of profiling applications that run on very large estates of machines.
Profiling applications that run on multiple machines brings up issues like harvesting application behavior from multiple machines that are under the influence of varying workloads in production, processing meaningful information from the harvested data, making this information accessible to the developers and achieving all this with the added constraint of minimizing the profiling overheads.
These are the challenges along with its own performance and scalability issues that the GWP system tackles with its architecture. The scale forces even the profiling activity to be engineered as a bespoke distributed application of its own kind.
Harvesting Profiles :
The life cycle of an application profile begins with the Profile Collector (which itself is a distributed service) activating profiling sessions periodically on a subset of machines in the fleet. It retrieves different types of sampled profiles sequentially and concurrently depending on factors such as machine type, event type etc. The profiles and related metadata are saved in their raw format.
GWP collects two types of profiles: whole-machine and per-process. Whole-machine profiles capture all activities happening on the machine, including user applications, the kernel, kernel modules, daemons, and other background jobs. Per-process profiles are a collection of various types of profiles from applications running on a machine. These are collected using utilities from Google Performance Tools.
Processing Harvested Profiles :
The sourced profiles are stored on the Google File System. Once they are stored comes the most challenging part. In order to gain runtime efficiencies applications that are deployed in their data centers don’t keep any kind of debug or symbolic information. This makes using this information very hard as it cannot be correlated with the source code of the applications out of the box.
Therefore a separate Symbolization phase is now required to augment the generated profile with debug symbols. GWP uses a strategy where the symbols are derived based on an unstripped version of the application binary. The unstripped versions of all applications are stored in a global repository. Other services too seem to be using this repository already like for example to symbolize stack traces for automated failure notifications etc. So that perhaps was the motivation to go with this approach for symbolization.
These binaries tend to be very large in their sizes and hence could become a bottleneck if all processing on them was carried out sequentially. So Map Reduce is used to distributed this computation across a few hundred machines.
Reporting on Profiles :
Due to the scale of machines from which profile information is harvested and the fact that GWP has collected multi-year performance data the volume of these profiles runs into several terabytes. In order to make use of this information they are loaded into a dimensional database distributed across multiple machines. This database, which supports a subset of SQL, is queried for any performance analysis of machines and applications. A web user interface is layered atop this database to help run the queries.
Also an API is offered to read information directly from the database.
Google-Wide Profiling (GWP), a continuous profiling infrastructure for data centers, provides performance insights for cloud applications. With negligible overhead, GWP provides stable, accurate profiles and a datacenter-scale tool for traditional performance analyses. Furthermore, GWP introduces novel applications of its profiles, such as application platform affinity measurements and identification of platform-specific, microarchitectural peculiarities.
Previewing from http://research.google.com/pubs/pub36575.html