LIVERMORE, Calif. — The initial version of “OVIS” — a software tool developed by Sandia National Laboratories that provides intelligent, real-time monitoring of computational computer clusters — is now available for free download at http://ovis.ca.sandia.gov.
OVIS, say Sandia researchers, offers a statistical approach to the problem of computational platform monitoring and analysis, which can be inefficient and ineffective due to the traditional emphasis on manufacturer-specified, “absolute” thresholds. Instead, OVIS observes the overall statistical properties and environmental effects of a cluster, characterizing individual device behaviors and comparing them to a large number of statistically similar devices.
Thus, individual node values that appear to deviate from the norm (given the current applicable model, as established by real-time analysis) are flagged as aberrant. This technique, say Sandia’s OVIS developers, can accurately expose problems much earlier than the current practice of simply waiting for a pre-determined threshold — necessarily set high to preclude too many false alarms — to be crossed.
OVIS not only addresses the issue of aberrant node detection but also allows the system builder to visualize the spatial distribution of a particular characteristic over the entire system.
Sandia is a National Nuclear Security Administration (NNSA) laboratory.
The baseline capabilities of OVIS currently available for download include:
- Visualization and correlation tools that display information about state variables, such as temperature CPU utilization and fan speed) and their aggregate statistics.
- Statistical tools that present the cluster as a comparative ensemble (rather than as individual nodes), a convenient and useful method for tuning cluster set-up and determining the effects of real-time changes in the cluster configuration and its environment.
- An XML based cluster configuration information template.
Though not part of the current download distribution, OVIS also incorporates a novel Bayesian inference scheme to dynamically infer models for the normal behavior of a system and to determine bounds on the probability of values manifested in the system. (“Bayesian” analysis, according to the International Society for Bayesian Analysis, is a well-known approach to data analysis that casts statistical problems in the framework of decision making). This and other advanced features will be available in future releases.