Sandia’s “OVIS” now available as open-source software

Publication Date:

Sandia news media contact

Mike Janes
mejanes@sandia.gov
505-844-4902

LIVERMORE, Calif. — The initial version of “OVIS” — a software tool developed by Sandia National Laboratories that provides intelligent, real-time monitoring of computational computer clusters — is now available for free download at http://ovis.ca.sandia.gov.

OVIS, say Sandia researchers, offers a statistical approach to the problem of computational platform monitoring and analysis, which can be inefficient and ineffective due to the traditional emphasis on manufacturer-specified, “absolute” thresholds. Instead, OVIS observes the overall statistical properties and environmental effects of a cluster, characterizing individual device behaviors and comparing them to a large number of statistically similar devices.

Thus, individual node values that appear to deviate from the norm (given the current applicable model, as established by real-time analysis) are flagged as aberrant. This technique, say Sandia’s OVIS developers, can accurately expose problems much earlier than the current practice of simply waiting for a pre-determined threshold — necessarily set high to preclude too many false alarms — to be crossed.

OVIS not only addresses the issue of aberrant node detection but also allows the system builder to visualize the spatial distribution of a particular characteristic over the entire system.

Sandia is a National Nuclear Security Administration (NNSA) laboratory.

The baseline capabilities of OVIS currently available for download include:

  • Visualization and correlation tools that display information about state variables, such as temperature CPU utilization and fan speed) and their aggregate statistics.
  • Statistical tools that present the cluster as a comparative ensemble (rather than as individual nodes), a convenient and useful method for tuning cluster set-up and determining the effects of real-time changes in the cluster configuration and its environment.
  • An XML based cluster configuration information template.

Though not part of the current download distribution, OVIS also incorporates a novel Bayesian inference scheme to dynamically infer models for the normal behavior of a system and to determine bounds on the probability of values manifested in the system. (“Bayesian” analysis, according to the International Society for Bayesian Analysis, is a well-known approach to data analysis that casts statistical problems in the framework of decision making). This and other advanced features will be available in future releases.

 

Sandia National Laboratories is a multimission laboratory operated by National Technology and Engineering Solutions of Sandia LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration. Sandia Labs has major research and development responsibilities in nuclear deterrence, global security, defense, energy technologies and economic competitiveness, with main facilities in Albuquerque, New Mexico, and Livermore, California.

Sandia news media contact

Mike Janes
mejanes@sandia.gov
505-844-4902