Functional Verification of SMP, MPP, and Vector-Register Supercomputers through Controlled Randomness

Joseph T. Wunderlich, Elizabethtown College


Prototype supercomputer functionality can be verified by comparing simulated hardware execution with actual hardware test-program runs where each successive test-program run includes randomly changing machine-states, operating scenarios, and data. Increased verification is achieved through repeated program execution. In both multi-processor and vector-register systems, a "controlled randomness" can be used to verify the functionality of simultaneously executing processors or functional units. This paper discusses the selection and combining of random number generators such that a "degree-of-randomness" between successive or parallel program runs is controlled. This allows computer engineers to simulate the execution of actual software (application or system-level) in which successive or parallel program runs may or may not involve uncorrelated tasks. Additionally, random number generators are selected to maximize execution speed and cycle-length, ensure reproducibility, and when desired, best produce a random source of numbers (i.e., to better approximate an independent, identically-distributed source). Generators can also be chosen for ease of implementation, the ability to run backwards, and the ability to split the generator's cycle into uncorrelated segments. "Backward multipliers" to allow generators to be run in reverse can also be easily found for some types of generators; reversibility is critical for functional verification so that code execution can be traced backwards to find scenarios that led to detected hardware failures. When generators are carefully selected and combined, the verification process can be optimized. By using this methodology, functional verification of SMP, MPP and vector-register supercomputers can be achieved.