The Human Test for Computer-Written Code


How will we know if we can trust machine-written software? A Weizmann alumnus is showing how to test such systems reliably

You are here

Crash-avoidance systems: What constitutes a reliable test?

Our lives might depend on the ability to check the math of our machines. Driving cars, flying planes, sending robots on rescue missions – every one of these activities requires the ability to evaluate situations quickly and make split-second decisions. The sophisticated algorithms these machines will rely on when they become autonomous – driverless cars, self-propelled robots and even self-flying aircraft – may, in all likelihood, be “written” by computers. For such complicated tasks, this new kind of software often outperforms the software humans can produce. Nonetheless, can we put our trust in such programs as those developed for a flight crash-avoidance system?

Dr. Guy Katz, a Weizmann Institute of Science alumnus who will be joining the Rachel and Selim Benin School of Computer Science and Engineering at the Hebrew University, has developed a method that can verify the accuracy of certain machine-written programs. Katz conducted his PhD research in the group of the Institute’s Prof. David Harel.

Dr. Guy Katz is checking the math of machines

These particular programs are known as “deep neural networks,” because their structure resembles that of the neurons in the brain. Such programs are generated by so-called “machine learning,” which is designed to imitate the human learning process. The problem, explains Katz, is that the programs are a sort of black box. A neural network is essentially composed of numerous switches, and each can be either on or off. So if a program has 300 switches, there are 2300 possible configurations. To know if a program will function properly every time, one would theoretically need to test it in all possible configurations, and this is where the problem arises: How can something as complex as a crash-avoidance system be tested? (As the early tests of autonoumous cars have made clear, unforeseen situations will always arise in real-life driving.) 

Katz, in his postdoctoral research at Stanford University, worked with a real-life program of the FAA to find a way to reduce the number of possible configurations that must be tested, by creating a technique for determining which of the many possible situations cannot occur in practice. He, Clark Barrett, David Dill, Kyle Julian and Mykel Kochenderfer, all of Stanford, managed to reduce the number of configurations for testing from 2300 to around a million in the crash-avoidance program they were working with. A million may sound like a lot, but it is within the capability of a computer while the total number of possibilities is far beyond the range of even the most powerful computer.   

Such programs are generated by so-called “machine learning,” which is designed to imitate the human learning process.

Their check of the FAA program – a process technically known as verification – “revealed that all properties being tested held, except for one, which was later fixed,” says Katz. A paper describing these results will appear at this year’s Computer Aided Verification (CAV) conference.

Katz intends to continue investigating means of verifying the functionality of programs produced via machine-learning. In addition to the FAA, he and the Stanford team are working with Intel on an autonomous car project. This project, he says, is several scales larger than the one they have already completed and will require much more pruning. “Although some things are common to all the deep neural network programs, others are specific to each system. We work closely with the engineers and experts who develop these systems to understand what properties need to be checked, and how the checking may be accelerated. That part still requires human input,” says Katz.