At the CADE ATP System Competition (CASC), automatic theorem prover Vampire once again showed its strength. The new and improved version, Vampire 4.9, achieved excellent results and won six of the seven categories, in which the provers competed this year. Martin Suda, the head of the automatic reasoning group at CIIRC, is one of the main developers of Vampire and was involved, among other things, in implementing new strategies in Vampire 4.9
About the championship in automatic reasoning
The CADE ATP System Competition, organized by Dr. Geoff Sutcliffe from the University of Miami, is part of the IJCAR International Conference on Automatic Reasoning. This year, the 30th anniversary edition of the competition was held from July 1-6, 2024 in Nancy, France. The championship is a competition where programmed systems with AI elements solve complex mathematical problems formulated in the machine-understandable language of mathematical logic. The goal is to solve as many problems as possible within a time limit of three or five minutes.
In this year’s competition, Vampire participated in all seven categories. The Vampire 4.9 version succeeded in all categories except the Typed First-order Non-theorems category, where it was defeated by the iProver 3.9 prover, but only in the number of formally accepted solutions. Therefore, the result is considered a great success and Martin Suda’s team hopes to win in all categories next year.
RNDr. Martin Suda, PhD., who designed the new strategies for Vampire 4.9, considers this result very good and is looking forward to the next challenges. „The automatic search for strong and complementary proof strategies is a very interesting algorithmic problem, to which we also contributed this year with a new paper, just published at IJCAR,“ he says. „We showed how to ensure that new strategies that we typically discover as ‚specialists‘ for a particular problem also generalize and have a high chance of solving new problems unknown in training.“
This year the competition categories were as follows:
– Higher-order Theorems
– Typed First-order Theorems
– Typed First-order Non-theorems
– First-order Theorems
– Unit Equality CNF
– Sledge Hammer Theorems
– I Challenge yoU
One of the new categories this year was the I Challenge U category, in which teams submitted 10 challenging problems to test the abilities of other proof systems. Vampire 4.9 excelled in this category, solving 59 problems out of 80, which earned it the first place.
Vampire 4.8, the previous version of the system, has achieved great results in categories such as First-order Non-theorems and Higher-order Theorems in the past year of competition. This year’s version, Vampire 4.9, has also brought improvements thanks to new strategies developed by Martin Suda from our institute. Ahmed Bhayat from the University of Manchester, who is the main author of the support for higher-order logic, also contributed significantly to the development.
In the table below, which shows the detailed results of this year’s competition, you can also find comparisons with last year’s division winners (see systems in pink). The improvement of the new Vampire 4.9 is noticeable. For example, in the Higher-order Theorems category it managed to solve 16 more problems compared to last year, which Martin Suda considers a success. This year, Vampire 4.9 solved 463 problems out of 500 problems in this category. Prover Zipperposition, which came second, solved only 433 problems out of 500.
There has been a huge improvement in the First-order Theorems division this year over last year’s winner Vampire 4.8, specifically by over 100 problems. According to Martin Suda, this is largely due to the timely submission of a prover before the competition for the so-called problem ranking phase: „The competition has a rule that only problems that have at least some chance of being solved by a system can be selected. Problem ranking is the process by which the organizer determines such a set of problems (and eventually selects a number of them for the competition.) By submitting Vampire early this year, many problems that only Vampire can solve, thanks to our new strategies, made it into the pool.“
Performance in CNF’s Unit Equality division has also improved greatly. So much so that Vampire 4.9 won the division for the first time. This was made possible by adding a number of optimizations such as improvements to indexing techniques, including code trees, lazy flatterm creation, lazy application of substitutions, and runtime-specialized ordering checks. Much of the credit for this result goes to Martin’s colleague Márton Hajdu from TU Wien.
The future and challenges of the Vampire team
Martin Suda and his team plan to continue to develop and improve the Vampire prover to win in all categories next year. The team’s motivation and enthusiasm are key to further advancements in automatic proving, which can bring revolutionary applications in various scientific and engineering fields.
Automatic reasoning is not just a competition of algorithms, but has wide applications in practice. Provers like Vampire can make the work of mathematicians, programmers and system designers easier. They are especially important in areas where reliability is crucial, such as medicine or transportation, where human lives depend on the correct functioning of systems.
We keep our fingers crossed for Martin Suda and his team in their endeavours and look forward to more great results in the future.