General propaganda for doing AI and AR over large formal math libraries

Example of today's AI/AR over such libraries

Some future agenda

Minor foundational remarks

Science and Method: Ideas about the interplay between correct deduction and induction/intuition

*"And in demonstration itself logic is not all. The true mathematical reasoning is a real induction [...]"*I believe he was right: strong general reasoning engines have to combine deduction and induction (learning patterns from data, making conjectures, etc.)

*"There is no actual infinite; the Cantorians have forgotten this, and that is why they have fallen into contradiction."*David Hilbert:

*"No one will drive us from the paradise which Cantor created for us"*

For me, on the larger scale of things, there is only one ultimate hero in what we do (sorry Freek, John, Tom, George, Andrzej)

This hero was born in 1912, the year when Poincare died (24 days before that)

Turing 1936: "On Computable Numbers, with an Application to the Entscheidungsproblem"

Introduced Universal Turing machines (a.k.a. computers)

Halting problem undecidable, therefore FOL undecidable

Unfortunately, for many today's theoretical computer scientists Turing's work ended in 1936:

"It is not possible to have a decision procedure for math, therefore attempting to do math automatically is futile"

1948-1950: Turing wrote the first chess program Turbochamp

Turing 1950: "Computing machinery and intelligence" - beginning of AI, Turing test

On pure deduction:

*"For at each stage when one is using a logical system, there is a very large number of alternative steps, any of which one is permitted to apply, so far as obedience to the rules of the logical system is concerned. These choices make the difference between a brilliant and a footling reasoner, not the difference between a sound and a fallacious one."**"We may hope that machines will eventually compete with men in all purely intellectual fields."*

Turing 1950 (last section on Learning Machines):

*"But which are the best ones [fields] to start [learning on] with?"**"... Even this is a difficult decision. Many people think that a very abstract activity, like the playing of chess, would be best."*Me: "Let's use large formal libraries (MML and Flyspeck)" (1998: considered crazy by practically everybody I talked to)

My today's version of Hilbert:

*"No one will drive us from the semantic AI paradise which large formal libraries created for us"*I don't care much about the foundations, as long as they allow decent formalization and are not too complex to describe

Turing 1936 tells us that we cannot have a decision procedure for math

So anybody trying to automate reasoning has to ultimatelly fail, right?

But how do WE then prove theorems??

Turing:

*"The fact that a brain can do it seems to suggest that the difficulties [of trying with a machine] may not really be so bad as they now seem."*(Penrose of course still disagrees)Unfortunatelly, the 1936-style CS theoreticians have largely ignored that

- My personal puzzle: The year is 2011. The recent AI successes are
*data-driven*, not*theory-driven*. - Ten years after the success of Google.
- Fifteen years after the success of Deep Blue with Kasparov.
- Five year after a car drove autonomously across the Mojave desert.
- Four years after the Netflix prize was announced.
*Why am I still the only person training AI systems on large repositories of human proofs like the Mizar library???*- (This finally started to change in 2011)

John Shawe-Taylor and Nello Cristiani -- Kernel Methods for Pattern Analysis (2004):

- Many of the most interesting problems in AI and computer science in general are extremely complex often making it difficult or even impossible to specify an explicitly programmed solution.
- As an example consider the problem of recognising genes in a DNA sequence. We do not know how to specify a program to pick out the subsequences of, say, human DNA that represent genes.
- Similarly we are not able directly to program a computer to recognise a face in a photo.

- Learning systems offer an alternative methodology for tackling these problems.
- By exploiting the knowledge extracted from a sample of data, they are often capable of adapting themselves to infer a solution to such tasks.
- We will call this alternative approach to software design the
*learning methodology*. - It is also referred to as the
*data driven*or*data based*approach, in contrast to the*theory driven*approach that gives rise to precise specifications of the required algorithms.

Learning high-level and low-level guidance of ATPs, methods for problem characterization, etc.

Implementation methods dealing with large knowledge bases and signatures

ITP-to-ATP translation methods, proof reconstruction methods

"Hammers" (L. Paulson): strong real-time (often cloud-based) AI/ATP for ITP users

Slower "more-AI" experimental systems developing various feedback loops between deduction and induction

Large-theory AI/ATP benchmarks/competitions (CASC LTB) since 2006/2008

Not very much in general, more in some special domains

Most recent eval on Mizar/MML, October 2013: 40% of about 50k Mizar toplevel theorems fully automated

Was 14-16% in 2003 in the first large-scale eval on 33k-big MML

Similar numbers for Flyspeck in 2012 (14k-20k theorems, very recent methods go between 40-50%). No complete eval for Isabelle/AFP.

These numbers look good, but we mostly prove the easier theorems

Average Mizar proof length of ATP-provable theorem is now 10 lines

- Early 2003: Can existing ATPs be used over the freshly translated Mizar library?
- About 80000 nontrivial math facts at that time.
- Is good premise selection possible at all?
- Or is it a mysterious power of mathematicians? (Penrose!)
- Today: Premise selection is not a mysterious property of mathematicians.
- Reasonably good algorithms started to appear (more below).
- Will extensive human (math) knowledge get obsolete?? (cf. Watson)

- Train premise selection on all previous Mizar/MML proofs (50k)
- Recommend relevant premises when proving new conjectures
- About 70% coverage in the first 100 recommended premises
- Chain the recommendations with strong ATPs to get full proofs
- Used today also for Isabelle/Sledgehammer and HOL/Flyspeck
- Many interesting issues: features, labels, their utility and consistency
- Still easy to improve (waiting for you!)

ML evaluation (premise recall)

ATP evaluation (problems solved)

Combining with SInE improves the best method from 726 to 797 (10%)