Inductive and Deductive AI over Large Formal Libraries

Josef Urban, Radboud University Nijmegen

Overview

General propaganda for doing AI and AR over large formal math libraries
Example of today's AI/AR over such libraries
Some future agenda
Minor foundational remarks

Henri Poincare

Science and Method: Ideas about the interplay between correct deduction and induction/intuition
"And in demonstration itself logic is not all. The true mathematical reasoning is a real induction [...]"
I believe he was right: strong general reasoning engines have to combine deduction and induction (learning patterns from data, making conjectures, etc.)

Poincare's opposition to set theory

"There is no actual infinite; the Cantorians have forgotten this, and that is why they have fallen into contradiction."
David Hilbert: "No one will drive us from the paradise which Cantor created for us"

Enters the Real Hero

For me, on the larger scale of things, there is only one ultimate hero in what we do (sorry Freek, John, Tom, George, Andrzej)
This hero was born in 1912, the year when Poincare died (24 days before that)

Alan Turing 1936 - Undecidability

Turing 1936: "On Computable Numbers, with an Application to the Entscheidungsproblem"
Introduced Universal Turing machines (a.k.a. computers)
Halting problem undecidable, therefore FOL undecidable
Unfortunately, for many today's theoretical computer scientists Turing's work ended in 1936:
"It is not possible to have a decision procedure for math, therefore attempting to do math automatically is futile"

Alan Turing 1950 - Artificial Intelligence

1948-1950: Turing wrote the first chess program Turbochamp
Turing 1950: "Computing machinery and intelligence" - beginning of AI, Turing test
On pure deduction: "For at each stage when one is using a logical system, there is a very large number of alternative steps, any of which one is permitted to apply, so far as obedience to the rules of the logical system is concerned. These choices make the difference between a brilliant and a footling reasoner, not the difference between a sound and a fallacious one."
"We may hope that machines will eventually compete with men in all purely intellectual fields."

Which intellectual fields to use for building AI?

Turing 1950 (last section on Learning Machines):
"But which are the best ones [fields] to start [learning on] with?"
"... Even this is a difficult decision. Many people think that a very abstract activity, like the playing of chess, would be best."
Me: "Let's use large formal libraries (MML and Flyspeck)" (1998: considered crazy by practically everybody I talked to)
My today's version of Hilbert: "No one will drive us from the semantic AI paradise which large formal libraries created for us"
I don't care much about the foundations, as long as they allow decent formalization and are not too complex to describe

Turing vs. the 1936-style CS theoreticians

Turing 1936 tells us that we cannot have a decision procedure for math
So anybody trying to automate reasoning has to ultimatelly fail, right?
But how do WE then prove theorems??
Turing: "The fact that a brain can do it seems to suggest that the difficulties [of trying with a machine] may not really be so bad as they now seem." (Penrose of course still disagrees)
Unfortunatelly, the 1936-style CS theoreticians have largely ignored that

Hence my depressive slide from 2011 AMS

My personal puzzle: The year is 2011. The recent AI successes are data-driven, not theory-driven.
Ten years after the success of Google.
Fifteen years after the success of Deep Blue with Kasparov.
Five year after a car drove autonomously across the Mojave desert.
Four years after the Netflix prize was announced.
Why am I still the only person training AI systems on large repositories of human proofs like the Mizar library???
(This finally started to change in 2011)

Data driven AI (Turing 50 years later)

John Shawe-Taylor and Nello Cristiani -- Kernel Methods for Pattern Analysis (2004):

Many of the most interesting problems in AI and computer science in general are extremely complex often making it difficult or even impossible to specify an explicitly programmed solution.
As an example consider the problem of recognising genes in a DNA sequence. We do not know how to specify a program to pick out the subsequences of, say, human DNA that represent genes.
Similarly we are not able directly to program a computer to recognise a face in a photo.

The Data driven approach

Learning systems offer an alternative methodology for tackling these problems.
By exploiting the knowledge extracted from a sample of data, they are often capable of adapting themselves to infer a solution to such tasks.
We will call this alternative approach to software design the learning methodology.
It is also referred to as the data driven or data based approach, in contrast to the theory driven approach that gives rise to precise specifications of the required algorithms.

Large-theory AI/ATP agenda since 2003

Learning high-level and low-level guidance of ATPs, methods for problem characterization, etc.
Implementation methods dealing with large knowledge bases and signatures
ITP-to-ATP translation methods, proof reconstruction methods
"Hammers" (L. Paulson): strong real-time (often cloud-based) AI/ATP for ITP users
Slower "more-AI" experimental systems developing various feedback loops between deduction and induction
Large-theory AI/ATP benchmarks/competitions (CASC LTB) since 2006/2008

How much can we AI/ATP-prove today?

Not very much in general, more in some special domains
Most recent eval on Mizar/MML, October 2013: 40% of about 50k Mizar toplevel theorems fully automated
Was 14-16% in 2003 in the first large-scale eval on 33k-big MML
Similar numbers for Flyspeck in 2012 (14k-20k theorems, very recent methods go between 40-50%). No complete eval for Isabelle/AFP.
These numbers look good, but we mostly prove the easier theorems
Average Mizar proof length of ATP-provable theorem is now 10 lines

High-level ATP guidance: Premise Selection

Early 2003: Can existing ATPs be used over the freshly translated Mizar library?
About 80000 nontrivial math facts at that time.
Is good premise selection possible at all?
Or is it a mysterious power of mathematicians? (Penrose!)
Today: Premise selection is not a mysterious property of mathematicians.
Reasonably good algorithms started to appear (more below).
Will extensive human (math) knowledge get obsolete?? (cf. Watson)

Example: Mizar Proof Advisor (2003)

Train premise selection on all previous Mizar/MML proofs (50k)
Recommend relevant premises when proving new conjectures
About 70% coverage in the first 100 recommended premises
Chain the recommendations with strong ATPs to get full proofs
Used today also for Isabelle/Sledgehammer and HOL/Flyspeck
Many interesting issues: features, labels, their utility and consistency
Still easy to improve (waiting for you!)

Evaluation of methods on MPTP2078

ML evaluation (premise recall) ML evaluation: Recall

Evaluation of methods on MPTP2078

ATP evaluation (problems solved) ATP evaluation: Solved

Combined (ensemble) methods

Combining with SInE improves the best method from 726 to 797 (10%) ATP evaluation: Combined ATP evaluation: Solved

Today on Flyspeck and Mizar

Low-level ATP guidance: Prover9 hints

The Prover9 community (ADAM workshop): non-associative algebra, 20-50k long proofs by Prover9 and Waldmeister
Prover9 hints strategy (Bob Veroff): extract hints from easier proofs to guide more difficult proofs
To get good hints Bob wants as little conjecture-based inferences as possible:
Get an ``essentially forward proof'' by various Prover9 setting
Exploration of related problems to get good hints (not really automated yet)

Other guidance for ATPs: E, Waldmeister

Knowledge base of abstracted lemmas from previous proofs in E (drawing analogies between different theories)
nearest-neighbor guidance: ConjectureRelativeSymbolWeight in E
further symbol weighting based on axiom relevance in E
semantic guidance: Prover9, iProver, Vampire (since 2012)
Waldmeister: theory recognition, optimization of term orderings, etc.

Large-theory Lemmatization and Conjecturing

Over 1B low-level lemmas in Flyspeck
1.5M-7M higher-level lemmas in MML and Flyspeck
Define fast preprocessing methods to extract the most important ones:
PageRank, recursive dependency count, recursive use count, etc.
Use the most important lemmas together with the toplevel theorems - helps by 5-20% (needs more evaluations)
Conjecturing: guessing the intermediate lemmas in longer proofs (we do not have the methods yet)

Examples of self-evolving metasystems

positive feedback loops
Machine Learner for Automated Reasoning (MaLARea)
Blind Strategymaker (BliStr)
Machine Learning Connection Prover (MaLeCoP)

Machine Learner for Automated Reasoning

MaLARea 0.4 (CASC@Turing) - unordered mode, explore & exploit, reinforcement learning, etc.
Feedback loop interleaving ATP with learning premise selection:
The more problems you solve (and fail to solve), the more solutions (and failures) you can learn from
The more you can learn from, the more you solve
Systematic concept addition (models, etc.), can be dangerous (needs feature weighting for good learning)
In some sense also conjecturing (omiting definitions)
The CASC performance curve flat for quite a while

MaLARea Architecture

MaLARea

MaLARea 0.5 - ordered mode

An ensemble of 40 different learning/deductive methods
Different formula similarity concepts (variable normalization)
Different feature weighting - IDF, LSI
Different E strategies
Diferent options for proof minimization
Immediate update of the learners with any new proof
CASC 2013 performance

BliStr on 1000 Mizar@Turing problems

original E coverage: 597 problems
after 30 hours of strategy growing: 22 strategies covering 670 problems
The best strategy solves 598 problems (1 more than all original strategies)
A selection of 14 strategies improves E auto-mode by 25% on unseen problems
Similar results for the Flyspeck problems
Be lazy, don't do "hard" theory-driven ATP research (a.k.a: thinking)
Larry Wall (Programming Perl): "We will encourage you to develop the three great virtues of a programmer: laziness, impatience, and hubris"

Machine Learning Connection Prover

MaLeCoP: put the AI methods inside a tableau ATP
the learning/deduction feedback loop runs across problems and inside problems
The more problems/branches you solve/close, the more solutions you can learn from
The more solutions you can learn from, the more problems you solve
Not just model avoidance, also ``dangerous pattern'' avoidance
still quite a prototype (no CASC)
already about 20-time proof search shortening on MPTP Challenge compared to leanCoP (see the paper)

MaLeCoP Architecture

Future AI/AR agenda

Learn better conjecturing on large corpora
Better low-level guidance, learning decision procedures, strategies, etc.
Learning automated translation of LaTeX books/papers to formal math from aligned corpora like Flyspeck and CCL

Betting (putting money where my mouth is)

In 20 years, 80% of MML and Flyspeck toplevel theorems will be provable automatically (same hardware, same library versions as today - about 40%)
The same in 30 years - I'll give you 2:1, In 10 years: 60%
In 25 years, 50% of the toplevel statements in LaTeX-written Msc-level math curriculum textbooks will be parsed automatically and with correct formal semantics
No betting: all this could be today done in 5 years with reasonable resources - I believe this technology is getting easy ("This problem is data-driven-AI-easy")
Hurry up: I will only accept bets up to 10k EUR total (negotiable)

Some foundational remarks

Disclaimer: I do not understand type theory.
Mizar is to a considerable extent an implementation of little theories (Bill Farmer, IMPS, 1992)
Artur's example: the abstract theory of topological groups
There is one large theory (ZFC/TG) in which all the little theories ultimately live
This plays the role of the explicit consistency layer, we can (or could) use it for doing model theory
And I do not know anything better than ZFC for doing model theory

Example: Reasoning modulo isomorphisms done right

Guillame's motivating example: if G is isomorphic to H then G is solvable iff H is solvable
This is trivial in model theory: two isomorphic models of L have the same true sentences of L (i.e., they are "elementarily equivalent").
My "correct version" of Guillame's example: if some ultrapowers of G and H are isomorphic then G is solvable iff H is solvable
Why? Keisler-Shelah, 1960s: M and N are elementary equivalent if and only if they have isomorphic ultrapowers.

Reasoning modulo isomorphisms done right

Of course, "M and N isomorphic" implies that they have some isomorphic ultrapowers, but not vice versa
So this is the right (both sufficient and necessary) condition for two models to exchange sentences.
Isomorphism is only its special case (sufficient, but unnecessary).
The theorem/proof transporting mechanisms in classical systems like HOL, Isabelle, Mizar can be strengthened to this case
Why not also strengthen the Univalence Axiom in this way?

Examples (Math Overflow)

Let A be the graph consisting of a single infinite beaded chain, or more concretely, the integers under adjacency. That is, A=(Z,~), where n~m just in case they differ by exactly one.
And let B consist of two (or more) disconnected copies of A.
It is easy to see that the ultrapower of either A or B by any ultrafilter on a countable index set consists of continuum many such beaded chains.
Thus, the structures A and B have isomorphic ultrapowers, and so they are elementarily equivalent by Keisler-Shelah.
Another: two pseudofinite fields with the same absolute numbers (i.e., the relative algebraic closure of the prime field) are elementarily equivalent.

Jokes

I hope I have now introduced another schism in type theory:
Certainly, a pure-blood constructivist/type-theorist cannot accept a model-theoretic argument based on ZFC, large cardinals, and non-principal ultrafilters?
Thanks for your attention!

CICM 2014, July 7-11 2014 : Conference on Intelligent Computer Mathematics 2014 - Mathematical Knowledge Management (MKM), Calculemus, Digital Mathematical Libraries (DML)
Invited talks: Eric Weisstein (Wolfram), Yves Bertot (INRIA), Freek Wiedijk (RU Nijmegen), Herbert Van de Sompel (Los Alamos Nat. Lab.), Antonio Leal Duarte (U. Coimbra), Jaime Carvalho e Silva (U. Coimbra)
QED+20, July 18 2014 (between ITP and IJCAR): Twenty Years of the QED Manifesto
Invited talks: Michael Beeson (San Jose State U.), Georges Gonthier (Microsoft Research), Adam Grabowski (U. of Bialystok), John Harrison (Intel), Gerwin Klein (NICTA), Magnus Myreen and Ramana Kumar (U. of Cambridge), Claudio Sacerdoti Coen (U. of Bologna)
Please come, in large cardinalities!

Josef Urban, Radboud University Nijmegen

Inductive and Deductive AI over Large Formal Libraries

Inductive and Deductive AI over Large Formal Libraries

Josef Urban, Radboud University Nijmegen

Overview

Henri Poincare

Poincare's opposition to set theory

Enters the Real Hero

Alan Turing 1936 - Undecidability

Alan Turing 1950 - Artificial Intelligence

Which intellectual fields to use for building AI?

Turing vs. the 1936-style CS theoreticians

Hence my depressive slide from 2011 AMS

Data driven AI (Turing 50 years later)

The Data driven approach

Large-theory AI/ATP agenda since 2003

How much can we AI/ATP-prove today?

High-level ATP guidance: Premise Selection

Example: Mizar Proof Advisor (2003)

Evaluation of methods on MPTP2078

Evaluation of methods on MPTP2078

Combined (ensemble) methods

Today on Flyspeck and Mizar

Low-level ATP guidance: Prover9 hints

Other guidance for ATPs: E, Waldmeister

Large-theory Lemmatization and Conjecturing

Examples of self-evolving metasystems

Machine Learner for Automated Reasoning

MaLARea Architecture

MaLARea 0.5 - ordered mode

BliStr: Blind Strategymaker

BliStr: Blind Strategymaker

BliStr: Blind Strategymaker

BliStr on 1000 Mizar@Turing problems

Machine Learning Connection Prover

MaLeCoP Architecture

Future AI/AR agenda

Betting (putting money where my mouth is)

Some foundational remarks

Example: Reasoning modulo isomorphisms done right

Reasoning modulo isomorphisms done right

Examples (Math Overflow)

Jokes

Advertisement