coq-club AT inria.fr
Subject: The Coq mailing list
List archive
- From: Lasse Blaauwbroek <lasse AT blaauwbroek.eu>
- To: coq-club AT inria.fr
- Subject: [Coq-Club] Announcing Graph2Tac, a prover based on Tactician's new API
- Date: Fri, 12 Jan 2024 16:23:27 +0100
- Authentication-results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=lasse AT blaauwbroek.eu; spf=Pass smtp.mailfrom=lasse AT blaauwbroek.eu; spf=None smtp.helo=postmaster AT mail.blaauwbroek.eu
- Ironport-data: A9a23:hjM6qqrB2lpGU5HQmxLM6AzYuWJeBmJjYRIvgKrLsJaIsI4StFCzt garIBmFbvmNZ2SnKt8lbtyz9EMBsJHWzoc2S1Y4pSgyESoX8ePIVI+TRqvSF3PLf5ebFCqLz O1HN4KedJhsJpP4jk3wWlQ0hSAkjclkfpKlVKiefHoZqTZMEE8JkQhkl/MynrlmiN24BxLlk d7pqqUzAnf8s9JPGjxSs/7rRC9H5qyo5GtB5wZmP5ingXeH/5UrJMJHTU2OByCgKmVkNrbSb /rOyri/4lTY838FYj9yuuuTnuUiG9Y+DCDW4pZkc/DKbitq+kTe5p0G2M80Mi+7vdkmc+dZk 72hvbToIesg0zaldO41C3G0GAkmVUFKFSOuzdFSfqV/wmWfG0YAzcmCA2k3DZEy+MdSOFtR+ Mc6cWsNVz241++5lefTpulE3qzPLeHuNYIb/HtkxDjEEvw8Rp3ABajXjTNa9G5g24YUQbCEP ZJfNmMHgBfoO3WjPn8TAYklkc+ignD2biJSslWYpuw6/gA/ySQujua2YIqOJrRmQ+1nphum9 jyd9F2nXB8WM4fG1QG/yH+z07qncSTTA9pJSOPpqJaGmma7zWsKTRYSSFGTuui8kkf4WtRFK kVS9DBGkEQp3EmiT927VRmxqWObtAQbVtkWHvBSBByxJrT84jaFLGolawd7d9EdlpZtRQUH7 Q+EkIa8bdBwi4F5WU5x55/N8mvvYXZJczVdDRLoWzfp9PHFhOkOYvPzojVLSvTdYiXdQ22Y/ txzhHFWanV6pZdjO1+H1V7GmSmwgZPCUxQ44A7aNkr8sVskPND7PtT2sgWFhRqlEGp/ZgfZ1 JTjs5XOhN3i8bnWzXzlrBglRez3vqjt3MP03wQ3d3Xey9hd0yX7JtwOsW8WyLZBLscAcDaha gnN/2tsCGx7YhOXgG4eS9zZNvnGOoC7S4m6B6qFMoMQCnWzHSfelBxTiYer9ziFuCARfWsXY s3znR+EXCZAW5d0hiG7Xfkc2rINzyUzjzGbD5PiwhjtlfLUaHeJQP1XeBGDf8Io3pOi+Q/1y td4M9fV6hN9VOalXDLb37ROJn83LF86J6vMleppSsC5LDFLJkQdGt7K4LZ4e4Vaj6VfzejJ2 XemW35n8ln0hFyZCADTanlcd67eWIZ/nEIFZh0qH0ia5GM+U7aS7YM0VZg+TZ84/sNNkN92S PgkfZ2bI/JtEz7oxRUUXaPfnqdDKiu5oB2oBDW0RgQ/c7pLZR37yvW9ciTBrCAxXzeK7+0gq Lie5ybnaJskRTU6Kv3JafiqnmiDjVJElM1cB0L3c8RuImPy+41XKgv0vP89A+cIDT7hnjK69 QKnMS00lNn3gb0e0Yf23PifjoKTDeFBMFJQHDDb4ZaIJCDqxDef7rEaYtmYXwL2dT3Sw7qjV 9V33vunEfwgnXR2iaRePYtv75oD44rIm+cH4CViRG7Gfna6OINGe3Om5/RChodJ571euDa1Z H6xx8lnCe23H/3hQXEsJ1sDT+Wc1PsrtCHYwtYrLW7buiJm3rq1fn9DHhuLiR1YPLtEAL1+8 OEDpvwp8BGNuicrFv2kjSlk0XuGAVJdcqcgt7AcWJTKjChywH58QJXsMA3Ey7DRVMdta24Ee iS1gojGjJRinnvySWI5TyXx7LAMlKYwtwBv535cAVaww/7upOI9hT9V+hQJFjVl9A1Nib9PC zI6Jn9OBPu8+hlzj5J+REGqIQZKASOZ9mHXy1ckkG74TVGiZlfSLV8SaPq8w0QEz11yJjRr3 qmU6GLAYwbYeMvc2igTW0k8j9fBSdd31BPJmeH5PsCjMqQ5XwHYgf6VVTJVkyfkPMI/v12Yh O9I+O0rV7b3GxRNqIIGCq6b948qdja6GEJ4T8pMxpg5RVPnRGnq2BylCVyARcdWFvmbrW66E 5NPI+xMZTSf1QGPjDERKoAML494nfUstd4La+7vKVUnqJqalCJi67jLxxj9hUgqYtRgqtk8I YXvbAC/EnScqH9Xum3VpuxGBzaIWsYFbwjCw+yFyuUFOJYduuVKc0tp8L+Lk1iKEQlgpTS4g RjiYvLI8ulc1ohcpYvgPaFdDQGSK9moduCp8hi2gutef+H0LsbCmAMEmGbJZz0ME+MqZO12s rCRvPrc/kDP5u82Wl+EvaixLfBC4MHqUddHNs7yEmJhohKDf83R+DoGxXGzLM1YsdFa5/T/f TCCVumLSYc3VetelVpvUAoPIyZFXu6zJu3lqDimpvuBNgkF3EaVZJm7/HvudidAejVOJ5T6D RTutu2z4sxD6r5BHwIAG+osFqoQzIUPgkf6X4aZWfik4miUbpeqv7LjkVwq7TjCF2aOCsH34 tTIW3ASsfh0VL7glLlkX05a53X7z0qRRcE6eksU5sF8kTe3DygLMIzx9L0YX4pMnHWaOI7QP VnwgahLNck5dT5AeBzh/975WQqcQOETUjs8yvrF4GvMAxqL6Ei87HeNO8uuD7qautcu8Q1/F ewjxw==
- Ironport-hdrordr: A9a23:sO7JxqtWaHw+eCKTT20fA1Mt7skDbdV00zEX/kB9WHVpm7+j9v xG/c5w6faaslossR0b6LO90cq7Lk80l6QZ3WB5B97LNzUO01HYTr2Kg7GD/9StIUPDytI=
- Ironport-phdr: A9a23:F64PUBWYTId0YKdfq/guhxR/J6HV8KzzXzF92vMcY1JmTK2v8tzYM VDF4r011RmVB9WdsawZwLWM+4nbGkU+or+580o+OKRWUBEEjchE1ycBO+WiTXPBEfjxciYhF 95DXlI2t1uyMExSBdqsLwaK+i764jEdAAjwOhRoLerpBIHSk9631+ev8JHPfglEnjWwba1xI Rmsswnct8kbjYRgJ6ov1xDEvmZGd+NKyGxnIl6egwzy6sCs8pB97i9eoegh98lOUaX7e6Q3U 7lVByk4Pm42+cPmqwDNQROA6XUAXGoWlAFIAxXe4xHhQpjxqCr6ufFj1yScIMb7UKo7WTWm7 6dsVR/olCIKPCM3/W3LlsB9ir9QrxW8qRxi2I7UeJ+aO+Zifq3TetMaQHBOXsdXVydcBo+xY I8CA+8HMO1FrYfyukEOoAOiCgevCu3gxTFHiGH206MnyOkuCx3K0RY6Et4SrnjZrtP4P7oSX +Cvy6nIyC3OYfVM1jfm84jHbA0qrOyIXb1qa8XR1EcuGBjfjlWXt4PlPC6e2+MJs2eB9OpgT vyghnU8pAF1uDeg3Nosi4fSiYIPzVDL6yZ5zJwrKt2lUUN2Z8OvH5RMuS+ALYR2Xt8iTH9yu CY80rALp5G2cigIxZko2hPRZeKKf5WU7x/+VeucLjl1iWx5dLy/iBi/80qtxOL8WMSo0FhHs DdJn9fQu30R2RLd5MyKR/1g9Umv3jaP0hrc6uBCIU0siKXbKoUhzaI3lpoOq0TDBTH5lF/5j K+McUUk5/Co5/zoYrXnup+cN5J0hRrkPqQvncyzGf43MhIBXmia5+Swzrrj/Vf2QLlSkvI2j LTZsIzAKcsHp6+1GxJZ3ps55xmiETimzswUnHcGLFJGeRKHl5bmNEzPIP/iF/u/hE6skDhzy /DJP73hHpXNLmXYn7v7Ybl97EtcxQwuxt5c/5JZEqwNLfHzV0PrqtDVAQU1PxKqz+r7Etlw1 J4SVXqMD6KdKq/erEOE6+w1L+WRZ4IYuSzxJ+Un6vPokHQ0h1sdfbSy3ZsLdn+4BOloI0SYY HXymtoNDWEHtRckQuPwkl2NSztTam6yX60i4jE7D5qrDYLZSYCshLyNxjy0HplMaW9cF1CNH 2znd5+DW/gSayKeONFukiEcWbigTY8uyw2uuRfnx7dkLufY4DMUuJ352NRr+eHfiQs++D5pA 8iF1mGCVWB0nmcGRz8s26B/pFRwylOZ3qhjmPxXC8de6OhNUgggL5Hc0uJ7BMvoWgLGYNiJS 0yqTcu7AT0pSNI+2McBY1xhFNW6khDDwy2qDqcIm7CTHpw77rrc32TtJ8Z603vJyK4hj0A/T sRTMW2mm7Vw+hPIB4/JlkWZj7yleb4d3C7L7meDzHCBsFtWUA5qAu35WiUUYVKTptDk7GvDS aWvAPIpKFhv08mHf49DccbkxX9CTf7+J97EZG+y0zO0DA2Ty5uGaIPjZnoXxiLQCw4JjlZAr j69KQEiC3L58CrlBzt0GAezMisEkMF7oXK/FQovyh2SKlZmz/yz8wIUgvqVT7US2KgFsWEvs WY8B06zivTRDdfIvA99ZONEe9po5F5czm/xvQh0N4G/Jbpli11Yfhkk91j22UBPA55b2dMvs GtsyQNzLayC11YUczqDxp3YMLneI3Tt9gqoZq2Q10yNmM2O9PIp7/I1407moBnvFkcm9CB/1 MJJ1nKH+pjQJA8bUJa3WEI+/gVlravdbyp76p68OWREF66yv3eC3tsoALBg0RO8Z5JEN7vCE gbuEsocDszoKeowmlHvYAhWdOZVvLU5Oc+rbZ7kkOaiIfpgkTS6jG9G/JE100SC8DB5Q/LJ2 JBNyu+R3w+OXTPxxFm7tcW/lYdBbDAUVm2xrEqsTIpYfbZ/VYwPAG6zPMetwdh9wZPwGjZZ+ FOlG1IayZqxYxPBJ1f53ABWyQEWuSn+yXb+lWAyyWtx6PXGj0msi6z4eREKO3BGXjxnhFboe s2viswCGVOvd04vnQek4kDzw+5aor5+Ji/dWxQtHWC+Imd8X6+3rrfHbdRI7cZitSxMS+mUa 1STQ6PiqQEd3iClEnYUl1VZP3m6/470mRB3kjfXKH9ot3fxc8V5zArA7sbbSP0X0ydMF0waw XHHQ1O7Odeu59CdkZzO5/u/W2yWXZpWaSD3zImEuUNX/EVSCAak17C2k9zjSk0h1DPjksJtT WPOpQr9ZY/i0+K7N/hmdw9mHg20581/E4B42ow+4fNYkXwTnY6Y1XADmG7uLt9B3q/9KncQD TIG2N/a5gH51VYrdyLUgdilEC/Hmo0+PpGzeSsO1zg47txWBavxjvQMhiZzrlei7ErQbfV7g jYB2K4r4X8ejfsOvVllxSGcD7YOWEhAaHW8ylLSs5bk9v0ROz/8FNr4nFBzlt2gEryY9wRVW XKiP4wnATc19cJndlTFzHz07IjgPtjWd9Ma8BOOwHKix6BYLow8kv0SiG9pI2X46DchwvUgg DRk2ZizoZeNMWJg/+S0H1QLU1+9L9NW4TzrgatEy4yS1pu0E71rHjwCQYTiV/WlE3QfqL60U mTGWC15oXCdF73FGAaZ40oztHPDHaegMHSPLWUYx9FvF1GNYVZSiwcOUHAmj4Y0Q0q0kdf5f h4ztVVzrhbo7wFBweVyO1zjX3fD8U22PywsRsHXKQoe50lU7kPRe6Ry98pVGCdVtt2kpQ2Jc SmAYhhQSHoOQgqCDkziOb+n4Z/B9fKZD6ywNamGZ7LGsuFYW/qSoPDnmoJ74zaBMNmONXh+H rU63ERER3VwB8XenX0GVSUWkyvHa8PTqg27/2V7qcW28fKjXwyKh8PHE7xJLdBm4Ay7m4+GP u+Uwix9ID9FyZkWwnLLjrUCnRYThyxoazixAOEAuCrKH8ey0udcCx8WbT82NdMdtvJhmFMRf 5SB0pWqj+0d7LZ9EVpOWF3/l9v8YMULJzr4L1bbHAOQM6zAIzTXwsbxaKf6SLtKjewSuQfj3 FTTW0LlIDmHkCHkEh61NuQZxiWSJwdTkIu5ex9wFmL5S9/lLBCmeowS73V+0fgvi3XGOHRJe yB7aF9Ip6aM4Dlwh/x+HypL6nNsMPaOgSGU7K/VN9xF1JkjSjQxnOVc7nMgzrJT5ywRX/15l hzZqdt2qk2nmO2CotKIeBBKoz9Wm4iRukhhf6jEpMAosZfs+RsM6SOaDx0DusRvENriuOZd1 4qX/EoWAD1G+tvJ4sEGAMXXbs+aYiJJDA==
- Ironport-sdr: 65a15973_Yu5mOVEZwT6OSH70LldJq4bDANP12TRRn15uywADhj5hEjc p10oTpIBscvUA/4oPSk1qg0okpwzqnZCubybpSg==
For a more rich text version of this announcement, please
refer to Coq's
Discourse.
We are pleased to announce Tactician's
API, a new AI interface for theorem proving, building on Tactician. This
includes a new graph-based dataset of over 520k definitions (of
which 260k are theorems) in 120 Coq packages, one of the largest
datasets for AI interactive theorem proving. We also present a new
state-of-the-art neural theorem proving solver Graph2Tac, designed
for proving theorems in Coq projects with new definitions not seen
during training.
The main contributions in this work are as follows:
- A novel method of calculating an internal representation of definitions and theorems, giving Graph2Tac a deeper semantic understanding of a proof state and which lemmas are appropriate for it. Graph2Tac can create representations of objects that were not seen during training, allowing it to perform well even on new Coq projects.
- One of the most comprehensive studies yet of various AI methods in interactive theorem proving including k-NN solvers, transformers, graph-based models, and hammers.
- An interface to Coq making it possible to train and connect your own custom machine learning models.
- A benchmarking system making it easy to give an apples-to-apples comparison to our work.
For Coq users, our neural models Graph2Tac
and Text2Tac, are available as part of Tactician and can be
run on a laptop (no GPU required). One can use Tactician's
`Suggest` command to suggest tactics, and `synth` tactic to find a
complete proof of a proof state.
The details are spelled out in these three papers:
- Graph2Tac: Learning
Hierarchical Representations of Math Concepts in Theorem proving
- The Tactician’s Web
of Large-Scale Formal Knowledge
- Hashing Modulo
Context-Sensitive α-Equivalence
Dataset:
Our dataset, which can be explored online,
faithfully represents the internal representation of Coq's
universe of mathematical knowledge as a single interconnected
graph. The visualization includes hierarchies
of modules, global
context information, definitions,
tactical
proofs, and tactic
proof transformations.
The dataset contains two representations. The text-based,
human-readable representation is useful for training language
models. The graph-based representation is designed such that two
terms are alpha-equivalent terms if and only if the forward
closure of their graphs is equivalent (bisimilar).
This allows us to merge alpha-equivalent subterms, heavily
compressing the dataset. Having a densely connected graph makes it
ideal for graph-based machine learning models. To support this
term-sharing, we introduce a novel graph-sharing
algorithm with `O(n log n)` complexity.
Graph2Tac:
In practical AI theorem proving, one of the main challenges is
handling new definitions and theorems not seen during training. We
want a model which can work online, adapting itself to users' new
projects in real time, so we train on one set of projects and test
on another set never seen during training. Our novel neural
theorem proving architecture, Graph2Tac, adds a new definition
task mechanism that improves theorems solved from 17.4% to 26.1%.
For example, even though our model has never before seen the
Poltac package, it can solve 86.7% of Poltac theorems, more than
any other Coq theorem prover in the literature, including
ProverBot9001 and CoqHammer.
Our definition task works by learning an embedding for each
definition and theorem in the training set, and then
simultaneously training a model to predict those learned
embeddings. At inference time, when we encounter a new definition
not seen during training, we use this definition task to compute
its embedding directly.
Our work contains one of the most extensive comparisons with other
proving methods, including CoqHammer, a baseline transformer, and
Tactician's built-in k-NN model. The transformer model is similar
to those used in GPT-f, PACT, Lisa, etc. The k-NN model is also an
online model, learning from proofs or recent theorems, and is
actually still an excellent model for short time periods, say one
minute, whereas Graph2Tac excels at the longer 10-minute mark.
Appendix H of
our paper
also provides an informal comparison with Proverbot9001 and CoqGym
family of solvers. We hope these comparisons will provide a lot of
discussion and move the field forward.
Tools for AI research We provide a lot of tools for AI
researchers who would like to compare with or build on our results
and for Coq developers who would like to build practical tools for
Coq users.
- Interaction
protocols: External agents can interface with Coq, either by
providing tactic predictions for Tactician’s search procedure for
the `synth` tactic, or by exploring the proof tree themselves
through the `Tactician Explore` command. Agents receive a full
copy of Coq’s internal kernel knowledge, which they can utilize to
make decisions.
- PyTactician:
A Python library to facilitate reading the dataset and interface
with Coq.
- Benchmarking:
Tools to evaluate the proving strength of agents on arbitrary Opam
packages. Benchmarks can be run on a laptop, a high-powered server
and even massively parallelized on a High Performance Computing
(HPC) cluster.
We would love to receive feedback both from Coq users and AI
researchers, including possible future collaborators!
Lasse Blaauwbroek, Jason Rute, Miroslav Olšák, Fidel Ivan
Schaposnik Massolo,
Jelle Piepenbrock, Vasily Pestun
- [Coq-Club] Announcing Graph2Tac, a prover based on Tactician's new API, Lasse Blaauwbroek, 01/12/2024
Archive powered by MHonArc 2.6.19+.