Skip to Content.
Sympa Menu

coq-club - [Coq-Club] Announcing Graph2Tac, a prover based on Tactician's new API

coq-club AT inria.fr

Subject: The Coq mailing list

List archive

[Coq-Club] Announcing Graph2Tac, a prover based on Tactician's new API


Chronological Thread 
  • From: Lasse Blaauwbroek <lasse AT blaauwbroek.eu>
  • To: coq-club AT inria.fr
  • Subject: [Coq-Club] Announcing Graph2Tac, a prover based on Tactician's new API
  • Date: Fri, 12 Jan 2024 16:23:27 +0100
  • Authentication-results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=lasse AT blaauwbroek.eu; spf=Pass smtp.mailfrom=lasse AT blaauwbroek.eu; spf=None smtp.helo=postmaster AT mail.blaauwbroek.eu
  • Ironport-data: A9a23:hjM6qqrB2lpGU5HQmxLM6AzYuWJeBmJjYRIvgKrLsJaIsI4StFCzt garIBmFbvmNZ2SnKt8lbtyz9EMBsJHWzoc2S1Y4pSgyESoX8ePIVI+TRqvSF3PLf5ebFCqLz O1HN4KedJhsJpP4jk3wWlQ0hSAkjclkfpKlVKiefHoZqTZMEE8JkQhkl/MynrlmiN24BxLlk d7pqqUzAnf8s9JPGjxSs/7rRC9H5qyo5GtB5wZmP5ingXeH/5UrJMJHTU2OByCgKmVkNrbSb /rOyri/4lTY838FYj9yuuuTnuUiG9Y+DCDW4pZkc/DKbitq+kTe5p0G2M80Mi+7vdkmc+dZk 72hvbToIesg0zaldO41C3G0GAkmVUFKFSOuzdFSfqV/wmWfG0YAzcmCA2k3DZEy+MdSOFtR+ Mc6cWsNVz241++5lefTpulE3qzPLeHuNYIb/HtkxDjEEvw8Rp3ABajXjTNa9G5g24YUQbCEP ZJfNmMHgBfoO3WjPn8TAYklkc+ignD2biJSslWYpuw6/gA/ySQujua2YIqOJrRmQ+1nphum9 jyd9F2nXB8WM4fG1QG/yH+z07qncSTTA9pJSOPpqJaGmma7zWsKTRYSSFGTuui8kkf4WtRFK kVS9DBGkEQp3EmiT927VRmxqWObtAQbVtkWHvBSBByxJrT84jaFLGolawd7d9EdlpZtRQUH7 Q+EkIa8bdBwi4F5WU5x55/N8mvvYXZJczVdDRLoWzfp9PHFhOkOYvPzojVLSvTdYiXdQ22Y/ txzhHFWanV6pZdjO1+H1V7GmSmwgZPCUxQ44A7aNkr8sVskPND7PtT2sgWFhRqlEGp/ZgfZ1 JTjs5XOhN3i8bnWzXzlrBglRez3vqjt3MP03wQ3d3Xey9hd0yX7JtwOsW8WyLZBLscAcDaha gnN/2tsCGx7YhOXgG4eS9zZNvnGOoC7S4m6B6qFMoMQCnWzHSfelBxTiYer9ziFuCARfWsXY s3znR+EXCZAW5d0hiG7Xfkc2rINzyUzjzGbD5PiwhjtlfLUaHeJQP1XeBGDf8Io3pOi+Q/1y td4M9fV6hN9VOalXDLb37ROJn83LF86J6vMleppSsC5LDFLJkQdGt7K4LZ4e4Vaj6VfzejJ2 XemW35n8ln0hFyZCADTanlcd67eWIZ/nEIFZh0qH0ia5GM+U7aS7YM0VZg+TZ84/sNNkN92S PgkfZ2bI/JtEz7oxRUUXaPfnqdDKiu5oB2oBDW0RgQ/c7pLZR37yvW9ciTBrCAxXzeK7+0gq Lie5ybnaJskRTU6Kv3JafiqnmiDjVJElM1cB0L3c8RuImPy+41XKgv0vP89A+cIDT7hnjK69 QKnMS00lNn3gb0e0Yf23PifjoKTDeFBMFJQHDDb4ZaIJCDqxDef7rEaYtmYXwL2dT3Sw7qjV 9V33vunEfwgnXR2iaRePYtv75oD44rIm+cH4CViRG7Gfna6OINGe3Om5/RChodJ571euDa1Z H6xx8lnCe23H/3hQXEsJ1sDT+Wc1PsrtCHYwtYrLW7buiJm3rq1fn9DHhuLiR1YPLtEAL1+8 OEDpvwp8BGNuicrFv2kjSlk0XuGAVJdcqcgt7AcWJTKjChywH58QJXsMA3Ey7DRVMdta24Ee iS1gojGjJRinnvySWI5TyXx7LAMlKYwtwBv535cAVaww/7upOI9hT9V+hQJFjVl9A1Nib9PC zI6Jn9OBPu8+hlzj5J+REGqIQZKASOZ9mHXy1ckkG74TVGiZlfSLV8SaPq8w0QEz11yJjRr3 qmU6GLAYwbYeMvc2igTW0k8j9fBSdd31BPJmeH5PsCjMqQ5XwHYgf6VVTJVkyfkPMI/v12Yh O9I+O0rV7b3GxRNqIIGCq6b948qdja6GEJ4T8pMxpg5RVPnRGnq2BylCVyARcdWFvmbrW66E 5NPI+xMZTSf1QGPjDERKoAML494nfUstd4La+7vKVUnqJqalCJi67jLxxj9hUgqYtRgqtk8I YXvbAC/EnScqH9Xum3VpuxGBzaIWsYFbwjCw+yFyuUFOJYduuVKc0tp8L+Lk1iKEQlgpTS4g RjiYvLI8ulc1ohcpYvgPaFdDQGSK9moduCp8hi2gutef+H0LsbCmAMEmGbJZz0ME+MqZO12s rCRvPrc/kDP5u82Wl+EvaixLfBC4MHqUddHNs7yEmJhohKDf83R+DoGxXGzLM1YsdFa5/T/f TCCVumLSYc3VetelVpvUAoPIyZFXu6zJu3lqDimpvuBNgkF3EaVZJm7/HvudidAejVOJ5T6D RTutu2z4sxD6r5BHwIAG+osFqoQzIUPgkf6X4aZWfik4miUbpeqv7LjkVwq7TjCF2aOCsH34 tTIW3ASsfh0VL7glLlkX05a53X7z0qRRcE6eksU5sF8kTe3DygLMIzx9L0YX4pMnHWaOI7QP VnwgahLNck5dT5AeBzh/975WQqcQOETUjs8yvrF4GvMAxqL6Ei87HeNO8uuD7qautcu8Q1/F ewjxw==
  • Ironport-hdrordr: A9a23:sO7JxqtWaHw+eCKTT20fA1Mt7skDbdV00zEX/kB9WHVpm7+j9v xG/c5w6faaslossR0b6LO90cq7Lk80l6QZ3WB5B97LNzUO01HYTr2Kg7GD/9StIUPDytI=
  • Ironport-phdr: A9a23:F64PUBWYTId0YKdfq/guhxR/J6HV8KzzXzF92vMcY1JmTK2v8tzYM VDF4r011RmVB9WdsawZwLWM+4nbGkU+or+580o+OKRWUBEEjchE1ycBO+WiTXPBEfjxciYhF 95DXlI2t1uyMExSBdqsLwaK+i764jEdAAjwOhRoLerpBIHSk9631+ev8JHPfglEnjWwba1xI Rmsswnct8kbjYRgJ6ov1xDEvmZGd+NKyGxnIl6egwzy6sCs8pB97i9eoegh98lOUaX7e6Q3U 7lVByk4Pm42+cPmqwDNQROA6XUAXGoWlAFIAxXe4xHhQpjxqCr6ufFj1yScIMb7UKo7WTWm7 6dsVR/olCIKPCM3/W3LlsB9ir9QrxW8qRxi2I7UeJ+aO+Zifq3TetMaQHBOXsdXVydcBo+xY I8CA+8HMO1FrYfyukEOoAOiCgevCu3gxTFHiGH206MnyOkuCx3K0RY6Et4SrnjZrtP4P7oSX +Cvy6nIyC3OYfVM1jfm84jHbA0qrOyIXb1qa8XR1EcuGBjfjlWXt4PlPC6e2+MJs2eB9OpgT vyghnU8pAF1uDeg3Nosi4fSiYIPzVDL6yZ5zJwrKt2lUUN2Z8OvH5RMuS+ALYR2Xt8iTH9yu CY80rALp5G2cigIxZko2hPRZeKKf5WU7x/+VeucLjl1iWx5dLy/iBi/80qtxOL8WMSo0FhHs DdJn9fQu30R2RLd5MyKR/1g9Umv3jaP0hrc6uBCIU0siKXbKoUhzaI3lpoOq0TDBTH5lF/5j K+McUUk5/Co5/zoYrXnup+cN5J0hRrkPqQvncyzGf43MhIBXmia5+Swzrrj/Vf2QLlSkvI2j LTZsIzAKcsHp6+1GxJZ3ps55xmiETimzswUnHcGLFJGeRKHl5bmNEzPIP/iF/u/hE6skDhzy /DJP73hHpXNLmXYn7v7Ybl97EtcxQwuxt5c/5JZEqwNLfHzV0PrqtDVAQU1PxKqz+r7Etlw1 J4SVXqMD6KdKq/erEOE6+w1L+WRZ4IYuSzxJ+Un6vPokHQ0h1sdfbSy3ZsLdn+4BOloI0SYY HXymtoNDWEHtRckQuPwkl2NSztTam6yX60i4jE7D5qrDYLZSYCshLyNxjy0HplMaW9cF1CNH 2znd5+DW/gSayKeONFukiEcWbigTY8uyw2uuRfnx7dkLufY4DMUuJ352NRr+eHfiQs++D5pA 8iF1mGCVWB0nmcGRz8s26B/pFRwylOZ3qhjmPxXC8de6OhNUgggL5Hc0uJ7BMvoWgLGYNiJS 0yqTcu7AT0pSNI+2McBY1xhFNW6khDDwy2qDqcIm7CTHpw77rrc32TtJ8Z603vJyK4hj0A/T sRTMW2mm7Vw+hPIB4/JlkWZj7yleb4d3C7L7meDzHCBsFtWUA5qAu35WiUUYVKTptDk7GvDS aWvAPIpKFhv08mHf49DccbkxX9CTf7+J97EZG+y0zO0DA2Ty5uGaIPjZnoXxiLQCw4JjlZAr j69KQEiC3L58CrlBzt0GAezMisEkMF7oXK/FQovyh2SKlZmz/yz8wIUgvqVT7US2KgFsWEvs WY8B06zivTRDdfIvA99ZONEe9po5F5czm/xvQh0N4G/Jbpli11Yfhkk91j22UBPA55b2dMvs GtsyQNzLayC11YUczqDxp3YMLneI3Tt9gqoZq2Q10yNmM2O9PIp7/I1407moBnvFkcm9CB/1 MJJ1nKH+pjQJA8bUJa3WEI+/gVlravdbyp76p68OWREF66yv3eC3tsoALBg0RO8Z5JEN7vCE gbuEsocDszoKeowmlHvYAhWdOZVvLU5Oc+rbZ7kkOaiIfpgkTS6jG9G/JE100SC8DB5Q/LJ2 JBNyu+R3w+OXTPxxFm7tcW/lYdBbDAUVm2xrEqsTIpYfbZ/VYwPAG6zPMetwdh9wZPwGjZZ+ FOlG1IayZqxYxPBJ1f53ABWyQEWuSn+yXb+lWAyyWtx6PXGj0msi6z4eREKO3BGXjxnhFboe s2viswCGVOvd04vnQek4kDzw+5aor5+Ji/dWxQtHWC+Imd8X6+3rrfHbdRI7cZitSxMS+mUa 1STQ6PiqQEd3iClEnYUl1VZP3m6/470mRB3kjfXKH9ot3fxc8V5zArA7sbbSP0X0ydMF0waw XHHQ1O7Odeu59CdkZzO5/u/W2yWXZpWaSD3zImEuUNX/EVSCAak17C2k9zjSk0h1DPjksJtT WPOpQr9ZY/i0+K7N/hmdw9mHg20581/E4B42ow+4fNYkXwTnY6Y1XADmG7uLt9B3q/9KncQD TIG2N/a5gH51VYrdyLUgdilEC/Hmo0+PpGzeSsO1zg47txWBavxjvQMhiZzrlei7ErQbfV7g jYB2K4r4X8ejfsOvVllxSGcD7YOWEhAaHW8ylLSs5bk9v0ROz/8FNr4nFBzlt2gEryY9wRVW XKiP4wnATc19cJndlTFzHz07IjgPtjWd9Ma8BOOwHKix6BYLow8kv0SiG9pI2X46DchwvUgg DRk2ZizoZeNMWJg/+S0H1QLU1+9L9NW4TzrgatEy4yS1pu0E71rHjwCQYTiV/WlE3QfqL60U mTGWC15oXCdF73FGAaZ40oztHPDHaegMHSPLWUYx9FvF1GNYVZSiwcOUHAmj4Y0Q0q0kdf5f h4ztVVzrhbo7wFBweVyO1zjX3fD8U22PywsRsHXKQoe50lU7kPRe6Ry98pVGCdVtt2kpQ2Jc SmAYhhQSHoOQgqCDkziOb+n4Z/B9fKZD6ywNamGZ7LGsuFYW/qSoPDnmoJ74zaBMNmONXh+H rU63ERER3VwB8XenX0GVSUWkyvHa8PTqg27/2V7qcW28fKjXwyKh8PHE7xJLdBm4Ay7m4+GP u+Uwix9ID9FyZkWwnLLjrUCnRYThyxoazixAOEAuCrKH8ey0udcCx8WbT82NdMdtvJhmFMRf 5SB0pWqj+0d7LZ9EVpOWF3/l9v8YMULJzr4L1bbHAOQM6zAIzTXwsbxaKf6SLtKjewSuQfj3 FTTW0LlIDmHkCHkEh61NuQZxiWSJwdTkIu5ex9wFmL5S9/lLBCmeowS73V+0fgvi3XGOHRJe yB7aF9Ip6aM4Dlwh/x+HypL6nNsMPaOgSGU7K/VN9xF1JkjSjQxnOVc7nMgzrJT5ywRX/15l hzZqdt2qk2nmO2CotKIeBBKoz9Wm4iRukhhf6jEpMAosZfs+RsM6SOaDx0DusRvENriuOZd1 4qX/EoWAD1G+tvJ4sEGAMXXbs+aYiJJDA==
  • Ironport-sdr: 65a15973_Yu5mOVEZwT6OSH70LldJq4bDANP12TRRn15uywADhj5hEjc p10oTpIBscvUA/4oPSk1qg0okpwzqnZCubybpSg==

For a more rich text version of this announcement, please refer to Coq's Discourse.

We are pleased to announce Tactician's API, a new AI interface for theorem proving, building on Tactician. This includes a new graph-based dataset of over 520k definitions (of which 260k are theorems) in 120 Coq packages, one of the largest datasets for AI interactive theorem proving. We also present a new state-of-the-art neural theorem proving solver Graph2Tac, designed for proving theorems in Coq projects with new definitions not seen during training.

The main contributions in this work are as follows:

  1. A novel method of calculating an internal representation of definitions and theorems, giving Graph2Tac a deeper semantic understanding of a proof state and which lemmas are appropriate for it. Graph2Tac can create representations of objects that were not seen during training, allowing it to perform well even on new Coq projects.
  2. One of the most comprehensive studies yet of various AI methods in interactive theorem proving including k-NN solvers, transformers, graph-based models, and hammers.
  3. An interface to Coq making it possible to train and connect your own custom machine learning models.
  4. A benchmarking system making it easy to give an apples-to-apples comparison to our work.

For Coq users, our neural models Graph2Tac and Text2Tac, are available as part of Tactician and can be run on a laptop (no GPU required). One can use Tactician's `Suggest` command to suggest tactics, and `synth` tactic to find a complete proof of a proof state.

The details are spelled out in these three papers:
- Graph2Tac: Learning Hierarchical Representations of Math Concepts in Theorem proving
- The Tactician’s Web of Large-Scale Formal Knowledge
- Hashing Modulo Context-Sensitive α-Equivalence

Dataset: Our dataset, which can be explored online, faithfully represents the internal representation of Coq's universe of mathematical knowledge as a single interconnected graph. The visualization includes hierarchies of modules, global context information, definitions, tactical proofs, and tactic proof transformations.

The dataset contains two representations. The text-based, human-readable representation is useful for training language models. The graph-based representation is designed such that two terms are alpha-equivalent terms if and only if the forward closure of their graphs is equivalent (bisimilar). This allows us to merge alpha-equivalent subterms, heavily compressing the dataset. Having a densely connected graph makes it ideal for graph-based machine learning models. To support this term-sharing, we introduce a novel graph-sharing algorithm with `O(n log n)` complexity.

Graph2Tac: In practical AI theorem proving, one of the main challenges is handling new definitions and theorems not seen during training. We want a model which can work online, adapting itself to users' new projects in real time, so we train on one set of projects and test on another set never seen during training. Our novel neural theorem proving architecture, Graph2Tac, adds a new definition task mechanism that improves theorems solved from 17.4% to 26.1%. For example, even though our model has never before seen the Poltac package, it can solve 86.7% of Poltac theorems, more than any other Coq theorem prover in the literature, including ProverBot9001 and CoqHammer.

Our definition task works by learning an embedding for each definition and theorem in the training set, and then simultaneously training a model to predict those learned embeddings. At inference time, when we encounter a new definition not seen during training, we use this definition task to compute its embedding directly.

Our work contains one of the most extensive comparisons with other proving methods, including CoqHammer, a baseline transformer, and Tactician's built-in k-NN model. The transformer model is similar to those used in GPT-f, PACT, Lisa, etc. The k-NN model is also an online model, learning from proofs or recent theorems, and is actually still an excellent model for short time periods, say one minute, whereas Graph2Tac excels at the longer 10-minute mark. Appendix H of
our paper also provides an informal comparison with Proverbot9001 and CoqGym family of solvers. We hope these comparisons will provide a lot of discussion and move the field forward.

Tools for AI research We provide a lot of tools for AI researchers who would like to compare with or build on our results and for Coq developers who would like to build practical tools for Coq users.

- Interaction protocols: External agents can interface with Coq, either by providing tactic predictions for Tactician’s search procedure for the `synth` tactic, or by exploring the proof tree themselves through the `Tactician Explore` command. Agents receive a full copy of Coq’s internal kernel knowledge, which they can utilize to make decisions.

- PyTactician: A Python library to facilitate reading the dataset and interface with Coq.

- Benchmarking: Tools to evaluate the proving strength of agents on arbitrary Opam packages. Benchmarks can be run on a laptop, a high-powered server and even massively parallelized on a High Performance Computing (HPC) cluster.

We would love to receive feedback both from Coq users and AI researchers, including possible future collaborators!

Lasse Blaauwbroek, Jason Rute, Miroslav Olšák, Fidel Ivan Schaposnik Massolo,
Jelle Piepenbrock, Vasily Pestun



  • [Coq-Club] Announcing Graph2Tac, a prover based on Tactician's new API, Lasse Blaauwbroek, 01/12/2024

Archive powered by MHonArc 2.6.19+.

Top of Page