Skip to Content.
Sympa Menu

coq-club - Re: [Coq-Club] Automatically tracking down bugs in coq?

coq-club AT inria.fr

Subject: The Coq mailing list

List archive

Re: [Coq-Club] Automatically tracking down bugs in coq?


Chronological Thread 
  • From: Jason Gross <jasongross9 AT gmail.com>
  • To: Jonas Oberhauser <s9joober AT gmail.com>
  • Cc: coq-club <coq-club AT inria.fr>, David Poetzsch-Heffter <davidpoetzsch AT gmx.net>, Michael Backenköhler <m.backenkoehler AT gmail.com>
  • Subject: Re: [Coq-Club] Automatically tracking down bugs in coq?
  • Date: Thu, 11 Apr 2013 03:45:02 -0400

Hi,
Thanks, everyone, for the suggestions.

I've thrown together a bit of code at https://github.com/JasonGross/coq-bug-finder which does some basic brute-force reduction: it in-lines all of the imports, strips out all of the comment,s, removes all of the code after whichever line generates the bug, and then removes any lemmas, definitions, Ltacs, etc which aren't used anywhere after their definition.  (It doesn't yet remove notations, but adding that shouldn't be too hard.)  It then tries to replace definition bodies with [Admitted.], in a rather dumb way.
I'm currently in the process of testing it, but I'm not sure when I'll get around to adding more, so I figured I'd send it out to the list.  (Contributions from others welcome.)

-Jason


On Wed, Apr 10, 2013 at 3:21 PM, Jonas Oberhauser <s9joober AT gmail.com> wrote:
Am 10.04.2013 10:07, schrieb AUGER Cédric:

Le Tue, 9 Apr 2013 23:55:17 +0200,
Frédéric Besson <frederic.besson AT inria.fr> a écrit :

On 9 avr. 2013, at 22:14, Jason Gross <jasongross9 AT gmail.com> wrote:

Hi,
I am attempting to create minimal reproducing examples for some
bugs I have found in Coq.  Currently my strategy is to: 1. start
with the source file which displays the error in Coq 2. recursively
inline all of the files in my development which that file imports
3. rename things so that this bloated file displays the error
(e.g., if I have duplicate section names, I need to rename one of
them) 4. remove all lines of code which come after the line
displaying the error 5a. replace as many proofs, definition bodies,
etc. with [admit] as I can, while keeping the file compiling and
displaying the bug 5b. work backwards through the file, removing
any definitions, theorems, etc. which are not used later in the
file (and for which removing them does not make the error go away)
6. play around with the drastically smaller file, trying to further
eliminate things.

(I sometimes interleave 5a and 5b.)

I have created a python script that does step 2 for me
automatically (assuming that all [Require Import] commands span
only one line), and steps 3 and 4 do not take much time.
Currently, most of the manual drudge-work and time is spent in 5a
and 5b.  I am considering writing python scripts to automate these
for me, but before I do, I want to ask, does anyone know of any
existing scripts which do this?  (Alternatively, do people have
suggestions about how to better create such examples?)
This would be very handy indeed to have an automatic tool for
isolating bugs. I remember spending quite some time trimming down a
development to eventually come up with a ridiculously small buggy
input!

There is an algorithm/approach called delta debugging
http://www.st.cs.uni-saarland.de/dd/ which automates the process of
isolating a small buggy input by performing some kind of binary
search. I do not precisely remember the requirements for using this
approach and how generic the algorithm is. At the time, it looked
pretty neat to me…

Best,
--
Frédéric


Oh, and I never tried it, but you may consider using the "Separate
Extraction" command, as it will extract only the necessary parts. Of
course you do not want to extract your code, but that could be a good
start to select the smallest set of definitions you require for your
bug. (But all the Prop stuff, like lemmas won't be selected this way as
it lives in Prop and not Type).

Hi all,

I and some friends (in CC) recently programmed a prototype for something similar in and for python.
The idea is to monitor a python program until an exception occurs and then creating a unit test.
This test recreates the environment (i.e. global variables, function stack, parameters) of a python function call that reproduces the error, as close as possible to the line where the exception occurred (but not too close).
We were planning on using delta debugging to minimize the environment, but left it out due to time constraints (and, as I recall, some technical reason).
I can still fill you in on some of the details of delta debugging.

Delta debugging on an abstract level takes as arguments an ordered finite set "S", a function "inp" that maps ordered subsets of S to an input, and a function "test" that decides whether an input is ok or not.
The algorithm produces a smallest ordered subset S' of S such that test(inp(S')) returns "not ok" (i.e. the input generated from S' is not ok, and all ordered strict subsets of S' generate an input that is ok).
The only requirement for this I recall is that of monotonicity: for all ordered subsets S1 of S such that test(inp(S1)) returns "ok", all ordered subsets S2 of S1 must satisfy test(inp(S2)) also be "ok".
A naive implementation of delta debugging requires that test(inp(S)) be "not ok" but that can be easily circumvented.

The hard part is finding the right level of abstraction for the problem and finding a good "test" function.
As an example, you want to debug a function working on HTML. It takes a string as an input but for some inputs it throws an exception.
In this case, "test" should probably be "ok" when the algorithm runs through and "not ok" when the exception is thrown.
However, what should "test" do when another exception is thrown? Much care needs to be taken to not violate the monotonicity constraint. As an example, the function could throw a "syntax is wrong" exception.
If this is considered as "not ok", then you could have one input which is ok and a smaller input which is not syntactically correct and thus breaks the constraint.
There are many options for S, too.
One could use the ordered list of characters in the HTML string.
One could first remove all strings and comments from the HTML string, and then use the ordered list of characters.
One could tokenize the HTML string and use the parsed tokens.
The first creates a rather large set, the second and third might abstract away from information that is required for the bug.
For example if our function throws the exception when the HTML string contains more than one comment the second option will not work.
If the excpetion is thrown when a string contains the two letters "ab", the third option will create something that is too big (namely it will not minimize the string).

In our example from above, S would have been the set of pairs of global variables and their values in the environment.
inp would create a python program that assigns every variable its value, then calls the function we want to debug
test would say "ok" if a variable is referenced in that python program that is not assigned, and "not ok" otherwise.
Delta debugging would thus find the minimum set of global variables such that the python program "runs".

Ok this became a long e-mail, I hope this is helpful in any way.

Kind regards, Jonas Oberhauser




Archive powered by MHonArc 2.6.18.

Top of Page