coq-club AT inria.fr

Subject: The Coq mailing list

List archive

Re: [Coq-Club] Unicode vs. homographic attacks vs. Coq trust

From: roux cody <cody.roux AT gmail.com>
To: coq-club AT inria.fr
Subject: Re: [Coq-Club] Unicode vs. homographic attacks vs. Coq trust
Date: Mon, 7 Mar 2016 15:42:45 -0500
Authentication-results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=cody.roux AT gmail.com; spf=Pass smtp.mailfrom=cody.roux AT gmail.com; spf=None smtp.helo=postmaster AT mail-ob0-f169.google.com
Ironport-phdr: 9a23:Ni7cbRZ8cEmdAgQC56lUzUP/LSx+4OfEezUN459isYplN5qZpc++bnLW6fgltlLVR4KTs6sC0LqJ9fC5EjVbv96oizMrTt9lb1c9k8IYnggtUoauKHbQC7rUVRE8B9lIT1R//nu2YgB/Ecf6YEDO8DXptWZBUiv2OQc9HOnpAIma153xjLDtvc2KKFwT2nKUWvBbElaflU3prM4YgI9veO4a6yDihT92QdlQ3n5iPlmJnhzxtY+a9Z9n9DlM6bp6r5YTGfayQ6NtRrtBST8iLmod5cvxtBCFQxHcyGEbVzAMlQdYSwPC4ACyCpz2vjq8rO1gyAGVOMT3SfY/XjH0vPQjcwPhlCpSb21xy2rQkMEl1K8=

One could argue that Coq doesn't really have a duty to defend against such "malicious provers" at the cost of convenience for well meaning users. This issue comes up from time to time on the list, but I think it creates a huge burden on the tool. There are tons of ways to create these "lying proofs" already:

https://www.cs.ru.nl/~freek/talks/rap.pdf

I guess that this isn't really consistent with the notion of "Pollack-consistency", but I feel it's pragmatically justified.

I would *love* to see an "obfuscated Coq code contest" though, where users would attempt to "cheat" formalization of some difficult theorem by such tricks or crafting a deceitfully trivial main theorem.

In practice of course, it would be good practice to avoid homographs in developments.

On Mon, Mar 7, 2016 at 2:15 PM, Jonathan Leivent <jonikelee AT gmail.com> wrote:

Another take on the issue of how best to get Unicode into proof scripts is: maybe sometimes it should not be allowed?

Consider the problem of homographic attacks (as with: https://en.wikipedia.org/wiki/IDN_homograph_attack).

Could this become an issue with trust in the case of Coq vs. the use of possibly homographic Unicode characters in Coq scripts?

Or perhaps Coq should perform some kind of Unicode normalization (see: https://en.wikipedia.org/wiki/Unicode_equivalence)?

One could argue that the issue already exists in many ASCII fonts that obscure the difference between "0" and "O", or between "1", "I" and "l". But, at least in those cases, one can choose a font that renders these characters sufficiently distinctly so that human readers (and OCR devices) are less likely to confuse them. With Unicode, there are too many similarities and near-similarities, even among mathematical symbols. A similar issue occurs when there are distinct input methods (such as Company-Coq's auto-folding of "<=" vs. latex folding of "\leq") that leave the underlying representation different while producing the same appearance.

-- Jonathan

[Coq-Club] Unicode vs. homographic attacks vs. Coq trust, Jonathan Leivent, 03/07/2016
- Re: [Coq-Club] Unicode vs. homographic attacks vs. Coq trust, roux cody, 03/07/2016
- Re: [Coq-Club] Unicode vs. homographic attacks vs. Coq trust, Clément Pit--Claudel, 03/07/2016