coq2html does not like Unicode
jeromesimeon opened this issue · 3 comments
jeromesimeon commented
coq2html gets confused when unicode characters are used in identifiers. Here is a small example
Inductive ωtruth :=
| ωtrue : ωtruth
| ωfalse : ωtruth
| otrue : ωtruth
| ofalse : ωtruth.
Definition otruth := ωtruth.
Reserved Notation "'¬' t" (at level 50).
Definition ωneg (ω:ωtruth) : ωtruth :=
match ω with
| ωtrue => ωfalse
| ωfalse => ωtrue
| otrue => ofalse
| ofalse => otrue
end.
Notation "'¬' x" := (ωneg x).
Definition oneg := ωneg.
Lemma ωdoubleneg :
forall (ω:ωtruth), ¬(¬ω) = ω.
Proof.
destruct ω; reflexivity.
Qed.
Lemma odoubleneg :
forall (o:otruth), oneg (oneg o) = o.
Proof.
destruct o; reflexivity.
Qed.
The o
-prefixed identifiers are handled properly but not the ω
-prefixed ones.
xavierleroy commented
I'm afraid no attention to Unicode was paid so far in the implementation of coq2html! Indeed, the regexp for identifiers
Line 274 in d68b757
jeromesimeon commented
I don't recall how ocamllex does with unicode. I had a very good experience with Alain Frish's sedlex https://github.com/alainfrisch/sedlex
xavierleroy commented
ocamllex doesn't know anything about Unicode but it might be able to do something with the UTF-8 encoding of Unicode. I, too, was thinking of ulex and its successor sedlex.