Commit ebb9ca50 authored by Peter Schwabe's avatar Peter Schwabe
Browse files

Updated the "Lessons learned" paragraph

parent 4a863b49
......@@ -92,25 +92,40 @@ above will soon be integrated in a new version of the library.
% As a result we do not believe the metric person-month to be
% a good representation of the verification effort.
\subheading{Lessons learned.} The effort to verify an existing code base is
significantly harder than synthesizing a proven by construction piece of software.
This difficulty is additionally increased by not having the freedom to modify
the original code, and by the low-level optimization applied in it.
This often requires to write functions that mimic the behavior of the C
code before proving multi-level equivalences to reach the desired level of specifications.
VST provides on one hand a large set of lemmas, and on the second hand tactics to use them.
If a lemma is directly applied, it generates a multiple sub-goals with a large set of dependent existential variables.
The tactics provided try to resolve those, and aim to simplify the workload of its user.
In an ideal world, the user does not need to know the lemmas applied under the hood and can just rely on those tactics.
Unfortunately, there were instances where those were not helping
% (\eg applying unnecessary substitutions, unfolding, exploding the size of our current goal; or simply failing),
at such moment, making it necessary to look into the VST code base and search for the right lemma.
Furthermore, the VST being an academic software, it is very hard to work with a tool
without being involved in the development loop. Additionally newer versions often broke
some of our proofs and it was often needed to adapt to the changes. That being said,
as we reported our bugs and struggles to the development team, since then the toolchain improved a lot.
\subheading{Lessons learned.}
Most efforts in the area of high-assurance crypto are carried out
by teams who at the same time work on tools and proofs and often
even co-develop the implementations with the proofs.
In this effort we set out to verify pre-existing software,
written in a not particularly verification-friendly language
using a set of tools (VST and Coq) whose development we are not
actively involved in.
TweeNaCl comes with a claim of verifiability, but it became clear
rather quickly that this claim is only based on the overall simplicity
of the library and not supported by code written carefully such that it can
efficiently be verified with existing tools. The difference between
our verified version of TweetNaCl and the original TweetNaCl in
Appendix~\ref{verified-C-and-diff} gives an idea of some minimal
changes we had to apply to work with VST; many more small changes
would have made the proof much easier and more elegant. As one
example, \todo{Add example here}.
There were many positive lessons to be learned from this verification effort;
most importantly that it is possible to prove ``legacy'' cryptographic
software written in C correct without having to co-develop proofs
and tools. However, we also learned that it is still necessary to
understand to some extent how these tools (in particular VST)
work under the hood.
VST is a collection of lemmas and proof tactics; the idea is
to expose the user only to the tactics and hide the details of
the underlying lemmas. At least in the VST versions we worked
with, this approach did not quite work and at various stages in
the proofs we had to look into the underlying lemmas \todo{Give an example here? Why did we have to do this?}.
Some struggle with VST also taught us another very pleasant lesson,
namely that the VST development team is very responsive and helpful.
Various of our issues were sorted out with their help and we hope
that some of the experience we brought in also helped improve VST.
\subheading{Extending our work.}
The high-level definition (\sref{sec:maths}) can easily be ported to any
......@@ -59,10 +59,20 @@ This paper presents two formal proofs. The first links a C implementation of the
This works provides a rare fully formalized proof linking a mathematical description of a cryptographic construction to a concrete widely-used C implementation. Bridging this gap with a theorem prover requires expertise in the underlying mathematics, in formal verification, and in cryptographic engineering. We need more papers like this to show how this can be done.
Section 3 describes how RFC 7748 is formalized in Coq. The functions "RFC" and "montgomery\_rec\_swap" are interesting and well documented. It is also useful to observe that the spec needs to take care of low-level details like the little-endian encoding of field elements and points, but the functions ZofList/ListofZ32 don't provide much value here. I would recommend just describing them in text and ending the section with the encode/decode functions.
The ZofList/listofZ32 functions are quite central in our correctness proofs
of arithmetic operations, so decided to keep them in the main body of the paper.
Section 4 shows that the TweetNaCl C code meets the Coq spec of RFC 7748. While details of the proof are interesting to formalists, it would be nice to provide a summary of the proof structure and the workflow. For example: (1) Prove that the code is memory-safe, (2) prove that the field arithmetic functions are correct, (3) prove that the add and double functions meet the RFC spec, (4) prove that the montgomery ladder correctly implements the RFC spec. It would also be useful for readers not familiar with Coq or VST to know which of these steps are "easy" and which of them usually take more time and effort.
\todo{Can we answer this one? Would be good to say something.}
Even though you don't find any bugs in the C code it would be useful for the reader if you could describe the kinds of bugs you would have found. For example, there is a history of carry propagation bugs in X25519 code (both in C and in assembly). You could illustrate one of these bugs and show how the VST/Coq proof would be able to find it.
See below (answer to the first requested change).
On page 23, you say "Constant-timeness is not a property verifiable with VST. This is not verifiable with our framework." Please say this prominently in Section 1. When readers sees your montgomery ladder, and sees the constant-time CSWAP function, they may well assume that you are verifying constant-timeness for the implementation. Stating that this is not the case early will help avoid misunderstanding. I recommend that you also cite other (complementary) tools that could be used just to verify constant-timeness for the code. It may be even better if you were to demonstrate the use of some such tool on the TweetNaCl code.
......@@ -71,7 +81,7 @@ On page 23, you say "Constant-timeness is not a property verifiable with VST. Th
I enjoyed Section 5 and I believe it is one of the more important (and reusable) parts of this work. However, the way it is presented reads like a long list of Lemmas and Theorems with hardly any explanation or motivation in between. By the end of this section, the reader has read 20 lemmas and definitions in a mixture of mathematical and Coq syntax. I recommend that the authors pick and choose 5 key lemmas and explain them, leaving the rest to an appendix. The high-level structure of the proof and some examples of "interesting" or "tricky" cases is the best they can hope to communicate in the body of the paper.
We reduced the amount of technical details in Section 5.
We reduced the amount of technical details in Section 5 and focused on the aspects that we consider particularly interesting.
......@@ -84,6 +84,13 @@ Let me start by stating that I’m no expert but rather an enthusiast hoping to
- One major concern I have is that this only proves the C level implementation. C must get translated to assembly, which seems like a lot of work to prove equivalence for. Also, some design decisions, such as changing the representations, seem to be problematic. What happens if this code is compiled on a 32-bit machine? Does that invalidate the proof? If not why not?
\textbf{Evaluation}: The proof is in the result, however, it would be interesting from a systems security perspective to know how long it took to specify. How long does it take to run? How do your decisions for representation impact performance? Can you show a graph depicting these costs? If I want to use a similar strategy, what should I avoid or try to do? What lessons learned? How easy will it be to use this in a different proof? My experience has been that proof reuse is a long way from being practical.
We find it very hard to estimate what time it took to write the proof.
Also, even if we were able to count the hours we spent, this would not really say much about
how much it would take us \emph{now} to undertake a similar effort with the same toolchain
and approach -- as we now have more experience and also because the toolchain has improved
significantly since we started working on this, it would certainly take much less time.
\textbf{Nits}: Maybe move the TCB into its own section (maybe section 3) title Assumptions/Threat Model?
......@@ -32,6 +32,9 @@ Presents a complete proof chain from mathematical specification to the compiled
\item I couldn't get provided proofs to build – reproducibility issue!
See our answer to the requested changes below.
\subheading{===== Detailed comments for authors =====}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment