Commit f5059abf by Peter Schwabe

### Some cleanup (removed unnecessary files etc.)

parent 8d9e5f7e
 \section{Curve25519 Implementation} \label{sec2-implem} In this section we present in detail the algorithm computing the crypto scalar multiplication. We then discuss which proofs are required. \subsection{Implementation} \label{sec:impl} \subsection{What needs to be proven?} Verifying \texttt{crypto\_scalarmult} also implies to verify all the functions subsequently called: \texttt{unpack25519}; \texttt{A}; \texttt{Z}; \texttt{M}; \texttt{S}; \texttt{car25519}; \texttt{inv25519}; \texttt{set25519}; \texttt{sel25519}; \texttt{pack25519}. We prove that the implementation of Curve25519 is \textbf{sound} \ie \begin{itemize} \item absence of access out-of-bounds of arrays (memory safety). \item absence of overflows/underflow on the arithmetic. \end{itemize} We also prove that TweetNaCl's code is \textbf{correct}: \begin{itemize} \item Curve25519 is correctly implemented (we get what we expect). \item Operations on \texttt{gf} (\texttt{A}, \texttt{Z}, \texttt{M}, \texttt{S}) are equivalent to operations ($+,-,\times,x^2$) in \Zfield. \item The Montgomery ladder does compute a scalar multiplication between a natural number and a point. \end{itemize} In order to prove the soundness and correctness of \texttt{crypto\_scalarmult}, we first create a skeleton of the Montgomery ladder with abstract operations which can be instanciated over lists, integers, field elements... A high level specification (over a generic field $\K$) allows use to prove the correctness of the ladder with respect to the curves theory. This high specification does not rely on the parameters of Curve25519. By instanciating $\K$ with $\Zfield$, and the parameters of Curve25519 ($a = 486662, b = 1$), we define a middle level specification. Additionally we also provide a low level specification close to the \texttt{C} code (over lists of $\Z$). We show this specification to be equivalent to the \textit{semantic version} of C (\texttt{CLight}) with VST. This low level specification gives us the soundness assurance. By showing that operations over instances ($\K = \Zfield$, $\Z$, list of $\Z$) are equivalent we bridge the gap between the low level and the high level specification with Curve25519 parameters. As such we prove all specifications to equivalent (Fig.\ref{tk:ProofStructure}). This garantees us the correctness of the implementation. \begin{figure}[h] \include{tikz/specifications} \caption{Structural construction of the proof} \label{tk:ProofStructure} \end{figure} \subsection{Correctness Specification} We show the soundness of TweetNaCl by proving the following specification matches a pure Coq function. This defines the equivalence between the Clight representation and a Coq definition of the ladder. \begin{CoqVST} Definition crypto_scalarmult_spec := DECLARE _crypto_scalarmult_curve25519_tweet WITH v_q: val, v_n: val, v_p: val, c121665:val, sh : share, q : list val, n : list Z, p : list Z (*------------------------------------------*) PRE [ _q OF (tptr tuchar), _n OF (tptr tuchar), _p OF (tptr tuchar) ] PROP (writable_share sh; Forall (fun x => 0 <= x < 2^8) p; Forall (fun x => 0 <= x < 2^8) n; Zlength q = 32; Zlength n = 32; Zlength p = 32) LOCAL(temp _q v_q; temp _n v_n; temp _p v_p; gvar __121665 c121665) SEP (sh [{ v_q }] <<(uch32)-- q; sh [{ v_n }] <<(uch32)-- mVI n; sh [{ v_p }] <<(uch32)-- mVI p; Ews [{ c121665 }] <<(lg16)-- mVI64 c_121665) (*------------------------------------------*) POST [ tint ] PROP (Forall (fun x => 0 <= x < 2^8) (CSM n p); Zlength (CSM n p) = 32) LOCAL(temp ret_temp (Vint Int.zero)) SEP (sh [{ v_q }] <<(uch32)-- mVI (CSM n p); sh [{ v_n }] <<(uch32)-- mVI n; sh [{ v_p }] <<(uch32)-- mVI p; Ews [{ c121665 }] <<(lg16)-- mVI64 c_121665 \end{CoqVST} In this specification we state as preconditions: \begin{itemize} \item[] \VSTe{PRE}: \VSTe{_p OF (tptr tuchar)}\\ The function \texttt{crypto\_scalarmult} takes as input three pointers to arrays of unsigned bytes (\VSTe{tptr tuchar}) \VSTe{_p}, \VSTe{_q} and \VSTe{_n}. \item[] \VSTe{LOCAL}: \VSTe{temp _p v_p}\\ Each pointer represent an address \VSTe{v_p}, \VSTe{v_q} and \VSTe{v_n}. \item[] \VSTe{SEP}: \VSTe{sh [{ v_p $\!\!\}\!\!]\!\!\!$ <<(uch32)-- mVI p}\\ In the memory share \texttt{sh}, the address \VSTe{v_p} points to a list of integer values \VSTe{mVI p}. \item[] \VSTe{PROP}: \VSTe{Forall (fun x => 0 <= x < 2^8) p}\\ In order to consider all the possible inputs, we assumed each elements of the list \texttt{p} to be bounded by $0$ included and $2^8$ excluded. \item[] \VSTe{PROP}: \VSTe{Zlength p = 32}\\ We also assumed that the length of the list \texttt{p} is 32. This defines the complete representation of \TNaCle{u8[32]}. \end{itemize} As Post-condition we have: \begin{itemize} \item[] \VSTe{POST}: \VSTe{tint}\\ The function \texttt{crypto\_scalarmult} returns an integer. \item[] \VSTe{LOCAL}: \VSTe{temp ret_temp (Vint Int.zero)}\\ The returned integer has value $0$. \item[] \VSTe{SEP}: \VSTe{sh [{ v_q $\!\!\}\!\!]\!\!\!$ <<(uch32)-- mVI (CSM n p)}\\ In the memory share \texttt{sh}, the address \VSTe{v_q} points to a list of integer values \VSTe{mVI (CSM n p)} where \VSTe{CSM n p} is the result of the \VSTe{crypto_scalarmult} over \VSTe{n} and \VSTe{p}. \item[] \VSTe{PROP}: \VSTe{Forall (fun x => 0 <= x < 2^8) (CSM n p)}\\ \VSTe{PROP}: \VSTe{Zlength (CSM n p) = 32}\\ We show that the computation for \VSTe{CSM} fits in \TNaCle{u8[32]}. \end{itemize} This specification shows that \VSTe{crypto_scalarmult} in C computes the same result as \VSTe{CSM} in Coq provided that inputs are within their respective bounds. By converting those array of 32 bytes into their respective little-endian value we prove the correctness of \VSTe{crypto_scalarmult} (Theorem \ref{CSM-correct}) in Coq (for the sake of simplicity we do not display the conversion in the theorem). \begin{theorem} \label{CSM-correct} For all $n \in \N, n < 2^{255}$ and where the bits 1, 2, 5 248, 249, 250 are cleared and bit 6 is set, for all $P \in E(\F{p^2})$, for all $p \in \F{p}$ such that $P.x = p$, there exists $Q \in E(\F{p^2})$ such that $Q = nP$ where $Q.x = q$ and $q$ = \VSTe{CSM} $n$ $p$. \end{theorem} A more complete description in Coq of Theorem \ref{CSM-correct} with the associated conversions is as follow: \begin{lstlisting}[language=Coq] Theorem Crypto_Scalarmult_Correct: forall (n p:list Z) (P:mc curve25519_Fp2_mcuType), Zlength n = 32 -> Zlength p = 32 -> Forall (fun x => 0 <= x /\ x < 2^8) n -> Forall (fun x => 0 <= x /\ x < 2^8) p -> Fp2_x (ZUnpack25519 (ZofList 8 p)) = P#x0 -> ZofList 8 (Crypto_Scalarmult n p) = (P *+ (Z.to_nat (Zclamp (ZofList 8 n)))) _x0. \end{lstlisting} Its proof is explained in the next section.
 \section{Representation and Reflections} \label{sec4-refl} In this section we describe techniques used to prove the equivalence between the Clight description of TweetNaCl and Coq functions producing similar behaviors. \subsection{Number Representation and C Implementation} As described in Section \ref{sec:impl}, numbers in \TNaCle{gf} are represented in base $2^{16}$ and we use a direct mapping to represent that array as a list integers in Coq. However in order to show the correctness of the basic operations, we need to convert this number as a full integer. \begin{definition} Let \Coqe{ZofList} : $\Z \rightarrow \texttt{list} \Z \rightarrow \Z$, a parametrized map by $n$ betwen a list $l$ and its it's little endian representation with a base $2^n$. \end{definition} We define it in Coq as: \begin{lstlisting}[language=Coq] Fixpoint ZofList {n:Z} (a:list Z) : Z := match a with | [] => 0 | h :: q => h + (pow 2 n) * ZofList q end. \end{lstlisting} We define a notation where $n$ is $16$. \begin{lstlisting}[language=Coq] Notation "Z16.lst A" := (ZofList 16 A). \end{lstlisting} We also define a notation to do the modulo, projecting any numbers in $\Zfield$. \begin{lstlisting}[language=Coq] Notation "A :GF" := (A mod (2^255-19)). \end{lstlisting} Remark that this representation is different from \Coqe{Zmodp}. However the equivalence between operations over $\Zfield$ and $\F{p}$ is easily proven. Using these two definitions, we proved intermediates lemmas such as the correctness of the multiplication \Coqe{M} where \Coqe{M} replicate the computations and steps done in C. \begin{lemma} For all list of integers \texttt{a} and \texttt{b} of length 16 representing $A$ and $B$ in $\Zfield$, the number represented in $\Zfield$ by the list \Coqe{(M a b)} is equal to $A \times B \bmod \p$. \end{lemma} And seen in Coq as follows: \begin{Coq} Lemma mult_GF_Zlength : forall (a:list Z) (b:list Z), Zlength a = 16 -> Zlength b = 16 -> (Z16.lst (M a b)) :GF = (Z16.lst a * Z16.lst b) :GF. \end{Coq} \subsection{Inversions in \Zfield} In a similar fashion we define a Coq version of the inversion mimicking the behavior of \TNaCle{inv25519} over \Coqe{list Z}. \begin{lstlisting}[language=Ctweetnacl] sv inv25519(gf o,const gf a) { gf c; int i; set25519(c,a); for(i=253;i>=0;i--) { S(c,c); if(i!=2 && i!=4) M(c,c,a); } FOR(i,16) o[i]=c[i]; } \end{lstlisting} We specify this with 2 functions: a recursive \Coqe{pow_fn_rev} to to simulate the for loop and a simple \Coqe{step_pow} containing the body. Note the off by one for the loop. \begin{lstlisting}[language=Coq] Definition step_pow (a:Z) (c g:list Z) : list Z := let c := Sq c in if a <>? 1 && a <>? 3 then M c g else c. Function pow_fn_rev (a:Z) (b:Z) (c g: list Z) {measure Z.to_nat a} : (list Z) := if a <=? 0 then c else let prev := pow_fn_rev (a - 1) b c g in step_pow (b - 1 - a) prev g. \end{lstlisting} This \Coqe{Function} requires a proof of termination. It is done by proving the Well-foundness of the decreasing argument: \Coqe{measure Z.to_nat a}. Calling \Coqe{pow_fn_rev} 254 times allows us to reproduce the same behavior as the \texttt{Clight} definition. \begin{lstlisting}[language=Coq] Definition Inv25519 (x:list Z) : list Z := pow_fn_rev 254 254 x x. \end{lstlisting} Similarily we define the same function over $\Z$. \begin{lstlisting}[language=Coq] Definition step_pow_Z (a:Z) (c:Z) (g:Z) : Z := let c := c * c in if a <>? 1 && a <>? 3 then c * g else c. Function pow_fn_rev_Z (a:Z) (b:Z) (c:Z) (g: Z) {measure Z.to_nat a} : Z := if (a <=? 0) then c else let prev := pow_fn_rev_Z (a - 1) b c g in step_pow_Z (b - 1 - a) prev g. Definition Inv25519_Z (x:Z) : Z := pow_fn_rev_Z 254 254 x x. \end{lstlisting} And prove their equivalence in $\Zfield$. \begin{lstlisting}[language=Coq] Lemma Inv25519_Z_GF : forall (g:list Z), length g = 16 -> (Z16.lst (Inv25519 g)) :GF = (Inv25519_Z (Z16.lst g)) :GF. \end{lstlisting} In TweetNaCl, \TNaCle{inv25519} computes an inverse in $\Zfield$. It uses the Fermat's little theorem by doing an exponentiation to $2^{255}-21$. This is done by applying a square-and-multiply algorithm. The binary representation of $p-2$ implies to always do a multiplications aside for bit 2 and 4, thus the if case. To prove the correctness of the result we can use multiple strategies such as: \begin{itemize} \item Proving it is special case of square-and-multiply algorithm applied to a specific number and then show that this number is indeed $2^{255}-21$. \item Unrolling the for loop step-by-step and applying the equalities $x^a \times x^b = x^{(a+b)}$ and $(x^a)^2 = x^{(2 \times a)}$. We prove that the resulting exponent is $2^{255}-21$. \end{itemize} We use the second method for the benefits of simplicity. However it requires to apply the unrolling and exponentiation formulas 255 times. This can be automated in Coq with tacticals such as \Coqe{repeat}, but it generates a proof object which will take a long time to verify. \subsection{Speeding up with Reflections} In order to speed up the verification, we use a technique called reflection. It provides us with flexibility such as we don't need to know the number of times nor the order in which the lemmas needs to be applied (chapter 15 in \cite{CpdtJFR}). The idea is to \textit{reflect} the goal into a decidable environment. We show that for a property $P$, we can define a decidable boolean property $P_{bool}$ such that if $P_{bool}$ is \Coqe{true} then $P$ holds. $$reify\_P : P_{bool} = true \implies P$$ By applying $reify\_P$ on $P$ our goal become $P_{bool} = true$. We then compute the result of $P_{bool}$. If the decision goes well we are left with the tautology $true = true$. To prove that the \Coqe{Inv25519_Z} is computing $x^{2^{255}-21}$, we define a Domain Specific Language. \begin{definition} Let \Coqe{expr_inv} denote an expression which is either a term; a multiplication of expressions; a squaring of an expression or a power of an expression. And Let \Coqe{formula_inv} denote an equality between two expressions. \end{definition} \begin{lstlisting}[language=Coq] Inductive expr_inv := | R_inv : expr_inv | M_inv : expr_inv -> expr_inv -> expr_inv | S_inv : expr_inv -> expr_inv | P_inv : expr_inv -> positive -> expr_inv. Inductive formula_inv := | Eq_inv : expr_inv -> expr_inv -> formula_inv. \end{lstlisting} The denote functions are defined as follows: \begin{lstlisting}[language=Coq] Fixpoint e_inv_denote (m:expr_inv) : Z := match m with | R_inv => term_denote | M_inv x y => (e_inv_denote x) * (e_inv_denote y) | S_inv x => (e_inv_denote x) * (e_inv_denote x) | P_inv x p => pow (e_inv_denote x) (Z.pos p) end. Definition f_inv_denote (t : formula_inv) : Prop := match t with | Eq_inv x y => e_inv_denote x = e_inv_denote y end. \end{lstlisting} All denote functions also take as an argument the environment containing the variables. We do not show it here for the sake of readability. Given that an environment, \Coqe{term_denote} returns the appropriate variable. With such Domain Specific Language we have the equality between: \begin{lstlisting}[backgroundcolor=\color{white}] f_inv_denote (Eq_inv (M_inv R_inv (S_inv R_inv)) (P_inv R_inv 3)) = (x * x^2 = x^3) \end{lstlisting} On the right side, \Coqe{(x * x^2 = x^3)} depends on $x$. On the left side, \texttt{(Eq\_inv (M\_inv R\_inv (S\_inv R\_inv)) (P\_inv R\_inv 3))} does not depend on $x$. This allows us to use computations in our decision precedure. We define \Coqe{step_inv} and \Coqe{pow_inv} to mirror the behavior of \Coqe{step_pow_Z} and respectively \Coqe{pow_fn_rev_Z} over our DSL and we prove their equality. \begin{lstlisting}[language=Coq] Lemma step_inv_step_pow_eq : forall (a:Z) (c:expr_inv) (g:expr_inv), e_inv_denote (step_inv a c g) = step_pow_Z a (e_inv_denote c) (e_inv_denote g). Lemma pow_inv_pow_fn_rev_eq : forall (a:Z) (b:Z) (c:expr_inv) (g:expr_inv), e_inv_denote (pow_inv a b c g) = pow_fn_rev_Z a b (e_inv_denote c) (e_inv_denote g). \end{lstlisting} We then derive the following lemma. \begin{lemma} \label{reify} With an appropriate choice of variables, \Coqe{pow_inv} denotes \Coqe{Inv25519_Z}. \end{lemma} In order to prove formulas in \Coqe{formula_inv}, we have the following a decidable procedure. We define \Coqe{pow_expr_inv}, a function which returns the power of an expression. We then compare the two values and decide over their equality. \begin{Coq} Fixpoint pow_expr_inv (t:expr_inv) : Z := match t with | R_inv => 1 (* power of a term is 1. *) | M_inv x y => (pow_expr_inv x) + (pow_expr_inv y) (* power of a multiplication is the sum of the exponents. *) | S_inv x => 2 * (pow_expr_inv x) (* power of a squaring is the double of the exponent. *) | P_inv x p => (Z.pos p) * (pow_expr_inv x) (* power of a power is the multiplication of the exponents. *) end. Definition decide_e_inv (l1 l2:expr_inv) : bool := (pow_expr_inv l1) ==? (pow_expr_inv l2). Definition decide_f_inv (f:formula_inv) : bool := match f with | Eq_inv x y => decide_e_inv x y end. \end{Coq} We prove our decision procedure correct. \begin{lemma} \label{decide} For all formulas $f$, if the decision over $f$ returns \Coqe{true}, then the denoted equality by $f$ is true. \end{lemma} Which is formalized as: \begin{Coq} Lemma decide_formula_inv_impl : forall (f:formula_inv), decide_f_inv f = true -> f_inv_denote f. \end{Coq} By reification to over DSL (lemma \ref{reify}) and by applying our decision (lemma \ref{decide}). we proved the following theorem. \begin{theorem} \Coqe{Inv25519_Z} computes an inverse in \Zfield. \end{theorem} \begin{Coq} Theorem Inv25519_Z_correct : forall (x:Z), Inv25519_Z x = pow x (2^255-21). \end{Coq} From \Coqe{Inv25519_Z_correct} and \Coqe{Inv25519_Z_GF}, we conclude the functionnal correctness of the inversion over \Zfield. \begin{corollary} \Coqe{Inv25519} computes an inverse in \Zfield. \end{corollary} \begin{Coq} Corollary Inv25519_Zpow_GF : forall (g:list Z), length g = 16 -> Z16.lst (Inv25519 g) :GF = (pow (Z16.lst g) (2^255-21)) :GF. \end{Coq} \subsection{Packing and other Applications of Reflection} We prove the functional correctness of \Coqe{Inv25519} with reflections. This technique can also be used where proofs requires some computing or a small and finite domain of variable to test e.g. for all $i$ such that $0 \le i < 16$. Using reflection we prove that we can split the for loop in \TNaCle{pack25519} into two parts. \begin{lstlisting}[language=Ctweetnacl] for(i=1;i<15;i++) { m[i]=t[i]-0xffff-((m[i-1]>>16)&1); m[i-1]&=0xffff; } \end{lstlisting} The first loop is computing the substraction while the second is applying the carrying. \begin{lstlisting}[language=Ctweetnacl] for(i=1;i<15;i++) { m[i]=t[i]-0xffff } for(i=1;i<15;i++) { m[i]=m[i]-((m[i-1]>>16)&1); m[i-1]&=0xffff; } \end{lstlisting} This loop separation allows simpler proofs. The first loop is seen as the substraction of a number in \Zfield. We then prove that with the iteration of the second loop, the number represented in \Zfield stays the same. This leads to the proof that \TNaCle{pack25519} is effectively reducing mod $\p$ and returning a number in base $2^8$. \begin{Coq} Lemma Pack25519_mod_25519 : forall (l:list Z), Zlength l = 16 -> Forall (fun x => -2^62 < x < 2^62) l -> ZofList 8 (Pack25519 l) = (Z16.lst l) mod (2^255-19). \end{Coq}
 \section{Trust in the proof and VST} \label{sec5-vst} Any formal system relies on a trusted base. In this section we describe our chain of trust bringing up to light some tripping points a user may encounter when using VST. \subsection{The Trusted Base} \subsection{Time and Space Complexity} Our work can be split into multiple parts. Bellow we provide an approximation of the number of lines of codes and thus an idea of the size of the proofs required to complete this work. \begin{itemize} \item The proof that the Montgomery ladder over a generic field $\K$ respects the elliptic curve theory and then it's specialization for $\F{p^2}$. This represent 4218 lines of code for 0.75 man-year. \item The multiple level definitions of the ladder and operations over lists, their proof of correctness and soundness and additionally that the match with the generic Montgomery ladder. This count for 22967 lines of code for about 1.5 man-year. \item The proof that our specification matches the Clight translation with VST. This count for 6934 lines of code for about 0.75 man-year. \end{itemize} While the proof with VST took slightly less time, having to provide proof that every intermediate computations slows down the process significantly. However it is necessary in order to garantee our the absences of overflows and underflows.
 # Proving correctness of TweetNaCl's Elliptic Curve implementation with Coq --------------------------------------------------------------------------- ## Abstract During his master's thesis Timmy Weerwag proved with Coq that the operations in TweetNaCl were correct with respect to the Ed25519 specifications. However, he did not provide any Coq proof of the correctness of the field arithmetic. The order of the field operations might be proven correct, but this does not garantees that they are actually correctly implemented. This work therefore aims to fill this gap. ## Modus Operandi In order to verify TweetNaCl implementation, we decided to use [VST][1] and the [Coq proof assistant][2]. +-------------+ | | | | | tweetNaCl.c | | | | | +------+------+ +---------- | --------------------------------+ | | clightgen code.c | | | | | v COQ | | +------+------+ VERIFIED | | | | ENCLAVE | | | | | | | tweetNaCl.v | | | | | | | | | | | +------+------+ | | ^ | | | Verified | | | Software | | | Toolchain | | v | | +------+------+ | | | | | | | | | | | verif_ | | | | tweetNaCl.v | | | | | | | | | | | +------+------+ | | ^ | | | | | v | | +------+------+ | | | | | | | | | | | Proofs.v | | | | | | | | | | | +-------------+ | +---------------------------------------------+ Using clightgen from [compcert][3], we can generate the *semantic version* of the C code. Using the Hoare-Floyd logic with VST we can show that the semantic of the program is equivalent to some functions in Coq. We can then prove that these *equivalent functions* do respect the semantic of the arithmetic. ## Number representation and operations The field elements used in Curve25519 are 256 bits longs. They are split into 16 limbs of 16 bits, placed into 64 bits **signed** containers. Basic operations are defined as follow: #### addition sv A(gf o,const gf a,const gf b) { int i; FOR(i,16) o[i]=a[i]+b[i]; } #### substraction sv Z(gf o,const gf a,const gf b) { int i; FOR(i,16) o[i]=a[i]-b[i]; } #### multiplication sv M(gf o,const gf a,const gf b) { i64 i,j,t[31]; FOR(i,31) t[i]=0; FOR(i,16) FOR(j,16) t[i+j] = a[i]*b[j]; FOR(i,15) t[i]+=38*t[i+16]; FOR(i,16) o[i]=t[i]; car25519(o); car25519(o); } #### car25519 sv car25519(gf o) { int i; i64 c; FOR(i,16) { o[i]+=(1LL<<16); c=o[i]>>16; o[(i+1)*(i<15)]+=c-1+37*(c-1)*(i==15); o[i]-=c<<16; } } ## Proofs of correctness In order to guarantee the soundness of the implementation, we need to be sure that these operations are correct with respect to the field representation. The A, Z, M operations are defined as of type GF[16] -> GF[16] -> GF[16]. The car25519 operation is of type GF[16] -> GF[16]. We can assume the existance of two auxiliary functions: - ZofGF of type GF[16] -> ℤ