From 2778006655799a1ca62df58b3d400f0ea9316e8a Mon Sep 17 00:00:00 2001
From: Peter Schwabe <peter@cryptojedi.org>
Date: Sun, 18 Aug 2019 18:07:33 +0200
Subject: [PATCH] Various small edits throughout the paper.

---
 conclusion.tex |  11 ++++-
 curve13318.tex |   5 +--
 intro.tex      |  13 ++++--
 prelim.tex     | 112 ++++++++++++++++++++-----------------------------
 refs.bib       |   2 +
 results.tex    |  54 +++---------------------
 6 files changed, 73 insertions(+), 124 deletions(-)

diff --git a/conclusion.tex b/conclusion.tex
index 9cede2f..8f30072 100644
--- a/conclusion.tex
+++ b/conclusion.tex
@@ -1,7 +1,14 @@
 \section{Conclusion}
 \label{sec:conclusion}
 
-% TODO(_) We still need some argument on the needed perfomance here.
+\todo{Rework this.}
+We see that the slowdown factor ranges from about $1.5\times$
+for the Haswell platform, to $2.9\times$ on Cortex M4.
+In terms of cycle counts,
+Curve13318 is considerably slower than Curve25519.
+However, the small factor involved can still be considered
+insignificant for complex protocols,
+which are often not designed for performance in the first place.
 
 In this paper, we introduced Baretto's Weierstraß curve,
 which we call ``Curve13318''.
@@ -36,4 +43,4 @@ Instead, eliminate the non-trivial cofactor; use Ristretto.
 %       but not because of their Weierstrass form. Only because
 %       the constants and field are worse than Curve25519s.
 % \end{Verbatim}
-%
\ No newline at end of file
+%
diff --git a/curve13318.tex b/curve13318.tex
index 3660089..a6e8220 100644
--- a/curve13318.tex
+++ b/curve13318.tex
@@ -14,7 +14,7 @@
 \usepackage{multicol}
 \usepackage{graphicx}
 \usepackage{dsfont}
-\usepackage{amsmath}
+\usepackage{amsmath,amsfonts}
 \usepackage{units}
 \usepackage{doi}
 \usepackage{float}
@@ -53,8 +53,7 @@
 % to display URLs in blue roman font according to Springer's eBook style:
 % \renewcommand\UrlFont{\color{blue}\rmfamily}
 
-\newcommand{\Add}{\hyperref[alg:add]{\textsc{Add}}}
-\newcommand{\Double}{\hyperref[alg:double]{\textsc{Double}}}
+\input{macros}
 
 \begin{document}
 
diff --git a/intro.tex b/intro.tex
index 5542bd3..c326f02 100644
--- a/intro.tex
+++ b/intro.tex
@@ -39,7 +39,7 @@ for the construction of the Ed25519 digital-signature scheme.
 The simplicity and efficiency of X25519 key exchange and Ed25519 signatures
 resulted in quick adoption in a variety of applications, such as 
 SSH, the Signal protocol, and the Tor anonymity project.
-Both protocols are also used in TLS 1.3.
+Both schemes are also used in TLS~1.3.
 
 \subheading{Complete addition or prime order.}
 Unfortunately, 
@@ -49,7 +49,7 @@ have to be weighed against a disadvantage:
 the group of points cannot have prime order 
 as it always has a cofactor of a multiple of 4.
 Consequently, a somewhat simplified view on choosing curves for cryptographic applications
-is that we have to choose between either efficient formulas
+is that we have to choose between either efficient complete formulas
 through Montgomery or (twisted) Edwards curves,
 or prime-order groups through Weierstraß curves.
 
@@ -100,7 +100,8 @@ formulas for Weierstraß curves~\cite{RCB16}.
 Unfortunately, these formulas are still considerably less efficient
 than the incomplete addition formulas that possibly require handling of 
 special cases.
-They are even less efficient than the addition
+The performance gap is even larger compared to
+the complete addition
 formulas for twisted Edwards curves, and the complete differential
 additional used in the scalar-multiplication ladder on Montgomery
 curves.
@@ -119,7 +120,8 @@ curves.
   Almost surprisingly however, there are no such optimized implementations of
   elliptic-curve scalar multiplication using complete formulas on Weierstraß curves.
   In this paper we present such implementations and answer the question about the
-  actual cost of complete cofactor-1 ECC arithmetic.
+  actual cost of complete cofactor-1 ECC arithmetic using the formulas
+  from~\cite{RCB16}.
 
   More specifically, we present highly optimized software targeting three
   different microarchitectures for variable-basepoint scalar multiplication
@@ -189,6 +191,9 @@ curves.
   \todo{Write this section}
 
   \todo{Also mention line of work on even faster ECC arithmetic.}
+
+  \subheading{Notation.}
+  \todo{Explain how we denote cost for multiplications etc.}
   
   \subheading{Availability of software.}
   We place all software related to this paper into the public domain
diff --git a/prelim.tex b/prelim.tex
index 19831ba..ab8b0f7 100644
--- a/prelim.tex
+++ b/prelim.tex
@@ -2,8 +2,8 @@
 \label{sec:prelim}
 
 \subsection{Weierstraß, Montgomery, and twisted Edwards curves}
-The typical way to introduce elliptic curves over a field with large characteristic $\F$ is
-through the \emph{short} Weierstraß equation
+The typical way to introduce elliptic curves over a field $\F$ with large characteristic is
+through the \emph{short Weierstraß equation}
 $$
 E_W: y^2 = x^3 + ax + b,
 $$
@@ -16,24 +16,42 @@ in the cryptographic setup is the group of $\F_p$-rational points $E(\F_p)$.
 Whenever we talk about ``the order of an elliptic curve'' in this paper we 
 mean the order of this group.
 The typical way to use Weierstraß curves in cryptography is to pick curve
-parameter $a=-3$ for somewhat more efficient arithmetic, represent a point
+parameter $a=-3$ for somewhat more efficient arithmetic and to represent a point
 $P=(x,y)$ in Jacobian coordinates $(X:Y:Z)$ with $(x,y) = (X/Z^2, Y/Z^3)$.
-Point addition is using the formulas from~\cite[]{BL07} (improving on~\cite{}) 
+Point addition is using the formulas from~\cite{BL07} (improving on~\cite{}) 
 and uses $11\mathbf{M}+5\mathbf{S}+9\mathbf{a}$.
-Doubling also uses formulas from~\cite[]{BL07} (improving on~\cite{}) 
-and uses $3\mahtbf{M}+5\mathbf{S}+8\mathbf{a}$.
+Doubling also uses formulas from~\cite{BL07} (improving on~\cite{}) 
+and uses $3\mathbf{M}+5\mathbf{S}+8\mathbf{a}$.
 Alternatively, one can use a ladder with differential additions, for
 example using the co-$Z$ approach from~\cite{}
+\todo{finish this}
 
 Scalar multiplication using a ladder is also what Montgomery proposed in~\cite{XXX}
 for a different class of elliptic curves, so-called \emph{Montgomery curves}.
 These are described through an equation of the form
 $$
-E_M: XXX
+E_M: XXX,
 $$
-
-The most common way to implement arithmt
-\todo{Write this section.}
+again with $a, b \in \F$. The ``ladder step'' consisting of one differential
+addition and one doubling costs XXX. The formulas were shown to be complete
+by Bernstein in the Curve25519 paper~\cite{Ber06}. 
+One peculiarity of the formulas is that
+they only involve the $x$-coordinate of a point. For Diffie-Hellman protocols
+this has the advantage of free point compression and decompression, but for
+signatures this involves extra effort to recover the $y$-coordinate.
+
+The most efficient complete formulas for full addition (and doubling)
+are on \emph{twisted Edwards} curves~\cite{BBJ+08}, i.e., curves with equation
+$$
+E_{tE}: x^2+y^2=1+d*x^2*y^2.
+$$
+For the special case of $a = -1$, the formulas from~\cite{XXX}
+need only XXX for addition and XXX for doubling. If $-1$ is a square in $\F_p$
+then the formulas are complete. 
+Every twisted Edwards curves is birationally equivalent to an Edwards
+curves~\cite[Thm.~3.2]{BBJ+08} and in the case of Curve25519 both shapes are
+used in protocols: the Montgomery shape and corresponding ladder for
+X25519 key exchange and the twisted Edwards shape for Ed25519 signatures.
 
 \subsection{Curve13318}
 The goal of this paper is to investigate the performance of complete addition and doubling
@@ -44,67 +62,27 @@ choose a curve that is as similar to Curve25519 as possible, except that it is i
 form and has prime order. This means that in particular, we want a curve that
 \begin{itemize}
   \item is defined over the field $\F_p$ with $p = 2^{255}-19$;
-  \item is twist secure (for a definition, see~\cite[Sec XXX]{XXX});
-  \item has curve parameter $a = -3$ to support common speedups of the group law;
-  \item has small curve parameter $b$;
+  \item is twist secure (for a definition, see~\cite{Ber06} or~\cite{safecurves});
+  \item has parameter $a = -3$ to support common speedups of the group law; and
+  \item has small parameter $b$.
 \end{itemize}
 
 It turns out that we were not the first to have the idea to look for a curve with
-precisely these properties. In 
-
-Many factors may contribute to the general performance of scalar multiplication on an elliptic curve. 
-In our research we have tried to produce a benchmark that is unaffected by irrelevant curve properties. For example, in the case of some of the traditional curves---like those from \cite{FIPS186-4,Brainpool,SEC2}---a performance penalty is expected from the use of large values for $a$ and $b$.
-\todo{check large values of $a$}
-
-Therefore, we choose a (new) prime-order curve that---except for its struc\-ture---is similar to Bernstein's Curve25519. 
-Aside from being able to make a good comparison, we can furthermore build on some of the optimizations that have been developed specifically
-for the field used in Curve25519 \cite{BS12,Cho16,DHH+15,FL15}.
-Our second priority is to choose properties that are ``straightforward'', 
-i.e.\ properties that are often found in other standardized curves. 
-In general, we try to find an answer to the question: 
-what could ``Curve25519'' have looked like, had it been a prime-order curve?
-
-The curve that we chose was proposed by Barreto in May 2017~\cite{BarretoCurve}. 
-The curve is defined over $\mathds{F}_{2^{255}-19}$. 
-It is described by equation~\ref{eq:curve13318} and a suitable generator is $G = (-7, 114)$.
-\begin{equation}\label{eq:curve13318}
-    E : y^2 = x^3 -3x + 13318
-\end{equation}
-
-\noindent
-Its order is given by
-\begin{equation}
-    N = \ell = 2^{255} + 325610659388873400306201440571661405155\texttt{.}
-\end{equation}
-In the following we will be calling this curve Curve13318,
-which at the same time points to the curve parameter $b$ and
+precisely these properties. 
+In May 2017, Barreto proposed a curve on Twitter with equation 
+$$
+E : y^2 = x^3 -3x + 13318,
+$$
+defined over $\mathds{F}_{2^{255}-19}$~\cite{BarretoCurve}. 
+In a follow-up tweet Barreto clarified that the
+selection criteria for this curve were
+``all old SafeCurves properties (with recent improvements) plus prime order''.
+Barreto did not name this curve; we will in the following
+refer to it as Curve13318. 
+This name at the same time points to the curve parameter $b$ and
 its intended similarities to Curve25519.
-
-% XXX(dsprenkels) Cannot choose a=0. Craig mentioned this to me once. It had
-%                 something to do with the fact that there would exist no
-%                 a=0 curve over GF(25519). I don't think we need to mention
-%                 this.
-\subheading{Choice of $a$.}
-Alternatively to choosing $a=-3$, we could choose $a=1$. 
-The first reason we chose $a=-3$ instead is because---depending on the addition formulas that are used---it results in more efficient curve arithmetic~\cite{BJ03}.
-Indeed, the Renes-Costello-Batina complete addition formulas have a specialized case for $a=-3$~\cite[Section 3.2]{RCB16}.
-The second reason is that various cryptographic standards have adopted these kinds of curves~\cite{ETSI07,BSI12,FIPS186-4,Brainpool,SEC2}. 
-Our results will apply to more commonly used curves if we mimic the standards.
-
-\subheading{Twist security.}
-In the case an implementor uses formulas that do not depend on any of the constants $a$ and $b$, 
-they could choose to omit checking whether the input point lies on the curve. 
-To prevent invalid curve attacks in this case, $E$'s twist ($E^d$) must also be of prime order. 
-Then, the first valid value for $b$ is $13318$.
-\todo{Comment on $x$-coordinate only.}
-
-\subheading{Point validation.}\label{sec:pointvalidation}
-All scalar-multiplication algorithms on Curve13318---or any short
-Weierstraß curve for that matter---should implement appropriate
-point-validation routines. That is, they should check whether the input point
-lies on the curve, and that it is not the neutral element.
-Together, these checks prevent any invalid-curve and small-subgroup attacks.
-\todo{Update according to previous paragraph's update}
+The order of the group of $\F_p$-rational points on Curve13318 is
+$N = \ell = 2^{255} + 325610659388873400306201440571661405155$.
 
 \subsection{The Renes-Costello-Batina formulas}
 \label{sec:additionformulas}
diff --git a/refs.bib b/refs.bib
index 9e3d072..b08eac4 100644
--- a/refs.bib
+++ b/refs.bib
@@ -1,6 +1,8 @@
 @String { LNCS  = LNCS }
 @String { SV    = {Springer} }
 
+@inproceedings{
+
 @book{Aum,
   author        = {Jean-Philippe Aumasson},
   title         = {Serious Cryptography},
diff --git a/results.tex b/results.tex
index d79a2cd..4e7633a 100644
--- a/results.tex
+++ b/results.tex
@@ -1,31 +1,9 @@
 \section{Performance results}
 \label{sec:results}
 
-% \begin{Verbatim}
-%     - First present the results
-%     - Show the slowdown factor for each
-%     - We see that platform matters. Due to Curve25519 having no 4\times batched operations.
-%     - In the end. Ballpark: factor 2?
-%     - Now it is up to the user, whether they want to use this or not. They must
-%       ask themselves: How much is the cofactor stuff worth? That is, if
-%       Ristretto hadn't been there; for god's sake, just use Ristretto.
-%     - ``The cost of completeness''
-%         - Mention that RCB reported 1.5x (or something)
-%         - Then, look at the other implementation from [BCLN16] which is
-%           defined over 2^256-189 w/ b=152961. They mention that completeness
-%           costs a factor of 2. Their w-256-mers implementation uses 278kcc. on
-%           Sandy Bridge
-%           As such they predicted our benchmark would be ~550kcc (which is a bit
-%           pessimistic). More like 40% (which is arguably fine).
-%     - Should we also do a comparison to traditional curves? I.e. NIST etc.?
-%
-%     - Note: I really want to criticize the use of radix 2^21.25. I don't think
-%       anybody should use that ever again.
-% \end{Verbatim}
-
 The complete scalar multiplication algorithm was tested and benchmarked on Sandy Bridge\footnote{Model: Intel Core i7-2600}, Ivy Bridge\footnote{Model: Intel Core i5-3210}, Haswell\footnote{Model: Intel Core i7-4770}, and Cortex M4\footnote{Device: STM32F407} CPU's. On the Intel processors, all measurements were done with Turbo Boost disabled, all Hyper-Threading cores shut down, and with the CPU clocked at the maximum frequency.
 
-We listed the benchmarking results in Table~\ref{tab:benchmarks}. As expected,
+We list the benchmarking results in Table~\ref{tab:benchmarks}. As expected,
 none of our implementations exceed the performance of Curve25519.
 
 \ctable[
@@ -57,40 +35,20 @@ none of our implementations exceed the performance of Curve25519.
 }
 
 It can immediately be seen that the slowdown factor is dependent on the platform.
-In particular, the Haswell implementation performs better than the others.
+In particular, the Haswell implementation of scalar multiplication on Curve13318 
+performs, also relatively speaking, much better than the others.
 The source of this is seems to be that Algorithms~\ref{alg:add} and~\ref{alg:double}
 lend themselves for very efficient 4-way parallelization,
 which is not supported by Curve25519's ladder algorithm.
 Through AVX2, 4-way parallelization is very powerful on Haswell,
-whereas on the other platforms it is not.
+whereas on the other platforms it is not, at least not to the same extent.
 This makes it possible to write
 a Haswell implementation that is significantly faster than the others.
 
-\begin{Verbatim}
-sFIXME(cryptojedi) Unsure about this paragraph. It touches the debate
-about performance that is needed. I feel that we have brought that much
-authority to say something about this here.
-
-We could take the argument that Hanno makes on the IETF mailing list about
-FourQ, or the fact that the draft expired, but all that feels a little thin.
-
-Moreover, there is possibly enough context that we need more space than just
-the conclusion. And also this section feels too brief without this paragraph.
-Am I maybe missing something else?
-
--- Daan
-\end{Verbatim}
-We see that the slowdown factor ranges from about $1.5\times$
-for the Haswell platform, to $2.9\times$ on Cortex M4.
-In terms of cycle counts,
-Curve13318 is considerably slower than Curve25519.
-However, the small factor involved can still be considered
-insignificant for complex protocols,
-which are often not designed for performance in the first place.
-
 \subheading{The cost of completeness.}
+\todo{go through this text again.}
 Aside from comparing the Renes-Costello-Batina formulas to Curve25519's formulas,
-we can also look briefly at other prime-order curve implementations,
+we can also look briefly at other prime-order-curve implementations,
 in order to see how the formulas fare against others.
 
 In their paper, Renes, Costello, and Batina reported the overhead to be $1.38\times$.
-- 
GitLab