From 2778006655799a1ca62df58b3d400f0ea9316e8a Mon Sep 17 00:00:00 2001 From: Peter Schwabe <peter@cryptojedi.org> Date: Sun, 18 Aug 2019 18:07:33 +0200 Subject: [PATCH] Various small edits throughout the paper. --- conclusion.tex | 11 ++++- curve13318.tex | 5 +-- intro.tex | 13 ++++-- prelim.tex | 112 ++++++++++++++++++++----------------------------- refs.bib | 2 + results.tex | 54 +++--------------------- 6 files changed, 73 insertions(+), 124 deletions(-) diff --git a/conclusion.tex b/conclusion.tex index 9cede2f..8f30072 100644 --- a/conclusion.tex +++ b/conclusion.tex @@ -1,7 +1,14 @@ \section{Conclusion} \label{sec:conclusion} -% TODO(_) We still need some argument on the needed perfomance here. +\todo{Rework this.} +We see that the slowdown factor ranges from about $1.5\times$ +for the Haswell platform, to $2.9\times$ on Cortex M4. +In terms of cycle counts, +Curve13318 is considerably slower than Curve25519. +However, the small factor involved can still be considered +insignificant for complex protocols, +which are often not designed for performance in the first place. In this paper, we introduced Baretto's Weierstraß curve, which we call ``Curve13318''. @@ -36,4 +43,4 @@ Instead, eliminate the non-trivial cofactor; use Ristretto. % but not because of their Weierstrass form. Only because % the constants and field are worse than Curve25519s. % \end{Verbatim} -% \ No newline at end of file +% diff --git a/curve13318.tex b/curve13318.tex index 3660089..a6e8220 100644 --- a/curve13318.tex +++ b/curve13318.tex @@ -14,7 +14,7 @@ \usepackage{multicol} \usepackage{graphicx} \usepackage{dsfont} -\usepackage{amsmath} +\usepackage{amsmath,amsfonts} \usepackage{units} \usepackage{doi} \usepackage{float} @@ -53,8 +53,7 @@ % to display URLs in blue roman font according to Springer's eBook style: % \renewcommand\UrlFont{\color{blue}\rmfamily} -\newcommand{\Add}{\hyperref[alg:add]{\textsc{Add}}} -\newcommand{\Double}{\hyperref[alg:double]{\textsc{Double}}} +\input{macros} \begin{document} diff --git a/intro.tex b/intro.tex index 5542bd3..c326f02 100644 --- a/intro.tex +++ b/intro.tex @@ -39,7 +39,7 @@ for the construction of the Ed25519 digital-signature scheme. The simplicity and efficiency of X25519 key exchange and Ed25519 signatures resulted in quick adoption in a variety of applications, such as SSH, the Signal protocol, and the Tor anonymity project. -Both protocols are also used in TLS 1.3. +Both schemes are also used in TLS~1.3. \subheading{Complete addition or prime order.} Unfortunately, @@ -49,7 +49,7 @@ have to be weighed against a disadvantage: the group of points cannot have prime order as it always has a cofactor of a multiple of 4. Consequently, a somewhat simplified view on choosing curves for cryptographic applications -is that we have to choose between either efficient formulas +is that we have to choose between either efficient complete formulas through Montgomery or (twisted) Edwards curves, or prime-order groups through Weierstraß curves. @@ -100,7 +100,8 @@ formulas for Weierstraß curves~\cite{RCB16}. Unfortunately, these formulas are still considerably less efficient than the incomplete addition formulas that possibly require handling of special cases. -They are even less efficient than the addition +The performance gap is even larger compared to +the complete addition formulas for twisted Edwards curves, and the complete differential additional used in the scalar-multiplication ladder on Montgomery curves. @@ -119,7 +120,8 @@ curves. Almost surprisingly however, there are no such optimized implementations of elliptic-curve scalar multiplication using complete formulas on Weierstraß curves. In this paper we present such implementations and answer the question about the - actual cost of complete cofactor-1 ECC arithmetic. + actual cost of complete cofactor-1 ECC arithmetic using the formulas + from~\cite{RCB16}. More specifically, we present highly optimized software targeting three different microarchitectures for variable-basepoint scalar multiplication @@ -189,6 +191,9 @@ curves. \todo{Write this section} \todo{Also mention line of work on even faster ECC arithmetic.} + + \subheading{Notation.} + \todo{Explain how we denote cost for multiplications etc.} \subheading{Availability of software.} We place all software related to this paper into the public domain diff --git a/prelim.tex b/prelim.tex index 19831ba..ab8b0f7 100644 --- a/prelim.tex +++ b/prelim.tex @@ -2,8 +2,8 @@ \label{sec:prelim} \subsection{Weierstraß, Montgomery, and twisted Edwards curves} -The typical way to introduce elliptic curves over a field with large characteristic $\F$ is -through the \emph{short} Weierstraß equation +The typical way to introduce elliptic curves over a field $\F$ with large characteristic is +through the \emph{short Weierstraß equation} $$ E_W: y^2 = x^3 + ax + b, $$ @@ -16,24 +16,42 @@ in the cryptographic setup is the group of $\F_p$-rational points $E(\F_p)$. Whenever we talk about ``the order of an elliptic curve'' in this paper we mean the order of this group. The typical way to use Weierstraß curves in cryptography is to pick curve -parameter $a=-3$ for somewhat more efficient arithmetic, represent a point +parameter $a=-3$ for somewhat more efficient arithmetic and to represent a point $P=(x,y)$ in Jacobian coordinates $(X:Y:Z)$ with $(x,y) = (X/Z^2, Y/Z^3)$. -Point addition is using the formulas from~\cite[]{BL07} (improving on~\cite{}) +Point addition is using the formulas from~\cite{BL07} (improving on~\cite{}) and uses $11\mathbf{M}+5\mathbf{S}+9\mathbf{a}$. -Doubling also uses formulas from~\cite[]{BL07} (improving on~\cite{}) -and uses $3\mahtbf{M}+5\mathbf{S}+8\mathbf{a}$. +Doubling also uses formulas from~\cite{BL07} (improving on~\cite{}) +and uses $3\mathbf{M}+5\mathbf{S}+8\mathbf{a}$. Alternatively, one can use a ladder with differential additions, for example using the co-$Z$ approach from~\cite{} +\todo{finish this} Scalar multiplication using a ladder is also what Montgomery proposed in~\cite{XXX} for a different class of elliptic curves, so-called \emph{Montgomery curves}. These are described through an equation of the form $$ -E_M: XXX +E_M: XXX, $$ - -The most common way to implement arithmt -\todo{Write this section.} +again with $a, b \in \F$. The ``ladder step'' consisting of one differential +addition and one doubling costs XXX. The formulas were shown to be complete +by Bernstein in the Curve25519 paper~\cite{Ber06}. +One peculiarity of the formulas is that +they only involve the $x$-coordinate of a point. For Diffie-Hellman protocols +this has the advantage of free point compression and decompression, but for +signatures this involves extra effort to recover the $y$-coordinate. + +The most efficient complete formulas for full addition (and doubling) +are on \emph{twisted Edwards} curves~\cite{BBJ+08}, i.e., curves with equation +$$ +E_{tE}: x^2+y^2=1+d*x^2*y^2. +$$ +For the special case of $a = -1$, the formulas from~\cite{XXX} +need only XXX for addition and XXX for doubling. If $-1$ is a square in $\F_p$ +then the formulas are complete. +Every twisted Edwards curves is birationally equivalent to an Edwards +curves~\cite[Thm.~3.2]{BBJ+08} and in the case of Curve25519 both shapes are +used in protocols: the Montgomery shape and corresponding ladder for +X25519 key exchange and the twisted Edwards shape for Ed25519 signatures. \subsection{Curve13318} The goal of this paper is to investigate the performance of complete addition and doubling @@ -44,67 +62,27 @@ choose a curve that is as similar to Curve25519 as possible, except that it is i form and has prime order. This means that in particular, we want a curve that \begin{itemize} \item is defined over the field $\F_p$ with $p = 2^{255}-19$; - \item is twist secure (for a definition, see~\cite[Sec XXX]{XXX}); - \item has curve parameter $a = -3$ to support common speedups of the group law; - \item has small curve parameter $b$; + \item is twist secure (for a definition, see~\cite{Ber06} or~\cite{safecurves}); + \item has parameter $a = -3$ to support common speedups of the group law; and + \item has small parameter $b$. \end{itemize} It turns out that we were not the first to have the idea to look for a curve with -precisely these properties. In - -Many factors may contribute to the general performance of scalar multiplication on an elliptic curve. -In our research we have tried to produce a benchmark that is unaffected by irrelevant curve properties. For example, in the case of some of the traditional curves---like those from \cite{FIPS186-4,Brainpool,SEC2}---a performance penalty is expected from the use of large values for $a$ and $b$. -\todo{check large values of $a$} - -Therefore, we choose a (new) prime-order curve that---except for its struc\-ture---is similar to Bernstein's Curve25519. -Aside from being able to make a good comparison, we can furthermore build on some of the optimizations that have been developed specifically -for the field used in Curve25519 \cite{BS12,Cho16,DHH+15,FL15}. -Our second priority is to choose properties that are ``straightforward'', -i.e.\ properties that are often found in other standardized curves. -In general, we try to find an answer to the question: -what could ``Curve25519'' have looked like, had it been a prime-order curve? - -The curve that we chose was proposed by Barreto in May 2017~\cite{BarretoCurve}. -The curve is defined over $\mathds{F}_{2^{255}-19}$. -It is described by equation~\ref{eq:curve13318} and a suitable generator is $G = (-7, 114)$. -\begin{equation}\label{eq:curve13318} - E : y^2 = x^3 -3x + 13318 -\end{equation} - -\noindent -Its order is given by -\begin{equation} - N = \ell = 2^{255} + 325610659388873400306201440571661405155\texttt{.} -\end{equation} -In the following we will be calling this curve Curve13318, -which at the same time points to the curve parameter $b$ and +precisely these properties. +In May 2017, Barreto proposed a curve on Twitter with equation +$$ +E : y^2 = x^3 -3x + 13318, +$$ +defined over $\mathds{F}_{2^{255}-19}$~\cite{BarretoCurve}. +In a follow-up tweet Barreto clarified that the +selection criteria for this curve were +``all old SafeCurves properties (with recent improvements) plus prime order''. +Barreto did not name this curve; we will in the following +refer to it as Curve13318. +This name at the same time points to the curve parameter $b$ and its intended similarities to Curve25519. - -% XXX(dsprenkels) Cannot choose a=0. Craig mentioned this to me once. It had -% something to do with the fact that there would exist no -% a=0 curve over GF(25519). I don't think we need to mention -% this. -\subheading{Choice of $a$.} -Alternatively to choosing $a=-3$, we could choose $a=1$. -The first reason we chose $a=-3$ instead is because---depending on the addition formulas that are used---it results in more efficient curve arithmetic~\cite{BJ03}. -Indeed, the Renes-Costello-Batina complete addition formulas have a specialized case for $a=-3$~\cite[Section 3.2]{RCB16}. -The second reason is that various cryptographic standards have adopted these kinds of curves~\cite{ETSI07,BSI12,FIPS186-4,Brainpool,SEC2}. -Our results will apply to more commonly used curves if we mimic the standards. - -\subheading{Twist security.} -In the case an implementor uses formulas that do not depend on any of the constants $a$ and $b$, -they could choose to omit checking whether the input point lies on the curve. -To prevent invalid curve attacks in this case, $E$'s twist ($E^d$) must also be of prime order. -Then, the first valid value for $b$ is $13318$. -\todo{Comment on $x$-coordinate only.} - -\subheading{Point validation.}\label{sec:pointvalidation} -All scalar-multiplication algorithms on Curve13318---or any short -Weierstraß curve for that matter---should implement appropriate -point-validation routines. That is, they should check whether the input point -lies on the curve, and that it is not the neutral element. -Together, these checks prevent any invalid-curve and small-subgroup attacks. -\todo{Update according to previous paragraph's update} +The order of the group of $\F_p$-rational points on Curve13318 is +$N = \ell = 2^{255} + 325610659388873400306201440571661405155$. \subsection{The Renes-Costello-Batina formulas} \label{sec:additionformulas} diff --git a/refs.bib b/refs.bib index 9e3d072..b08eac4 100644 --- a/refs.bib +++ b/refs.bib @@ -1,6 +1,8 @@ @String { LNCS = LNCS } @String { SV = {Springer} } +@inproceedings{ + @book{Aum, author = {Jean-Philippe Aumasson}, title = {Serious Cryptography}, diff --git a/results.tex b/results.tex index d79a2cd..4e7633a 100644 --- a/results.tex +++ b/results.tex @@ -1,31 +1,9 @@ \section{Performance results} \label{sec:results} -% \begin{Verbatim} -% - First present the results -% - Show the slowdown factor for each -% - We see that platform matters. Due to Curve25519 having no 4\times batched operations. -% - In the end. Ballpark: factor 2? -% - Now it is up to the user, whether they want to use this or not. They must -% ask themselves: How much is the cofactor stuff worth? That is, if -% Ristretto hadn't been there; for god's sake, just use Ristretto. -% - ``The cost of completeness'' -% - Mention that RCB reported 1.5x (or something) -% - Then, look at the other implementation from [BCLN16] which is -% defined over 2^256-189 w/ b=152961. They mention that completeness -% costs a factor of 2. Their w-256-mers implementation uses 278kcc. on -% Sandy Bridge -% As such they predicted our benchmark would be ~550kcc (which is a bit -% pessimistic). More like 40% (which is arguably fine). -% - Should we also do a comparison to traditional curves? I.e. NIST etc.? -% -% - Note: I really want to criticize the use of radix 2^21.25. I don't think -% anybody should use that ever again. -% \end{Verbatim} - The complete scalar multiplication algorithm was tested and benchmarked on Sandy Bridge\footnote{Model: Intel Core i7-2600}, Ivy Bridge\footnote{Model: Intel Core i5-3210}, Haswell\footnote{Model: Intel Core i7-4770}, and Cortex M4\footnote{Device: STM32F407} CPU's. On the Intel processors, all measurements were done with Turbo Boost disabled, all Hyper-Threading cores shut down, and with the CPU clocked at the maximum frequency. -We listed the benchmarking results in Table~\ref{tab:benchmarks}. As expected, +We list the benchmarking results in Table~\ref{tab:benchmarks}. As expected, none of our implementations exceed the performance of Curve25519. \ctable[ @@ -57,40 +35,20 @@ none of our implementations exceed the performance of Curve25519. } It can immediately be seen that the slowdown factor is dependent on the platform. -In particular, the Haswell implementation performs better than the others. +In particular, the Haswell implementation of scalar multiplication on Curve13318 +performs, also relatively speaking, much better than the others. The source of this is seems to be that Algorithms~\ref{alg:add} and~\ref{alg:double} lend themselves for very efficient 4-way parallelization, which is not supported by Curve25519's ladder algorithm. Through AVX2, 4-way parallelization is very powerful on Haswell, -whereas on the other platforms it is not. +whereas on the other platforms it is not, at least not to the same extent. This makes it possible to write a Haswell implementation that is significantly faster than the others. -\begin{Verbatim} -sFIXME(cryptojedi) Unsure about this paragraph. It touches the debate -about performance that is needed. I feel that we have brought that much -authority to say something about this here. - -We could take the argument that Hanno makes on the IETF mailing list about -FourQ, or the fact that the draft expired, but all that feels a little thin. - -Moreover, there is possibly enough context that we need more space than just -the conclusion. And also this section feels too brief without this paragraph. -Am I maybe missing something else? - --- Daan -\end{Verbatim} -We see that the slowdown factor ranges from about $1.5\times$ -for the Haswell platform, to $2.9\times$ on Cortex M4. -In terms of cycle counts, -Curve13318 is considerably slower than Curve25519. -However, the small factor involved can still be considered -insignificant for complex protocols, -which are often not designed for performance in the first place. - \subheading{The cost of completeness.} +\todo{go through this text again.} Aside from comparing the Renes-Costello-Batina formulas to Curve25519's formulas, -we can also look briefly at other prime-order curve implementations, +we can also look briefly at other prime-order-curve implementations, in order to see how the formulas fare against others. In their paper, Renes, Costello, and Batina reported the overhead to be $1.38\times$. -- GitLab