Skip to content
Snippets Groups Projects
Commit 27780066 authored by Peter Schwabe's avatar Peter Schwabe
Browse files

Various small edits throughout the paper.

parent 049f4678
No related branches found
No related tags found
No related merge requests found
\section{Conclusion} \section{Conclusion}
\label{sec:conclusion} \label{sec:conclusion}
% TODO(_) We still need some argument on the needed perfomance here. \todo{Rework this.}
We see that the slowdown factor ranges from about $1.5\times$
for the Haswell platform, to $2.9\times$ on Cortex M4.
In terms of cycle counts,
Curve13318 is considerably slower than Curve25519.
However, the small factor involved can still be considered
insignificant for complex protocols,
which are often not designed for performance in the first place.
In this paper, we introduced Baretto's Weierstraß curve, In this paper, we introduced Baretto's Weierstraß curve,
which we call ``Curve13318''. which we call ``Curve13318''.
...@@ -36,4 +43,4 @@ Instead, eliminate the non-trivial cofactor; use Ristretto. ...@@ -36,4 +43,4 @@ Instead, eliminate the non-trivial cofactor; use Ristretto.
% but not because of their Weierstrass form. Only because % but not because of their Weierstrass form. Only because
% the constants and field are worse than Curve25519s. % the constants and field are worse than Curve25519s.
% \end{Verbatim} % \end{Verbatim}
% %
\ No newline at end of file
...@@ -14,7 +14,7 @@ ...@@ -14,7 +14,7 @@
\usepackage{multicol} \usepackage{multicol}
\usepackage{graphicx} \usepackage{graphicx}
\usepackage{dsfont} \usepackage{dsfont}
\usepackage{amsmath} \usepackage{amsmath,amsfonts}
\usepackage{units} \usepackage{units}
\usepackage{doi} \usepackage{doi}
\usepackage{float} \usepackage{float}
...@@ -53,8 +53,7 @@ ...@@ -53,8 +53,7 @@
% to display URLs in blue roman font according to Springer's eBook style: % to display URLs in blue roman font according to Springer's eBook style:
% \renewcommand\UrlFont{\color{blue}\rmfamily} % \renewcommand\UrlFont{\color{blue}\rmfamily}
\newcommand{\Add}{\hyperref[alg:add]{\textsc{Add}}} \input{macros}
\newcommand{\Double}{\hyperref[alg:double]{\textsc{Double}}}
\begin{document} \begin{document}
......
...@@ -39,7 +39,7 @@ for the construction of the Ed25519 digital-signature scheme. ...@@ -39,7 +39,7 @@ for the construction of the Ed25519 digital-signature scheme.
The simplicity and efficiency of X25519 key exchange and Ed25519 signatures The simplicity and efficiency of X25519 key exchange and Ed25519 signatures
resulted in quick adoption in a variety of applications, such as resulted in quick adoption in a variety of applications, such as
SSH, the Signal protocol, and the Tor anonymity project. SSH, the Signal protocol, and the Tor anonymity project.
Both protocols are also used in TLS 1.3. Both schemes are also used in TLS~1.3.
\subheading{Complete addition or prime order.} \subheading{Complete addition or prime order.}
Unfortunately, Unfortunately,
...@@ -49,7 +49,7 @@ have to be weighed against a disadvantage: ...@@ -49,7 +49,7 @@ have to be weighed against a disadvantage:
the group of points cannot have prime order the group of points cannot have prime order
as it always has a cofactor of a multiple of 4. as it always has a cofactor of a multiple of 4.
Consequently, a somewhat simplified view on choosing curves for cryptographic applications Consequently, a somewhat simplified view on choosing curves for cryptographic applications
is that we have to choose between either efficient formulas is that we have to choose between either efficient complete formulas
through Montgomery or (twisted) Edwards curves, through Montgomery or (twisted) Edwards curves,
or prime-order groups through Weierstraß curves. or prime-order groups through Weierstraß curves.
...@@ -100,7 +100,8 @@ formulas for Weierstraß curves~\cite{RCB16}. ...@@ -100,7 +100,8 @@ formulas for Weierstraß curves~\cite{RCB16}.
Unfortunately, these formulas are still considerably less efficient Unfortunately, these formulas are still considerably less efficient
than the incomplete addition formulas that possibly require handling of than the incomplete addition formulas that possibly require handling of
special cases. special cases.
They are even less efficient than the addition The performance gap is even larger compared to
the complete addition
formulas for twisted Edwards curves, and the complete differential formulas for twisted Edwards curves, and the complete differential
additional used in the scalar-multiplication ladder on Montgomery additional used in the scalar-multiplication ladder on Montgomery
curves. curves.
...@@ -119,7 +120,8 @@ curves. ...@@ -119,7 +120,8 @@ curves.
Almost surprisingly however, there are no such optimized implementations of Almost surprisingly however, there are no such optimized implementations of
elliptic-curve scalar multiplication using complete formulas on Weierstraß curves. elliptic-curve scalar multiplication using complete formulas on Weierstraß curves.
In this paper we present such implementations and answer the question about the In this paper we present such implementations and answer the question about the
actual cost of complete cofactor-1 ECC arithmetic. actual cost of complete cofactor-1 ECC arithmetic using the formulas
from~\cite{RCB16}.
More specifically, we present highly optimized software targeting three More specifically, we present highly optimized software targeting three
different microarchitectures for variable-basepoint scalar multiplication different microarchitectures for variable-basepoint scalar multiplication
...@@ -189,6 +191,9 @@ curves. ...@@ -189,6 +191,9 @@ curves.
\todo{Write this section} \todo{Write this section}
\todo{Also mention line of work on even faster ECC arithmetic.} \todo{Also mention line of work on even faster ECC arithmetic.}
\subheading{Notation.}
\todo{Explain how we denote cost for multiplications etc.}
\subheading{Availability of software.} \subheading{Availability of software.}
We place all software related to this paper into the public domain We place all software related to this paper into the public domain
......
...@@ -2,8 +2,8 @@ ...@@ -2,8 +2,8 @@
\label{sec:prelim} \label{sec:prelim}
\subsection{Weierstraß, Montgomery, and twisted Edwards curves} \subsection{Weierstraß, Montgomery, and twisted Edwards curves}
The typical way to introduce elliptic curves over a field with large characteristic $\F$ is The typical way to introduce elliptic curves over a field $\F$ with large characteristic is
through the \emph{short} Weierstraß equation through the \emph{short Weierstraß equation}
$$ $$
E_W: y^2 = x^3 + ax + b, E_W: y^2 = x^3 + ax + b,
$$ $$
...@@ -16,24 +16,42 @@ in the cryptographic setup is the group of $\F_p$-rational points $E(\F_p)$. ...@@ -16,24 +16,42 @@ in the cryptographic setup is the group of $\F_p$-rational points $E(\F_p)$.
Whenever we talk about ``the order of an elliptic curve'' in this paper we Whenever we talk about ``the order of an elliptic curve'' in this paper we
mean the order of this group. mean the order of this group.
The typical way to use Weierstraß curves in cryptography is to pick curve The typical way to use Weierstraß curves in cryptography is to pick curve
parameter $a=-3$ for somewhat more efficient arithmetic, represent a point parameter $a=-3$ for somewhat more efficient arithmetic and to represent a point
$P=(x,y)$ in Jacobian coordinates $(X:Y:Z)$ with $(x,y) = (X/Z^2, Y/Z^3)$. $P=(x,y)$ in Jacobian coordinates $(X:Y:Z)$ with $(x,y) = (X/Z^2, Y/Z^3)$.
Point addition is using the formulas from~\cite[]{BL07} (improving on~\cite{}) Point addition is using the formulas from~\cite{BL07} (improving on~\cite{})
and uses $11\mathbf{M}+5\mathbf{S}+9\mathbf{a}$. and uses $11\mathbf{M}+5\mathbf{S}+9\mathbf{a}$.
Doubling also uses formulas from~\cite[]{BL07} (improving on~\cite{}) Doubling also uses formulas from~\cite{BL07} (improving on~\cite{})
and uses $3\mahtbf{M}+5\mathbf{S}+8\mathbf{a}$. and uses $3\mathbf{M}+5\mathbf{S}+8\mathbf{a}$.
Alternatively, one can use a ladder with differential additions, for Alternatively, one can use a ladder with differential additions, for
example using the co-$Z$ approach from~\cite{} example using the co-$Z$ approach from~\cite{}
\todo{finish this}
Scalar multiplication using a ladder is also what Montgomery proposed in~\cite{XXX} Scalar multiplication using a ladder is also what Montgomery proposed in~\cite{XXX}
for a different class of elliptic curves, so-called \emph{Montgomery curves}. for a different class of elliptic curves, so-called \emph{Montgomery curves}.
These are described through an equation of the form These are described through an equation of the form
$$ $$
E_M: XXX E_M: XXX,
$$ $$
again with $a, b \in \F$. The ``ladder step'' consisting of one differential
The most common way to implement arithmt addition and one doubling costs XXX. The formulas were shown to be complete
\todo{Write this section.} by Bernstein in the Curve25519 paper~\cite{Ber06}.
One peculiarity of the formulas is that
they only involve the $x$-coordinate of a point. For Diffie-Hellman protocols
this has the advantage of free point compression and decompression, but for
signatures this involves extra effort to recover the $y$-coordinate.
The most efficient complete formulas for full addition (and doubling)
are on \emph{twisted Edwards} curves~\cite{BBJ+08}, i.e., curves with equation
$$
E_{tE}: x^2+y^2=1+d*x^2*y^2.
$$
For the special case of $a = -1$, the formulas from~\cite{XXX}
need only XXX for addition and XXX for doubling. If $-1$ is a square in $\F_p$
then the formulas are complete.
Every twisted Edwards curves is birationally equivalent to an Edwards
curves~\cite[Thm.~3.2]{BBJ+08} and in the case of Curve25519 both shapes are
used in protocols: the Montgomery shape and corresponding ladder for
X25519 key exchange and the twisted Edwards shape for Ed25519 signatures.
\subsection{Curve13318} \subsection{Curve13318}
The goal of this paper is to investigate the performance of complete addition and doubling The goal of this paper is to investigate the performance of complete addition and doubling
...@@ -44,67 +62,27 @@ choose a curve that is as similar to Curve25519 as possible, except that it is i ...@@ -44,67 +62,27 @@ choose a curve that is as similar to Curve25519 as possible, except that it is i
form and has prime order. This means that in particular, we want a curve that form and has prime order. This means that in particular, we want a curve that
\begin{itemize} \begin{itemize}
\item is defined over the field $\F_p$ with $p = 2^{255}-19$; \item is defined over the field $\F_p$ with $p = 2^{255}-19$;
\item is twist secure (for a definition, see~\cite[Sec XXX]{XXX}); \item is twist secure (for a definition, see~\cite{Ber06} or~\cite{safecurves});
\item has curve parameter $a = -3$ to support common speedups of the group law; \item has parameter $a = -3$ to support common speedups of the group law; and
\item has small curve parameter $b$; \item has small parameter $b$.
\end{itemize} \end{itemize}
It turns out that we were not the first to have the idea to look for a curve with It turns out that we were not the first to have the idea to look for a curve with
precisely these properties. In precisely these properties.
In May 2017, Barreto proposed a curve on Twitter with equation
Many factors may contribute to the general performance of scalar multiplication on an elliptic curve. $$
In our research we have tried to produce a benchmark that is unaffected by irrelevant curve properties. For example, in the case of some of the traditional curves---like those from \cite{FIPS186-4,Brainpool,SEC2}---a performance penalty is expected from the use of large values for $a$ and $b$. E : y^2 = x^3 -3x + 13318,
\todo{check large values of $a$} $$
defined over $\mathds{F}_{2^{255}-19}$~\cite{BarretoCurve}.
Therefore, we choose a (new) prime-order curve that---except for its struc\-ture---is similar to Bernstein's Curve25519. In a follow-up tweet Barreto clarified that the
Aside from being able to make a good comparison, we can furthermore build on some of the optimizations that have been developed specifically selection criteria for this curve were
for the field used in Curve25519 \cite{BS12,Cho16,DHH+15,FL15}. ``all old SafeCurves properties (with recent improvements) plus prime order''.
Our second priority is to choose properties that are ``straightforward'', Barreto did not name this curve; we will in the following
i.e.\ properties that are often found in other standardized curves. refer to it as Curve13318.
In general, we try to find an answer to the question: This name at the same time points to the curve parameter $b$ and
what could ``Curve25519'' have looked like, had it been a prime-order curve?
The curve that we chose was proposed by Barreto in May 2017~\cite{BarretoCurve}.
The curve is defined over $\mathds{F}_{2^{255}-19}$.
It is described by equation~\ref{eq:curve13318} and a suitable generator is $G = (-7, 114)$.
\begin{equation}\label{eq:curve13318}
E : y^2 = x^3 -3x + 13318
\end{equation}
\noindent
Its order is given by
\begin{equation}
N = \ell = 2^{255} + 325610659388873400306201440571661405155\texttt{.}
\end{equation}
In the following we will be calling this curve Curve13318,
which at the same time points to the curve parameter $b$ and
its intended similarities to Curve25519. its intended similarities to Curve25519.
The order of the group of $\F_p$-rational points on Curve13318 is
% XXX(dsprenkels) Cannot choose a=0. Craig mentioned this to me once. It had $N = \ell = 2^{255} + 325610659388873400306201440571661405155$.
% something to do with the fact that there would exist no
% a=0 curve over GF(25519). I don't think we need to mention
% this.
\subheading{Choice of $a$.}
Alternatively to choosing $a=-3$, we could choose $a=1$.
The first reason we chose $a=-3$ instead is because---depending on the addition formulas that are used---it results in more efficient curve arithmetic~\cite{BJ03}.
Indeed, the Renes-Costello-Batina complete addition formulas have a specialized case for $a=-3$~\cite[Section 3.2]{RCB16}.
The second reason is that various cryptographic standards have adopted these kinds of curves~\cite{ETSI07,BSI12,FIPS186-4,Brainpool,SEC2}.
Our results will apply to more commonly used curves if we mimic the standards.
\subheading{Twist security.}
In the case an implementor uses formulas that do not depend on any of the constants $a$ and $b$,
they could choose to omit checking whether the input point lies on the curve.
To prevent invalid curve attacks in this case, $E$'s twist ($E^d$) must also be of prime order.
Then, the first valid value for $b$ is $13318$.
\todo{Comment on $x$-coordinate only.}
\subheading{Point validation.}\label{sec:pointvalidation}
All scalar-multiplication algorithms on Curve13318---or any short
Weierstraß curve for that matter---should implement appropriate
point-validation routines. That is, they should check whether the input point
lies on the curve, and that it is not the neutral element.
Together, these checks prevent any invalid-curve and small-subgroup attacks.
\todo{Update according to previous paragraph's update}
\subsection{The Renes-Costello-Batina formulas} \subsection{The Renes-Costello-Batina formulas}
\label{sec:additionformulas} \label{sec:additionformulas}
......
@String { LNCS = LNCS } @String { LNCS = LNCS }
@String { SV = {Springer} } @String { SV = {Springer} }
@inproceedings{
@book{Aum, @book{Aum,
author = {Jean-Philippe Aumasson}, author = {Jean-Philippe Aumasson},
title = {Serious Cryptography}, title = {Serious Cryptography},
......
\section{Performance results} \section{Performance results}
\label{sec:results} \label{sec:results}
% \begin{Verbatim}
% - First present the results
% - Show the slowdown factor for each
% - We see that platform matters. Due to Curve25519 having no 4\times batched operations.
% - In the end. Ballpark: factor 2?
% - Now it is up to the user, whether they want to use this or not. They must
% ask themselves: How much is the cofactor stuff worth? That is, if
% Ristretto hadn't been there; for god's sake, just use Ristretto.
% - ``The cost of completeness''
% - Mention that RCB reported 1.5x (or something)
% - Then, look at the other implementation from [BCLN16] which is
% defined over 2^256-189 w/ b=152961. They mention that completeness
% costs a factor of 2. Their w-256-mers implementation uses 278kcc. on
% Sandy Bridge
% As such they predicted our benchmark would be ~550kcc (which is a bit
% pessimistic). More like 40% (which is arguably fine).
% - Should we also do a comparison to traditional curves? I.e. NIST etc.?
%
% - Note: I really want to criticize the use of radix 2^21.25. I don't think
% anybody should use that ever again.
% \end{Verbatim}
The complete scalar multiplication algorithm was tested and benchmarked on Sandy Bridge\footnote{Model: Intel Core i7-2600}, Ivy Bridge\footnote{Model: Intel Core i5-3210}, Haswell\footnote{Model: Intel Core i7-4770}, and Cortex M4\footnote{Device: STM32F407} CPU's. On the Intel processors, all measurements were done with Turbo Boost disabled, all Hyper-Threading cores shut down, and with the CPU clocked at the maximum frequency. The complete scalar multiplication algorithm was tested and benchmarked on Sandy Bridge\footnote{Model: Intel Core i7-2600}, Ivy Bridge\footnote{Model: Intel Core i5-3210}, Haswell\footnote{Model: Intel Core i7-4770}, and Cortex M4\footnote{Device: STM32F407} CPU's. On the Intel processors, all measurements were done with Turbo Boost disabled, all Hyper-Threading cores shut down, and with the CPU clocked at the maximum frequency.
We listed the benchmarking results in Table~\ref{tab:benchmarks}. As expected, We list the benchmarking results in Table~\ref{tab:benchmarks}. As expected,
none of our implementations exceed the performance of Curve25519. none of our implementations exceed the performance of Curve25519.
\ctable[ \ctable[
...@@ -57,40 +35,20 @@ none of our implementations exceed the performance of Curve25519. ...@@ -57,40 +35,20 @@ none of our implementations exceed the performance of Curve25519.
} }
It can immediately be seen that the slowdown factor is dependent on the platform. It can immediately be seen that the slowdown factor is dependent on the platform.
In particular, the Haswell implementation performs better than the others. In particular, the Haswell implementation of scalar multiplication on Curve13318
performs, also relatively speaking, much better than the others.
The source of this is seems to be that Algorithms~\ref{alg:add} and~\ref{alg:double} The source of this is seems to be that Algorithms~\ref{alg:add} and~\ref{alg:double}
lend themselves for very efficient 4-way parallelization, lend themselves for very efficient 4-way parallelization,
which is not supported by Curve25519's ladder algorithm. which is not supported by Curve25519's ladder algorithm.
Through AVX2, 4-way parallelization is very powerful on Haswell, Through AVX2, 4-way parallelization is very powerful on Haswell,
whereas on the other platforms it is not. whereas on the other platforms it is not, at least not to the same extent.
This makes it possible to write This makes it possible to write
a Haswell implementation that is significantly faster than the others. a Haswell implementation that is significantly faster than the others.
\begin{Verbatim}
sFIXME(cryptojedi) Unsure about this paragraph. It touches the debate
about performance that is needed. I feel that we have brought that much
authority to say something about this here.
We could take the argument that Hanno makes on the IETF mailing list about
FourQ, or the fact that the draft expired, but all that feels a little thin.
Moreover, there is possibly enough context that we need more space than just
the conclusion. And also this section feels too brief without this paragraph.
Am I maybe missing something else?
-- Daan
\end{Verbatim}
We see that the slowdown factor ranges from about $1.5\times$
for the Haswell platform, to $2.9\times$ on Cortex M4.
In terms of cycle counts,
Curve13318 is considerably slower than Curve25519.
However, the small factor involved can still be considered
insignificant for complex protocols,
which are often not designed for performance in the first place.
\subheading{The cost of completeness.} \subheading{The cost of completeness.}
\todo{go through this text again.}
Aside from comparing the Renes-Costello-Batina formulas to Curve25519's formulas, Aside from comparing the Renes-Costello-Batina formulas to Curve25519's formulas,
we can also look briefly at other prime-order curve implementations, we can also look briefly at other prime-order-curve implementations,
in order to see how the formulas fare against others. in order to see how the formulas fare against others.
In their paper, Renes, Costello, and Batina reported the overhead to be $1.38\times$. In their paper, Renes, Costello, and Batina reported the overhead to be $1.38\times$.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment