\subsection{Weierstraß, Montgomery, and twisted Edwards curves}
XXX
The typical way to introduce elliptic curves over a field with large characteristic $\F$ is
through the \emph{short} Weierstraß equation
$$
E_W: y^2= x^3+ ax + b,
$$
where $a,b \in\F$.
As long as the discriminant $\delta=-16(4a^3+27b^2)$ is nonzero, this equation describes
an elliptic curve and any elliptic curve over a field $\F$ with characteristic not equal to two or three
can be described through such an equation.
For cryptography we typically choose a field of large prime order $p$; the relevant group
in the cryptographic setup is the group of $\F_p$-rational points $E(\F_p)$.
Whenever we talk about ``the order of an elliptic curve'' in this paper we
mean the order of this group.
The typical way to use Weierstraß curves in cryptography is to pick curve
parameter $a=-3$ for somewhat more efficient arithmetic, represent a point
$P=(x,y)$ in Jacobian coordinates $(X:Y:Z)$ with $(x,y)=(X/Z^2, Y/Z^3)$.
Point addition is using the formulas from~\cite[]{BL07} (improving on~\cite{})
and uses $11\mathbf{M}+5\mathbf{S}+9\mathbf{a}$.
Doubling also uses formulas from~\cite[]{BL07} (improving on~\cite{})
and uses $3\mahtbf{M}+5\mathbf{S}+8\mathbf{a}$.
Alternatively, one can use a ladder with differential additions, for
example using the co-$Z$ approach from~\cite{}
Scalar multiplication using a ladder is also what Montgomery proposed in~\cite{XXX}
for a different class of elliptic curves, so-called \emph{Montgomery curves}.
These are described through an equation of the form
$$
E_M: XXX
$$
The most common way to implement arithmt
\todo{Write this section.}
\subsection{Curve13318}
Many factors may contribute to the general performance of scalar multiplication on an elliptic curve. In our research we have tried to produce a benchmark that is unaffected by irrelevant curve properties. For example, in the case of some of the traditional curves---like those from \cite{FIPS186-4,Brainpool,SEC2}---a performance penalty is expected from the use of large values for $a$ and $b$.
The goal of this paper is to investigate the performance of complete addition and doubling
on a Weierstraß curve and compare it to the performance of Curve25519.
Many aspects contribute to the performance of elliptic-curve arithmetic and as we are
mainly interested in the impact of formulas implementing the group law, we decided to
choose a curve that is as similar to Curve25519 as possible, except that it is in Weierstraß
form and has prime order. This means that in particular, we want a curve that
\begin{itemize}
\item is defined over the field $\F_p$ with $p =2^{255}-19$;
\item is twist secure (for a definition, see~\cite[Sec XXX]{XXX});
\item has curve parameter $a =-3$ to support common speedups of the group law;
\item has small curve parameter $b$;
\end{itemize}
It turns out that we were not the first to have the idea to look for a curve with
precisely these properties. In
Therefore, we choose a (new) prime-order curve that---except for its struc\-ture---is similar to Bernstein's Curve25519. Aside from being able to make a good comparison, we can furthermore build on some of the optimizations that have been developed specifically
for the field used in Curve25519. \cite{BS12,Cho16,DHH+15,FL15}.
Our second priority is to choose properties that are ``straightforward'', i.e.\ properties that are often found in other standardized curves. In general, we try to find an answer to the question: what would Curve25519 have looked like, had it been a prime-order curve?
Many factors may contribute to the general performance of scalar multiplication on an elliptic curve.
In our research we have tried to produce a benchmark that is unaffected by irrelevant curve properties. For example, in the case of some of the traditional curves---like those from \cite{FIPS186-4,Brainpool,SEC2}---a performance penalty is expected from the use of large values for $a$ and $b$.
\todo{check large values of $a$}
The curve that we chose was proposed by P.~Barreto in May 2017~\cite{BarretoCurve}. The nameless curve is defined over $\mathds{F}_{2^{255}-19}$. It is described by equation~\ref{eq:curve13318} and a suitable generator is $G =(-7, 114)$.
Therefore, we choose a (new) prime-order curve that---except for its struc\-ture---is similar to Bernstein's Curve25519.
Aside from being able to make a good comparison, we can furthermore build on some of the optimizations that have been developed specifically
for the field used in Curve25519 \cite{BS12,Cho16,DHH+15,FL15}.
Our second priority is to choose properties that are ``straightforward'',
i.e.\ properties that are often found in other standardized curves.
In general, we try to find an answer to the question:
what could ``Curve25519'' have looked like, had it been a prime-order curve?
The curve that we chose was proposed by Barreto in May 2017~\cite{BarretoCurve}.
The curve is defined over $\mathds{F}_{2^{255}-19}$.
It is described by equation~\ref{eq:curve13318} and a suitable generator is $G =(-7, 114)$.
\begin{equation}\label{eq:curve13318}
E : y^2 = x^3 -3x + 13318
\end{equation}
...
...
@@ -21,18 +76,27 @@ Its order is given by
\begin{equation}
N = \ell = 2^{255} + 325610659388873400306201440571661405155\texttt{.}
\end{equation}
In the following we will be calling this curve Curve13318,
which at the same time points to the curve parameter $b$ and
its intended similarities to Curve25519.
% XXX(dsprenkels) Cannot choose a=0. Craig mentioned this to me once. It had
% something to do with the fact that there would exist no
% a=0 curve over GF(25519). I don't think we need to mention
% this.
\subheading{Choice of $a$.}
Alternatively to choosing $a=-3$, we could choose $a=1$. The first reason we chose $a=-3$ instead is because---depending on the addition formulas that are used---it results in more efficient curve arithmetic~\cite{BJ03}.
Alternatively to choosing $a=-3$, we could choose $a=1$.
The first reason we chose $a=-3$ instead is because---depending on the addition formulas that are used---it results in more efficient curve arithmetic~\cite{BJ03}.
Indeed, the Renes-Costello-Batina complete addition formulas have a specialized case for $a=-3$~\cite[Section 3.2]{RCB16}.
The second reason is that various cryptographic standards have adopted these kinds of curves~\cite{ETSI07,BSI12,FIPS186-4,Brainpool,SEC2}. Our results will apply to more commonly used curves if we mimic the standards.
The second reason is that various cryptographic standards have adopted these kinds of curves~\cite{ETSI07,BSI12,FIPS186-4,Brainpool,SEC2}.
Our results will apply to more commonly used curves if we mimic the standards.
\subheading{Twist security.}
In the case an implementor uses formulas that do not depend on any of the constants $a$ and $b$, they could choose to omit checking whether the input point lies on the curve. To prevent invalid-curve attacks in this case, $E$'s twist ($E^d$) must also be of prime order. Then, the first valid value for $b$ is $13318$.
In the case an implementor uses formulas that do not depend on any of the constants $a$ and $b$,
they could choose to omit checking whether the input point lies on the curve.
To prevent invalid curve attacks in this case, $E$'s twist ($E^d$) must also be of prime order.
The cost of the doubling formulas is $8\mathbf{M}+3\mathbf{S}+2\mathbf{m_b}+21\mathbf{a}$. The algorithm for doubling (\Double{}) is listed in~\Cref{alg:double}.
The cost of the doubling formulas is $8\mathbf{M}+3\mathbf{S}+2\mathbf{m_b}+21\mathbf{a}$.
The algorithm for doubling (\Double{}) is listed in~\Cref{alg:double}.
\begin{algorithm}[h]
\caption{Renes-Costello-Batina formula for $a=-3$. Used for exception-free doubling on Curve13318.}\label{alg:double}
...
...
@@ -192,9 +256,8 @@ The cost of the doubling formulas is $8\mathbf{M} + 3\mathbf{S} + 2\mathbf{m_b}
\end{algorithmic}
\end{algorithm}
We can try to reduce the cost of the doubling algorithm by erasing (some of) the multiplications $v_{1}$, $v_{4}$, $v_{6}$, $v_{28}$, using the rule that $2\alpha\beta=(\alpha+\beta)^2-\alpha^2-\beta^2$.
By applying this rule, we trade $1\mathbf{M}+1\mathbf{a}$ for $1\mathbf{S}+3\mathbf{a}$. Only on the \emph{Haswell} platform (\Cref{sec:haswell}), we found that the relative costs of $\mathbf{M}$, $\mathbf{S}$, and $\mathbf{a}$ were favorable, such that the substitution results in a faster algorithm.
In the next sections, these algorithms will form the basis for the implementations of the \textsc{Double} and \textsc{Add}-routines in our scalar-multiplication algorithm. We apply more smaller optimizations when the platform allows for it.
We can reduce the cost of the doubling algorithm by erasing (some of) the multiplications $v_{1}$, $v_{4}$, $v_{6}$, $v_{28}$,
using the rule that $2\alpha\beta=(\alpha+\beta)^2-\alpha^2-\beta^2$.
By applying this rule, we trade $1\mathbf{M}+1\mathbf{a}$ for $1\mathbf{S}+3\mathbf{a}$.
As we will describe in Section~\ref{sec:implementation},
this trick is beneficial only on the \emph{Haswell} platform,