Our intuition is that in practice they will end up slightly slower than
the signed fixed-window scalar multiplication using Renes-Costello-Batina formulas
we employed here, but setteling this question clearly needs more implementation
effort.
In this paper, we introduced Baretto's Weierstraß curve,
which we call ``Curve13318''.
We implemented optimized algorithms for variable-basepoint scalar multiplication
on the Intel Haswell and Sandy Bridge microarchitectures,
and on the ARM Cortex M4 architecture, using the formulas from~\cite{RCB16}.
We compared their performances to well-known implementations
for scalar multiplication on Curve25519.
The cycle counts show that Curve13318 is considerably slower than Curve25519---%
by a factor ranging from 1.47 on Intel Haswell and 2.87 on ARM Cortex M4---%
though this slowdown should still be reasonable for most cryptographic protocols.
\subheading{Conclusion.}
The analysis in this paper shows that using prime-order Weierstraß curves
with complete addition formulas is between $\approx1.5$ times and $\approx2.9$
times slower than using state-of-the-art Montgomery curve arithmetic.
In an area where even a $10$\% improvement in performance is often
considered important an worth publication in major venues, this is
a pretty heavy price to pay; at least for some applications that
are bottlenecked by ECC performance.
We argue that,
% We "believe" that?
while prime-order Weierstraß curves may be slower,
they are easier to implement securely in complex cryptographic protocols than their (twisted) Edwards counterparts.
Moreover, we saw (again) that the overhead of complete formulas
for arithmetic on Weierstraß curves is not that large.
However, for applications that primarily aim at simplicity and safety against
subgroup attacks, the performance penalty might be acceptable.
This point of view is supported, for example, also by the fact that
the attempt to standardize the high-performance ``Four$\mathbb{Q}$''
curve~\cite{CL15} in CFRG~\cite{LLB17} was only very short lived\footnote{For the full dicussion, see \url{https://mailarchive.ietf.org/arch/msg/cfrg/sCqu86nFiAw_9beBXVqBM_zES_k.}}.
The discussion around this proposal acknowledged that Four$\mathbb{Q}$
offers considerably faster arithmetic than Curve25519, but questioned
that there are any applications that really need that performance.
In the end, Weierstraß curves are still superseded by Curve25519.
However for complex protocols,
we discourage the unmodified use of this curve.
Instead, eliminate the non-trivial cofactor; use Ristretto.
% \subheading{Future work.}
% Should be even do this?
% \begin{Verbatim}
% - Implement the other formulas for other GF(2^255-19)-curves
% - Complete formulas useful for scalar randomization
% - Try the Susella-Montrasio formulas.
% - Make the argument that the traditional curves are bad,
% but not because of their Weierstrass form. Only because
% the constants and field are worse than Curve25519s.
% \end{Verbatim}
%
In our opinion, for the design of new protocols the best compromise
is to use Curve25519 in twisted Edwards form with the Ristretto encoding