Skip to content
Snippets Groups Projects
Commit 5b5ae3ee authored by Daan Sprenkels's avatar Daan Sprenkels
Browse files

Try to improve the run of a factor-1.4 paragraph

parent 26addff0
No related branches found
No related tags found
No related merge requests found
......@@ -54,6 +54,7 @@ This makes it possible to write
a Haswell implementation that is significantly faster than the others.
\subheading{The cost of completeness.}
Another question we might be able to answer is if the factor-$1.4$ penalty
claimed in~\cite{RCB16}---for complete formulas vs.\ incomplete formulas---%
% On versus as 'vs.': https://english.stackexchange.com/a/5395
......@@ -63,12 +64,38 @@ for optimized implementations.
In~\cite{BCLN16}, Bos, Costello, Longa, and Naehrig present performance results
for scalar multiplication on a prime-order Weierstraß curve over $\mathds{F}_{2^{256}-189}$
using parameter $a=-3$. The implementation uses non-complete formulas for addition
and doubling and they report $278\,000$ for variable-base scalar multiplication
on Intel Sandy Bridge.
The curve is very similar to Curve13318, the software in~\cite{BCLN16} is
seriously optimized and also claimed to run in constant time, so these
using parameter $a=-3$.
The curve is very similar to Curve13318 and
the implementation uses non-complete formulas for addition
and doubling.
The authors report $278\,000$ cycles for variable-base scalar multiplication on Intel Sandy Bridge.
The software in~\cite{BCLN16} is seriously optimized, and claimed to
run in constant time,
so these
$278\,000$ cycles are reasonably comparable to our $389\,546$ cycles with
complete formulas.
In other words, this comparison confirms the factor-$1.4$ performance-penalty
complete formulas.
In other words, this comparison affirms the factor-$1.4$ performance-penalty
claim from~\cite{RCB16}.
% -Aside from comparing the Renes-Costello-Batina formulas to Curve25519's formulas,
% -we can also look briefly at other prime-order curve implementations,
% -in order to see how the formulas fare against others.
%
% [...]
%
% -In their paper, Renes, Costello, and Batina reported the overhead to be $1.38\times$.
% -On the other hand, \cite{BCLN16},
% -implement a similar $a=-3$ curve over $\mathds{F}_{2^{256}-189}$
% -(which they call ``w-256-mers'');
% -they chose to use incomplete formulas for their implementation,
% -as using complete formulas would incur a performance cost of a factor 2.\footnote{%
% -A relevant note to this estimate is that it was constructed \emph{before} \cite{RCB16} was published.
% -I.e.\ it is based on the formulas from Bosma and Lenstra~(\cite{BL95}),
% -\emph{not} those from Renes, Costello, and Batina~(\cite{RCB16}).
% -}
% -Their variable-basepoint scalar-multiplication runs in $278\unit{kcc}$
% -on the Sandy Bridge microarchitecture.
% -Comparing that measurement to ours, suggests that the complete formulas add---%
% -relative to their incomplete formulas based on conditional masking---%
% -an overhead of about $40\%$,
% -which strongly affirms the overhead measured by Renes, Costello, and Batina.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment