Skip to content
Snippets Groups Projects
Commit 9858aedb authored by Daan Sprenkels's avatar Daan Sprenkels
Browse files

Tweak text

parent 4785c782
No related branches found
No related tags found
No related merge requests found
......@@ -24,7 +24,7 @@ The table lookup is implemented in a traditional scanning fashion:
selecting the required value using a bitwise AND operation.
Where we use an unsigned representation,
we compute the conditional negation of $Y$
by negating $Y$ and selecting the correct result using bitwise operations. When the representation is signed,
by negating $Y$ and selecting the correct result using bitwise operations. When using floating points,
we use a single XOR operation to conditionally flip the sign bit.
These operations are---as well as the rest of the code---implemented in constant-time.
......@@ -215,11 +215,11 @@ This substitutes $8\mathbf{a}$ for $4\mathbf{m}$ in \Add{}, and $10\mathbf{a}$ f
Last, we found that shuffling the \texttt{ymm} registers turns out to be
relatively weak and expensive.
That is because Sandy Bridge has no arbitrary shuffle instruction
(like the \texttt{vpermq} instruction from AVX2).
(like the \texttt{vpermq} instruction in AVX2).
To shuffle every value in a \texttt{ymm} register into the correct lane,
we would need at least two µops on port 5.
Then it is cheaper to put all the values in the first lane, and
accept most of the additions and subtractions are not batched.
accept that most of the additions and subtractions are not batched.
......
......@@ -11,7 +11,7 @@ all Hyper-Threading cores shut down, and with the CPU clocked at the maximum
nominal frequency.
The STM32F407 device was run with its default settings,
as listed in the datasheet~\cite{STM32F407}
(i.e.~clocked from the internal 16\unit{MHz} internal RC-oscillator).
(i.e.~clocked from the 16\unit{MHz} internal RC-oscillator).
We list the benchmarking results in Table~\ref{tab:benchmarks}. As expected,
none of our implementations exceed the performance of Curve25519.
......@@ -21,7 +21,7 @@ none of our implementations exceed the performance of Curve25519.
caption = {Measured cycle counts
of the variable-basepoint scalar-multiplication routines
on the {Sandy Bridge} (SB), {Ivy Bridge} (IB), {Haswell} (H) and {Cortex M4} (M4) architectures.
For completeness (by the request of our reviewers), we have included cycle
For completeness, we have included cycle
counts for Ed25519 signatures verification (which is the operation in Ed25519
that computes variable-basepoint scalar-multiplication).
},
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment