From 9858aedb98c3c9a48ca8167d14d8e98d8c45d15f Mon Sep 17 00:00:00 2001
From: Daan Sprenkels <daan@dsprenkels.com>
Date: Thu, 3 Oct 2019 13:13:40 +0200
Subject: [PATCH] Tweak text

---
 implementation.tex | 6 +++---
 results.tex        | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/implementation.tex b/implementation.tex
index 5bef68b..b1f51b2 100644
--- a/implementation.tex
+++ b/implementation.tex
@@ -24,7 +24,7 @@ The table lookup is implemented in a traditional scanning fashion:
 selecting the required value using a bitwise AND operation.
 Where we use an unsigned representation,
 we compute the conditional negation of $Y$
-by negating $Y$ and selecting the correct result using bitwise operations. When the representation is signed,
+by negating $Y$ and selecting the correct result using bitwise operations. When using floating points,
 we use a single XOR operation to conditionally flip the sign bit.
 These operations are---as well as the rest of the code---implemented in constant-time.
 
@@ -215,11 +215,11 @@ This substitutes $8\mathbf{a}$ for $4\mathbf{m}$ in \Add{}, and $10\mathbf{a}$ f
 Last, we found that shuffling the \texttt{ymm} registers turns out to be
 relatively weak and expensive.
 That is because Sandy Bridge has no arbitrary shuffle instruction
-(like the \texttt{vpermq} instruction from AVX2).
+(like the \texttt{vpermq} instruction in AVX2).
 To shuffle every value in a \texttt{ymm} register into the correct lane,
 we would need at least two µops on port 5.
 Then it is cheaper to put all the values in the first lane, and
-accept most of the additions and subtractions are not batched.
+accept that most of the additions and subtractions are not batched.
 
 
 
diff --git a/results.tex b/results.tex
index f177f35..46848bc 100644
--- a/results.tex
+++ b/results.tex
@@ -11,7 +11,7 @@ all Hyper-Threading cores shut down, and with the CPU clocked at the maximum
 nominal frequency.
 The STM32F407 device was run with its default settings,
 as listed in the datasheet~\cite{STM32F407}
-(i.e.~clocked from the internal 16\unit{MHz} internal RC-oscillator).
+(i.e.~clocked from the 16\unit{MHz} internal RC-oscillator).
 We list the benchmarking results in Table~\ref{tab:benchmarks}. As expected,
 none of our implementations exceed the performance of Curve25519.
 
@@ -21,7 +21,7 @@ none of our implementations exceed the performance of Curve25519.
     caption = {Measured cycle counts
     of the variable-basepoint scalar-multiplication routines
     on the {Sandy Bridge} (SB), {Ivy Bridge} (IB), {Haswell} (H) and {Cortex M4} (M4) architectures.
-    For completeness (by the request of our reviewers), we have included cycle
+    For completeness, we have included cycle
     counts for Ed25519 signatures verification (which is the operation in Ed25519
     that computes variable-basepoint scalar-multiplication).    
     },
-- 
GitLab