if(CMAKE_BUILD_TYPE EQUAL "Release" OR CMAKE_BUILD_TYPE EQUAL "MinSizeRel")
set(CMAKE_OSX_ARCHITECTURES arm64 x86_64)# needs to be set before project(), see https://cmake.org/cmake/help/latest/variable/CMAKE_OSX_ARCHITECTURES.html
endif()
endif()
project(pep LANGUAGES C CXX)
set(CMAKE_C_STANDARD 11)
if(CMAKE_SYSTEM_NAME MATCHES "Android")
set(CMAKE_CXX_STANDARD 17)
else()
set(CMAKE_CXX_STANDARD 20)
endif()
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
OPTION(ALL_WARNINGS "Enable all possible warnings" ON)
OPTION(TEST "Build the tests" ON)
if(UNIX AND NOT APPLE)
# reduce binary size, by making it easier to garbage collect redundant/unused binary code.
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-over-aligned")# getting weird errors that the default allocator guarantees 4 bytes, and 8 bytes is needed if std::function's are used
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unknown-warning-option -Wno-zero-as-null-pointer-constant")# lots of false positives
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-extra-semi-stmt")# lots of issues with checkAssert(..); (warning there is an additional ; at the end)
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-ctad-maybe-unsupported")# disable errors about missing angle brackets
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-c99-extensions")# disable warnings about designated initializers are a C99 extensions (as it is allowed in C++20) (needed for macOS Catalina)
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-template")# needed for Catch2 TEMPLATE_LIST_TEST_CASE
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-alloca")# for now disabled, sqlite3 integration (only place alloca is used) is possibly removed anyway
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-undefined-func-template")# goes wrong with Prometheus imports (goes wrong on compile time)
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-array-bounds")# false positive in memcpy of NumberToStringObject (see https://bitpowder.com:2443/bitpowder/indigo/-/jobs/47193)
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-redundant-move")# conflicts with clang 7.0; which requires the moves
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-stringop-overflow")# too many false positives in simplestring.h
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Werror=missing-field-initializers")# needed for HTTPRequestOptions{}
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-mismatched-new-delete")# needed for debug.h InstanceDebug
endif()
if(NOT MSVC)
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Werror")
endif()
endif(ALL_WARNINGS)
if(MSVC)
# Get rid of warnings about unsafe standard functions such as _open and _ftime64
# Get rid of warnings about passing "unchecked iterators" such as pointers to standard functions such as std::copy. See https://msdn.microsoft.com/en-us/library/aa985965.aspx and e.g. https://stackoverflow.com/a/1301343
# libpep: Library for polymorphic pseudonimisation and encryption
Author: Bernard van Gastel
Licence: Apache
This library implements the PEP encryption based on ElGamal, and operations on these encrypted messages. A message `M` can be encrypted for a receiver which has public key `Y` associated with it, belonging to secret key `y`. This encryption is random: every time a different random `r` is used, resulting in different ciphertexts (encrypted messages). We represent this encryption function as `EG(r, M, Y)`.
The library supports three operations on ciphertext `in` (= `EG(r, M, Y)`, encrypting message `M` for public key `Y` with random `r`):
-`out = rerandomize(in, s)`: scrambles a ciphertext. Both `in` and `out` can be decrypted by the same secret key `y`, both resulting in the same decrypted message `M`. However, the binary form of `in` and `out` differs. Spec: `in = EG(r, M, Y)` is transformed to `out = EG(r+s, M, Y)`;
-`out = reshuffle(in, n)`: modifies a ciphertext `in` (an encrypted form of `M`), so that after decryption of `out` the decrypted message will be equal to `n*M`. Spec: `in = EG(r, M, Y)` is transformed to `out = EG(r, n*M, Y)`.
-`out = rekey(in, k)`: if `in` can be decrypted by secret key `y`, then `out` can be decrypted by secret key `k*y`. Decryption will both result in message `M`. Spec: `in = EG(r, M, Y)` is transformed to `out = EG(r, M, k*Y)`.
The `rekey(in, k)` and `reshuffle(in, n)` can be combined in a `RKS(in, k, n)`.
There are also zero knowledge proof version of these operations. These are needed so that a party can prove to another party that it has applied the operation on the input data, without revealing the factors used in the operation.
When distributing trust over multiple central servers, these zero knowledge proofs are essential, so that a malfunctioning server can not violate security guarantees of the system.
## Applications
For pseudonimisation, the core operation is *reshuffle* with `n`. It modifies a main pseudonym with a factor `n` that is specific to a user (or user group) receiving the pseudonym. After applying a user specific factor, a pseudonym is called a *local pseudonym*.
Using only a reshuffle is insufficient, as the pseudonym is still encrypted with the public key `Y` (which can be decrypted by the secret key `y`). To allow a user to decrypt the encrypted pseudonym, a *rekey* with `k` is needed, in combination with a protocol to hand the user the secret key `k*y`. The factor `k` is typically tied to the *current session of a user*.
To make pseudonyms harder to trace, rerandomize is applied frequently. This way a binary compare of the encrypted pseudonym will not leak any information.
## Implementation
We are using the Ristretto encoding on a Curve25519. We are using the libsodium implementation. In the source code, scalars are lower case and group elements are upper case. There are a number of arithmetic rules for scalars and group elements: group elements can be added and subtracted from each other. Scalars support addition, subtraction, and multiplication. A scalar can be converted to a group element (by multiplying with the special generator `G`), but not the other way around. Group elements can also be multiplied by a scalar.
Group elements have an *almost* 32 byte range (top bit is always zero, and some other values are invalid). Therefore, not all AES-256 keys (using the full 32 bytes range) are valid group elements. But all group elements are valid AES-256 keys. Group elements can be generated by `GroupElement::Random()` or `GroupElement::FromHash(..)`. Scalars are also 32 bytes, and can be generated with `Scalar::Random()` or `Scalar::FromHash(..)`.
The zero knowledge proofs are offline Schnorr proofs, based on a Fiat-Shamir transform.
The key derivation function used in Blake2b. The hashing algorithm used is SHA512.
Unit tests can be easily added by adding a `unit-tests/foo.test.cpp` file.
## Background
Based on the article by Eric Verheul and Bart Jacobs, *Polymorphic Encryption and Pseudonymisation in Identity Management and Medical Research*. In **Nieuw Archief voor Wiskunde (NAW)**, 5/18, nr. 3, 2017, p. 168-172. A local copy is available in docs/naw5-2017-18-3-168.pdf. This article does not contain the zero knowledge proofs.