JSON encoding messes up UTF8 strings
The JSON encoder escapes all non-ASCII characters byte-by-byte. It uses an encoding of the form \uXXXX
, which is intended for escaping unicode characters, which don't necessarily consist of single bytes. This messes up UTF8 with non-single byte characters, as those will be encoded as multiple unicode characters, instead of a single one.
JSON does not support a notation for single bytes, so I could not find an encoding-agnostic way for escaping. It is however not required to escape non-ASCII characters at all, only the lower-numbered control characters. So simply not escaping higher-number characters seems to yield valid JSON and allows for an encoding-agnostic solution.