Commit fda79915 authored by Erik Poll's avatar Erik Poll
Browse files

clarified & condensed section 4.3

parent a4a23db1
......@@ -53,9 +53,8 @@ their rules were tied to routines in the code, so had to be slightly adapted
to fit the different implementations. In contrast, our properties are
defined at an abstract level so do not need such tailoring. Moreover,
our black box approach approach means we can analyse any implemention
of SSH, not just C implementations.
\marginpar{\tiny @Paul: do these properties bear any resemblance o
your LTL properties}
of SSH, not just C implementations. \marginpar{\tiny @Paul: do these
properties bear any resemblance of your LTL properties}
Formal models of SSH in the form of state machines have been used
before, namely for a manual code review of OpenSSH
......
......@@ -195,79 +195,77 @@ Overall, we notice that in many ways, the {\dmapper} acts similarly to an SSH cl
implementation.
\subsection{Compacting SSH into a small Mealy machine}
\subsection{Practical complications}
There are three practical complications in learning models of SSH
servers: (1) an SSH server may exhibit \emph{non-determistic} behaviour; (2)
a single input to the server can produce a \emph{sequence} of outputs
ratheer than just a single output, and (3) \emph{buffering} behaviour of the
server. These complication are discussed below.
%There are three practical complications in learning models of SSH
%servers: (1) an SSH server may exhibit \emph{non-determistic}
%behaviour; (2) a single input to the server can produce a
%\emph{sequence} of outputs ratheer than just a single output, and (3)
%\emph{buffering} behaviour of the server. These complication are
%discussed below.
SSH implementations can exhibit non-determistic behaviour. The
learning algorithm cannot deal with non-determinism, so the {\dmapper}
not only provides abstraction, but it also ensures\marginpar{\tiny
Erik:'checks' instead of 'ensures'??} that the abstract representation
shown to the learner behaves a deterministic Mealy machine.
There are a few sources of non-determinism:
learning algorithm cannot cope with non-determinism -- learning will
not terminate -- so this has to be detected, which our {\dmapper} does.
There are a few sources of non-determinism in SSH:
\begin{enumerate}
\item Underspecification in the SSH specification (for example, by not
specifying the order of certain messages) allows
non-deterministic behaviour of implementations. Even if client
specifying the order of certain messages) allows some
non-deterministic behaviour. And even if client
and server do implement a fixed order for messages they sent, the
asynchronous nature of comunication between them means the
interleaving of messages sent and received may vary. Moreover,
asynchronous nature of communication means that the
interleaving of sent and received messages may vary. Moreover,
client and server are free to intersperse \textsc{debug} and
\textsc{ignore} messages at any given time; the \textsc{ignore}
messages are aimed to thwart traffic analysis.
\textsc{ignore} messages at any given time\footnote{The \textsc{ignore}
messages are aimed to thwart traffic analysis.}
\item Timing is another source of non-deterministic behaviour. For
example, the {\dmapper} might time-out before the {\dsut} had
sent its response.
In our experiments we tried to set time-out periods so that this
did not occur. However, other timing-related quirks can still
cause non-deterministism. For example, {\dsuts} behave
cause non-deterministism. For example, some {\dsuts} behave
unexpectedly when a new query is received too shortly after the
previous one.
%For example, a trace in which a valid user authentication is performed within five milliseconds after an authentication request on DropBear can cause the authentication to (wrongly) fail.
\end{enumerate}
%
To detect non-determinism, we cache all new observations in an SQLite database and verify observations against this cache.
\marginpar{\tiny Erik: the text does not explain what we do if non-determinism
is detected. Do we ignore the new trace and report the old one from the
database? And/or warn that non-determinism is happening?}
The cache also enables us to answer to queries answered before
without running inputs on the {\dsut}. This transcended consecutive
learning runs, as we could reload the cache from a previous run,
enabling quick response of all queries up to the point the previous
run ended.\marginpar{\tiny Erik: but the trace still has to be
re-run, right, if we want to extend it beyond the end of the previous
run?}
To detect non-determinism, the {\dmapper} caches all observations
in an SQLite database and verifies if a new observation is different
to one cached from a previous protocol run. If so, it raises
a warning, which then needs to be manually investigated.
An added benefit of this cache is that is allows the {\dmapper} to
supply answer to some inputs without actually sending them to the
{\dsut}. This speeded up learning a lot when we had to restart
experiments: any new experiment on the same {\dsut} could start where
the previous experiment left of, without re-running all inputs. This
was an important benefit, as experiments could take several days.
%A subsequent identical learning run can quickly resume from where the previous one was ended, as the cache from the previous run is used to quickly respond to all queries up to the point the previous run ended.
An SSH server may also produce a sequence of outputs in response to an
input. This means it is not behaving as a Mealy machines, which allow
allow for only one output. Dealing with this is simple: the {\dmapper}
concatenates all outputs into one, and it produces this sequence
as the single output to the {\dlearner}.
Another challenge is presented by buffering, which leads to an
explosion of states in learning, as buffers cannot be described
succinctly by Mealy machines.
We have encountered buffering in two situations. Firstly, some
implementations buffer responses during a key re-exchange; only once
the rekeying is complete, are all these queued responses released, all
at once. This leads to a \textsc{newkeys} response (indicating
rekeying has completed), directly followed by all buffered responses.
This would lead to non-termination of the learning algorithm, as for
every sequence of buffered messages there is a different respone. To
prevent this, we treat the sequence of queued responses
as a single output \textsc{buffered}.
Secondly, buffering happens when opening and closing channels,
since a SUT can close only as many channels as have previously been
opened. For this reason, we restrict the number of simultaneously open
Another practical problem beside non-determinism is that an SSH server
may produce a sequence of outputs in response to a single input. This
means it is not behaving as a Mealy machines, which allows for
only one output. Dealing with this is simple: the {\dmapper}
concatenates all outputs into one, and it produces this sequence as
the single output to the {\dlearner}.
A final challenge is presented by forms of `buffering', which we
encountered in two situations. Firstly, some implementations buffer
incoming requests during a key re-exchange; only once the rekeying is
complete are all these messages processed. This leads to a
\textsc{newkeys} response (indicating rekeying has completed),
directly followed by all the responses to the buffered requests. This
would lead to non-termination of the learning algorithm, as for every
sequence of buffered messages the response is different. To
prevent this, we treat the sequence of queued responses as a single
output \textsc{buffered}.
Secondly, buffering happens when opening and closing channels, since a
{\dsut} can close only as many channels as have previously been opened.
Learning this behaviour would lead to an infinite state machine, as we
would need a state `there are $n$ channels open' for every number $n$.
For this reason, we restrict the number of simultaneously open
channels to one. The {\dmapper} returns a custom response
\textsc{ch\_max} to a \textsc{ch\_open} message whenever this limit is
reached.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment