Commit 477037ed authored by Paul Fiterau Brostean's avatar Paul Fiterau Brostean

Some updates

parent 36a8011c
......@@ -15,7 +15,8 @@ We have adapted the setting of timing parameters to each implementation.
\caption{Model of the OpenSSH server. {\normalfont States are collected in 3 clusters,
indicated by the rectangles, where each cluster corresponds to
one of the protocol layers.
We eliminate redundant states and information induced by the {\dmapper}, as well as states present in successful rekeying sequences. Wherever rekeying was permitted, we replaced the rekeying states and transitions by a single \textsl{REKEY SEQUENCE} transition. We also factor out edges common to states within a cluster. We replace common disconnecting edges, by one edge from the cluster to the disconnect state. Common self loop edges are colored, and the actual i/o information only appears on one edge. Transitions with similar start and end states are joined together on the same edge. Transition labels are kept short by regular expressions(UA\_* stands for all inputs starting with UA\_) or by factoring out common start strings. Green edges highlight the happy flow. }}
We eliminate redundant states and information induced by the {\dmapper}, as well as states present in successful rekeying sequences. Wherever rekeying was permitted, we replaced the rekeying states and transitions by a single \textsl{REKEY SEQUENCE} transition. We also factor out edges common to states within a cluster. We replace common disconnecting edges, by one edge from the cluster to the disconnect state. Common self loop edges are colored, and the actual i/o information only appears on one edge. Transitions with similar start and end states are joined together on the same edge. Transition labels are kept short by regular expressions(UA\_* stands for inputs starting with UA\_) or by factoring out common start strings. Green edges highlight the happy flow. '+' concatenates
multiple outputs.}}
\label{fig:sshserver}
\end{figure*}
......@@ -34,7 +35,7 @@ challenges, as it would take a varying amount of time until the system responded
For the test queries we used random and exhaustive variants of the testing algorithm described in
\cite{SMJV15}, which generate efficient test suites. Tests generated comprise an access sequence, a middle section of length {\dk} and a
distinguishing sequence. The exhaustive variant for a set {\dk} generates tests for all possible middle sections and all states. Passing all tests then provides some notion of confidence,
distinguishing sequence. The exhaustive variant generates tests for all possible middle sections of length {\dk} and all states. Passing all tests then provides some notion of confidence,
namely, that the learned model is correct unless the (unknown) model of the implementation has at least {\dk} more states than the learned hypothesis. The random variant produces tests
with randomly generated middle sections. No formal confidence is provided, but past experience shows this to be more effective at finding counterexamples since {\dk}
can be set to higher values. We executed a random test suite with {\dk} of 4 comprising 40000 tests for OpenSSH, and 20000 tests for BitVise and DropBear.
......@@ -42,7 +43,7 @@ We then ran an exhaustive test suite with {\dk} of 2 for all implementations.
Table~\ref{tab:experiments} describes the exact versions of the systems analyzed together with statistics on learning and testing, namely:
(1) the number of states in the learned model, (2) the number of hypotheses built during the learning process and (3) the total number of learning and test queries run. For test queries, we only consider those run on the last hypothesis. All learned models along with the specifications checked can be found at \url{https://gitlab.science.ru.nl/pfiteraubrostean/Learning-SSH-Paper/tree/master/models}.
(1) the number of states in the learned model, (2) the number of hypotheses built during the learning process and (3) the total number of learning and test queries run. For test queries, we only consider those run on the last hypothesis. All learned models plus the specifications checked can be found at \url{https://gitlab.science.ru.nl/pfiteraubrostean/Learning-SSH-Paper/tree/master/models}.
%BitVise: MemQ: 24996 TestQ: 58423
%Dropbear: MemQ: 3561 TestQ: 30629
%OpenSSH: MemQ: 19836 TestQ: 76418
......@@ -64,19 +65,17 @@ DropBear v2014.65 & 29 & 8 & 8357
The large number of states is down to several reasons. First of all, some systems exhibited buffering behavior. In particular, BitVise would queue
responses for higher layer inputs sent during key re-exchange, and would deliver them all at once after rekeying was done. Rekeying was also
a major contributor to the number of states. For each state where rekeying
is possible, the sequence of transitions constituting the complete
rekeying process should lead back to that state. This
leads to two additional rekeying states for every state allowing rekey. Many states were also added due to {\dmapper} generated outputs such as \textsc{ch\_none} or \textsc{ch\_max}, outputs which signal that no channel is open or that the maximum number of channels have been opened.
a major contributor to the number of states. For each state where rekeying is possible, the sequence of transitions constituting the complete
rekeying process should lead back to that state. This leads to two additional rekeying states for every state allowing rekey. Many states were also added due to {\dmapper} generated outputs such as \textsc{ch\_none} or \textsc{ch\_max}, outputs which signal that no channel is open or that the maximum number of channels have been opened.
Figure~\ref{fig:sshserver} shows the model learned for OpenSSH, with some edits to improve readability. The happy flow, in green, is fully explored in the model and mostly matches our earlier description of it\footnote{The only exception is in the Transport layer, where unlike in our happy flow definition, the server is the first to send the \textsc{newkeys} message. This is also accepted behavior, as the protocol does not specify which side should send \textsc{newkeys} first.}. Also explored is what happens when a rekeying sequence is attempted. We notice that rekeying is only allowed in states of the Connection layer. Strangely, for these states, rekeying is not state preserving, as the generated output on receiving a \textsc{sr\_auth}, \textsc{sr\_conn} or \textsc{kex30} changes from \textsc{unimpl} to \textsc{no\_resp}. This leads to two sub-clusters of states, one before the first rekey, the other afterward. In all other states, the first step of a
rekeying (\textsc{kexinit}) yields (\textsc{unimpl}), while the last step (\textsc{newkeys}) causes the system to disconnect.
We were puzzled by how systems reacted to \textsc{sr\_conn}, the request for services of the Connection layer. These services can be accessed once the user had authenticated, without the need of a prior service request. That in itself was not strange, as authentication messages already mention that connection services should start after authentication \footnote{This is a technical detail, the message format of authentication messages requires a field which says the service started after authentication. The only option is to start Connection layer services. }.
Unexpected was that an explicit request either lead to \textsc{unimpl}/\textsc{no\_resp} with no state change, as in the case of OpenSSH, or termination of the connection, as in the case of BitVise. The latter was particularly strange, as in theory, once authenticated, the user should always have access to the service, and not be disconnected when requesting this service. Only DropBear seems to respond positively (\textsc{sr\_accept}) to \textsc{sr\_conn} after authentication.
%We were puzzled by how systems reacted to \textsc{sr\_conn}, the request for services of the Connection layer. These services can be accessed once the user had authenticated, without the need of a prior service request. That in itself was not strange, as authentication messages already mention that connection services should start after authentication \footnote{This is a technical detail, the message format of authentication messages requires a field which says the service started after authentication. The only option is to start Connection layer services. }.
%Unexpected was that an explicit request either lead to \textsc{unimpl}/\textsc{no\_resp} with no state change, as in the case of OpenSSH, or termination of the connection, as in the case of BitVise. The latter was particularly strange, as in theory, once authenticated, the user should always have access to the service, and not be disconnected when requesting this service. Only DropBear seems to respond positively (\textsc{sr\_accept}) to \textsc{sr\_conn} after authentication.
We also notice the intricate authentication behavior:
after an unsuccessful authentication attempt the only authentication method still allowed is password authentication. Finally, only BitVise allowed multiple terminals to be requested over the same channel. As depicted in the model, OpenSSH abruptly terminates on requesting a second terminal. DropBear exhibits a similar behavior.
%Systems reacted differently to \textsc{sr\_conn}, input which requests services of the Connection layer. These services are automatically requested after successful authentication, so technically, this input is not required to open channels for example. Nevertheless, in authenticated states, OpenSSH responded with \textsc{unimpl} and maintained state. BitVise disconnected the user. Only DropBear generated \textsc{sr\_accept}, the output we were expecting in view of the RFC.
We also note the intricate authentication behavior: after an unsuccessful authentication attempt the only authentication method still allowed is password authentication. Finally, only BitVise allowed multiple terminals to be requested over the same channel. As depicted in the model, OpenSSH abruptly terminates on requesting a second terminal. DropBear exhibits a similar behavior.
......
This diff is collapsed.
......@@ -41,7 +41,7 @@ $\M$ and $\N$ if and only if $A_{\M}(\sigma) \neq A_{\N}(\sigma)$.
\subsection{MAT Framework} \label{ssec:mat}
The most efficient algorithms for model learning all follow
The most efficient algorithms for model learning (see~\cite{Isberner2015} for a recent overview) all follow
the pattern of a \emph{minimally adequate teacher (MAT)} as proposed by Angluin~\cite{Angluin1987Learning}.
Here learning is viewed as a game in which a \textit{learner} has to infer an unknown automaton by asking queries to a teacher. The teacher knows the automaton, which in our setting is a Mealy machine $\M$,
also called the System Under Learning ({\dsut}).
......@@ -65,8 +65,7 @@ An equivalence query can be approximated using a conformance testing tool \cite{
A test query consists of asking the {\dsut} for the response to an input sequence
$\sigma \in I^{\ast}$, similar to a membership query.
Note that this cannot rule out that there is more behavior
that has not been discovered. For a recent overview of model learning
algorithms for this setting see \cite{Isberner2015}.
that has not been discovered.
\subsection{Abstraction} \label{ssec:mappers}
Most current learning algorithms are only applicable to Mealy machines with small alphabets comprising abstract messages. Practical systems typically
......
......@@ -18,8 +18,7 @@ SSHv2 follows a client-server paradigm. The protocol consists of three layers (F
\end{figure}
Each layer has its own specific messages. The SSH protocol is interesting in that outer layers do not encapsulate inner layers. This means that different layers can interact. Hence, it makes sense to analyze SSH as a whole, instead of analyzing its constituent layers independently. Below we discuss each layer, outlining the relevant messages which are later used in learning, and characterising the so-called \textit{happy flow} that a normal protocol
run follows.
Each layer has its own specific messages. The SSH protocol is interesting in that outer layers do not encapsulate inner layers, and different layers can interact. For this reason, we opt to analyze SSH as a whole, instead of analyzing its constituent layers independently. Below we discuss each layer, outlining the relevant messages which are later used in learning, and characterizing the so-called \textit{happy flow} that a normal protocol run follows.
At a high level, a typical SSH protocol run uses the three constituent protocols in the order given above: after the client establishes a TCP connection with the server, (1) the two sides use the Transport layer protocol to negotiate key exchange and encryption algorithms, and use these to establish session keys, which are then used to secure further communication; (2) the client uses the user authentication protocol to authenticate to the server; (3) the client uses the connection protocol to access services on the server, for example the terminal service.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment