Commit a7a9048f authored by Erik Poll's avatar Erik Poll

fixed some times & condensed

parent 7145da0a
......@@ -9,10 +9,10 @@ Abstraction was provided by a {\dmapper} component placed between the
{\dlearner} and the {\dsut}. The {\dmapper} was constructed from an
existing SSH implementation. The input alphabet of the {\dmapper}
explored key exchange, setting up a secure connection, several
authentication methods and opening and closing channels over which the
authentication methods, and opening and closing channels over which the
terminal service could be requested. We used two input alphabets, a
full version for OpenSSH, and a restricted version for the other
implementations. The restricted alphabet was still sufficient to
full version for OpenSSH, and a restricted version for Bitvise and
DropBear. The restricted alphabet was still sufficient to
explore most aforementioned behavior.
We encountered several challenges. Firstly, building a {\dmapper} presented a considerable technical challenge, as it required re-structuring of an actual
......
......@@ -86,7 +86,7 @@ also used model checkers \cite{TCP2016,Chalupar2014Automated}.
Instead of using active learning as we do, it is also possible to use
passive learning to obtain protocol state machines
\cite{Wang2011Inferring}. Here network traffic is observed, and not
actively generated. This can then also provide a probabilistic
actively generated. This can then provide a probabilistic
characterization of normal network traffic, but it cannot uncover
implementation flaws that occur in strange message flows, which is our
goal.
......
......@@ -21,8 +21,9 @@ We have adapted the setting of timing parameters to each implementation.
\end{figure*}
OpenSSH was learned using a full alphabet, whereas DropBear and BitVise were learned using a restricted alphabet (as defined in Subsection~\ref{subsec:alphabet}).
The main reason for using a restricted alphabet reduce learning times. Based on the model learned for OpenSSH (the first implementation analyzed) and the specification, we excluded
inputs that seemed unlikely to produce state change (like \textsc{debug} or \textsc{unimpl}). We also excluded inputs that proved costly time-wise (like \textsc{disconnect}), yet were not were not needed to visit all states in our happy flow. We excluded, for example, the user/password based authentication inputs (\textsc{ua\_pw\_ok} and
The reason for using a restricted alphabet was to reduce learning times. Based on the model learned for OpenSSH (the first implementation analyzed) and the specification, we excluded
inputs that seemed unlikely to produce state change (such as
\textsc{debug} or \textsc{unimpl}). We also excluded inputs that proved costly time-wise (such as \textsc{disconnect}) but were not were not needed to visit all states in the happy flow. We excluded, for example, the user/password based authentication inputs (\textsc{ua\_pw\_ok} and
\textsc{ua\_pw\_nok}) as they would take the system 2-3 seconds to respond to. By contrast, public key authentication resulted in quick responses.
%The \textsc{disconnect} input presented similar
......@@ -45,8 +46,8 @@ can be set to higher values. We executed a random test suite with {\dk} of 4 com
%To that end, we built a java adapter that would automatically run the model checker on the hypothesis, and transform any counterexamples into tests. This proved essential in learning DropBear, as the last counterexample was generated by the model checker.
Table~\ref{tab:experiments} describes the exact versions of the systems analyzed together with statistics on learning and testing, namely:
(1) the number of states in the learned model, (2) the number of hypotheses built during the learning process and (3) the total number of learning and test queries run. For test queries, we only consider those run on the last hypothesis. All learned models plus the specifications checked can be found at \url{https://gitlab.science.ru.nl/pfiteraubrostean/Learning-SSH-Paper/tree/master/models}. The statistics
Table~\ref{tab:experiments} describes the exact versions of the systems analyzed together with statistics on learning and testing:
(1) the number of states in the learned model, (2) the number of hypotheses built during the learning process and (3) the total number of learning and test queries run. For test queries, we only consider those run on the last hypothesis. All learned models and the properties checked are at \url{https://gitlab.science.ru.nl/pfiteraubrostean/Learning-SSH-Paper/tree/master/models}. The statistics
give a glimpse into the issue of scalability. Assuming each input took 0.5 seconds to process, and an average query length of 10, to perform 40000 queries would have taken roughly 55 hours. This is consistent with the time experiments took, which span several days. The long duration compelled us to resort to restricted alphabets, which lead to reduction in the number of queries needed. Our work could have benefited from parallel execution.
%BitVise: MemQ: 24996 TestQ: 58423
%Dropbear: MemQ: 3561 TestQ: 30629
......
......@@ -100,7 +100,7 @@ convey its own parameter preferences before key exchange can proceed. Also inclu
\label{trans-alphabet}
\end{table}
The Authentication layer defines one single client message type in the form of the authentication request~\cite[p. 4]{rfc4252}. Its parameters contain all information needed for authentication. Four authentication methods exist: none, password, public key and host-based. Our mapper supports all methods except the host-based authentication because some SUTs don't support this feature. Both the public key and password methods have \textsc{ok} and \textsc{nok} variants, which provide respectively correct and incorrect credentials. Our restricted alphabet supports only public key authentication, as the implementations processed this faster than the other authentication methods.
The Authentication layer defines a single client message type for the authentication requests~\cite[p. 4]{rfc4252}. Its parameters contain all information needed for authentication. Four authentication methods exist: none, password, public key and host-based. Our mapper supports all methods except host-based authentication because some SUTs don't support this feature. Both the public key and password methods have \textsc{ok} and \textsc{nok} variants, which provide respectively correct and incorrect credentials. Our restricted alphabet supports only public key authentication, as the implementations processed this faster than the other authentication methods.
\begin{table}[!ht]
\centering
......@@ -118,9 +118,9 @@ The Authentication layer defines one single client message type in the form of t
\label{auth-alphabet}
\end{table}
The Connection layer allows the client to manage channels and to request/run services over them. In accordance with our learning goal,
The Connection layer allows clients to manage channels and request services over them. In accordance with our learning goal,
our mapper only supports inputs for requesting terminal emulation, plus inputs for channel management as shown in Table~\ref{conn-alphabet}.
The restricted alphabet only supports the most general channel management inputs. Those excluded are not expected to produce state change.
The restricted alphabet only supports the most general channel management inputs, and excludes those not expected to produce state change.
\begin{table}[!ht]
......@@ -141,7 +141,7 @@ The restricted alphabet only supports the most general channel management inputs
\label{conn-alphabet}
\end{table}
\emph{The output alphabet} subsumes all messages an SSH server generates, which may include, with identical meaning, any of the messages defined as inputs. They also include responses to various requests: \textsc{kex31}~\cite[p. 21]{rfc4253} as reply to \textsc{kex30}, \textsc{sr\_succes} in response to service requests (\textsc{sr\_auth} and \textsc{sr\_conn}), \textsc{ua\_success} and \textsc{ua\_failure}~\cite[p. 5,6]{rfc4252} in response to authentication requests, and \textsc{ch\_open\_success}~\cite[p. 6]{rfc4254} and \textsc{ch\_success}~\cite[p. 10]{rfc4254} , in positive response to \textsc{ch\_open} and \textsc{ch\_request\_pty} respectively. To these outputs, we add \textsc{no\_resp} for when the {\dsut} generates no output, and the special outputs \textsc{ch\_none}, \textsc{ch\_max} and \textsc{no\_conn}, and \textsc{buffered}, which we discuss in the next Subsections.
\emph{The output alphabet} includes all messages an SSH server generates, which may include, with identical meaning, any of the messages defined as inputs. This also includes responses to various requests: \textsc{kex31}~\cite[p. 21]{rfc4253} as reply to \textsc{kex30}, \textsc{sr\_succes} in response to service requests (\textsc{sr\_auth} and \textsc{sr\_conn}), \textsc{ua\_success} and \textsc{ua\_failure}~\cite[p. 5,6]{rfc4252} in response to authentication requests, and \textsc{ch\_open\_success}~\cite[p. 6]{rfc4254} and \textsc{ch\_success}~\cite[p. 10]{rfc4254} , in positive response to \textsc{ch\_open} and \textsc{ch\_request\_pty} respectively. To these outputs, we add \textsc{no\_resp} for when the {\dsut} generates no output, and the special outputs \textsc{ch\_none}, \textsc{ch\_max} and \textsc{no\_conn}, and \textsc{buffered}, which we discuss in the next subsections.
%The learning alphabet comprises of input/output messages by which the {\dlearner} interfaces with the {\dmapper}. Section~\ref{sec:ssh} outlines essential inputs, while Table X provides a summary
%of all messages available at each layer. \textit{\textit{}}
......@@ -160,7 +160,9 @@ is the \textsc{no\_resp} message.
The sheer complexity of the {\dmapper} meant that it was easier to
adapt an existing SSH implementation, rather than construct the
{\dmapper} from scratch. Paramiko already provides mechanisms for
{\dmapper} from scratch.
After all, in many ways the {\dmapper} acts similar to an SSH client.
Paramiko already provides mechanisms for
encryption/decryption, as well as routines for constructing and
sending the different types of packets, and for receiving them. These
routines are called by control logic dictated by Paramiko's own state
......@@ -186,7 +188,7 @@ negotiated earlier in place of the older ones, if such existed.
The {\dmapper} also contains a buffer for storing opened channels, which is initially empty.
On a \textsc{ch\_open} from the learner, the {\dmapper} adds a channel to the buffer
with a randomly generated channel identifier, on a \textsc{ch\_close}, it removes the channel
with a randomly generated channel identifier; on a \textsc{ch\_close}, it removes the channel
(if there was any). The buffer size, or the maximum number of opened channels, is limited to one. Initially the buffer is empty. The {\dmapper} also stores the sequence number of the last received message from the {\dsut}. This number is then used when constructing \textsc{unimpl} inputs.
In the following cases, inputs are answered by the {\dmapper} directly
......@@ -204,8 +206,6 @@ responds with a \textsc{no\_conn} message, as sending further messages to the {\
% messages to the {\dsut} is pointless in that case;
%\end{enumerate}
%
In many ways, the {\dmapper} acts similar to an SSH client, hence the
decision to built it by adapting an existing client implementation.
\subsection{Practical complications}
......@@ -278,7 +278,7 @@ complete are all these messages processed. This leads to a
\textsc{newkeys} response (indicating rekeying has completed),
directly followed by all the responses to the buffered requests. This
would lead to non-termination of the learning algorithm, as for every
sequence of buffered messages the response is different. To
sequence of buffered messages the response differs. To
prevent this, we treat the sequence of queued responses as the single
output \textsc{buffered}.
......
......@@ -92,8 +92,8 @@ tabsize=2
\begin{abstract}
We apply model learning on three SSH implementations to infer state machine models, and then use model checking
to verify that these models satisfy basic security properties and conform to the RFCs. Our analysis showed that
all tested SSH server models satisfy the stated security properties.
However, our analysis uncovered several violations of the standard.
all tested SSH server models satisfy the stated security properties,
but uncovered several violations of the standard.
%Frits: I would say the fingerprinting is a detail, standard violations much more important.
%The state machines of the implementations differ significantly, allowing them to be
%effectively fingerprinted.
......
This diff is collapsed.
......@@ -251,7 +251,6 @@ machine learning algorithms},
abstract = {The secure shell ({SSH}) protocol is one of the most popular cryptographic protocols on the Internet. Unfortunately, the current {SSH} authenticated encryption mechanism is insecure. In this paper, we propose several fixes to the {SSH} protocol and, using techniques from modern cryptography, we prove that our modified versions of {SSH} meet strong new chosen-ciphertext privacy and integrity requirements. Furthermore, our proposed fixes will require relatively little modification to the {SSH} protocol and to {SSH} implementations. We believe that our new notions of privacy and integrity for encryption schemes with stateful decryption algorithms will be of independent interest.},
author = {Bellare, M. and Kohno, T. and Namprempre, C.},
journal = {ACM Trans. Inf. Syst. Secur.},
month = may,
number = {2},
pages = {206--241},
publisher = {ACM},
......@@ -357,7 +356,7 @@ machine learning algorithms},
author = {Aarts, F. and Ruiter, J. {de} and Poll, E.},
booktitle = {Software Testing, Verification and Validation Workshops (ICSTW)},
pages = {461--468},
publisher = {IEEE CS},
publisher = {IEEE},
title = {Formal Models of Bank Cards for Free},
year = {2013}
}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment