Commit 028a7869 authored by Paul Fiterau Brostean's avatar Paul Fiterau Brostean

Merge branch 'master' of gitlab.science.ru.nl:pfiteraubrostean/Learning-SSH-Paper

parents 88acfa77 c13141c8
......@@ -3,7 +3,7 @@
We have combined model learning with abstraction techniques to infer models of the OpenSSH, Bitvise and DropBear SSH server implementations. We have also
formalized several security and functional properties drawn from the SSH RFC specifications. We have verified these
properties on the learned models using model checking and have uncovered several minor standard violations.
The security critical properties were met by all implementations.
The security-critical properties were met by all implementations.
Abstraction was provided by a {\dmapper} component placed between the
{\dlearner} and the {\dsut}. The {\dmapper} was constructed from an
......@@ -19,15 +19,15 @@ We encountered several challenges. Firstly, building a {\dmapper} presented a co
SSH implementation. Secondly, because we used classical learning algorithms, we had to ensure that the abstracted implementation behaved
like a (deterministic) Mealy Machine. Here time-induced non-determinism was difficult to eliminate. Buffering also presented problems,
leading to a considerable increase in the number of states. Moreover, the systems analyzed were relatively slow, which meant learning took
several days\marginpar{\tiny Erik: For a single server, right??}. This was compounded by the size of the learning alphabet, and it forced us into using a reduced alphabet for two of the analyzed implementations.
several days. This was compounded by the size of the learning alphabet, and it forced us into using a reduced alphabet for two of the implementations.
Limitations of the work, hence possibilities for future work, are several. First of all, the {\dmapper} was not formalized, unlike in~\cite{TCP2016}, thus we did not
produce a concretization of the abstract models. Consequently, model checking results cannot be fully transferred to the actual implementations. Formal definition
of the mapper and concretization of the learned models (as defined in \cite{AJUV15}) would tackle this. The {\dmapper} also caused considerable redundancy in the learned models, re-tweaking the abstractions used, in particular those for managing channels, could alleviate this problem while also improving learning times. This in turn would facilitate learning using expanded alphabets instead of resorting to restricted alphabets.
of the mapper and concretization of the learned models (as defined in \cite{AJUV15}) would tackle this. The {\dmapper} also caused considerable redundancy in the learned models; tweaking the abstractions used, in particular those for managing channels, could alleviate this problem while also improving learning times. This in turn would facilitate learning using expanded alphabets instead of resorting to restricted alphabets.
Furthermore, the {\dmapper} abstraction could be refined, to give more
insight into the implementations. In particular, parameters,
such as the session identifier or data sent over channels, could be extracted from the {\dmapper} and potentially handled by existing Register Automata learners\cite{ralib2015,tomte2015}. These learners
can infer systems with parameterized alphabets, state variables and simple operations on data. Finally, we had to eliminate any timing-related behavior, as it could not be handled by the classical learners used. To that end, preliminary work on learning timed automata\cite{GrinchteinJL10} could be leveraged.
can infer systems with parameterized alphabets, state variables and simple operations on data. Finally, we ignored all timing-related behavior, as it could not be handled by the classical learners used; there is preliminary work on learning timed automata\cite{GrinchteinJL10} which could use timing behavior.
Despite these limitations, our work provides a compelling application of learning and model checking in a security setting, on a widely used protocol. We hope this lays
......
......@@ -14,16 +14,17 @@ rather than cryptographic flaws. Such logical flaws have occurred in
implementations of other security protocols, notably TLS, with Apple's
'goto fail' bug and the FREAK attack \cite{Beurdouche:2017}. For this
we use model learning (a.k.a.\ active automata learning)
\cite{Angluin1987Learning,Vaa17} to infer state machines of three SSH
\cite{Angluin1987Learning,PeVaYa02,Vaa17} to infer state machines of three SSH
implementations, which we then analyze by model checking for
conformance to both functional and security properties.
The properties we verify for the inferred state machines are based on
the RFCs that specify SSH \cite{rfc4251,rfc4252,rfc4253,rfc4254}.
These properties are formalized in LTL and verified using
NuSMV~\cite{NuSMV2}. Manually verifying these properties would be
difficult, as the learned models are complex. Moreover, formalizing
properties means we can also better assess and overcome vagueness or
NuSMV~\cite{NuSMV2}.
We use a model checker since the models are too complex for manual inspection (they are trivial for NuSMV).
Moreover, by formalizing the
properties we can better assess and overcome vagueness or
under-specification in the RFC standards.
Model learning obtains a state machine model of a black-box system by
......@@ -55,7 +56,7 @@ options. Our analysis only considers the protocol logic. However,
their rules were tied to routines in the code, so had to be slightly adapted
to fit the different implementations. In contrast, our properties are
defined at an abstract level so do not need such tailoring. Moreover,
our black box approach approach means we can analyze any implementation
our black box approach means we can analyze any implementation
of SSH, not just C implementations.
......@@ -67,7 +68,7 @@ testing of SSH implementations \cite{Boss2012}. All this research
only considered the SSH transport layer, and not the other SSH
protocol layers.
This paper is borne out of 2 recent theses \cite{Verleg2016,Toon2016}
This paper is born out of two recent theses \cite{Verleg2016,Toon2016},
and is to our knowledge the first combined application of model
learning and model checking in verifying SSH implementations, or more
generally, implementations of any network security protocol.
......
This diff is collapsed.
......@@ -19,7 +19,6 @@ The {\dlearner} uses LearnLib ~\cite{LearnLib2009}, a Java library implementing
SSH is a complex client-server protocol. In our work so far we concentrated on learning models of the implementation of the server, and not of the client.
We further restrict learning to only exploring the terminal service of the Connection layer, as we consider it to be the most interesting
from a security perspective. Algorithms for encryption, compression and hashing are left to default settings and are not purposefully explored. Also, the starting
state of the {\dsut} is one where a TCP connection has already been established and where SSH versions have been exchanged, which are prerequisites for starting the Transport layer protocol.
%figure
......@@ -40,11 +39,11 @@ intended to ever be sent to the server\footnote{This means we exclude
the messages \textsc{service\_accept}, \textsc{ua\_accept},
\textsc{ua\_failure}, \textsc{ua\_banner}, \textsc{ua\_pk\_ok},
\textsc{ua\_pw\_changereq}, \textsc{ch\_success} and
\textsc{ch\_failure} from our alphabet.} Furthermore, from the
\textsc{ch\_failure} from our alphabet.}. Furthermore, from the
Connection layer we only use messages for channel management and the
terminal functionality. Finally, because we will only explore
protocol behavior after SSH versions have been exchanged, we exclude
these messages for exchanging version numbers.
the messages for exchanging version numbers.
%\marginpar{\tiny Erik: I
%rephrased all this to make it simpler. Is it still ok?}
......@@ -57,7 +56,7 @@ models while only marginally limiting explored behavior. We discuss
this again in Section~\ref{sec:result}. Inputs included in the
restricted alphabet are marked by '*' in the tables below.
Table~\ref{trans-alphabet} introduces the Transport layer inputs. We include a version of the \textsc{kexinit} message with \texttt{first\_kex\_packet\_follows} disabled.
Table~\ref{trans-alphabet} lists the Transport layer inputs. We include a version of the \textsc{kexinit} message with \texttt{first\_kex\_packet\_follows} disabled.
This means no guess~\cite[p. 17]{rfc4253} is attempted on the {\dsut}'s parameter preferences. Consequently, the {\dsut} will have to send its own \textsc{kexinit} in order to
convey its own parameter preferences before key exchange can proceed. Also included are inputs for establishing new keys (\textsc{kex30}, \textsc{newkeys}), disconnecting (\textsc{disconnect}), as well as the special inputs \textsc{ignore}, \textsc{unimpl} and \textsc{debug}. The latter are not interesting, as they are normally ignored by implementations. Hence they are excluded from our restricted alphabet. \textsc{disconnect} proved costly time wise, so was also excluded.
%We include two versions of the \textsc{kexinit} message, one where \texttt{first\_kex\_packet\_follows} is disabled, the other when it is enabled, in which case, the message would make a guess on the security parameters~\cite[p. 17]{rfc4253}. Our mapper can only handle correct key guesses, so the wrong-guess procedure as described in ~\cite[p. 19]{rfc4253} was not supported.
......@@ -84,7 +83,7 @@ convey its own parameter preferences before key exchange can proceed. Also inclu
\label{trans-alphabet}
\end{table}
The Authentication layer defines one single client message type in the form of the authentication request~\cite[p. 4]{rfc4252}. Its parameters contain all information needed for authentication. Four authentication methods exist: none, password, public key and host-based. Our mapper supports all methods except the host-based authentication because various SUTs lack support for this feature. Both the public key and password methods have \textsc{ok} and \textsc{nok} variants, which provide respectively correct and incorrect credentials. Our restricted alphabet supports only public key authentication, as it was processed the fastest by implementations out of all authentication forms.
The Authentication layer defines one single client message type in the form of the authentication request~\cite[p. 4]{rfc4252}. Its parameters contain all information needed for authentication. Four authentication methods exist: none, password, public key and host-based. Our mapper supports all methods except the host-based authentication because some SUTs lack support for this feature. Both the public key and password methods have \textsc{ok} and \textsc{nok} variants, which provide respectively correct and incorrect credentials. Our restricted alphabet supports only public key authentication, as the implementations processed this faster than the other authentication methods.
\begin{table}[!ht]
\centering
......@@ -189,9 +188,9 @@ instead of being sent to the {\dsut} to find out its response:
responds with a \textsc{no\_conn} message, as sending further
messages to the {\dsut} is pointless in that case;
\end{enumerate}
Overall, we notice that in many ways, the {\dmapper} acts similarly to an SSH client. Hence it is unsurprising that it was built off an existing
implementation.
%
In many ways, the {\dmapper} acts similar to an SSH client, hence the
decision to built it by adapting an existing client implementation.
\subsection{Practical complications}
......
......@@ -89,11 +89,13 @@ tabsize=2
\begin{abstract}
We apply model learning on three SSH implementations to infer state machine models, which
are then verified by a model checker for functional and security properties. Our analysis showed that
all tested SSH server models satisfy the security properties. Nevertheless, we uncovered several minor standard violations.
The state machines of the implementations differ significantly, allowing them to be
effectively fingerprinted.
We apply model learning on three SSH implementations to infer state machine models, and then use model checking
to verify that these models satisfy basic security properties and conform to the RFCs. Our analysis showed that
all tested SSH server models satisfy the stated security properties.
However, our analysis uncovered several violations of the standard.
%Frits: I would say the fingerprinting is a detail, standard violations much more important.
%The state machines of the implementations differ significantly, allowing them to be
%effectively fingerprinted.
%Various shortcomings with regards to the RFCs were found. Opening multiple channels is not properly implemented on CiscoSSH and PowerShell. OpenSSH contains a bug which can result in connection closure after rekeying in some circumstances. Both Tectia and OpenSSH implement a liberal message acceptance policy in the first phase of the protocol. Such a liberal policy is unwise in this error-prone stage.
......@@ -145,7 +147,7 @@ effectively fingerprinted.
% We no longer use \terms command
%\terms{Theory}
\keywords{ACM proceedings, \LaTeX, text tagging}
\keywords{Model learning, model checking, SSH protocol}
\maketitle
......
......@@ -24,7 +24,7 @@
}
@misc{SSHResults,
title = {Learned SSH Models},
title = {Learned {SSH} Models},
year = {2017},
publisher = {GitLab},
journal = {GitLab repository},
......@@ -222,7 +222,8 @@ machine learning algorithms},
shorttitle = {{RALib}},
urldate = {2016-09-05},
booktitle = {{DIFTS}},
author = {Cassel, Sofia and Howar, Falk and Jonsson, Bengt},
author = {Cassel, S. and Howar, F. and Jonsson, B.},
url = {http://www.faculty.ece.vt.edu/chaowang/difts2015/papers/paper_5.pdf},
year = {2015},
keywords = {learning, ralib, register-automata},
}
......@@ -574,3 +575,13 @@ machine learning algorithms},
}
@ARTICLE{PeVaYa02,
AUTHOR = "Peled, D. and Vardi, M.Y. and Yannakakis, M.",
TITLE = "{Black Box Checking}",
JOURNAL = "Journal of Automata, Languages, and Combinatorics",
YEAR = "2002",
VOLUME = "7",
NUMBER = "2",
PAGES = "225--246"
}
......@@ -9,32 +9,33 @@ SSHv2 follows a client-server paradigm. The protocol consists of three layers (F
\item The \textit{connection protocol} (RFC 4254~\cite{rfc4254}) allows the encrypted channel to be multiplexed in different channels. These channels enable a user to run multiple applications, such as terminal emulation or file transfer, over a single SSH connection.
\end{enumerate}
Each layer has its own specific messages. The SSH protocol is interesting in that outer layers do not encapsulate inner layers. This means that different layers can interact. Hence, it makes sense to analyze SSH as a whole, instead of analyzing its constituent layers independently. We review each layer, outlining the relevant messages which are later used in learning, and characterising the so-called \textit{happy flow} that a normal protocol
\begin{figure}[t] % was !hb
\centering
\includegraphics[scale=0.35]{SSH_protocols.png}
\caption{SSH protocol layers}
\label{fig:sshcomponents}
\end{figure}
Each layer has its own specific messages. The SSH protocol is interesting in that outer layers do not encapsulate inner layers. This means that different layers can interact. Hence, it makes sense to analyze SSH as a whole, instead of analyzing its constituent layers independently. Below we discuss each layer, outlining the relevant messages which are later used in learning, and characterising the so-called \textit{happy flow} that a normal protocol
run follows.
At a high level, a typical SSH protocol run uses the three constituent protocols in the order given above: after the client establishes a TCP connection with the server, (1) the two sides use the transport layer protocol to negotiate key exchange and encryption algorithms, and use these to establish session keys, which are then used to secure further communication; (2) the client uses the user authentication protocol to authenticate to the server; (3) the client uses the connection protocol to access services on the server, for example the terminal service.
%Different layers are identified by their message numbers. These message numbers will form the basis of the state fuzzing. The SSH protocol is especially interesting because outer layers do not encapsulate inner layers. This means that different layers can interact. One could argue that this is a less systematic approach, in which a programmer is more likely to make state machine-related errors.
\begin{figure}[!hb]
\centering
\includegraphics[scale=0.35]{SSH_protocols.png}
\caption{SSH protocol components running on a TCP/IP stack.}
\label{fig:sshcomponents}
\end{figure}
\subsection{Transport layer}\label{ssh-run-trans}
SSH runs over TCP, and provides end-to-end confidentiality and integrity using session keys. Once a TCP connection has been established with the server, these session keys are securely negotiated using a \textsl{key exchange} algorithm, the first step of the protocol. The key exchange begins by the two sides exchanging their preferences for the key exchange algorithm to be used, as well as encryption, compression and hashing algorithms. Preferences are sent with a \textsc{kexinit} message.
%TODO How is the algorithm picked?
Subsequently, key exchange using the negotiated algorithm takes place. Following this algorithm, one-time session keys for encryption and hashing are generated by each side, together with an identifier for the session. The main key exchange algorithm is Diffie-Hellman, which is also the only one required by the RFC. For the Diffie-Hellman scheme, \textsc{kex30} and \textsc{kex31} are exchanged to establish fresh session keys. These keys are used from the moment the \textsc{newkeys} command has been issued by both parties. A subsequent \textsc{sr\_auth} requests the authentication service. The happy flow thus consists of the succession of the three steps comprising key exchange, followed up by a successful authentication service request. The sequence is shown in Figure~\ref{fig:hf-trans}.
\begin{figure}[!hb]
\begin{figure}[!hb]
\includegraphics[scale=0.285]{hf-trans.pdf}
\caption{The happy flow for the transport layer.}
\label{fig:hf-trans}
\end{figure}
\textsl{Key re-exchange}~\cite[p. 23]{rfc4253}, or \textsl{rekeying}, is a near identical process, with the difference being that instead of taking place at the beginning, it takes place once session keys are already in place. The purpose is to renew session keys so as to foil potential replay attacks~\cite[p. 17]{rfc4251}. It follows the same steps as key exchange. Messages exchanged as part of it are encrypted using the old set of keys, messages exchanged afterwards are encrypted using the new keys. A fundamental property of rekeying is that it should preserve the state; that is, after the rekeying procedure is completed, the protocol should be in the same state as
\textsl{Key re-exchange}~\cite[p. 23]{rfc4253}, or \textsl{rekeying}, is an almost identical process, the difference being that instead of taking place at the beginning, it takes place once session keys are already in place. The purpose is to renew session keys so as to foil potential replay attacks~\cite[p. 17]{rfc4251}. It follows the same steps as key exchange. A fundamental property of rekeying is that it should preserve the state; that is, after the rekeying procedure is completed, the protocol should be in the same state as
it was before the rekeying started, with as only difference that new keys are now in use. %Some implementations are known not support rekeying in certain states of the protocol.
%We consider an transport layer state machine secure if there is no path from the initial state to the point where the authentication service is invoked without exchanging and employing cryptographic keys.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment