Commit ca56c4a7 authored by Paul Fiterau Brostean's avatar Paul Fiterau Brostean

Some more files/writing

parent 0621a994
......@@ -2,12 +2,12 @@
The SSH protocol is used interact securely with remote machines. Alongside TLS and IPSec, SSH is amongst the most frequently used security suites~\cite{Albrecht2009Plaintext}. Due to its significant user base and sensitive nature, flaws in the protocol or its implementation could have major impact. It therefore comes as no surprise that SSH has attracted scrutiny from the security community. Problems with information leakage via unencrypted MACs were identified~\cite{Bellare2004Breaking}, as well as possibilities for plain text recovery when a block cipher in cipher block chaining mode was used~\cite{Albrecht2009Plaintext}. Other analysis found no serious issues~\cite{Williams2011Analysis}\cite{Paterson2010PlaintextDependent}. Academic researchers have so far focused more on the theoretical aspects than on implementations of the protocol.
In this work, we use classical active automata learning, which we refer to as model learning, to infer state machines of 3 SSH implementations. Model learning has previously been applied to infer state machines of EMV bank cards~\cite{Aarts2013Formal}, electronic passports~\cite{Aarts2010Inference} and hand-held readers for online banking~\cite{Chalupar2014Automated}. More recently, it was used to learn implementations of TCP~\cite{Janssen2015Learning} and TLS~\cite{RuiterProtocol}. Model learning's goal is obtaining a state model of a black-box system by providing inputs and observing outputs. The learned state model corresponds to the observed behavior, and can be used in system analysis. Since model learning builds from a finite number of observations, we can never be sure that the learned model is correct. To that end, advanced conformance algorithms are employed, which yield some confidence that the system inferred is in fact correct. In the context of testing protocols, model learning can be seen as a form of protocol state fuzzing, whereby unexpected inputs are sent to a system under test in the hope of exposing hidden anomalies. In model learning, inputs are sent with no regard to the order imposed by the protocol.
In this work, we use classical active automata learning, which we refer to as model learning, to infer state machines of 2 SSH implementations. Model learning has previously been applied to infer state machines of EMV bank cards~\cite{Aarts2013Formal}, electronic passports~\cite{Aarts2010Inference} and hand-held readers for online banking~\cite{Chalupar2014Automated}. More recently, it was used to learn implementations of TCP~\cite{Janssen2015Learning} and TLS~\cite{RuiterProtocol}. Model learning's goal is obtaining a state model of a black-box system by providing inputs and observing outputs. The learned state model corresponds to the observed behavior, and can be used in system analysis. Since model learning builds from a finite number of observations, we can never be sure that the learned model is correct. To that end, advanced conformance algorithms are employed, which yield some confidence that the system inferred is in fact correct. In the context of testing protocols, model learning can be seen as a form of protocol state fuzzing, whereby unexpected inputs are sent to a system under test in the hope of exposing hidden anomalies. In model learning, inputs are sent with no regard to the order imposed by the protocol.
Inputs can therefore be sent disorderly. Any anomalies are then exposed in the learned model.
Having obtained models, we use model checkers to automatically verify their conformance to security properties. Such verification would be difficult manually, as the learned models are large. Moreover, the very satisfaction of properties can be used to refine the learned model. In cases where the learned model does not satisfy a security property, the proving sequence of inputs can either signal non-conformance of the system, or can be used to further refine the learned model.
Our work most closely resembles work on TCP~\cite{Janssen2015Learning}, where they also combine classical learning and model checking to analyze
Our work is borne out of 3 recent theses. It most closely resembles work on TCP~\cite{Janssen2015Learning}, where they also combine classical learning and model checking to analyze
various TCP implementations, with our work being more focused on security properties. Other case studies rely on
manual analysis of the learned models ~\cite{Aarts2013Formal},\cite{Chalupar2014Automated}, \cite{RuiterProtocol}. Our work differs,
since we use a model checker to automatically check specifications. Moreover, the model checker is also used as means of testing the learned model.
......
......@@ -10,7 +10,8 @@ The learning setup consists of three components: a \emph{learner}, \emph{mapper}
The learner uses LearnLib ~\cite{LearnLib2009}, a Java library implementing $L^{\ast}$ based algorithms for learning Mealy machines. The {\dmapper} is based on Paramiko, an open source SSH implementation written in Python\footnote{Paramiko is available at \url{http://www.paramiko.org/}}. We opted for Paramiko because its code is relatively well structured and documented. The {\dsut} can be any existing implementation of a SSH server. The {\dlearner} communicates with the {\dmapper} over sockets. A graphical representation of our setup is shown in Figure~\ref{fig:components}.
\begin{figure}
\noindent\makebox[\textwidth]{\includegraphics[width=1.1\textwidth]{example-components.pdf}}
\centering
\includegraphics[scale=0.35]{example-components.pdf}
\caption{The setup consists of a learner, mapper and SUT.}
\label{fig:components}
\end{figure}
......@@ -18,13 +19,14 @@ The learner uses LearnLib ~\cite{LearnLib2009}, a Java library implementing $L^{
%figure
\subsection{The learning alphabet}
Learning time tends to grow rapidly as the input alphabet grows. It is therefore important to focus on messages for which interesting state-changing behaviour can be expected. As a general principle, we therefore chose not to query protocol messages that are not intended to be sent from a client to a server\footnote{Applying this principle to the RFC's messages results in not including \textsc{service\_accept}, \textsc{ua\_accept}, \textsc{ua\_failure}, \textsc{ua\_banner}, \textsc{ua\_pk\_ok}, \textsc{ua\_pw\_changereq}, \textsc{ch\_success} and \textsc{ch\_failure} in our alphabet.}.
Learning time grows rapidly as the input alphabet grows. It is therefore important to focus on messages for which interesting state-changing behaviour can be expected. As a general principle, we therefore chose not to query protocol messages that are not intended to be sent from a client to a server\footnote{Applying this principle to the RFC's messages results in not including \textsc{service\_accept}, \textsc{ua\_accept}, \textsc{ua\_failure}, \textsc{ua\_banner}, \textsc{ua\_pk\_ok}, \textsc{ua\_pw\_changereq}, \textsc{ch\_success} and \textsc{ch\_failure} in our alphabet.}.
Applying this ``outgoing only'' principle to the transport layer results in the messages of Table~\ref{trans-alphabet}\footnote{A mapping to the official RFC names is provided in Appendix~\ref{appendixa}.}. The only message that is out of the ordinary is \textsc{guessinit}. This is a special instance of \textsc{kexinit} for which \texttt{first\_kex\_packet\_follows} is enabled~\cite[p. 17]{rfc4253}. Our mapper can only handle correct key guesses, so the wrong-guess procedure as described in ~\cite[p. 19]{rfc4253} was not supported. When needed, SUTs were configured to make this guess work by altering their cipher preferences. The SSH version and comment string (described in Section~\ref{ssh-run-trans}) was not queried because it does not follow the binary packet protocol.
Applying this ``outgoing only'' principle to the transport layer results in the messages of Table~\ref{trans-alphabet}. This is a special instance of \textsc{kexinit} for which \texttt{first\_kex\_packet\_follows} is enabled~\cite[p. 17]{rfc4253}. Our mapper can only handle correct key guesses, so the wrong-guess procedure as described in ~\cite[p. 19]{rfc4253} was not supported. When needed, SUTs were configured to make this guess work by altering their cipher preferences. The SSH version and comment string (described in Section~\ref{ssh-run-trans}) was not queried because it does not follow the binary packet protocol.
\begin{table}[!ht]
\centering
\small
\begin{tabular}{ll}
\textbf{Message} & \textbf{Description} \\
\textsc{discon} & Terminates the current connection~\cite[p. 23]{rfc4253} \\
......@@ -42,10 +44,13 @@ Applying this ``outgoing only'' principle to the transport layer results in the
\label{trans-alphabet}
\end{table}
For the user authentication layer, applying our ``outgoing only'' principle results in just one message: the authentication request~\cite[p. 4]{rfc4252}. Its parameters contain all information needed for authentication. As stated in Section~\ref{ssh-run-auth}, four authentication methods exist: none, password, public key and host-based. Our mapper supports all methods except the host-based authentication because various SUTs lack support for this feature. As shown in Table~\ref{auth-alphabet}, both the public key as well as the password method have a \textsc{ok} and \textsc{nok} variant which provides respectively correct and incorrect credentials.
For the user authentication layer, applying our ``outgoing only'' principle results in just one message: the authentication request~\cite[p. 4]{rfc4252}. Its parameters contain all information needed for authentication. Four authentication methods exist: none, password, public key and host-based. Our mapper supports all methods except the host-based authentication because various SUTs lack support for this feature. For all other types, we define message
As shown in Table~\ref{auth-alphabet}, both the public key as well as the password method have a \textsc{ok} and \textsc{nok} variant which provides respectively correct and incorrect credentials.
\begin{table}[!ht]
\centering
\small
\begin{tabular}{ll}
\textbf{Message} & \textbf{Description} \\
\textsc{ua\_none} & Authenticates with the ``none'' method~\cite[p. 7]{rfc4252} \\
......@@ -62,6 +67,7 @@ The connection protocol allows the client to request different processes over a
\begin{table}[!ht]
\centering
\small
\begin{tabular}{ll}
\textbf{Message} & \textbf{Description} \\
\textsc{ch\_open} & Opens a new channel~\cite[p. 5]{rfc4254} \\
......@@ -84,21 +90,7 @@ The connection protocol allows the client to request different processes over a
The {\dmapper} must provide translation between abstract message representations and well-formed SSH messages. A special case is for when no output is received from the {\dsut}, in which case the {\dmapper} gives back to the learner a {\dtimeout} message, concluding a timeout occurred. The sheer complexity of the {\dmapper}, meant that it was easier to adapt an existing SSH implementation, rather than construct the {\dmapper} from scratch. Paramiko had already provided routines for sending the different types of packets, and for receiving them. These routines were called by control logic dictated by Paramiko's own state machine. The {\dmapper} was constructed by replacing this control logic with one dictated by messages received from the {\dlearner}. %over a socket connection
The {\dmapper} updates its state with information from concrete messages. This enables it to craft concrete messages corresponding to the learner abstractions.
Table~\{table:state} shows how the mapper is updated.
\begin{table}[!ht]
\centering
\begin{tabular}{ll}
\textbf{Message} & \textbf{Influence on mapper state} \\
\textsc{kexinit} & Saves SUT's parameter preferences\tablefootnote{The parameters that must be supported according to the RFCs to ensure interoperability are used if no \textsc{kexinit} has been received.}. \\
\textsc{kex31} & Saves the exchange hash resulting from key exchange. \\
\textsc{newkeys} & Takes in use new keys for all outgoing messages\tablefootnote{Silently ignored when key exchange has not yet been completed.}. \\
\textsc{ch\_accept} & Saves the channel identifier, which is used in implementing later channel related inputs. \tablefootnote{Zero is used if no \textsc{ch\_accept} has been received.}. \\
\textit{any} & Saves the sequence number, used for implementing the \textsc{unimpl} input\tablefootnote{Zero is used if no message has been received.}. \\
\end{tabular}
\caption{State-changing responses implemented by the mapper. These combinedly result in the mapper's state.}
\label{table:state}
\end{table}
Table~\ref{table:state} shows how the mapper is updated.
%As and thus to interact with the {\dsut}, at times similar to how an SSH client would.
......@@ -122,4 +114,6 @@ Aside from non-determinism, SSH implementations can also produce a sequence of o
concatenates all outputs into one, delivering a single output to the {\dlearner}.
Another challenge is presented by buffering, leading to an explosion of states, as buffers cannot be described succintly by Mealy Machines.
We have encountered buffers in two occasions. Firstly, some implementations buffer certain responses when in key re-exchange. As soon as rekeying completes, these queued messages are released all at once. This leads to a \textsc{newkeys} message (indicating rekeying has completed), directly followed by all buffered messages. Buffer behaviour can also be observed when opening and closing channels, since a SUT can close only as many channels as have previously been opened. For this reason, we restricted the number of simultaneously open channels to one. The {\dmapper} returns a custom response \textsc{ch\_max} to a \textsc{ch\_open} message whenever this limit is reached.
\ No newline at end of file
We have encountered buffers in two occasions. Firstly, some implementations buffer certain responses when in re-key exchange. As soon as re-keying completes, these queued messages are released all at once. This leads to a \textsc{newkeys} response (indicating re-keying has completed), directly followed by all buffered responses. This combined with concatenation of the multiple output response would lead to non-termination of the learning algorithm, as for every variant of the response queue there would be a different output. To counter this, we replace by concatenation of queued output responses by the single word \textsc{buffered}, thus forming \textsc{newkeys\_buffered}.
Buffer behaviour can also be observed when opening and closing channels, since a SUT can close only as many channels as have previously been opened. For this reason, we restricted the number of simultaneously open channels to one. The {\dmapper} returns a custom response \textsc{ch\_max} to a \textsc{ch\_open} message whenever this limit is reached.
\ No newline at end of file
\section{Checking security specifications} \label{sec:specs}
We use the NuSMV model checker to verify security properties. NuSMV is a model checker where a state of a model is specified as a set of finite variables, and a transition-function, which can then be defined to change these variables. Specifications in temporal logic, such as CTL and LTL, can be checked for truth on specified models. NuSMV provides a counterexample if a certain specification is not true. NuSMV models are generated automatically from the learned models. Generation proceeds by first defining in an empty NuSMV file three variables, corresponding to inputs, outputs and states. The transition-function is then extracted from the learned model and appended to the file. This function updates the output and state variable for given valuations of the input variable. Below we give an example of a Mealy machine and it's associated NuSMV model.
The properties we verify are primarily drawn from Verleg's security definitions. To these, we add properties on re-keying, as well as other properties outlined in the RFC specification.
\subsection{General security properties}
Verleg defined three security properties, one for each layer. We formalized and checked two, leaving out the one defined for the Connection layer, as it implied manually checking the model for problems.
The security property for the Transport layer is defined as follows: ``We consider a transport layer state machine secure if there is no path from the initial state to the point where the authentication service is invoked without exchanging and employing cryptographic keys.'' In other words, authentication request cannot succeed if key exchange wasn't performed successfully. Key exchange implies performing three steps (*give steps*). These steps have to be performed in order but can be interleaved by other actions. Successful authentication request should necessarily imply successful execution of the key exchange steps. We can tell a authentication request or a key exchange step was successful from the values of the input and output variables. For example, an authentication request is successful
if for the input SERVICE\_REQUEST\_AUTH, the output SERVICE\_ACCEPT was received. Following these principles, we define the LTL specification in Figure~\ref{fig:prop1}. O stands for the Once-operator, a past time LTL operator, which is true if its argument is true in a past time instant.
\begin{figure}[h]
\begin{lstlisting}[basicstyle=\footnotesize]
G (
(input=SERVICE_REQUEST_AUTH & output=SERVICE_ACCEPT)
->
O ( (input=NEWKEYS & output= NO_RESP) &
O ( (input=KEX30 & output=KEX31_NEWKEYS) &
O (output=KEXINIT)
)
)
)
\end{lstlisting}
\label{fig:prop1}
\caption{Transport layer security property.}
\end{figure}
For the authentication layer, Verleg defined the authentication layer secure ``if there is no path from the unauthenticated state to the authenticated state without providing the correct credentials.''
Since we have no method to directly determine if the model is in an authenticated state, we consider the authenticated state to be a state where opening a channel can be done successfully.
Since opening a channel should only be possible after user authentication, this should show if we are in an authenticated state. The specification then is rather simple: Always, if a channel is opened successfully, has there not been a failure in user authentication since a successful user authentication. Opening a channel successfully in the model is straightforward, a request to open a channel is sent to the server, and a success message is received back. Checking whether there has been a success or failure in user authentication is done by checking if the server ever sends the UA\_FAILURE or UA\_SUCCESS packets. Additional checks if the user actually made a user authentication request could potentially be added. The resulting LTL is shown in Figure~\ref{fig:prop2}. This LTL uses the Since-operator (S) which holds if its first argument holds for every state since its second argument holds. Note that the second argument has to hold somewhere in the path for the Since operator to hold.
\begin{figure}[h]
\begin{lstlisting}[basicstyle=\footnotesize]
G ( (input= CH_OPEN & output= CH_OPEN_SUCCESS)
-> (!(output=UA_FAILURE) S output= UA_SUCCESS) )
\end{lstlisting}
\label{fig:prop2}
\caption{User Authentication layer LTL in NuSMV syntax.}
\end{figure}
\subsection{Re-key properties}
An important property is that re-exchanging keys can be done in any state of the protocol, and its successful execution doesn't change the protocol state. We consider 2 general protocol states, pre-authenticated (after a successful authentication request, before authentication) and authenticated. For these, we specify that performing key exchange should always preserve the state, until specific inputs or outputs are encountered that necessarily change it. An example of such inputs or outputs are those corresponding to a disconnect action from either side.
\subsection{RFC Properties}
We have also formalized and checked properties drawn from the RFC specifications. A problem was that the specification for each layer seemed to only consider messages specific to that layer. In particular, it didn't consider disconnect messages, that may be sent at any point in the protocol as per (*ref*). For the Transport Layer, the RFC states that after sending a KEXINIT message, a party MUST not send SERVICE\_REQUEST or SERVICE\_ACCEPT messages, until it has sent a NEWKEYS message. This is translated to the LTL.
The RFC also states that in case the server rejects a service request, it should send an appropriate DISCONNECT message.
The Connection Layer RFC states that upon receiving a CH\_CLOSE message, a side should send back a CH\_CLOSE message, unless it has already sent this message for the channel. This of course, ignores the case when a side disconnects, in which case a CH\_CLOSE would no longer have to be issued.
\subsection{Model checking results}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment