Commit b34f19c0 authored by Erik Poll's avatar Erik Poll

shit

parent 00a3ccc9
@misc{TeXFAQ,
author = {{UK \TeX{} Users Group}},
howpublished = {\url{http://www.tex.ac.uk}},
title = {{UK} List of {\TeX} Frequently Asked Questions},
year = {2016},
}
@Manual{Downes04:amsart,
title = {The \textsf{amsart}, \textsf{amsproc}, and
\textsf{amsbook} document~classes},
author = {Michael Downes and Barbara Beeton},
organization = {American Mathematical Society},
year = 2004,
month = {August},
note = {\url{http://www.ctan.org/pkg/amslatex}}
}
@Manual{instr-l,
title = {Instructions for Preparation of Papers and
Monographs, {AMS\LaTeX}},
organization = {American Mathematical Society},
month = {August},
year = 2004,
note = {\url{http://www.ctan.org/pkg/amslatex}}
}
@Manual{Fiorio15,
title = {{a}lgorithm2e.sty---package for algorithms},
author = {Cristophe Fiorio},
month = {October},
year = 2015,
annote = {\url{http://www.ctan.org/pkg/algorithm2e}}
}
@Manual{Brito09,
title = {The algorithms bundle},
author = {Rog\'erio Brito},
month = {August},
year = 2009,
annote = {\url{http://www.ctan.org/pkg/algorithms}}
}
@Manual{Heinz15,
title = {The Listings Package},
author = {Carsten Heinz and Brooks Moses and Jobst Hoffmann},
month = {June},
year = 2015,
note = {\url{http://www.ctan.org/pkg/listings}}
}
@Manual{Fear05,
title = {Publication quality tables in {\LaTeX}},
author = {Simon Fear},
month = {April},
year = 2005,
note = {\url{http://www.ctan.org/pkg/booktabs}}
}
@Manual{ACMIdentityStandards,
title = {{ACM} Visual Identity Standards},
organization = {Association for Computing Machinery},
year = 2007
}
@Manual{Sommerfeldt13:Subcaption,
title = {The subcaption package},
author = {Axel Sommerfeldt},
month = {April},
year = 2013,
note = {\url{http://www.ctan.org/pkg/subcaption}},
}
@Manual{Nomencl,
title = {A package to create a nomenclature},
author = {Boris Veytsman and Bern Schandl and Lee Netherton
and CV Radhakrishnan},
month = {September},
note = {\url{http://www.ctan.org/pkg/nomencl}},
year = 2005}
@Manual{Talbot16:Glossaries,
title = {User Manual for glossaries.sty v4.25},
author = {Nicola L. C. Talbot},
month = {June},
year = 2016,
note = {\url{http://www.ctan.org/pkg/subcaption}}}
...@@ -5,16 +5,21 @@ formalized several security and functional properties drawn from the SSH RFC spe ...@@ -5,16 +5,21 @@ formalized several security and functional properties drawn from the SSH RFC spe
properties on the learned models using model checking and have uncovered several minor inconsistencies, though crucially, the security critical properties were met properties on the learned models using model checking and have uncovered several minor inconsistencies, though crucially, the security critical properties were met
by all implementation. by all implementation.
Abstraction was provided by a {\dmapper} component placed between the {\dlearner} and the {\dsut}. The {\dmapper} was constructed from an existing SSH Abstraction was provided by a {\dmapper} component placed between the
implementation. The alphabet the {\dmapper} exposed to the {\dlearner}, explored key exchange and setting up a secure connection, several authentication methods {\dlearner} and the {\dsut}. The {\dmapper} was constructed from an
and opening and closing single of channels over which the terminal service could be requested. We used two alphabets, a full version for OpenSSH, and existing SSH implementation. The input alphabet of the {\dmapper}
a restricted version for the other implementations. The restricted alphabet was still sufficient to explore most aforementioned behavior. explored key exchange, setting up a secure connection, several
authentication methods and opening and closing channels over which the
terminal service could be requested. We used two input alphabets, a
full version for OpenSSH, and a restricted version for the other
implementations. The restricted alphabet was still sufficient to
explore most aforementioned behavior.
There were several challenges encountered. Firstly, building a {\dmapper} presented a considerable technical challenge, as it required re-structuring of an actual There were several challenges encountered. Firstly, building a {\dmapper} presented a considerable technical challenge, as it required re-structuring of an actual
SSH implementation. Secondly, because we used classical learning algorithms, we had to ensure that the abstracted implementation behaved SSH implementation. Secondly, because we used classical learning algorithms, we had to ensure that the abstracted implementation behaved
like a deterministic Mealy Machine. Herein, time induced non-determinism was difficult to eliminate. Buffering also presented problems, like a deterministic Mealy Machine. Herein, time induced non-determinism was difficult to eliminate. Buffering also presented problems,
leading to a considerable increase in the number of states. Moreover, the systems analyzed were relatively slow, which meant learning and testing took leading to a considerable increase in the number of states. Moreover, the systems analyzed were relatively slow, which meant learning took
several days. This was compounded by the size of the learning alphabet, and it forced us into using a reduced alphabet for two of the analyzed implementations. several days\marginpar{\tiny Erik: For a single server, right??}. This was compounded by the size of the learning alphabet, and it forced us into using a reduced alphabet for two of the analyzed implementations.
Limitations of the work, hence elements for future work, are several. First of all, the {\dmapper} was not formalized, unlike in~\cite{TCP2016}, thus we did not Limitations of the work, hence elements for future work, are several. First of all, the {\dmapper} was not formalized, unlike in~\cite{TCP2016}, thus we did not
produce a concretization of the abstract models. Consequently, model checking results cannot be fully transferred to the actual implementations. Formal definition produce a concretization of the abstract models. Consequently, model checking results cannot be fully transferred to the actual implementations. Formal definition
...@@ -25,4 +30,4 @@ can infer systems with parameterized alphabets, state variables and simple opera ...@@ -25,4 +30,4 @@ can infer systems with parameterized alphabets, state variables and simple opera
Despite these limitations, our work provides a compelling application of learning and model checking in a security setting, on a widely used protocol. We hope this lays Despite these limitations, our work provides a compelling application of learning and model checking in a security setting, on a widely used protocol. We hope this lays
some more groundwork for further case studies, as well as fresh advancements in learning techniques. some more groundwork for further case studies, as well as advances learning techniques.
\ No newline at end of file
\section{Introduction}\label{introduction} \section{Introduction}\label{introduction}
The SSH protocol is used interact securely with remote machines. Alongside TLS and IPSec, SSH is amongst the most frequently used security suites~\cite{Albrecht2009Plaintext}. Due to its significant user base and sensitive nature, flaws in the protocol or its implementation could have major impact. It therefore comes as no surprise that SSH has attracted scrutiny from the security community. The SSH protocol is used interact securely with remote machines. Alongside TLS and IPSec, SSH is among the most frequently used network security protocols~\cite{Albrecht2009Plaintext}. Due to its significant user base and sensitive nature, flaws in the protocol or its implementation could have major impact. It therefore comes as no surprise that SSH has attracted scrutiny from the security community.
The protocol specification has been subjected to various security analyses in \cite{Albrecht2009Plaintext,Bellare2004Breaking,Williams2011Analysis,Paterson2010PlaintextDependent}. SSH has been subjected to various security analyses \cite{Albrecht2009Plaintext,Bellare2004Breaking,Williams2011Analysis,Paterson2010PlaintextDependent}.
Formal methods have also been applied. Poll et. al. in \cite{Poll_rigorous_2011} formulate a thorough specification of SSH's Transport layer. They then use this specification to verify OpenSSH by manually inspecting the source code. The same specification is later used by Erik Boss\cite{Boss2012} in model based testing of the implementation. Udrea et al.\cite{Udrea_rule-based_2008} use static analysis to check two implementations of SSH against an extensive set of rules. Formal methods have been applied in analysing implmentations of
SSH: Protocol state machines of SSH's transport layer
were used in a manual code review of OpenSSH
\cite{Poll_rigorous_2011} and a formal program verification of a Java
implementation of SSH \cite{PollSchubert07}. These models have also
been used for model based testing \cite{Boss2012}.
Udrea et al.\cite{Udrea_rule-based_2008} use static analysis to check
two implementations of SSH against an extensive set of rules.
%In \cite{Boss2012}, model based testing was used to verify
%Academic researchers have so far focused more on the theoretical aspects than on implementations of the protocol. %Academic researchers have so far focused more on the theoretical aspects than on implementations of the protocol.
In this work, we use classical active automata learning, or simply, model learning, to infer state machines of three SSH implementations, which we then verify by model checking. In this work, we use classical active automata learning, or simply, model learning, to infer state machines of three SSH implementations, which we then verify by model checking.
Model learning has previously been applied to infer state machines of EMV bank cards~\cite{Aarts2013Formal}, electronic passports~\cite{Aarts2010Inference} and hand-held readers for online banking~\cite{Chalupar2014Automated}. More recently, it was used to learn implementations of TCP~\cite{TCP2016} and TLS~\cite{RuiterProtocol}. Model learning's goal is to obtain a state model of a black-box system by providing inputs and observing outputs. The learned state model corresponds to the observed behavior, and can be used in system analysis. Since model learning builds from a finite number of observations, we can never be sure that the learned model is correct. To that end, advanced conformance algorithms are employed\cite{LeeY96}, which yield some confidence that the system inferred is in fact correct. In the context of testing protocols, model learning can be seen as a form of protocol state fuzzing, whereby unexpected inputs are sent to a system under test in the hope of exposing hidden anomalies. In model learning, inputs are sent with no regard to the order imposed by the protocol. Any anomalies are then exposed in the learned model. Model learning has previously been applied to infer state machines of EMV bank cards~\cite{Aarts2013Formal}, electronic passports~\cite{Aarts2010Inference}, hand-held readers for online banking~\cite{Chalupar2014Automated}, and implementations of TCP~\cite{TCP2016} and TLS~\cite{RuiterProtocol}. Model learning aims to obtain a state machine model of a black-box system by providing inputs and observing outputs. The learned state model corresponds to the observed behavior, and can be used in system analysis. Since model learning builds from a finite number of observations, we can never be sure that the learned model is correct. To that end, advanced conformance algorithms are employed\cite{LeeY96}, which yield some confidence that the system inferred is in fact correct. In the context of testing protocols, model learning can be seen as a form of protocol state fuzzing, whereby unexpected inputs are sent to a system under test in the hope of exposing hidden anomalies. In model learning, inputs are sent with no regard to the order imposed by the protocol. Any anomalies are then exposed in the learned model.
Having obtained models, we use model checking to automatically verify their conformance to both functional and security properties. The properties are drawn out of the RFC specifications\cite{rfc4251,rfc4252,rfc4253,rfc4254} and formalized in LTL. They are then checked for truth on the learned model using NuSMV~\cite{NuSMV}. Manually verifying these properties would be difficult, as the learned models are reasonably large. Moreover, formalizing properties means we can also better assess and overcome vagueness or under-specification in the RFC standards. Having obtained models, we use model checking to automatically verify their conformance to both functional and security properties. The properties are drawn out of the RFC specifications\cite{rfc4251,rfc4252,rfc4253,rfc4254} and formalized in LTL. They are then checked for truth on the learned model using NuSMV~\cite{NuSMV}. Manually verifying these properties would be difficult, as the learned models are reasonably large. Moreover, formalizing properties means we can also better assess and overcome vagueness or under-specification in the RFC standards.
...@@ -43,4 +49,4 @@ information available from traces without knowledge of the security key. ...@@ -43,4 +49,4 @@ information available from traces without knowledge of the security key.
%Besides security-related logical flaws, inferred state machines can show quirks such as superfluous states. Although these might not be directly exploitable, OpenBSD auditors illustrate why these small bugs should be resolved: ``we are not so much looking for security holes, as we are looking for basic software bugs, and if years later someone discovers the problem used to be a security issue, and we fixed it because it was just a bug, well, all the better''\footnote{\url{http://www.openbsd.org/security.html}}. %Besides security-related logical flaws, inferred state machines can show quirks such as superfluous states. Although these might not be directly exploitable, OpenBSD auditors illustrate why these small bugs should be resolved: ``we are not so much looking for security holes, as we are looking for basic software bugs, and if years later someone discovers the problem used to be a security issue, and we fixed it because it was just a bug, well, all the better''\footnote{\url{http://www.openbsd.org/security.html}}.
% %
%\textit{Organization.} An outline of the SSH protocol will be provided in Section~\ref{ssh}. The experimental setup is discussed in Section~\ref{setup}. The results are subsequently discussed in Section~\ref{results}, after which we conclude in Section~\ref{conclusions}. %\textit{Organization.} An outline of the SSH protocol will be provided in Section~\ref{ssh}. The experimental setup is discussed in Section~\ref{setup}. The results are subsequently discussed in Section~\ref{results}, after which we conclude in Section~\ref{conclusions}.
% %
\ No newline at end of file
...@@ -12,7 +12,9 @@ Certain arrangements had to be made including the setting of timing parameters t ...@@ -12,7 +12,9 @@ Certain arrangements had to be made including the setting of timing parameters t
\begin{figure*} \begin{figure*}
\centering \centering
\includegraphics[scale=0.30]{ssh-server} \includegraphics[scale=0.30]{ssh-server}
\caption{OpenSSH server. States are distributed into 3 clusters, one for each layer, plus a state for when connection was lost. \caption{OpenSSH server. States are collected in 3 clusters,
indicated by the rectangles, where each cluster corresponds to
one of the protocol layers.
We eliminate redundant states and information induced by the {\dmapper}, as well as states present in successful rekey sequences. Wherever rekey was permitted, we replaced the rekey states and transitions by a single \textsl{REKEY SEQUENCE} transition. We also factor out edges common to states within a cluster. We replace common disconnecting edges, by one edge from the cluster to the disconnect state. Common self loop edges are colored, and the actual i/o information only appears on one edge. Transitions with similar start and end states are joined together on the same edge. Transition labels are kept short by regular expressions(UA\_* stands for all inputs starting with UA\_) or by factoring out common start strings. Green edges highlight the happy flow. } We eliminate redundant states and information induced by the {\dmapper}, as well as states present in successful rekey sequences. Wherever rekey was permitted, we replaced the rekey states and transitions by a single \textsl{REKEY SEQUENCE} transition. We also factor out edges common to states within a cluster. We replace common disconnecting edges, by one edge from the cluster to the disconnect state. Common self loop edges are colored, and the actual i/o information only appears on one edge. Transitions with similar start and end states are joined together on the same edge. Transition labels are kept short by regular expressions(UA\_* stands for all inputs starting with UA\_) or by factoring out common start strings. Green edges highlight the happy flow. }
\label{fig:sshserver} \label{fig:sshserver}
\end{figure*} \end{figure*}
...@@ -38,7 +40,7 @@ distinguishing sequence. The exhaustive variant for a set {\dk}, generates tests ...@@ -38,7 +40,7 @@ distinguishing sequence. The exhaustive variant for a set {\dk}, generates tests
namely, that the learned model is correct unless the (unknown) model of the implementation has at least {\dk} more states. The random variant produces tests namely, that the learned model is correct unless the (unknown) model of the implementation has at least {\dk} more states. The random variant produces tests
with randomly generated middle sections. No formal confidence is provided, but past experience shows this to be more effective at finding counterexamples since {\dk} with randomly generated middle sections. No formal confidence is provided, but past experience shows this to be more effective at finding counterexamples since {\dk}
can be set to higher values. We executed a random test suite with {\dk} of 4 comprising 40000 tests for OpenSSH, and 20000 tests for BitVise and DropBear. can be set to higher values. We executed a random test suite with {\dk} of 4 comprising 40000 tests for OpenSSH, and 20000 tests for BitVise and DropBear.
We then ran an exhaustive test suite with {\dk} of 2 for for all implementations. We then ran an exhaustive test suite with {\dk} of 2 for all implementations.
Table~\ref{tab:experiments} describes the exact versions of the systems analyzed together with statistics on learning and testing, namely: Table~\ref{tab:experiments} describes the exact versions of the systems analyzed together with statistics on learning and testing, namely:
...@@ -62,11 +64,13 @@ DropBear v2014.65 & 17 & 2 & 19863 ...@@ -62,11 +64,13 @@ DropBear v2014.65 & 17 & 2 & 19863
\label{tab:experiments} \label{tab:experiments}
\end{table} \end{table}
The large number of states is down to several reasons. First of all, some systems exhibited buffering behavior. In particular, BitVise would queue The large number of states is down to several reasons. First of all, some systems exhibited buffering behavior. In particular, BitVise would queue
responses for higher layer inputs sent during key re-exchange, and would deliver them all at once, after the exchange was done. Re-exchanging keys (rekey-ing) was also responses for higher layer inputs sent during key re-exchange, and would deliver them all at once after the exchange was done. Rekeying was also
a major contributor to the number of states. In states allowing rekey, following the sequence of transitions comprising the rekey should lead back to the starting state. This a major contributor to the number of states. For each state where rekeying
leads to 2 additional rekey states for every state permitting rekey. A considerable number of states were also added due to {\dmapper} generated outputs such as \textsc{ch\_none} or \textsc{ch\_max}, outputs which signal that no channel is open or that the maximum number of channels have been opened. is possible, the sequence of transitions constituting the complete
rekeying process should lead back to tha state. This
leads to two additional rekeying states for each state where rekeying
is possible . A considerable number of states were also added due to {\dmapper} generated outputs such as \textsc{ch\_none} or \textsc{ch\_max}, outputs which signal that no channel is open or that the maximum number of channels have been opened.
Figure~\ref{fig:sshserver} shows the model learned for OpenSSH, with various changes applied to improve readability. The happy flow, contoured in green, is fully explored in the model and mostly matches our earlier description of it\footnote{The only exception is in the transport layer, where unlike in our happy flow definition, the server is the first to send the \textsc{newkeys} message. This is also accepted behavior, as the protocol does not specify which side should send \textsc{newkeys} first.}. Also explored is what happens when a rekey sequence is attempted. We notice that rekey is only allowed in states of the Connection layer. Strangely, for these states, rekey is not state preserving, as the generated output on receiving a \textsc{sr\_auth}, \textsc{sr\_conn} or \textsc{kex30} changes from \textsc{unimpl} to \textsc{no\_resp}. This leads to two sub-clusters of states, one before the first rekey, the other afterward. In all other states, the first step of a rekey (\textsc{kexinit}) yields (\textsc{unimpl}), while the last step (\textsc{newkeys}) causes the system to disconnect. Figure~\ref{fig:sshserver} shows the model learned for OpenSSH, with various changes applied to improve readability. The happy flow, contoured in green, is fully explored in the model and mostly matches our earlier description of it\footnote{The only exception is in the transport layer, where unlike in our happy flow definition, the server is the first to send the \textsc{newkeys} message. This is also accepted behavior, as the protocol does not specify which side should send \textsc{newkeys} first.}. Also explored is what happens when a rekey sequence is attempted. We notice that rekey is only allowed in states of the Connection layer. Strangely, for these states, rekey is not state preserving, as the generated output on receiving a \textsc{sr\_auth}, \textsc{sr\_conn} or \textsc{kex30} changes from \textsc{unimpl} to \textsc{no\_resp}. This leads to two sub-clusters of states, one before the first rekey, the other afterward. In all other states, the first step of a rekey (\textsc{kexinit}) yields (\textsc{unimpl}), while the last step (\textsc{newkeys}) causes the system to disconnect.
......
This diff is collapsed.
...@@ -26,10 +26,13 @@ input enabled and deterministic. ...@@ -26,10 +26,13 @@ input enabled and deterministic.
%\end{figure} %\end{figure}
\subsection{MAT Framework} \label{ssec:mat} \subsection{MAT Framework} \label{ssec:mat}
The most efficient algorithms for model learning all follow The most efficient algorithms for model learning all follow
the pattern of a \emph{minimally adequate teacher (MAT)} as proposed by Angluin~\cite{Angluin1987Learning}. the pattern of a \emph{minimally adequate teacher (MAT)} as proposed by Angluin~\cite{Angluin1987Learning}.
In the MAT framework, learning is viewed as a game in which a \emph{learner} has to infer an unknown automaton by asking queries to a teacher. The teacher knows the automaton, which in our setting is a deterministic Mealy machine $\M$. Here learning is viewed as a game in which a \emph{learner} has to infer an unknown automaton by asking queries to a teacher. The teacher knows the automaton, which in our setting is a deterministic Mealy machine $\M$,
Initially, the {\dlearner} only knows the inputs $I$ and outputs $O$ of $\M$. also called the System Under Learning ({\dsut}).
Initially, the {\dlearner} only knows the input alphabet $I$ and
output alphabet $O$ of $\M$.
The task of the {\dlearner} is to learn $\M$ through two types of queries: The task of the {\dlearner} is to learn $\M$ through two types of queries:
\begin{itemize} \begin{itemize}
\item \item
...@@ -41,9 +44,15 @@ whether $\CH \approx \M$. The teacher answers \emph{yes} if this is the case. Ot ...@@ -41,9 +44,15 @@ whether $\CH \approx \M$. The teacher answers \emph{yes} if this is the case. Ot
\emph{no} and supplies a \emph{counterexample}, which is a sequence $\sigma \in I^{\ast}$ that triggers \emph{no} and supplies a \emph{counterexample}, which is a sequence $\sigma \in I^{\ast}$ that triggers
a different output sequence for both Mealy machines, that is, $A_{\CH}(\sigma) \neq A_{\M}(\sigma)$. a different output sequence for both Mealy machines, that is, $A_{\CH}(\sigma) \neq A_{\M}(\sigma)$.
\end{itemize} \end{itemize}
%
If $\M$ is treated as a black box, the equivalence query can only be
approximated by a \emph{test query}, which uses conformance testing
\cite{LeeY96} -- more specifically, model-based testing --
to look for a counterexample with a finite number of queries. Note
that this cannot exclude the possibility that there is more behaviour
that has not been discovered. For a recent overview of model learning
algorithms for this setting see \cite{Isberner2015}.
Model learning algorithms have been developed developed for learning deterministic Mealy machines using
a finite number of queries. We point to \cite{Isberner2015} for a recent overview. These algorithms are leveraged
in applications where one wants to learn a model of a black-box reactive system, or System Under Learning ({\dsut}). The teacher typically in applications where one wants to learn a model of a black-box reactive system, or System Under Learning ({\dsut}). The teacher typically
consists of the {\dsut}, which answers membership queries, and a conformance consists of the {\dsut}, which answers membership queries, and a conformance
testing tool \cite{LeeY96} that approximates equivalence queries using a set testing tool \cite{LeeY96} that approximates equivalence queries using a set
...@@ -101,4 +110,4 @@ concrete inputs to abstract inputs and concrete outputs to abstract outputs. For ...@@ -101,4 +110,4 @@ concrete inputs to abstract inputs and concrete outputs to abstract outputs. For
%\item The transport layer protocol. This creates the basis for communication between server and client, providing a key exchange protocol and server authentication. The key exchange protocol is performed through three roundtrips. During the first, both client and server send a KEXINIT message. Then, the client sends a KEX30 message, the server responds with a KEX31 message. Finally, both parties send a NEWKEYS message, which indicates that the keys sent in the second step can be used. %\item The transport layer protocol. This creates the basis for communication between server and client, providing a key exchange protocol and server authentication. The key exchange protocol is performed through three roundtrips. During the first, both client and server send a KEXINIT message. Then, the client sends a KEX30 message, the server responds with a KEX31 message. Finally, both parties send a NEWKEYS message, which indicates that the keys sent in the second step can be used.
%\item The user authentication protocol. This component is used to authenticate a client to the server, for example, through a username and password combination, or through SSH-keys. %\item The user authentication protocol. This component is used to authenticate a client to the server, for example, through a username and password combination, or through SSH-keys.
%\item The connection protocol. This is used to provide different services to the connected client, it can thus multiplex the encrypted channel into different channels. The provided services can be services like file transfer or a remote terminal. Typical messages are requests for opening or closing channels, or requests for earlier named services. %\item The connection protocol. This is used to provide different services to the connected client, it can thus multiplex the encrypted channel into different channels. The provided services can be services like file transfer or a remote terminal. Typical messages are requests for opening or closing channels, or requests for earlier named services.
%\end{itemize} %\end{itemize}
\ No newline at end of file
...@@ -29,8 +29,8 @@ This function updates the output and state variables for a given valuation of th ...@@ -29,8 +29,8 @@ This function updates the output and state variables for a given valuation of th
%} %}
%\lstset{showspaces=true} %\lstset{showspaces=true}
\begin{figure}[h]
\begin{figure}[h]
\centering \centering
%\begin{subfigure} %\begin{subfigure}
\begin{tikzpicture}[>=stealth',shorten >=1pt,auto,node distance=2.8cm] \begin{tikzpicture}[>=stealth',shorten >=1pt,auto,node distance=2.8cm]
...@@ -70,6 +70,7 @@ This function updates the output and state variables for a given valuation of th ...@@ -70,6 +70,7 @@ This function updates the output and state variables for a given valuation of th
\label{fig:nusmvex} \label{fig:nusmvex}
\end{figure} \end{figure}
The remainder of this section defines the properties we formalized and verified. We group these properties into four categories: The remainder of this section defines the properties we formalized and verified. We group these properties into four categories:
\begin{enumerate} \begin{enumerate}
...@@ -83,7 +84,7 @@ A key note is that properties are checked not on the actual concrete model of th ...@@ -83,7 +84,7 @@ A key note is that properties are checked not on the actual concrete model of th
\subsection{Basic characterizing properties} \subsection{Basic characterizing properties}
%cannot be translated. %Though in practical terms, these results are still translatable, in particular for cases where properties are not met. %cannot be translated. %Though in practical terms, these results are still translatable, in particular for cases where properties are not met.
In our setting, once the connection is lost, it can no longer be recovered and the {\dmapper} will itself respond with \textsc{no\_conn} to any subsequent non-Connection layer inputs sent by the {\dlearner}. This behavior is described by Property~\ref{prop:noconn}, where \emph{isConnectionInput} is a predicate which only holds if the input supplied is a Connection layer input. The reason we exclude connection inputs is due to a mapper characteristic we touched on in Section~\ref{sec:result}. The {\dmapper} maintains a buffer of opened channels and limits its size to 1. From the perspective of the {\dmapper}, a channel is open, and thus added to the buffer, whenever \textsc{ch\_open} is received from the learner, regardless if a channel was actually opened on the {\dsut}. If an attempt to open an additional channel is made, the {\dmapper} itself responds by \textsc{ch\_max} without querying the {\dsut}. Conversely, if there is no channel open (the buffer is empty) and an input operating on a channel is sent, the {\dmapper} responds by \textsc{ch\_none}, again, without querying the {\dsut}. Additionally, a channel opened on the {\dmapper} is closed and removed from the buffer on a {ch\_close} from the {\dlearner}, with a corresponding SSH CHANNEL CLOSE message being sent to the {\dsut}. We use Property~\label{prop:channel} to describe this mapper induced behavior. In our setting, once the connection is lost, it can no longer be recovered and the {\dmapper} will itself respond with \textsc{no\_conn} to any subsequent non-Connection layer inputs sent by the {\dlearner}. This behavior is described by Property~\ref{prop:noconn}, where \emph{isConnectionInput} is a predicate which only holds if the input supplied is a Connection layer input. The reason we exclude connection inputs is due to a mapper characteristic we touched on in Section~\ref{sec:result}. The {\dmapper} maintains a buffer of opened channels and limits its size to 1. From the perspective of the {\dmapper}, a channel is open, and thus added to the buffer, whenever \textsc{ch\_open} is received from the learner, regardless if a channel was actually opened on the {\dsut}. If an attempt to open an additional channel is made, the {\dmapper} itself responds by \textsc{ch\_max} without querying the {\dsut}. Conversely, if there is no channel open (the buffer is empty) and an input operating on a channel is sent, the {\dmapper} responds by \textsc{ch\_none}, again, without querying the {\dsut}. Additionally, a channel opened on the {\dmapper} is closed and removed from the buffer on a {ch\_close} from the {\dlearner}, with a corresponding SSH CHANNEL CLOSE message being sent to the {\dsut}. We use Property~\ref{prop:channel} to describe this mapper induced behavior.
\begin{property}%[h] \begin{property}%[h]
......
...@@ -29,10 +29,10 @@ ...@@ -29,10 +29,10 @@
@inproceedings{SMJV15, @inproceedings{SMJV15,
author = {W. Smeenk and J. Moerman and D.N. Jansen and F.W. Vaandrager}, author = {W. Smeenk and J. Moerman and D.N. Jansen and F.W. Vaandrager},
booktitle = {Proceedings 17th International Conference on Formal Engineering Methods (ICFEM 2015), Paris, 3-6 November 2015}, booktitle = {Proceedings 17th International Conference on Formal Engineering Methods (ICFEM 2015), Paris, 3-6 November 2015},
series = {Lecture Notes in Computer Science}, series = {LNCS},
volume = 9407, volume = 9407,
pages = {1--17}, pages = {1--17},
publisher = {Springer-Verlag}, publisher = {Springer},
year = 2015, year = 2015,
editor = {M. Butler and S. Conchon and F. Zaidi}, editor = {M. Butler and S. Conchon and F. Zaidi},
title = {Applying Automata Learning to Embedded Control Software} title = {Applying Automata Learning to Embedded Control Software}
...@@ -40,22 +40,12 @@ ...@@ -40,22 +40,12 @@
@MastersThesis{Boss2012, @MastersThesis{Boss2012,
title={Evaluating implementations of SSH by means of model-based testing}, title={Evaluating implementations of SSH by means of model-based testing},
author={Lenaerts, T}, author={Lenaerts, T.},
year={2012}, year={2012},
document_type = {Bachelor's Thesis}, document_type = {Bachelor's Thesis},
type = {Bachelor's Thesis}, type = {Bachelor's Thesis},
howpublished = {Online \url{https://pdfs.semanticscholar.org/8841/47071555f50b614c6af640cea5152fee10f2.pdf}}, howpublished = {Online \url{https://pdfs.semanticscholar.org/8841/47071555f50b614c6af640cea5152fee10f2.pdf}},
journal = {Radboud University} school = {Radboud University}
}
@article{Poll_rigorous_2011,
title = {Rigorous specifications of the {SSH} {Transport} {Layer}},
url = {http://www.cs.kun.nl/~erikpoll/publications/ssh.pdf},
urldate = {2017-02-13},
journal = {Radboud University Nijmegen, Tech. Rep. ICIS–R11004},
author = {Poll, Erik and Schubert, Aleksy},
year = {2011},
keywords = {ssh},
} }
@article{Udrea_rule-based_2008, @article{Udrea_rule-based_2008,
...@@ -83,7 +73,7 @@ ...@@ -83,7 +73,7 @@
document_type = {Bachelor's Thesis}, document_type = {Bachelor's Thesis},
type = {Bachelor's Thesis}, type = {Bachelor's Thesis},
howpublished = {Online}, howpublished = {Online},
journal = {Radboud University} school = {Radboud University}
} }
...@@ -96,10 +86,10 @@ ...@@ -96,10 +86,10 @@
@MastersThesis{Verleg2016, @MastersThesis{Verleg2016,
title={Inferring SSH state machines using protocol state fuzzing}, title={Inferring SSH state machines using protocol state fuzzing},
author={Verleg, P}, author={Verleg, P.},
year={2016}, year={2016},
howpublished = {Online}, howpublished = {Online},
journal = {Radboud University} school = {Radboud University}
} }
@article{rfc760, @article{rfc760,
...@@ -146,8 +136,8 @@ ...@@ -146,8 +136,8 @@
pages = {188--204}, pages = {188--204},
posted-at = {2015-12-07 11:15:39}, posted-at = {2015-12-07 11:15:39},
priority = {2}, priority = {2},
publisher = {Springer Berlin Heidelberg}, publisher = {Springer},
series = {Lecture Notes in Computer Science}, series = {LNCS},
title = {Generating Models of {Infinite-State} Communication Protocols Using Regular Inference with Abstraction}, title = {Generating Models of {Infinite-State} Communication Protocols Using Regular Inference with Abstraction},
url = {http://dx.doi.org/10.1007/978-3-642-16573-3_14}, url = {http://dx.doi.org/10.1007/978-3-642-16573-3_14},
volume = {6435}, volume = {6435},
...@@ -195,7 +185,7 @@ machine learning algorithms}, ...@@ -195,7 +185,7 @@ machine learning algorithms},
issn= {0925-9856}, issn= {0925-9856},
doi= {10.1007/s10703-014-0216-x}, doi= {10.1007/s10703-014-0216-x},
url= {http://dx.doi.org/10.1007/s10703-014-0216-x}, url= {http://dx.doi.org/10.1007/s10703-014-0216-x},
publisher= {Springer US}, publisher= {Springer},
keywords= {Active automata learning; Mealy machines; Abstraction techniques; Communication protocols; Session initiation protocol; Transmission control protocol}, keywords= {Active automata learning; Mealy machines; Abstraction techniques; Communication protocols; Session initiation protocol; Transmission control protocol},
volume = 46, volume = 46,
number = 1, number = 1,
...@@ -233,7 +223,7 @@ machine learning algorithms}, ...@@ -233,7 +223,7 @@ machine learning algorithms},
booktitle={International Colloquium on Theoretical Aspects of Computing}, booktitle={International Colloquium on Theoretical Aspects of Computing},
pages={165--183}, pages={165--183},
year={2015}, year={2015},
organization={Springer International Publishing} organization={Springer}
} }
@inproceedings{ralib2015, @inproceedings{ralib2015,
...@@ -248,10 +238,11 @@ machine learning algorithms}, ...@@ -248,10 +238,11 @@ machine learning algorithms},
file = {2015-RALib A LearnLib extension for inferring EFSMs.pdf:C\:\\Users\\Paul\\AppData\\Roaming\\Zotero\\Zotero\\Profiles\\tt1zn5x1.default\\zotero\\storage\\TQUR4TS7\\2015-RALib A LearnLib extension for inferring EFSMs.pdf:application/pdf} file = {2015-RALib A LearnLib extension for inferring EFSMs.pdf:C\:\\Users\\Paul\\AppData\\Roaming\\Zotero\\Zotero\\Profiles\\tt1zn5x1.default\\zotero\\storage\\TQUR4TS7\\2015-RALib A LearnLib extension for inferring EFSMs.pdf:application/pdf}
} }
@article{Chalupar2014Automated, @inproceedings{Chalupar2014Automated,
author = {Chalupar, Georg and Peherstorfer, Stefan and Poll, Erik and de Ruiter, Joeri}, author = {Chalupar, Georg and Peherstorfer, Stefan and Poll, Erik and de Ruiter, Joeri},
citeulike-article-id = {13837720}, citeulike-article-id = {13837720},
journal = {WOOT'14 Proceedings of the 8th USENIX conference on Offensive Technologies}, booktitle = {Proceedings of the 8th USENIX workshop on
Offensive Technologies (WOOT'14)},
pages = {1--10}, pages = {1--10},
posted-at = {2015-11-13 14:58:54}, posted-at = {2015-11-13 14:58:54},
priority = {2}, priority = {2},
...@@ -270,8 +261,8 @@ machine learning algorithms}, ...@@ -270,8 +261,8 @@ machine learning algorithms},
pages = {673--686}, pages = {673--686},
posted-at = {2015-11-13 14:56:16}, posted-at = {2015-11-13 14:56:16},
priority = {2}, priority = {2},
publisher = {Springer Berlin Heidelberg}, publisher = {Springer},
series = {Lecture Notes in Computer Science}, series = {LNCS},
title = {Inference and Abstraction of the Biometric Passport}, title = {Inference and Abstraction of the Biometric Passport},
url = {http://dx.doi.org/10.1007/978-3-642-16558-0_54}, url = {http://dx.doi.org/10.1007/978-3-642-16558-0_54},
volume = {6415}, volume = {6415},
...@@ -370,8 +361,8 @@ machine learning algorithms}, ...@@ -370,8 +361,8 @@ machine learning algorithms},
pages = {345--361}, pages = {345--361},
posted-at = {2015-11-13 12:20:10}, posted-at = {2015-11-13 12:20:10},
priority = {2}, priority = {2},
publisher = {Springer Berlin Heidelberg}, publisher = {Springer},
series = {Lecture Notes in Computer Science}, series = {LNCS},
title = {{Plaintext-Dependent} Decryption: A Formal Security Treatment of {SSH}-{CTR}}, title = {{Plaintext-Dependent} Decryption: A Formal Security Treatment of {SSH}-{CTR}},
url = {http://dx.doi.org/10.1007/978-3-642-13190-5_18}, url = {http://dx.doi.org/10.1007/978-3-642-13190-5_18},
volume = {6110}, volume = {6110},
...@@ -389,8 +380,8 @@ machine learning algorithms}, ...@@ -389,8 +380,8 @@ machine learning algorithms},
pages = {356--374}, pages = {356--374},
posted-at = {2015-11-13 12:16:22}, posted-at = {2015-11-13 12:16:22},
priority = {2}, priority = {2},
publisher = {Springer Berlin Heidelberg}, publisher = {Springer},
series = {Lecture Notes in Computer Science}, series = {LNCS},
title = {Analysis of the {SSH} Key Exchange Protocol}, title = {Analysis of the {SSH} Key Exchange Protocol},
url = {http://dx.doi.org/10.1007/978-3-642-25516-8_22}, url = {http://dx.doi.org/10.1007/978-3-642-25516-8_22},
volume = {7089}, volume = {7089},
...@@ -454,7 +445,6 @@ machine learning algorithms}, ...@@ -454,7 +445,6 @@ machine learning algorithms},
citeulike-linkout-0 = {http://dx.doi.org/10.1109/icstw.2013.60}, citeulike-linkout-0 = {http://dx.doi.org/10.1109/icstw.2013.60},
citeulike-linkout-1 = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6571671}, citeulike-linkout-1 = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6571671},
doi = {10.1109/icstw.2013.60}, doi = {10.1109/icstw.2013.60},
institution = {Inst. for Comput. \& Inf. Sci., Radboud Univ. Nijmegen, Nijmegen, Netherlands},
isbn = {978-1-4799-1324-4}, isbn = {978-1-4799-1324-4},
month = mar, month = mar,
pages = {461--468}, pages = {461--468},
...@@ -475,7 +465,7 @@ machine learning algorithms}, ...@@ -475,7 +465,7 @@ machine learning algorithms},
title="Combining Model Learning and Model Checking to Analyze TCP Implementations", title="Combining Model Learning and Model Checking to Analyze TCP Implementations",
bookTitle="Computer Aided Verification: 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part II", bookTitle="Computer Aided Verification: 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part II",
year="2016", year="2016",
publisher="Springer International Publishing", publisher="Springer",
address="Cham", address="Cham",
pages="454--471", pages="454--471",
isbn="978-3-319-41540-6", isbn="978-3-319-41540-6",
...@@ -507,7 +497,6 @@ machine learning algorithms}, ...@@ -507,7 +497,6 @@ machine learning algorithms},
priority = {2}, priority = {2},
publisher = {USENIX Association}, publisher = {USENIX Association},
title = {{Not-Quite}-{So-Broken} {TLS}: Lessons in {Re-Engineering} a Security Protocol Specification and Implementation}, title = {{Not-Quite}-{So-Broken} {TLS}: Lessons in {Re-Engineering} a Security Protocol Specification and Implementation},
url = {https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/kaloper-mersinjak},
year = {2015} year = {2015}
} }
...@@ -523,12 +512,10 @@ machine learning algorithms}, ...@@ -523,12 +512,10 @@ machine learning algorithms},
priority = {2}, priority = {2},
publisher = {USENIX Association}, publisher = {USENIX Association},
title = {Protocol State Fuzzing of {TLS} Implementations}, title = {Protocol State Fuzzing of {TLS} Implementations},
url = {https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/de-ruiter},
year = {2015} year = {2015}
} }
@unpublished{Poll2011Rigorous, @techreport{Poll_rigorous_2011,
abstract = {Abstract. This document presents (semi-)formal specifications of the security protocol {SSH}, more specifically the transport layer protocol, and describe a source code review of {OpenSSH}, the leading implementation of {SSH}, using these specifications. Our specifications, in the form of finite state machines, are at a different level of abstraction that the typical formal descriptions used to study security protocols. Our motivation is to understand actual implementations of {SSH}, so we try to capture some of the details from the official (informal) specification that are irrelevant to the security of the abstract protocol, but which do complicate the implementation. Our specifications should be useful to anyone trying to understand or implement {SSH}. First versions of our specifications were developed for the formal verification of a Java implementation of {SSH} [17]. 1},
author = {Poll, Erik and Schubert, Aleksy}, author = {Poll, Erik and Schubert, Aleksy},
citeulike-article-id = {13778664}, citeulike-article-id = {13778664},
citeulike-linkout-0 = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.194.1815}, citeulike-linkout-0 = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.194.1815},
...@@ -536,7 +523,8 @@ machine learning algorithms}, ...@@ -536,7 +523,8 @@ machine learning algorithms},
priority = {2}, priority = {2},
title = {Rigorous specifications of the {SSH} Transport Layer}, title = {Rigorous specifications of the {SSH} Transport Layer},
url = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.194.1815}, url = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.194.1815},
year = {Radboud University, 2011} year = {2011},
institution = {Radboud University}
} }
@article{Pasareanu2008Learning, @article{Pasareanu2008Learning,
...@@ -558,7 +546,7 @@ machine learning algorithms}, ...@@ -558,7 +546,7 @@ machine learning algorithms},
pages = {175--205}, pages = {175--205},
posted-at = {2015-09-30 07:58:08}, posted-at = {2015-09-30 07:58:08},
priority = {2}, priority = {2},
publisher = {Springer US}, publisher = {Springer},
title = {Learning to divide and conquer: applying the {L}* algorithm to automate assume-guarantee reasoning}, title = {Learning to divide and conquer: applying the {L}* algorithm to automate assume-guarantee reasoning},
url = {http://dx.doi.org/10.1007/s10703-008-0049-6}, url = {http://dx.doi.org/10.1007/s10703-008-0049-6},
volume = {32}, volume = {32},
...@@ -576,8 +564,8 @@ machine learning algorithms}, ...@@ -576,8 +564,8 @@ machine learning algorithms},
pages = {1--18}, pages = {1--18},
posted-at = {2015-09-30 07:56:06}, posted-at = {2015-09-30 07:56:06},
priority = {2}, priority = {2},
publisher = {Springer Berlin Heidelberg}, publisher = {Springer},
series = {Lecture Notes in Computer Science}, series = {LNCS},
title = {Inferring Protocol State Machine from Network Traces: A Probabilistic Approach}, title = {Inferring Protocol State Machine from Network Traces: A Probabilistic Approach},
url = {http://dx.doi.org/10.1007/978-3-642-21554-4_1}, url = {http://dx.doi.org/10.1007/978-3-642-21554-4_1},
volume = {6715}, volume = {6715},
...@@ -639,10 +627,18 @@ machine learning algorithms}, ...@@ -639,10 +627,18 @@ machine learning algorithms},
pages = {75--90}, pages = {75--90},
posted-at = {2015-09-30 07:41:58}, posted-at = {2015-09-30 07:41:58},
priority = {2}, priority = {2},
publisher = {Springer US}, publisher = {Springer},
series = {IFIP — The International Federation for Information Processing}, series = {IFIP — The International Federation for Information Processing},
title = {Automatic executable test case generation for extended finite state machine protocols}, title = {Automatic executable test case generation for extended finite state machine protocols},
url = {http://dx.doi.org/10.1007/978-0-387-35198-8_6}, url = {http://dx.doi.org/10.1007/978-0-387-35198-8_6},
year = {1997} year = {1997}
} }
@inproceedings{PollSchubert07,
author = {E. Poll and A. Schubert},
title = {Verifying an implementation of {SSH}},
booktitle = {WITS'07},
year = {2007},
pages = {164-177},
}
...@@ -2,19 +2,17 @@ ...@@ -2,19 +2,17 @@
The Secure Shell Protocol (or SSH) is a protocol used for secure remote login and other secure network services over an insecure network. It is an application layer protocol running on top of TCP, which provides reliable data transfer, but does not provide any form of connection security. The initial version of SSH was superseded by a second version (SSHv2), as the former was found to contain design flaws~\cite{FutoranskyAttack} which could not be fixed without losing backwards compatibility. This work focuses on SSHv2. The Secure Shell Protocol (or SSH) is a protocol used for secure remote login and other secure network services over an insecure network. It is an application layer protocol running on top of TCP, which provides reliable data transfer, but does not provide any form of connection security. The initial version of SSH was superseded by a second version (SSHv2), as the former was found to contain design flaws~\cite{FutoranskyAttack} which could not be fixed without losing backwards compatibility. This work focuses on SSHv2.
SSHv2 follows a client-server paradigm consisting of three components (Figure~\ref{fig:sshcomponents}): SSHv2 follows a client-server paradigm. The protocol consists of three layers (Figure~\ref{fig:sshcomponents}):
\begin{itemize} \begin{enumerate}
\item The \textit{transport layer protocol} (RFC 4253~\cite{rfc4253}) forms the basis for any communication between a client and a server. It provides confidentiality, integrity and server authentication as well as optional compression. \item The \textit{transport layer protocol} (RFC 4253~\cite{rfc4253}) forms the basis for any communication between a client and a server. It provides confidentiality, integrity and server authentication as well as optional compression.