introduction.tex 6.08 KB
Newer Older

The SSH protocol is used interact securely with remote machines. Alongside TLS and IPSec, SSH is amongst the most frequently used security suites~\cite{Albrecht2009Plaintext}. Due to its significant user base and sensitive nature, flaws in the protocol or its implementation could have major impact. It therefore comes as no surprise that SSH has attracted scrutiny from the security community. Problems with information leakage via unencrypted MACs were identified~\cite{Bellare2004Breaking}, as well as possibilities for plain text recovery when a block cipher in cipher block chaining mode was used~\cite{Albrecht2009Plaintext}. Other analysis found no serious issues~\cite{Williams2011Analysis}\cite{Paterson2010PlaintextDependent}. Academic researchers have so far focused more on the theoretical aspects than on implementations of the protocol. 

In this work, we use classical active automata learning, which we refer to as model learning, to infer state machines of 2 SSH implementations.  Model learning  has previously been applied to infer state machines of EMV bank cards~\cite{Aarts2013Formal}, electronic passports~\cite{Aarts2010Inference} and hand-held readers for online banking~\cite{Chalupar2014Automated}. More recently, it was used to learn implementations of TCP~\cite{TCP2016} and TLS~\cite{RuiterProtocol}. Model learning's goal is obtaining a state model of a black-box system by providing inputs and observing outputs. The learned state model corresponds to the observed behavior, and can be used in system analysis. Since model learning builds from a finite number of observations, we can never be sure that the learned model is correct. To that end, advanced conformance algorithms are employed, which yield some confidence that the system inferred is in fact correct.  In the context of testing protocols, model learning can be seen as a form of protocol state fuzzing, whereby unexpected inputs are sent to a system under test in the hope of exposing hidden anomalies. In model learning, inputs are sent with no regard to the order imposed by the protocol. Inputs can therefore be sent disorderly. Any anomalies are then exposed in the learned model.

Paul Fiterau Brostean's avatar
Paul Fiterau Brostean committed
Having obtained models, we use model checking to automatically verify their conformance to security properties. Such verification would be difficult manually, as the learned models are large. Moreover, the very satisfaction of properties can be used to refine the learned model. In cases where the learned model does not satisfy a security property, the proving sequence of inputs can either signal non-conformance of the system, or can be used to further refine the learned model. The security properties are drawn out of the RFC specifications\cite{rfc4251}\cite{rfc4252}\cite{rfc4253}\cite{rfc4254} and formalized in LTL. They are then checked for truth on the learned model using NuSMV~\cite{NuSMV}.

Our work is borne out of 2 recent theses \cite{Verleg2016}, \cite{Toon2016}.  It most closely resembles work on TCP~\cite{TCP2016}, where they also combine classical learning and model checking to analyze various TCP implementations, with our work being more focused on security properties. Other case studies rely on 
manual analysis of the learned models ~\cite{Aarts2013Formal},\cite{Chalupar2014Automated}, \cite{RuiterProtocol}. Our work differs, 
Paul Fiterau Brostean's avatar
Paul Fiterau Brostean committed
since we use a model checker to automatically check specifications. %Moreover, the model checker is also used as means of testing the learned model. 
Inference of protocols has also been done from observing network traffic ~\cite{Wang2011Inferring}. Such inference is limited to the traces that
occur over a network. Such inference is further hampered if the analyzed protocols communicate over encrypted channels, as this severely limits 
information available from traces without knowledge of the security key. 


%Model checkers provide for an automatic way of verifying properties. 
%Manual verification would be difficult manually, as the state machines are large.

%State machines inferred are large and hard to check manually for properties. Moreover, the very satisfaction of these properties by the learned model can be used to generate tests. Model checkers provide for an automated way to check conformance of learned models. 
%In this thesis, we will be using protocol state fuzzing to extract an implementations state machine. All fuzzing algorithms rely on the idea of sending unexpected input data to a system under test (SUT) in the hope that this triggers anomalies. Using protocol state fuzzing, however, we will fuzz on the \textit{order} of otherwise correctly-formed messages. 
%This technique has previously been applied to infer state machines of EMV bank cards~\cite{Aarts2013Formal}, electronic passports~\cite{Aarts2010Inference} and hand-held readers for online banking~\cite{Chalupar2014Automated}. Furthermore, implementations of TCP~\cite{Janssen2015Learning} and TLS~\cite{RuiterProtocol} have state-fuzzed. As in these works, we implement state fuzzing
%State machines inferred are large and hard to check manually. 
%Once a state machine has been inferred, security-related logical flaws are usually easily spotted by an auditor with some knowledge about the protocol~\cite{RuiterProtocol}. An example of a logical flaw is exchanging user credentials before an encrypted connection has been established. 
%Besides security-related logical flaws, inferred state machines can show quirks such as superfluous states. Although these might not be directly exploitable, OpenBSD auditors illustrate why these small bugs should be resolved: ``we are not so much looking for security holes, as we are looking for basic software bugs, and if years later someone discovers the problem used to be a security issue, and we fixed it because it was just a bug, well, all the better''\footnote{\url{}}.
%\textit{Organization.} An outline of the SSH protocol will be provided in Section~\ref{ssh}. The experimental setup is discussed in Section~\ref{setup}. The results are subsequently discussed in Section~\ref{results}, after which we conclude in Section~\ref{conclusions}.