introduction.tex 7.87 KB
Newer Older

The SSH protocol is used interact securely with remote machines. Alongside TLS and IPSec, SSH is amongst the most frequently used security suites~\cite{Albrecht2009Plaintext}. Due to its significant user base and sensitive nature, flaws in the protocol or its implementation could have major impact. It therefore comes as no surprise that SSH has attracted scrutiny from the security community. 
The protocol specification has been subjected to various security analyses in \cite{Albrecht2009Plaintext,Bellare2004Breaking,Williams2011Analysis,Paterson2010PlaintextDependent}. 
Formal methods have also been applied. Poll et. al. in \cite{Poll_rigorous_2011} formulate a thorough specification of SSH's Transport layer. They then use this specification to verify OpenSSH by manually inspecting the source code. The same specification is later used by Erik Boss\cite{Boss2012} in model based testing of the implementation. Udrea et al.\cite{Udrea_rule-based_2008} use static analysis to check two implementations of SSH against an extensive set of rules.

%In \cite{Boss2012}, model based testing was used to verify 
%Academic researchers have so far focused more on the theoretical aspects than on implementations of the protocol. 

Paul Fiterau Brostean's avatar
Paul Fiterau Brostean committed
In this work, we use classical active automata learning, or simply, model learning, to infer state machines of three SSH implementations, which we then verify by model checking.  
Model learning  has previously been applied to infer state machines of EMV bank cards~\cite{Aarts2013Formal}, electronic passports~\cite{Aarts2010Inference} and hand-held readers for online banking~\cite{Chalupar2014Automated}. More recently, it was used to learn implementations of TCP~\cite{TCP2016} and TLS~\cite{RuiterProtocol}. Model learning's goal is to obtain a state model of a black-box system by providing inputs and observing outputs. The learned state model corresponds to the observed behavior, and can be used in system analysis. Since model learning builds from a finite number of observations, we can never be sure that the learned model is correct. To that end, advanced conformance algorithms are employed\cite{LeeY96}, which yield some confidence that the system inferred is in fact correct.  In the context of testing protocols, model learning can be seen as a form of protocol state fuzzing, whereby unexpected inputs are sent to a system under test in the hope of exposing hidden anomalies. In model learning, inputs are sent with no regard to the order imposed by the protocol. Any anomalies are then exposed in the learned model.

Paul Fiterau Brostean's avatar
Paul Fiterau Brostean committed
Having obtained models, we use model checking to automatically verify their conformance to both functional and security properties. The properties are drawn out of the RFC specifications\cite{rfc4251,rfc4252,rfc4253,rfc4254} and formalized in LTL. They are then checked for truth on the learned model using NuSMV~\cite{NuSMV}. Manually verifying these properties would be difficult, as the learned models are reasonably large. Moreover, formalizing properties means we can also better assess and overcome vagueness or under-specification in the RFC standards.

%Moreover, the very satisfaction of properties can be used to refine the learned model.
%In cases where the learned model does not satisfy a security property, the proving sequence of inputs can either signal non-conformance of the system, or can be used to further refine the learned model. 

Our work is borne out of 2 recent theses \cite{Verleg2016,Toon2016} and to our knowledge, is the first combined application of model learning and model checking in verifying SSH implementations. 
Paul Fiterau Brostean's avatar
Paul Fiterau Brostean committed
Previous applications of formal verification methods involve model based testing\cite{Boss2012} or static analysis\cite{Poll_rigorous_2011,Udrea_rule-based_2008}. The former limited analysis to the Transport layer. The latter required access to source code, which might not be available for proprietary implementations. \cite{Udrea_rule-based_2008} formulated an extensive set of rules, which were then used by a static analysis tool to verify C implementations. The set of rules covered all layers and it described both elements of message ordering and data flow, whereas we only analyze message ordering. However, rules were tied to routines in the code, so had to be slightly adapted to fit the different implementations. By contrast, our properties are defined at an abstract level so don't need such tailoring. Moreover employing a black box approach makes our approach easily portable and means we don't impose any constraints on the actual implementation of the system (like for example, that it is implemented in C).

Paul Fiterau Brostean's avatar
Paul Fiterau Brostean committed
Our approach, most closely resembles work on TCP~\cite{TCP2016}, where they also combine classical learning and model checking to analyze various TCP implementations, with our work being more focused on security properties. Other case studies involving model learning rely on  manual analysis of the learned models ~\cite{Aarts2013Formal,Chalupar2014Automated,RuiterProtocol}. Our work differs, since we use a model checker to automatically check specifications. Inference of protocols has also been done from observing network traffic ~\cite{Wang2011Inferring}. Such inference is limited to the traces that occur over a network. Inference is further hampered if the analyzed protocols communicate over encrypted channels, as this severely limits 
information available from traces without knowledge of the security key. 
%Whereas the combined work of  \cite{Poll_rigurous2011}, \cite{Boss2012} leverages model
%based testing on a specification model, we 

%Moreover, the model checker is also used as means of testing the learned model. 



%Model checkers provide for an automatic way of verifying properties. 
%Manual verification would be difficult manually, as the state machines are large.

%State machines inferred are large and hard to check manually for properties. Moreover, the very satisfaction of these properties by the learned model can be used to generate tests. Model checkers provide for an automated way to check conformance of learned models. 
%In this thesis, we will be using protocol state fuzzing to extract an implementations state machine. All fuzzing algorithms rely on the idea of sending unexpected input data to a system under test (SUT) in the hope that this triggers anomalies. Using protocol state fuzzing, however, we will fuzz on the \textit{order} of otherwise correctly-formed messages. 
%This technique has previously been applied to infer state machines of EMV bank cards~\cite{Aarts2013Formal}, electronic passports~\cite{Aarts2010Inference} and hand-held readers for online banking~\cite{Chalupar2014Automated}. Furthermore, implementations of TCP~\cite{Janssen2015Learning} and TLS~\cite{RuiterProtocol} have state-fuzzed. As in these works, we implement state fuzzing
%State machines inferred are large and hard to check manually. 
%Once a state machine has been inferred, security-related logical flaws are usually easily spotted by an auditor with some knowledge about the protocol~\cite{RuiterProtocol}. An example of a logical flaw is exchanging user credentials before an encrypted connection has been established. 
%Besides security-related logical flaws, inferred state machines can show quirks such as superfluous states. Although these might not be directly exploitable, OpenBSD auditors illustrate why these small bugs should be resolved: ``we are not so much looking for security holes, as we are looking for basic software bugs, and if years later someone discovers the problem used to be a security issue, and we fixed it because it was just a bug, well, all the better''\footnote{\url{}}.
%\textit{Organization.} An outline of the SSH protocol will be provided in Section~\ref{ssh}. The experimental setup is discussed in Section~\ref{setup}. The results are subsequently discussed in Section~\ref{results}, after which we conclude in Section~\ref{conclusions}.