Commit 7ce02209 authored by Erik Poll's avatar Erik Poll
Browse files

tweaked introduction

parent 40274912
\section{Introduction}\label{introduction}
The SSH protocol is used interact securely with remote machines. Alongside TLS and IPSec, SSH is among the most frequently used network security protocols~\cite{Albrecht2009Plaintext}. Due to its significant user base and sensitive nature, flaws in the protocol or its implementation could have major impact. It therefore comes as no surprise that SSH has attracted scrutiny from the security community.
SSH has been subjected to various security analyses \cite{Albrecht2009Plaintext,Bellare2004Breaking,Williams2011Analysis,Paterson2010PlaintextDependent}.
Formal methods have been applied in analysing implementations of
SSH: Protocol state machines of SSH's transport layer
were used in a manual code review of OpenSSH
\cite{Poll_rigorous_2011} and a formal program verification of a Java
implementation of SSH \cite{PollSchubert07}. These models have also
been used for model based testing \cite{Boss2012}.
Udrea et al.\cite{Udrea_rule-based_2008} use static analysis to check
two implementations of SSH against an extensive set of rules.
SSH is security protocols that is widely to interact securely with
remote machines. SSH -- or more precisely, the transport layer
protocol of SSH -- has been subjected to security analysis
\cite{Williams2011Analysis}, incl.\ analyses that revealed
cryptographic shortcomings
\cite{Albrecht2009Plaintext,Bellare2004Breaking,Paterson2010PlaintextDependent}.
%Academic researchers have so far focused more on the theoretical aspects than on implementations of the protocol.
But whereas these analyses consider the abstract cryptographic
protocol, this paper looks at actual implementations of SSH, and
investigates flaws in the program logic of these implementations,
rather than cryptographic flaws. Such logical flaws have occurred in
implementations of other security protocols, notably TLS, with Apple's
'goto fail' bug and the FREAK attack \cite{Beurdouche:2017}. For this
we use model learning (a.k.a.\ active automata learning)
\cite{Angluin1987Learning,Vaa17} to infer state machines of three SSH
implementations, which we then analyse by model checking for
conformance to both functional and security properties.
In this work, we use model learning (a.k.a.\ active automata learning) \cite{Angluin1987Learning,Vaa17},
to infer state machines of three SSH implementations, which we then verify by model checking.
Model learning has previously been applied to infer state machines of EMV bank cards~\cite{Aarts2013Formal}, electronic passports~\cite{Aarts2010Inference}, hand-held readers for online banking~\cite{Chalupar2014Automated}, and implementations of TCP~\cite{TCP2016} and TLS~\cite{RuiterProtocol}. Model learning aims to obtain a state machine model of a black-box system by providing inputs and observing outputs. The learned state model corresponds to the observed behavior, and can be used in system analysis. Since model learning builds from a finite number of observations, we can never be sure that the learned model is correct. To that end, advanced conformance algorithms are employed \cite{LeeY96}, which yield some confidence that the system inferred is in fact correct. In the context of testing protocols, model learning can be seen as a form of protocol state fuzzing, whereby unexpected inputs are sent to a system under test in the hope of exposing hidden anomalies. In model learning, inputs are sent with no regard to the order imposed by the protocol. Any anomalies are then exposed in the learned model.
The properties we verify for the inferred state machines are based on
the RFCs that specify SSH \cite{rfc4251,rfc4252,rfc4253,rfc4254}.
These properties are formalized in LTL and verified using
NuSMV~\cite{NuSMV2}. Manually verifying these properties would be
difficult, as the learned models are complex. Moreover, formalizing
properties means we can also better assess and overcome vagueness or
under-specification in the RFC standards.
Having obtained models, we use model checking to automatically verify their conformance to both functional and security properties. The properties are drawn out of the RFC specifications \cite{rfc4251,rfc4252,rfc4253,rfc4254} and formalized in LTL. They are then checked for truth on the learned model using NuSMV~\cite{NuSMV2}. Manually verifying these properties would be difficult, as the learned models are reasonably large. Moreover, formalizing properties means we can also better assess and overcome vagueness or under-specification in the RFC standards.
Model learning obtains a state machine model of a black-box system by
providing inputs and observing outputs. Since model learning uses a
finite number of observations, we can never be sure that the learned
model is complete\marginpar{\tiny Erik: I prefer 'complete' to
'correct'.} To that end, advanced conformance algorithms are employed
\cite{LeeY96}, which yield some confidence that the system inferred is
in fact complete. In the context of testing security protocols, model
learning can be seen as a form of fuzzing, where inputs are sent in
unexpected orders to expose any hidden anomalies.
%Moreover, the very satisfaction of properties can be used to refine the learned model.
%In cases where the learned model does not satisfy a security property, the proving sequence of inputs can either signal non-conformance of the system, or can be used to further refine the learned model.
\paragraph{Related work}
Our work is borne out of 2 recent theses \cite{Verleg2016,Toon2016} and to our knowledge, is the first combined application of model learning and model checking in verifying SSH implementations.
Previous applications of formal verification methods involve model based testing\cite{Boss2012} or static analysis\cite{Poll_rigorous_2011,Udrea_rule-based_2008}. The former limited analysis to the Transport layer. The latter required access to source code, which might not be available for proprietary implementations. \cite{Udrea_rule-based_2008} formulated an extensive set of rules, which were then used by a static analysis tool to verify C implementations. The set of rules covered all layers and it described both elements of message ordering and data flow, whereas we only analyze message ordering. However, rules were tied to routines in the code, so had to be slightly adapted to fit the different implementations. By contrast, our properties are defined at an abstract level so don't need such tailoring. Moreover employing a black box approach makes our approach easily portable and means we don't impose any constraints on the actual implementation of the system (like for example, that it is implemented in C).
%The SSH protocol has been investigated for cryptographic flaws, e.g.\
%\cite{Williams2011Analysis,Albrecht2009Plaintext,Bellare2004Breaking,Paterson2010PlaintextDependent},
%but that research does not consider logical flaws in actual software
%implementations.
Our approach, most closely resembles work on TCP~\cite{TCP2016}, where they also combine classical learning and model checking to analyze various TCP implementations, with our work being more focused on security properties. Other case studies involving model learning rely on manual analysis of the learned models ~\cite{Aarts2013Formal,Chalupar2014Automated,RuiterProtocol}. Our work differs, since we use a model checker to automatically check specifications. Inference of protocols has also been done from observing network traffic ~\cite{Wang2011Inferring}. Such inference is limited to the traces that occur over a network. Inference is further hampered if the analyzed protocols communicate over encrypted channels, as this severely limits
information available from traces without knowledge of the security key.
%Whereas the combined work of \cite{Poll_rigurous2011}, \cite{Boss2012} leverages model
%based testing on a specification model, we
Udrea et al.\cite{Udrea_rule-based_2008} also investigated SSH
implementations for logical flaws. They used a static analysis tool to
check two C implementations of SSH against an extensive set of rules.
These rules not only express properties of the SSH protocol logic, but
also of message formats and support for earlier versions and various
options. Our analysis only considers the protocol logic. However,
their rules were tied to routines in the code, so had to be slightly adapted
to fit the different implementations. In contrast, our properties are
defined at an abstract level so do not need such tailoring. Moreover,
our black box approach approach means we can analyse any implemention
of SSH, not just C implementations.
\marginpar{\tiny @Paul: do these properties bear any resemblance o
your LTL properties}
%Moreover, the model checker is also used as means of testing the learned model.
Formal models of SSH in the form of state machines have been used
before, namely for a manual code review of OpenSSH
\cite{Poll_rigorous_2011}, formal program verification of a Java
implementation of SSH \cite{PollSchubert07}, and for model based
testing of SSH implementations \cite{Boss2012}. All this research
only considered the SSH transport layer, and not the other SSH
protocol layers.
This paper is borne out of 2 recent theses \cite{Verleg2016,Toon2016}
and is to our knowledge the first combined application of model
learning and model checking in verifying SSH implementations, or more
generally, implementations of any network security protocol.
Model learning has previously been applied to infer state machines of
EMV bank cards~\cite{Aarts2013Formal}, electronic
passports~\cite{Aarts2010Inference}, hand-held readers for online
banking~\cite{Chalupar2014Automated}, and implementations of
TCP~\cite{TCP2016} and TLS~\cite{RuiterProtocol}. Some of these case
studies relied on manual analysis of the learned models
\cite{Aarts2013Formal,Aarts2010Inference,RuiterProtocol}, but some
also used model checkers \cite{TCP2016,Chalupar2014Automated}.
Instead of using active learning as we do, it is also possible to use
passive learning to obtain protocol state machines
\cite{Wang2011Inferring}. Here network traffic is observed, and not
actively generated. This can then also provide a probabilistic
characterisation of normal network traffic, but it cannot uncover
implementation flaws that occur in strange message flows, which is our
goal.
%Our approach, most closely resembles work on TCP~\cite{TCP2016},
%where they also combine classical learning and model checking to
%analyze various TCP implementations, with our work being more focused
%on security properties. Other case studies involving model learning
%rely on manual analysis of the learned models
%~\cite{Aarts2013Formal,TCP2016,Chalupar2014Automated,RuiterProtocol}.
%Our work differs, since we use a model checker to automatically check
%specifications. Inference of protocols has also been done from
%observing network traffic ~\cite{Wang2011Inferring}. Such inference
%is limited to the traces that occur over a network. Inference is
%further hampered if the analyzed protocols communicate over encrypted
%channels, as this severely limits information available from traces without knowledge of the security key.
%Model checkers provide for an automatic way of verifying properties.
%Manual verification would be difficult manually, as the state machines are large.
%State machines inferred are large and hard to check manually for properties. Moreover, the very satisfaction of these properties by the learned model can be used to generate tests. Model checkers provide for an automated way to check conformance of learned models.
%
%In this thesis, we will be using protocol state fuzzing to extract an implementations state machine. All fuzzing algorithms rely on the idea of sending unexpected input data to a system under test (SUT) in the hope that this triggers anomalies. Using protocol state fuzzing, however, we will fuzz on the \textit{order} of otherwise correctly-formed messages.
%This technique has previously been applied to infer state machines of EMV bank cards~\cite{Aarts2013Formal}, electronic passports~\cite{Aarts2010Inference} and hand-held readers for online banking~\cite{Chalupar2014Automated}. Furthermore, implementations of TCP~\cite{Janssen2015Learning} and TLS~\cite{RuiterProtocol} have state-fuzzed. As in these works, we implement state fuzzing
%
......
......@@ -576,9 +576,22 @@ machine learning algorithms},
title = {Model Learning},
volume = {60},
number = {2},
month = feb,
year = {2017},
pages = {86--95},
publisher = {ACM},
address = {New York, NY, USA},
}
@article{Beurdouche:2017,
author = {Beurdouche, B. and Bhargavan, K. and Delignat-Lavaud, A. and Fournet, C. and Kohlweiss, M. and Pironti, A. and Strub, P.-Y. and Zinzindohoue, J. K.},
title = {A Messy State of the Union: Taming the Composite State Machines of TLS},
journal = {Communications of the ACM},
volume = {60},
number = {2},
year = {2017},
pages = {99--107},
publisher = {ACM},
address = {New York, NY, USA},
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment