introduction.tex 7.34 KB
Newer Older
1
2
\section{Introduction}\label{introduction}

Erik Poll's avatar
Erik Poll committed
3
4
5
6
7
8
SSH is security protocols that is widely to interact securely with
remote machines. SSH -- or more precisely, the transport layer
protocol of SSH -- has been subjected to security analysis
\cite{Williams2011Analysis}, incl.\ analyses that revealed
cryptographic shortcomings
\cite{Albrecht2009Plaintext,Bellare2004Breaking,Paterson2010PlaintextDependent}.
9

Erik Poll's avatar
Erik Poll committed
10
11
12
13
14
15
16
17
18
19
But whereas these analyses consider the abstract cryptographic
protocol, this paper looks at actual implementations of SSH, and
investigates flaws in the program logic of these implementations,
rather than cryptographic flaws.  Such logical flaws have occurred in
implementations of other security protocols, notably TLS, with Apple's
'goto fail' bug and the FREAK attack \cite{Beurdouche:2017}.  For this
we use model learning (a.k.a.\ active automata learning)
\cite{Angluin1987Learning,Vaa17} to infer state machines of three SSH
implementations, which we then analyse by model checking for
conformance to both functional and security properties.
20

Erik Poll's avatar
Erik Poll committed
21
22
23
24
25
26
27
The properties we verify for the inferred state machines are based on
the RFCs that specify SSH \cite{rfc4251,rfc4252,rfc4253,rfc4254}.
These properties are formalized in LTL and verified using
NuSMV~\cite{NuSMV2}. Manually verifying these properties would be
difficult, as the learned models are complex.  Moreover, formalizing
properties means we can also better assess and overcome vagueness or
under-specification in the RFC standards.
28

Erik Poll's avatar
Erik Poll committed
29
30
31
32
33
34
35
36
37
Model learning obtains a state machine model of a black-box system by
providing inputs and observing outputs.  Since model learning uses a
finite number of observations, we can never be sure that the learned
model is complete\marginpar{\tiny Erik: I prefer 'complete' to
'correct'.} To that end, advanced conformance algorithms are employed
\cite{LeeY96}, which yield some confidence that the system inferred is
in fact complete.  In the context of testing security protocols, model
learning can be seen as a form of fuzzing, where inputs are sent in
unexpected orders to expose any hidden anomalies. 
38

Erik Poll's avatar
Erik Poll committed
39
\paragraph{Related work}
40

Erik Poll's avatar
Erik Poll committed
41
42
43
44
%The SSH protocol has been investigated for cryptographic flaws, e.g.\
%\cite{Williams2011Analysis,Albrecht2009Plaintext,Bellare2004Breaking,Paterson2010PlaintextDependent},
%but that research does not consider logical flaws in actual software
%implementations.
45

Erik Poll's avatar
Erik Poll committed
46
47
48
49
50
51
52
53
54
55
56
57
58
Udrea et al.\cite{Udrea_rule-based_2008} also investigated SSH
implementations for logical flaws.  They used a static analysis tool to
check two C implementations of SSH against an extensive set of rules.
These rules not only express properties of the SSH protocol logic, but
also of message formats and support for earlier versions and various
options. Our analysis only considers the protocol logic.  However,
their rules were tied to routines in the code, so had to be slightly adapted
to fit the different implementations. In contrast, our properties are
defined at an abstract level so do not need such tailoring. Moreover,
our black box approach approach means we can analyse any implemention
of SSH, not just C implementations.
\marginpar{\tiny @Paul: do these properties bear any resemblance o
your LTL properties}
59

Erik Poll's avatar
Erik Poll committed
60
61
62
63
64
65
66
Formal models of SSH in the form of state machines have been used
before, namely for a manual code review of OpenSSH
\cite{Poll_rigorous_2011}, formal program verification of a Java
implementation of SSH \cite{PollSchubert07}, and for model based
testing of SSH implementations \cite{Boss2012}.  All this research
only considered the SSH transport layer, and not the other SSH
protocol layers.
67

Erik Poll's avatar
Erik Poll committed
68
69
70
71
This paper is borne out of 2 recent theses \cite{Verleg2016,Toon2016}
and is to our knowledge the first combined application of model
learning and model checking in verifying SSH implementations, or more
generally, implementations of any network security protocol.
72

Erik Poll's avatar
Erik Poll committed
73
74
75
76
77
78
79
80
Model learning has previously been applied to infer state machines of
EMV bank cards~\cite{Aarts2013Formal}, electronic
passports~\cite{Aarts2010Inference}, hand-held readers for online
banking~\cite{Chalupar2014Automated}, and implementations of
TCP~\cite{TCP2016} and TLS~\cite{RuiterProtocol}.  Some of these case
studies relied on manual analysis of the learned models
\cite{Aarts2013Formal,Aarts2010Inference,RuiterProtocol}, but some
also used model checkers \cite{TCP2016,Chalupar2014Automated}.
81

Erik Poll's avatar
Erik Poll committed
82
83
84
85
86
87
88
Instead of using active learning as we do, it is also possible to use
passive learning to obtain protocol state machines
\cite{Wang2011Inferring}.  Here network traffic is observed, and not
actively generated.  This can then also provide a probabilistic
characterisation of normal network traffic, but it cannot uncover
implementation flaws that occur in strange message flows, which is our
goal.
89

Erik Poll's avatar
Erik Poll committed
90
91
92
93
94
95
96
97
98
99
100
101
%Our approach, most closely resembles work on TCP~\cite{TCP2016},
%where they also combine classical learning and model checking to
%analyze various TCP implementations, with our work being more focused
%on security properties. Other case studies involving model learning
%rely on  manual analysis of the learned models
%~\cite{Aarts2013Formal,TCP2016,Chalupar2014Automated,RuiterProtocol}.
%Our work differs, since we use a model checker to automatically check
%specifications. Inference of protocols has also been done from
%observing network traffic ~\cite{Wang2011Inferring}. Such inference
%is limited to the traces that occur over a network. Inference is
%further hampered if the analyzed protocols communicate over encrypted
%channels, as this severely limits information available from traces without knowledge of the security key. 
102
103
104
105
106
107
108
109
110
111

%In this thesis, we will be using protocol state fuzzing to extract an implementations state machine. All fuzzing algorithms rely on the idea of sending unexpected input data to a system under test (SUT) in the hope that this triggers anomalies. Using protocol state fuzzing, however, we will fuzz on the \textit{order} of otherwise correctly-formed messages. 
%This technique has previously been applied to infer state machines of EMV bank cards~\cite{Aarts2013Formal}, electronic passports~\cite{Aarts2010Inference} and hand-held readers for online banking~\cite{Chalupar2014Automated}. Furthermore, implementations of TCP~\cite{Janssen2015Learning} and TLS~\cite{RuiterProtocol} have state-fuzzed. As in these works, we implement state fuzzing
%
%State machines inferred are large and hard to check manually. 
%Once a state machine has been inferred, security-related logical flaws are usually easily spotted by an auditor with some knowledge about the protocol~\cite{RuiterProtocol}. An example of a logical flaw is exchanging user credentials before an encrypted connection has been established. 
%
%Besides security-related logical flaws, inferred state machines can show quirks such as superfluous states. Although these might not be directly exploitable, OpenBSD auditors illustrate why these small bugs should be resolved: ``we are not so much looking for security holes, as we are looking for basic software bugs, and if years later someone discovers the problem used to be a security issue, and we fixed it because it was just a bug, well, all the better''\footnote{\url{http://www.openbsd.org/security.html}}.
%
%\textit{Organization.} An outline of the SSH protocol will be provided in Section~\ref{ssh}. The experimental setup is discussed in Section~\ref{setup}. The results are subsequently discussed in Section~\ref{results}, after which we conclude in Section~\ref{conclusions}.
Erik Poll's avatar
shit    
Erik Poll committed
112
%