learning_setup.tex 11.6 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
\section{Learning setup} \label{sec:setup}
%This chapter will cover the setup used to infer the state machines. We provide a general setup outline in Section~\ref{components}. The tested SSH servers are described in Section~\ref{suts}, which were queried with the alphabet described in Section~\ref{alphabet}. Section~\ref{setup-handling} will cover the challenging SUT behaviour faced when implementing the mapper, and the adaptations that were made to overcome these challenges. Section~\ref{layers-individual} will discuss the relation between state machines for individual layers and the state machine of the complete SSH protocol. The conventions on visualisation of the inferred state machines are described in Section~\ref{visualisation}.

%Throughout this chapter, an individual SSH message to a SUT is denoted as a \textit{query}. A \textit{trace} is a sequence of multiple queries, starting from a SUT's initial state. Message names in this chapter are usually self-explanatory, but a mapping to the official RFC names is provided in Appendix~\ref{appendixa}.

%\section{Components}\label{components}

The learning setup consists of three components: a \emph{learner}, \emph{mapper} and the SUT. The {\dlearner} generates abstract inputs, representing SSH messages. The {\dmapper} transforms these messages into well-formed SSH packets and sends them to the {\dsut}. The {\dsut} sends response packets back to the {\dmapper}, which in turn, translates these packets to abstract outputs. The {\dmapper} then sends the abstract outputs back to the learner. 


The learner uses LearnLib ~\cite{LearnLib2009}, a Java library implementing $L^{\ast}$ based algorithms for learning Mealy machines. The {\dmapper} is based on Paramiko, an open source SSH implementation written in Python\footnote{Paramiko is available at \url{http://www.paramiko.org/}}. We opted for Paramiko because its code is relatively well structured and documented. The {\dsut} can be any existing implementation of a SSH server. The {\dlearner} communicates with the {\dmapper} over sockets. A graphical representation of our setup is shown in Figure~\ref{fig:components}.
\begin{figure}
13
14
	\centering
  \includegraphics[scale=0.35]{example-components.pdf}
15
16
17
18
19
20
21
  \caption{The setup consists of a learner, mapper and SUT.}
  \label{fig:components}
\end{figure}

%figure

\subsection{The learning alphabet}
22
Learning time grows rapidly as the input alphabet grows. It is therefore important to focus on messages for which interesting state-changing behaviour can be expected. As a general principle, we therefore chose not to query protocol messages that are not intended to be sent from a client to a server\footnote{Applying this principle to the RFC's messages results in not including \textsc{service\_accept}, \textsc{ua\_accept}, \textsc{ua\_failure}, \textsc{ua\_banner}, \textsc{ua\_pk\_ok}, \textsc{ua\_pw\_changereq}, \textsc{ch\_success} and \textsc{ch\_failure} in our alphabet.}. 
23

24

25
Applying this ``outgoing only'' principle to the transport layer results in the messages of Table~\ref{trans-alphabet}. This is a special instance of \textsc{kexinit} for which \texttt{first\_kex\_packet\_follows} is enabled~\cite[p. 17]{rfc4253}. Our mapper can only handle correct key guesses, so the wrong-guess procedure as described in ~\cite[p. 19]{rfc4253} was not supported. When needed, SUTs were configured to make this guess work by altering their cipher preferences. The SSH version and comment string (described in Section~\ref{ssh-run-trans}) was not queried because it does not follow the binary packet protocol.
26
27
28

\begin{table}[!ht]
\centering
29
\small
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
\begin{tabular}{ll}
\textbf{Message} & \textbf{Description} \\
\textsc{discon} & Terminates the current connection~\cite[p. 23]{rfc4253} \\
\textsc{ignore} & Has no intended effect~\cite[p. 24]{rfc4253} \\
\textsc{unimpl} & Intended response to an unimplemented message~\cite[p. 25]{rfc4253} \\
\textsc{debug} & Provides other party with debug information~\cite[p. 25]{rfc4253} \\
\textsc{kexinit} & Sends parameter preferences~\cite[p. 17]{rfc4253} \\
\textsc{guessinit} & A \textsc{kexinit} after which a guessed \textsc{kex30} follows~\cite[p. 19]{rfc4253} \\
\textsc{kex30} & Initializes the Diffie-Hellman key exchange~\cite[p. 21]{rfc4253} \\
\textsc{newkeys} & Requests to take new keys into use~\cite[p. 21]{rfc4253} \\
\textsc{sr\_auth} & Requests the authentication protocol~\cite[p. 23]{rfc4253} \\
\textsc{sr\_conn} & Requests the connection protocol~\cite[p. 23]{rfc4253}
\end{tabular}
\caption{Alphabet used to query the transport layer.}
\label{trans-alphabet}
\end{table}

47
48
49
For the user authentication layer, applying our ``outgoing only'' principle results in just one message: the authentication request~\cite[p. 4]{rfc4252}. Its parameters contain all information needed for authentication. Four authentication methods exist: none, password, public key and host-based. Our mapper supports all methods except the host-based authentication because various SUTs lack support for this feature. For all other types, we define message

As shown in Table~\ref{auth-alphabet}, both the public key as well as the password method have a \textsc{ok} and \textsc{nok} variant which provides respectively correct and incorrect credentials. 
50
51
52

\begin{table}[!ht]
\centering
53
\small
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
\begin{tabular}{ll}
\textbf{Message} & \textbf{Description} \\
\textsc{ua\_none} & Authenticates with the ``none'' method~\cite[p. 7]{rfc4252} \\
\textsc{ua\_pk\_ok} & Provides a valid name/key combination~\cite[p. 8]{rfc4252} \\
\textsc{ua\_pk\_nok} & Provides an invalid name/key combination~\cite[p. 8]{rfc4252} \\
\textsc{ua\_pw\_ok} & Provides a valid name/password combination~\cite[p. 10]{rfc4252} \\
\textsc{ua\_pw\_nok} & Provides an invalid name/password combination~\cite[p. 10]{rfc4252} \\
\end{tabular}
\caption{Alphabet used to query the user authentication layer.}
\label{auth-alphabet}
\end{table}

The connection protocol allows the client to request different processes over a single channel. Our mapper only implements requesting terminal emulation because availability of other processes depends heavily on a SUTs configuration. Moreover, little security-relevant information is expected to be gained by thoroughly testing other process requests. Combining this premise with the aforementioned ``outgoing only'' principle resulted in the alphabet of Table~\ref{conn-alphabet}. 

\begin{table}[!ht]
\centering
70
\small
71
72
73
74
75
76
77
78
79
80
81
82
83
\begin{tabular}{ll}
\textbf{Message} & \textbf{Description} \\
\textsc{ch\_open} & Opens a new channel~\cite[p. 5]{rfc4254} \\
\textsc{ch\_close} & Closes a channel~\cite[p. 9]{rfc4254} \\
\textsc{ch\_eof} & Indicates that no more data will be sent~\cite[p. 9]{rfc4254} \\
\textsc{ch\_data} & Sends data over the channel~\cite[p. 7]{rfc4254} \\
\textsc{ch\_edata} & Sends typed data over the channel~\cite[p. 8]{rfc4254} \\
\textsc{ch\_window\_adjust} & Adjusts the window size~\cite[p. 7]{rfc4254} \\
\textsc{ch\_request\_pty} & Requests terminal emulation~\cite[p. 11]{rfc4254} \\
\end{tabular}
\caption{Alphabet used to query the connection layer.}
\label{conn-alphabet}
\end{table}
84
85
%The learning alphabet comprises of input/output messages by which the {\dlearner} interfaces with the {\dmapper}. Section~\ref{sec:ssh} outlines essential inputs, while Table X provides a summary
%of all messages available at each layer. 
86

87
%table
88

89
90
\subsection{The mapper}
The {\dmapper} must provide translation between abstract message representations and well-formed SSH messages. A special case is for when no output is received from the {\dsut}, in which case the {\dmapper} gives back to the learner a {\dtimeout} message, concluding a timeout occurred.  The sheer complexity of the {\dmapper}, meant that it was easier to adapt an existing SSH implementation, rather than construct the {\dmapper} from scratch. Paramiko had already provided routines for sending the different types of packets, and for receiving them. These routines were called by control logic dictated by Paramiko's own state machine. The {\dmapper} was constructed by replacing this control logic with one dictated by messages received from the {\dlearner}. %over a socket connection
91

92
The {\dmapper} updates its state with information from concrete messages. This enables it to craft concrete messages corresponding to the learner abstractions. 
93
Table~\ref{table:state} shows how the mapper is updated.
94

95
%As  and thus to interact with the {\dsut}, at times similar to how an SSH client would.
96

97
%, sourced  from interacting with the {\dsut}. 
98

99
100
101
102
103
104
105
106
107
108
\subsection{Compacting SSH into a small Mealy machine}
The {\dmapper} not only provides abstraction, it also ensures that the abstract representation shown to the learner behaves like a deterministic Mealy Machine. 
This is needed, as SSH implementations are inherently \textit{non-determistic}.  Sources of non-determinism can be divided into three categories:
\begin{enumerate}
\item SSH's \textit{protocol design} is inherently non-deterministic. Firstly, because underspecification leads to multiple options for developers, from which one can be selected in a non-deterministic manner. Secondly, because non-deterministic behaviour directly results from the specifications. An example of the latter is allowing to insert \textsc{debug} and \textsc{ignore} messages at any given time.
\item \textit{Response timing} is a source of non-determinism as well. For example, the {\dmapper} might conclude a timeout before the {\dsut} had sent its response. We had to set
timeout values accordingly, so that enough time was allowed for the response to be received. 
\item Other \textit{timing-related quirks} can cause non-deterministic behaviour as well. Some  {\dsuts} behave unexpectedly when a new query is received shortly after the previous one. 
%For example, a trace in which a valid user authentication is performed within five milliseconds after an authentication request on DropBear can cause the authentication to (wrongly) fail.  
\end{enumerate}
109

110
111
To detect non-determinism, we cache all new observations in an SqlLite database and we verify observations against this cache. The cache also enables us to answer to queries answered before
without running inputs on the {\dsut}. A subsequent identical learning run can quickly resume from where the previous one was ended, as the cache from the previous run is used to quickly respond to all queries up to the point the previous run ended.
112

113
114
Aside from non-determinism, SSH implementations can also produce a sequence of outputs in response to an input, whereas Mealy machines allow for only one output. To that end, the {\dmapper} 
concatenates all outputs into one, delivering a single output to the {\dlearner}. 
115

116
Another challenge is presented by buffering, leading to an explosion of states, as buffers cannot be described succintly by Mealy Machines.
117
118
119
We have encountered buffers in two occasions. Firstly, some implementations buffer certain responses when in re-key exchange.  As soon as re-keying completes, these queued messages are released all at once. This leads to a \textsc{newkeys} response (indicating re-keying has completed), directly followed by all buffered responses. This combined with concatenation of the multiple output response would lead to non-termination of the learning algorithm, as for every variant of the response queue there would be a different output. To counter this, we replace by concatenation of queued output responses by the single word \textsc{buffered}, thus forming \textsc{newkeys\_buffered}.

Buffer behaviour can also be observed when opening and closing channels, since a SUT can close only as many channels as have previously been opened. For this reason, we restricted the number of simultaneously open channels to one. The {\dmapper} returns a custom response \textsc{ch\_max} to a \textsc{ch\_open} message whenever this limit is reached.