Commit 9be40284 authored by Paul Fiterau Brostean's avatar Paul Fiterau Brostean

Updated text. Fixed inconsistencies. Added .dot files. Also added powershell...

Updated text. Fixed inconsistencies. Added .dot files. Also added powershell script for (re-)building dot files
parent c588106d
......@@ -8,19 +8,21 @@ by all implementation.
Abstraction was provided by a {\dmapper} component placed between the {\dlearner} and the {\dsut}. The {\dmapper} was constructed from an existing SSH
implementation. The alphabet the {\dmapper} exposed to the {\dlearner} explored key exchange and setting up a secure connection, several authentication methods
and opening and closing single of channels over which the terminal service could be requested. We used two alphabets, a full version for OpenSSH, and
a restricted version for the other implementations. The restricted alphabet was still sufficient explore all aforementioned behavior.
a restricted version for the other implementations. The restricted alphabet was still sufficient to explore all aforementioned behavior.
There were several challenges encountered. Firstly, building a {\dmapper} presented a considerable technical challenge, as it required re-structuring of an actual
SSH implementation. Secondly, because we used classical learning algorithms, we had to ensure that the abstracted implementation behaved
like a deterministic Mealy Machine. Herein, time induced non-determinism, like in other works, was difficult to eliminate. Buffering also presented problems,
like a deterministic Mealy Machine. Herein, time induced non-determinism was difficult to eliminate. Buffering also presented problems,
leading to a considerable increase in the number of states. Moreover, the systems analyzed were relatively slow, which meant learning and testing took
several days. This was compounded by the size of the learning alphabet, and it forced us into using a reduced alphabet for two of the analyzed implementations.
Limitations of the work, thus elements for future work, are several. First of all, the {\dmapper} was not formalized, unlike in~\ref{TCP2016}, thus we could not
Limitations of the work, thus elements for future work, are several. First of all, the {\dmapper} was not formalized, unlike in~\cite{TCP2016}, thus we did not
produce a concretization of the abstract models. Consequently, model checking results cannot be fully transferred to the actual implementations. Moreover,
abstraction produced by the {\dmapper} could be made less coarse, so more insight into the implementation is gained. Parameters which are now abstracted,
such as the channel identifier or data sent over channels, could be extracted and potentially handled by Register Automata learners. Another thing which
begs for improvement is the handling of buffering behavior. Buffering leads to many uninteresting states and greatly lengthens the learning process.
such as the channel identifier or data sent over channels, could be extracted and potentially handled by Register Automata learners. The {\dmapper} also caused
redundancy in the learned model, re-tweaking the abstractions used, in particular those for managing channels, could alleviate this problem while also
improving learning times, which in turn would make facilitate learning using full alphabets for all systems. Another thing which
warrants improvement is the handling of buffering behavior. Buffering leads to many uninteresting states and greatly lengthens the learning process.
Despite these limitations, our work provides a compelling application of learning and model checking in a security setting, on a widely used protocol. We hope this lays
some more groundwork for further case studies, as well as fresh advancements in learning techniques.
\ No newline at end of file
B gci . | Where-Object { $_.Name.EndsWith(".dot") } | ForEach-Object {$s = ($_.Name.Substring(0, ($_.Name.Length - 4)) + ".pdf"); echo $s; dot -Tpdf $_.Name -o $s}
digraph G {
rankdir=LR
preauth[label="pre-auth"]
preauth -> auth [color=green style=bold label="UA_PK_OK/UA_SUCCESS
UA_PW_OK/UA_SUCCESS"]
}
No preview for this file type
digraph G {
rankdir=LR
auth -> chan[color=green style=bold label="CH_OPEN/
CH_OPEN_SUCCESS"]
chan -> pty[color=green style=bold label="CH_REQUEST_PTY/
CH_SUCCESS"]
pty -> auth[color=green style=bold label="CH_CLOSE/
CH_CLOSE_SUCCESS"]
}
No preview for this file type
digraph G {
rankdir=LR;
prekex[label="pre-kex"]
preauth[label="pre-auth"]
init -> prekex[color=green style=bold label="KEXINIT/
KEXINIT"]
prekex -> kexed[color=green style=bold label="KEX30/
KEX31"]
kexed -> keyed[color=green style=bold label="NEWKEYS/
NEWKEYS"]
keyed -> preauth[color=green style=bold label="SR_AUTH/
SR_SUCCESS"]
}
No preview for this file type
This diff is collapsed.
digraph G {
label="";
//rank=TB;
compound=true;
subgraph cluster_noconn {
style=invisible;
s2
dn [style=invisible shape=point]
//dntrans [style=invisible shape=point]
s2 [label="conn_lost"]
//conn_lost [label="conn_lost"]
}
dn -> s2 [label = "/ DISCONNECT"]
//dntrans -> conn_lost [label = "/ DISCONNECT"]
//dntrans:sw -> s2:nw [label = "* \\ {DEBUG,UNIMPL}/ DISCONNECT"]
subgraph cluster_trans {
label="Transport layer";
rankdir="LR";
//style=invisible;
s0
s1
//s3
s4
s5
//s6 kex ch
//s7 cl_auth
//s8 ch
//s9 cl_auth
//s10
//s11 cl_auth
//s12 cl_auth
//s16 ch
//s17
//s18
//s19 auth rekey
//s22
//s23 has channel rekey
//s24
//s25 auth rekey
//s27
//s28 has channel rekey
//s29 has commands pty rekey
//s30 has commands pty rekey
//s50
/*
Changelog:
These changes have been performed in order from top to bottom, some earlier changes may
no longer be visible in the final dot-code.
Note: with "x -> x" I mean edges that start from a certain edge, and end at the same edge; "s0 -> s0" for example.
---------------------
COLLAPSING EDGES:
Removed all edges with the dupicate starting and ending points, and merged their labels to a single label.
---------------------
REDUCING LABELS 1st pass:
added abbreviations:
debug = {DEBUG, IGNORE, UNIMPL} (not DISCONNECTED!, since this often leads to a different state) (stopped doing this after some states, since these commands are pretty much useless, always x -> x with NO_RESP or UNIMPL, they will be removed in a following pass (the exception is init, where it receives a KEXINIT and goes to prekex))
all_ua = {UA_PK_OK, UA_PK_NOK, UA_PW_OK, UA_PW_NOK, UA_NONE}
other_ch = messages starting with CH not yet defined for the source state
other_ua = messages starting with UA not yet defined for the source state
other = messages not yet defined for the source state (should be seen as "after" other_ch ect.)
---------------------
REDUCING LABELS 2nd pass:
removed debug (see 1st pass)
assume that DISCONNECT|NO_CONN|NO_CONN (result from REKEY) can be seen as DISCONNECT (in SUL-output) (also for any other REKEY output that ends with disconnecting)
assume that NO_CONN can be seen as DISCONNECT (in SUL-output)
more use of "other" (lots of other/DISCONNECT, not really helpful if we're removing disconnect states later on)
---------------------
OVERAL READABILITY: FROM NOW ON LOSING INFORMATION, AND POTENTIALLY HAVING "OTHER" CATCH WRONG PACKETS
removed s3, in favor of a single disconnect state (opening/closing a channel in a disconnected state seems useless anyways, although it means OpenSSH still keeps track of this info, could be interesting) (will have to be removed / cut into several "small" disconnect states)
removed all irrelevant connection layer info (CH packets) from s0, s1, s4, s5, s6, s7, s8 (transport layer states) and s9, s11, s12, s13 (authentication layer states)
removed all irrelevant authentication layer info (UA packets) from s0, s4 (transport layer states) and s17, s21, s22, s27 (connection layer states)
removed DISCONNECT as input (no case where this does not lead to disconnecting)
hiding edges to s2 if it only contains other / DISCONNECT
---------------------
OVERALL READABILTIY 2nd pass: (large removals of information, therefore a lot of edges are commented away, rather than deleted outright)
removing CH_OPEN / CH_MAX, as this only gives x -> x, exception for s15, s20, s21, since it contains other_ch with usefull behaviour
removing other_ch if it does x -> x
renamed "KEXINIT|KEX31+NEWKEYS|NO_RESP" to "rekey_ok"
removed x -> x edges for states without an edge to disconnect (commented if they contain potentially interesting behaviour)
removed NEWKEYS / DISCONNECT if not currently doing rekey (all of them)
redid removing x->x edges
removed s2 (disconnect)
removed REKEY / rekey_ok (hopefully this makes dot show the three step rekey cycle more clearly in the graph)
---------------------
REMOVING USELESS STATES:
removed rekey-related states from authed_channel (s20, s21, s26), since authed_channel is basically useless, and the rekey states take up a lot of space.
removed the edge authed_channel -> authed_channel
removed "other" from all x -> x edges
correction: switched names of none_fail and unauthed_none
removed "unauthed_channel", "authed_channel", "pw_fail_channel", "pk_fail_channel" and "unauthed_none" since the connection layer cannot be reached from these states, and they make the authentication layer part hard to read
Turns out the connection layer was reachable from pw_fail_channel, and thus from unauthed_channel...
readded pw_fail_channel, unauthed_channel
---------------------
READABILITY OF THE Dot-file:
removed commented lines
---------------------
*/
s0 [label="init" color="red"];
s0 -> s1[label="KEXINIT, SR_*, UNIMPL,
DEBUG, IGNORE, KEX30
/ KEXINIT" color="green"]
s0 -> s2[xlabel="UA_*, CH_* /
KEXINIT+DISCONNECT"]
s1 [label="keyed"];
s1 -> s5[label="KEX30 / KEX31+NEWKEYS" color="green" style=bold]
dnexit [style=invisible shape=point]
dnexit -> dn[label="*\\{DEBUG, UNIMPL}"]
s1 -> dnexit//[label="*\\{DEBUG, UNIMPL}"]
//s3 [label="conn_lost_channel"]
s4 [label="kexed"];
//s4 -> s6[label="CH_OPEN / UNIMPL"]
s5 [label="pre-kex"];
s5 -> s4[label="NEWKEYS / NO_RESP" color="green" style=bold]
s5 -> dnexit //dn[label="*\\{DEBUG, UNIMPL}"]
//s6 [label="kexed_channel"]; //ch_open from s4 (kexed)
//s6 -> s4[label="CH_CLOSE / UNIMPL"]
}
//s0:sw -> dntrans:nw[ltail=cluster_trans]
subgraph cluster_auth {
label="Authentication layer"
s7 [label="pre-auth"];
s7 -> s12[label="UA_NONE / UA_FAILURE"]
s7 -> s9[label="UA_PK_NOK / UA_FAILURE"]
//s7 -> s8[label="CH_OPEN / UNIMPL"]
s7 -> s11[label="UA_PW_NOK / UA_FAILURE"]
s7:sw -> s7:nw[label="CH_*, KEXINIT, KEX30 / UNIMPL" color="blue" fontcolor="blue"]
s9 [label="pk_fail"];
s9 -> s9 [label="UA_PK_NOK / UA_FAILURE"]
//s9 -> s2 [label="UA_{NONE, PK_OK, PW_OK, PW_NOK} / DISCONNECT"]
s9 -> dn [label="UA_{*\\PK_NOK}"]
s9:sw -> s9:nw[color="blue" fontcolor="blue"]
s11 [label="pw_fail"];
s11 -> dn[label="UA_{NONE, PK_*}"]
s11 -> s11[label="UA_PW_NOK / UA_FAILURE"]
s11:sw -> s11:nw[color="blue" fontcolor="blue" side=l]
s12 [label="none_fail"]; //UA_NONE from s7 (unauthed) or CH_CLOSE from s13 (none_fail) (although no channel is open in s13)
//s12 -> s2[label="UA_{PK_OK, PK_NOK, PW_OK, PW_NOK} / DISCONNECT"]
s12 -> dn[label="UA_{*\\NONE}"]
s12 -> s12[label="UA_NONE / UA_FAILURE"]
s12:sw -> s12:nw[color="blue" fontcolor="blue"]
}
s4 -> s7[label="SR_AUTH / SR_ACCEPT" color="green" style=bold]
s12 -> s2[label="SR_CONN / DISCONNECT
DISCONNECT, NEWKEYS / NO_CONN" ltail=cluster_auth]
subgraph cluster_conn {
label="Connection layer"
subgraph cluster_conn_states {
label=""
style=dotted;
s10 [label="auth"];
//s10 -> s19[label="KEXINIT / KEXINIT"]
s10 -> s18[label="CH_OPEN / CH_OPEN_SUCCESS" color="green" style=bold]
//s10 -> s2[label="{DISCONNECT, NEWKEYS} / NO_CONN" color="green" style=bold]
//s10 -> s50[label="REKEY SEQUENCE" style="dashed"]
s10 -> s10[color=red fontcolor=red]
s18 [label="chan"];
s18 -> s10[label="CH_CLOSE / CH_CLOSE"]
//s18 -> s23[label="KEXINIT / KEXINIT"]
s18 -> s24[label="CH_REQUEST_PTY / CH_SUCCESS" color="green" style=bold]
s18 -> s18[label="SR_*, UA_*, KEX30
/ UNIMPL" color=red fontcolor=red]
//s18 -> s2[label="{DISCONNECT, NEWKEYS}/NO_CONN"]
s24 [label="pty"];
s24 -> s10[label="CH_CLOSE / CH_CLOSE" color="green" style=bold]
//s24 -> s24[label="REKEY SEQUENCE"]
//s24 -> s29[label="KEXINIT / KEXINIT"]
s24 -> s24[color=red fontcolor=red]//label="SR_AUTH / UNIMPL"]
//s24 -> s2[label="{DISCONNECT, NEWKEYS}/NO_CONN
// CH_REQUEST_PTY/DISCONNECT"]
}
//s18 -> s2[label="{DISCONNECT, NEWKEYS}/NO_CONN" ltail=cluster_conn_states]
s7 -> s10[label="UA_{PK_OK, PW_OK} / UA_SUCCESS" color="green" style=bold]
s11 -> s10[label="UA_PW_OK / UA_SUCCESS"]
subgraph cluster_rekey_states {
label=""
style=dotted;
shape=circle;
rauth; //s17
rchan; //s22
rpty; //s27
}
}
rauth -> rchan;
rchan -> rpty;
rchan -> rauth;
rauth -> rauth [color=blue]
rpty -> rpty [color=blue]
rchan -> rchan [color=blue label="SR_*, KEX30 / NO_RESP
UA_* / UNIMPL" fontcolor=blue]
rpty -> rauth;
//rchan -> rpty[label="SR_AUTH/NO_RESP" color=blue lhead=cluster_rekey_states ltail=cluster_rekey_states];
//s10 -> rauth[label="REKEY SEQUENCE" style="dashed" ltail=cluster_conn_states lhead=cluster_rekey_states]
s10 -> rauth[label="REKEY SEQUENCE" style="dashed"] //lhead=cluster_rekey_states]
s18 -> rchan[style="dashed"]//[label="REKEY SEQUENCE" style="dashed"] //lhead=cluster_rekey_states]
s24 -> rpty[style="dashed"]//[label="REKEY SEQUENCE" style="dashed"] //lhead=cluster_rekey_states]
s18 -> s2 [label="DISCONNECT, NEWKEYS
/ NO_CONN" ltail=cluster_conn]
//s4 -> s9 [style=invisible]
//s1 -> s5 [style=invisible]
//s25 [label="authed_kexed"];
//s25 -> s17[label="NEWKEYS / NO_RESP"]
//s27 [label="has_pty_rekeyed"];
//s27 -> s17[label="CH_CLOSE / CH_CLOSE"]
//s27 -> s29[label="KEXINIT / KEXINIT"]
//s27 -> s27[label="REKEY SEQUENCE" style="dashed"]
//s27 -> s27[label="SR_AUTH / NO_RESP
// REKEY SEQUENCE"]
//s24 -> s2[label="{DISCONNECT, NEWKEYS}/NO_CONN
// CH_REQUEST_PTY/DISCONNECT"]
//s50 [label="rekey_conn_states"]
//s50 -> s2[label="{DISCONNECT, NEWKEYS}/NO_CONN"]
//s50 -> s50[label="SR_AUTH / NO_RESP"]
//s28 [label="has_channel_kexed"];
//s28 -> s22[label="NEWKEYS / NO_RESP"]
//s29 [label="has_commands_pty_prekex"];
//s29 -> s30[label="KEX30 / KEX31+NEWKEYS"]
//s30 [label="has_commands_pty_kexed"];
//s30 -> s27[label="NEWKEYS / NO_RESP"]
}
......@@ -9,12 +9,20 @@ In our experimental setup, the {\dlearner} and {\dmapper} were running in a Linu
learned over a localhost connection, whereas BitVise was learned over a virtual connection with the Windows host machine.
Certain arrangements had to be made including the setting of timing parameters to fit each implementation.
\begin{figure*}
\centering
\includegraphics[scale=0.32]{ssh-server}
\caption{OpenSSH server. States are distributed into 3 clusters, one for each layer, plus a state for when connection was lost.
We eliminate redundant states and information induced by the {\dmapper}, as well as states present in successful rekey sequences. Wherever rekey was permitted, we replaced the rekey states and transitions by a single \textsl{REKEY SEQUENCE} transition. We also factor out edges common to states within a cluster. We replace common disconnecting edges, by one edge from the cluster to the disconnect state. Common self loop edges are colored, and the actual i/o information only appears on one edge. Transitions with similar start and end states are joined together on the same edge. Transition labels are kept short by regular expressions(UA\_* stands for all inputs starting with UA\_) or by factoring out common start strings. Green edges highlight the happy flow. }
\label{fig:sshserver}
\end{figure*}
OpenSSH was learned using a full alphabet, whereas DropBear and BitVise were learned using a reduced alphabet. Both versions of
the alphabets are described in Subsection~\ref{subsec:alphabet}. The primary reason for using a reduced alphabet was to reduce learning times.
Most inputs excluded were inputs that either didn't change behavior (like \textsc{debug} or \textsc{unimpl}), or that proved costly time-wise,
and were not critical to penetrating all layers. A concrete example is the user/password based authentication inputs (\textsc{ua\_pw\_ok} and
\textsc{ua\_pw\_nok}). It would take the system 2-3 seconds to respond to an invalid password, perhaps in an attempt to thwart brute force attacks.
By contrast, public key authentication resulted in quick responses. The \textsc{disconnect} input presented similar
\textsc{ua\_pw\_nok}). It would take the system 2-3 seconds to respond to an invalid password, perhaps in an attempt by the designers to thwart
brute force attacks. By contrast, public key authentication resulted in quick responses. The \textsc{disconnect} input presented similar
challenges, particularly for BitVise.
%Not only that, but failed password authentication
......@@ -32,6 +40,7 @@ with randomly generated middle sections. No formal confidence is provided, but p
can be set to higher values. We executed a random test suite with {\dk} of 4 comprising 40000 tests for OpenSSH, and 20000 tests for BitVise and DropBear.
We then ran an exhaustive test suite with {\dk} of 2 for for all implementations.
Table~\ref{tab:experiments} describes the exact versions of the systems analyzed together with statistics on learning and testing, namely:
(1) the number of states in the learned model, (2) the number of hypotheses built during the learning process and (3) the total number of learning and test queries run. For test queries, we only consider those run on the last hypothesis.
......@@ -57,9 +66,14 @@ DropBear v2014.65 & 17 & 2 & 19863
The large number of states is down to several reasons. First of all, some systems exhibited buffering behavior. In particular, BitVise would queue
responses for higher layer inputs sent during key re-exchange, and would deliver them all at once, after the exchange was done. Re-exchanging keys (rekey-ing) was also
a major contributor to the number of states. In states permitting rekey, following the sequence of transitions comprising the rekey should lead back to the starting state. This
leads to 2 additional rekey states for every state permitting rekey. A considerable number of states were also added due to {\dmapper} generated outputs such as \textsl{ch\_none} or \textsl{ch\_max}, outputs which signal that no channel is open or that the maximum number of channels have been opened.
leads to 2 additional rekey states for every state permitting rekey. A considerable number of states were also added due to {\dmapper} generated outputs such as \textsc{ch\_none} or \textsc{ch\_max}, outputs which signal that no channel is open or that the maximum number of channels have been opened.
Figure~{fig:sshserver} shows the model learned for OpenSSH, with various changes applied to improve readability. The happy flow, contoured in green, is fully explored in the model and mostly matches our earlier description of it\footnote{The only exception is in the transport layer, where unlike in our happy flow definition, the server is the first to send the \textsc{newkeys} message. This is also accepted behavior, as the protocol does not specify which side should send the \textsc{newkeys} first.}. Also explored is what happens when a rekey sequence is attempted. We notice that rekey is only allowed in states of the Connection layer. Strangely, for these states, rekey is not state preserving, as the generated output on receiving a \textsc{sr\_auth}, \textsc{sr\_conn} or \textsc{kex30} changes from \textsc{unimpl} to \textsc{no\_resp}. This leads to two sub-clusters of states, one before the first rekey, the other afterward. In all other states, the first step of a rekey (\textsc{kexinit}) yields (\textsc{unimpl}), while the last step (\textsc{newkeys}) causes the system to disconnect. We also notice the intricate authentication behavior, it seems that password authentication is the only form that allows you to authenticate after an unsuccessful attempt.
%Figure~{fig:sshserver} shows
%To give a concrete example, the {\dmapper} on every \textsl{ch\_open} saves a channel identifier and sends
%a corresponding message to the {\dsut}. If \textsl{ch\_open} is called again, the {\dmapper} responds with a \textsl{ch\_max}. The channel identifier is removed
......
......@@ -11,7 +11,7 @@ The learning setup consists of three components: a \emph{learner}, \emph{mapper}
The learner uses LearnLib ~\cite{LearnLib2009}, a Java library implementing $L^{\ast}$ based algorithms for learning Mealy machines. The {\dmapper} is based on Paramiko, an open source SSH implementation written in Python\footnote{Paramiko is available at \url{http://www.paramiko.org/}}. We opted for Paramiko because its code is relatively well structured and documented. The {\dsut} can be any existing implementation of a SSH server. The {\dlearner} communicates with the {\dmapper} over sockets. A graphical representation of our setup is shown in Figure~\ref{fig:components}.
\begin{figure}
\centering
\includegraphics[scale=0.40]{components.png}
\includegraphics[scale=0.29]{components.pdf}
\caption{The SSH learning setup.}
\label{fig:components}
\end{figure}
......@@ -19,7 +19,7 @@ The learner uses LearnLib ~\cite{LearnLib2009}, a Java library implementing $L^{
SSH is a complex client-server type protocol. It would be exceedingly difficult to learn all its facets, thus we narrow down the learning goal to learning SSH
server implementations. We further restrict learning to only exploring the terminal service of the connection layer, as we consider it to be the most interesting
from a security perspective. Algorithms for encryption, compression and hashing are set to default settings and are not purposefully explored. Also, the starting
state of the {\dsut} is one where a TCP connection has been established and where SSH versions have been exchanged, a required first step of the Transport Layer protocol.
state of the {\dsut} is one where a TCP connection has already been established and where SSH versions have been exchanged, a required first step of the Transport layer protocol.
%figure
%It is therefore important to focus on messages for which interesting state-changing behaviour can be expected.
......@@ -29,13 +29,17 @@ Learning doesn't scale with a growing alphabet, hence it is important to reduce
only selecting inputs that are consistent with our learning goal.
Since we are learning only SSH server implementations, we filter out messages that were not intended to be sent to the server. \footnote{Applying this principle to
the RFC's messages results in not including \textsc{service\_accept}, \textsc{ua\_accept}, \textsc{ua\_failure}, \textsc{ua\_banner}, \textsc{ua\_pk\_ok}, \textsc{ua\_pw\_changereq}, \textsc{ch\_success} and \textsc{ch\_failure} in our alphabet.} Furthermore, from the Connection layer we only select general inputs plus those relating to the terminal functionality.
We reduce the alphabet further by only selecting inputs which follow the binary packet protocol, hence we don't include the identification input which should
be sent by both client and server at the start of every connection. The exchange of these inputs is made implicit. Finally, from the inputs defined, we make a selection
of essential inputs. These comprise the restricted alphabet, which we will use in some experiments. Essential inputs are marked with '*'.
the RFC's messages results in not including \textsc{service\_accept}, \textsc{ua\_accept}, \textsc{ua\_failure}, \textsc{ua\_banner}, \textsc{ua\_pk\_ok}, \textsc{ua\_pw\_changereq}, \textsc{ch\_success} and \textsc{ch\_failure} in our alphabet.} Furthermore, from the Connection layer we only select general inputs for channel management plus those relating to the terminal functionality.
As mentioned previously, the starting state is already one where SSH versions have been exchanged, thus we also exclude these messages from our alphabet as they are no longer useful
in exploring SSH's behavior once the initial exchange took place. From the inputs defined, we make a selection of essential inputs. The selection forms the \textit{restricted alphabet}, which we use in some experiments. The restricted alphabet significantly decreases the number of queries needed to learn while only marginally limiting explored behavior. We touch on this again in the Section~\ref{sec:result}. Inputs included in the restricted alphabet are marked by '*'.
%messages which don't adhere
%to the binary packet p
%We reduce the alphabet further by only selecting inputs which follow the binary packet protocol, hence we don't include the identification input which should
%be sent by both client and server at the start of every connection. The exchange of these inputs is made implicit.
Table~\ref{trans-alphabet} introduces the Transport layer inputs. We include two versions of the \textsc{kexinit} message, one where \texttt{first\_kex\_packet\_follows} is disabled, the other when it is enabled, in which case, the message would make a guess on the security parameters~\cite[p. 17]{rfc4253}. Our mapper can only handle correct key guesses, so the wrong-guess procedure as described in ~\cite[p. 19]{rfc4253} was not supported.
Table~\ref{trans-alphabet} introduces the Transport layer inputs. We include two versions of the \textsc{kexinit} message, one where \texttt{first\_kex\_packet\_follows} is disabled, the other when it is enabled, in which case, the message would make a guess on the security parameters~\cite[p. 17]{rfc4253}. Our mapper can only handle correct key guesses, so the wrong-guess procedure as described in ~\cite[p. 19]{rfc4253} was not supported. \textsc{ignore}, \textsc{unimpl} and \textsc{debug} are not expected to alter behavior so excluded from our restricted alphabet. \textsc{discon} proved costly time wise, so was also excluded.
%When needed, SUTs were configured to make this guess work by altering their cipher preferences. The SSH version and comment string (described in Section~\ref{ssh-run-trans}) was not queried because it does not follow the binary packet protocol.
\begin{table}[!ht]
......@@ -48,17 +52,17 @@ Table~\ref{trans-alphabet} introduces the Transport layer inputs. We include two
\textsc{unimpl} & Intended response to an unimplemented message~\cite[p. 25]{rfc4253} \\
\textsc{debug} & Provides other party with debug information~\cite[p. 25]{rfc4253} \\
\textsc{kexinit}* & Sends parameter preferences~\cite[p. 17]{rfc4253} \\
\textsc{guessinit} & A \textsc{kexinit} after which a guessed \textsc{kex30} follows~\cite[p. 19]{rfc4253} \\
\textsc{guessinit}* & A \textsc{kexinit} after which a guessed \textsc{kex30} follows~\cite[p. 19]{rfc4253} \\
\textsc{kex30}* & Initializes the Diffie-Hellman key exchange~\cite[p. 21]{rfc4253} \\
\textsc{newkeys}* & Requests to take new keys into use~\cite[p. 21]{rfc4253} \\
\textsc{sr\_auth}* & Requests the authentication protocol~\cite[p. 23]{rfc4253} \\
\textsc{sr\_conn}* & Requests the connection protocol~\cite[p. 23]{rfc4253}
\end{tabular}
\caption{Transport Layer inputs}
\caption{Transport layer inputs}
\label{trans-alphabet}
\end{table}
The Authentication Layer defines one single client message type in the form of the authentication request~\cite[p. 4]{rfc4252}. Its parameters contain all information needed for authentication. Four authentication methods exist: none, password, public key and host-based. Our mapper supports all methods except the host-based authentication because various SUTs lack support for this feature. Both the public key and password methods have an \textsc{ok} and a \textsc{nok} variant which provides respectively correct and incorrect credentials.
The Authentication layer defines one single client message type in the form of the authentication request~\cite[p. 4]{rfc4252}. Its parameters contain all information needed for authentication. Four authentication methods exist: none, password, public key and host-based. Our mapper supports all methods except the host-based authentication because various SUTs lack support for this feature. Both the public key and password methods have an \textsc{ok} and a \textsc{nok} variant which provides respectively correct and incorrect credentials. Our restricted alphabet supports only public key authentication, as it was processed the fastest by implementations out of all authentication forms.
\begin{table}[!ht]
\centering
......@@ -71,12 +75,13 @@ The Authentication Layer defines one single client message type in the form of t
\textsc{ua\_pw\_ok} & Provides a valid name/password combination~\cite[p. 10]{rfc4252} \\
\textsc{ua\_pw\_nok} & Provides an invalid name/password combination~\cite[p. 10]{rfc4252} \\
\end{tabular}
\caption{Authentication Layer inputs}
\caption{Authentication layer inputs}
\label{auth-alphabet}
\end{table}
The Connection Layer allows the client manage channels and to request/run services over them. In accordance with our learning goal,
The Connection layer allows the client manage channels and to request/run services over them. In accordance with our learning goal,
our mapper only supports inputs for requesting terminal emulation, plus inputs for channel management as shown in Table~\ref{conn-alphabet}.
The restricted alphabet only supports the most general channel management inputs. Those excluded are not expected to produce state change.
\begin{table}[!ht]
......@@ -92,7 +97,7 @@ our mapper only supports inputs for requesting terminal emulation, plus inputs f
\textsc{ch\_window\_adjust} & Adjusts the window size~\cite[p. 7]{rfc4254} \\
\textsc{ch\_request\_pty}* & Requests terminal emulation~\cite[p. 11]{rfc4254} \\
\end{tabular}
\caption{Connection Layer inputs}
\caption{Connection layer inputs}
\label{conn-alphabet}
\end{table}
%The learning alphabet comprises of input/output messages by which the {\dlearner} interfaces with the {\dmapper}. Section~\ref{sec:ssh} outlines essential inputs, while Table X provides a summary
......
......@@ -10,7 +10,7 @@
\usepackage{ upgreek }
\usepackage{graphicx}
\usepackage{tikz}
\usepackage{subcaption}
%\usepackage{subcaption}
%\usepackage{subcaption}
\usetikzlibrary{automata, arrows,shapes.multipart}
\lstset{
......
This diff is collapsed.
......@@ -167,6 +167,7 @@ machine learning algorithms},
author = {Futoransky, Ariel and Kargieman, Emiliano},
citeulike-article-id = {13837770},
edition = {oct. 1998},
year={1998},
howpublished = {Online. \url{http://www.coresecurity.com/files/attachments/CRC32.pdf}},
posted-at = {2015-11-13 16:33:04},
priority = {2},
......
\section{The Secure Shell Protocol} \label{sec:ssh}
The Secure Shell Protocol (or SSH) is a protocol used for secure remote login and other secure network services over an insecure network. The initial version of the protocol was superseded by a second version (SSHv2), as the former was found to contain design flaws~\cite{FutoranskyAttack} which could not be fixed without losing backwards compatibility. This work focuses on SSHv2.
The Secure Shell Protocol (or SSH) is a protocol used for secure remote login and other secure network services over an insecure network. It is an application layer protocol running on top of TCP, which provides reliable data transfer, but does not provide any form of connection security. The initial version of SSH was superseded by a second version (SSHv2), as the former was found to contain design flaws~\cite{FutoranskyAttack} which could not be fixed without losing backwards compatibility. This work focuses on SSHv2.
SSHv2 follows a client-server paradigm consisting of three components (Figure~\ref{fig:sshcomponents}):
\begin{itemize}
\item The \textit{transport layer protocol} (RFC 4253~\cite{rfc4253}) forms the basis for any communication between a client and a server. It provides confidentiality, integrity and server authentication as well as optional compression.
\item The \textit{transport layer protocol} (RFC 4253~\cite{rfc4253}) forms the basis for any communication between a client and a server. It provides confidentiality, integrity and server authentication as well as optional compression.
\item The \textit{user authentication protocol} (RFC 4252~\cite{rfc4252}) is used to authenticate the client to the server.
\item The \textit{connection protocol} (RFC 4254~\cite{rfc4254}) allows the encrypted channel to be multiplexed in different channels. These channels enable a user to run multiple processes, such as terminal emulation or file transfer, over a single SSH connection.
\end{itemize}
Each layer has its own specific messages. The SSH protocol is interesting in that outer layers do not encapsulate inner layers. This means that different layers can interact. Hence, it makes sense to learn SSH as a whole, instead of analyzing its constituent layers independently. We review each layer, outlining the relevant messages which are later used in learning. We refer to as \textit{happy flow},
the common case of a successful operation. The SSH protocol happy flow at a high level consists of the following steps: (1) the client first establishes a TCP connection with the server, (2) the two sides negotiate encryption algorithms, and subsequently establish one-time session keys via a negotiated key exchange method, further communication is secured using these keys, (3) the client authenticates to the server by providing valid credentials and finally, (4) the client accesses services on the server like for example the terminal service. We omit the first step in our detailed description. The following steps are an integral part of each of SSH's layers.
the common case of a successful operation.
The SSH protocol happy flow at a high level consists of the following steps: (1) the client first establishes a TCP connection with the server, (2) the two sides negotiate encryption algorithms, and subsequently establish one-time session keys via a negotiated key exchange method, further communication is secured using these keys, (3) the client authenticates to the server by providing valid credentials and finally, (4) the client accesses services on the server like for example the terminal service. Leaving out the first step, each step that follows is an integral part of each SSH layers.
%perhaps room for figure
......@@ -24,34 +26,33 @@ the common case of a successful operation. The SSH protocol happy flow at a high
\end{figure}
\subsection{Transport layer}\label{ssh-run-trans}
SSH provides end-to-end encryption based on pseudo-random keys which are securely negotiated as part of a \textsl{key exchange} method.
Key exchange begins by the two sides exchanging their preferences for negotiable parameters relating to the actual key exchange algorithm used, as well as encryption, compression and hashing algorithms preferred and supported by either side. Preferences are sent via the \textsc{kexinit} message. Subsequently, key exchange using the negotiated algorithm takes place. Following this algorithm, one-time session keys for encryption and hashing are generated by each side, together with an identifier for the session. Diffie-Hellman is the main key exchange algorithm, and the only one required for support by the RFC. Under the Diffie-Hellman scheme, \textsc{kex30} and \textsc{kex31} are exchanged and new session keys are produced. These keys are used from the moment the \textsc{newkeys} command has been issued by both parties. A subsequent \textsc{sr\_auth} requests the authentication service. The happy flow thus consists of the succession of the three steps comprising key exchange, followed up by a successful authentication service request. The sequence is shown in Figure~\ref{fig:sshcomponents}.
SSH runs over TCP, and provides end-to-end encryption based on pseudo-random keys. Once a TCP connection has been established with the server, these keys are securely negotiated as part of a \textsl{key exchange} method, the first step of the protocol. Key exchange begins by the two sides exchanging their preferences for negotiable parameters relating to the actual key exchange algorithm used, as well as encryption, compression and hashing algorithms preferred and supported by either side. Preferences are sent via the \textsc{kexinit} message. Subsequently, key exchange using the negotiated algorithm takes place. Following this algorithm, one-time session keys for encryption and hashing are generated by each side, together with an identifier for the session. Diffie-Hellman is the main key exchange algorithm, and the only one required for support by the RFC. Under the Diffie-Hellman scheme, \textsc{kex30} and \textsc{kex31} are exchanged and new session keys are produced. These keys are used from the moment the \textsc{newkeys} command has been issued by both parties. A subsequent \textsc{sr\_auth} requests the authentication service. The happy flow thus consists of the succession of the three steps comprising key exchange, followed up by a successful authentication service request. The sequence is shown in Figure~\ref{fig:hf-trans}.
\begin{figure}[!hb]
\includegraphics[scale=0.32]{hf-trans.pdf}
\includegraphics[scale=0.285]{hf-trans.pdf}
\caption{The happy flow for the transport layer.}
\label{fig:sshcomponents}
\label{fig:hf-trans}
\end{figure}
\textsl{Key re-exchange}, or rekey, is a near identical process, with the difference being that instead of taking place at the beginning, it takes place once session keys are already in place. The purpose is to renew session keys so as to foil potential replay attacks. It follows the same steps as key exchange. Messages exchanged as part of it are encrypted using the old set of keys, messages exchanged afterward are encrypted using the new keys.
\textsl{Key re-exchange}~\cite[p. 23]{rfc4253}, or \textsl{rekey}, is a near identical process, with the difference being that instead of taking place at the beginning, it takes place once session keys are already in place. The purpose is to renew session keys so as to foil potential replay attacks~\cite[p. 17]{rfc4251}. It follows the same steps as key exchange. Messages exchanged as part of it are encrypted using the old set of keys, messages exchanged afterward are encrypted using the new keys. A key property of rekey is that it should preserve the state. That is, the state of the protocol before a rekey should be the same as afterward, with the difference that new keys are in use. %Some implementations are known not support rekey in certain states of the protocol.
%We consider an transport layer state machine secure if there is no path from the initial state to the point where the authentication service is invoked without exchanging and employing cryptographic keys.
\subsection{Authentication layer}\label{ssh-run-auth}
Once the authentication service has been summoned, authentication can commence. To this end, RFC 4252~\cite{rfc4252} defines four authentication methods (password, public-key, host-based and none). The authentication request includes a user name, service name and authentication data, which includes both the authentication method as well as the data needed to perform the actual authentication, such the password or public key. The happy flow for this layer is defined as the sequence that results in a successful authentication. The messages \textsc{ua\_pw\_ok} and \textsc{ua\_pk\_ok} achieve this for respectively password or public key authentication.
Once the authentication service has been summoned, authentication can commence. To this end, RFC 4252~\cite{rfc4252} defines four authentication methods (password, public-key, host-based and none). The authentication request includes a user name, service name and authentication data, which includes both the authentication method as well as the data needed to perform the actual authentication, such the password or public key. The happy flow for this layer is defined as the sequence that results in a successful authentication. The messages \textsc{ua\_pw\_ok} and \textsc{ua\_pk\_ok} achieve this for respectively password or public key authentication. Figure~\ref{fig:hf-auth} presents the case involving password authentication.
%We consider a user authentication layer state machine secure if there is no path from the unauthenticated state to the authenticated state without providing correct credentials.
\begin{figure}[!ht]
\includegraphics[scale=0.45]{hf-auth.pdf}
\caption{The happy flow for the user authentication layer.}
\label{fig:sshcomponents}
\label{fig:hf-auth}
\end{figure}
\subsection{Connection layer}\label{ssh-run-conn}
Successful authentication makes available services of the Connection layer. The Connection layer enables the user to open and close channels of different types, with each type providing access to specific services. From the various services available, we focus on the remote terminal over a session channel, a standout feature of SSH. A happy flow would involve opening a session channel, \textsc{ch\_open}, requesting a terminal \textsc{ch\_request\_pty}, sending and managing data via the messages \textsc{ch\_send\_data}, \textsc{ch\_adj\_win}, \textsc{ch\_send\_eof}, and eventually closing the channel via \textsc{ch\_close}.
Successful authentication makes services of the Connection layer available. The Connection layer enables the user to open and close channels of different types, with each type providing access to specific services. From the various services available, we focus on the remote terminal over a session channel, a standout feature of SSH. The happy flow involves opening a session channel, \textsc{ch\_open}, requesting a ``pseudo terminal'' \textsc{ch\_request\_pty}, sending and managing data via the messages \textsc{ch\_send\_data}, \textsc{ch\_adj\_win}, \textsc{ch\_send\_eof}, and eventually closing the channel via \textsc{ch\_close}. Figure~\ref{fig:hf-conn} describes the aforementioned sequence.
%Because the connection protocol offers a wide range of functionalities, we it is hard to define a single happy flow. Requesting a terminal is one of the main features of SSH and has therefore been selected as the happy flow. This behaviour is typically triggered by the trace \textsc{ch\_open}; \textsc{ch\_request\_pty}. Other
......@@ -61,5 +62,5 @@ Successful authentication makes available services of the Connection layer. The
\begin{figure}[!ht]
\includegraphics[scale=0.35]{hf-conn.pdf}
\caption{The happy flow for the connection layer.}
\label{fig:sshcomponents}
\label{fig:hf-conn}
\end{figure}
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment