Commit 6203adf7 authored by Paul Fiterau Brostean's avatar Paul Fiterau Brostean
Browse files

added Toon's bachelor thesis

parents
\documentclass[10pt,a4paper]{article}
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{graphicx}
\usepackage{fancyref}
\author{Toon Lenaerts s4321219}
\begin{document}
\section{Conclusion} \label{conclusion}
In this thesis we improved upon Verlegs thesis in three aspects. We expand on our conclusions regarding these three aspects in Sections \ref{conctraining} to \ref{concchecking}. We also present some suggestions for further work in Section \ref{concfurther}.
\subsection{Training Complete Model} \label{conctraining}
We have trained a model of the entire SSH-protocol, regardless of time concerns noted by Verleg. The model contains the desired behavior of SSH. Therefore, we have been able to successfully train all three layers of SSH simultaneously. Our model uses the entire training alphabet for all three layers, rather than a specialized alphabet per layer. Because of this, our model can contain information not present in Verlegs models. Therefore, we have improved upon Verlegs work by learning the three layers of SSH in a single model.
\subsection{Testing} \label{conctesting}
By using a more advanced testing method, we have gained a measure of confidence in our generated model; our model is representative of the SSH implementation, or the real model is at least two states larger. Using a random walk algorithm does not give this measure of confidence. We have therefore improved upon Verlegs work by using this testing algorithm.
\subsection{Model Checking} \label{concchecking}
We have checked two LTL properties with our model. These properties are formalized from security definitions for SSH defined by Verleg. According to NuSMV, both properties hold for our generated model. We have shown an application of using model checking to verify properties of a model generated through protocol state fuzzing. Through model checking we have gained a measure of confidence for the conclusions made about our model. This confidence is not present in models that are purely checked by hand. Therefore, we have improved upon Verlegs work by using model checking for the generated model.
\subsection{Further Work} \label{concfurther}
Possible further work is extending our setup with different implementations of SSH, similar to Verlegs original work. It is also possible to extend the tested properties and test different aspects than security.
Some parameter abstractions are included in our setup, these can be removed, and used in learning. This would require a different approach however, since Mealy machines are not compatible with parameters. For example, various parameters are negotiated in the key exchange protocol, these could be extracted. SSH also uses a sequence number which could be extracted.
Verlegs thesis contains additional suggestions for further work that remain relevant; Extending the alphabet, using different SSH settings or implementing additional SSH behavior. Some of the suggestions Verleg does are related to decisions they made to keep learning times shorter, applying the suggestions would extend learning time. A different direction would be to attempt to shorten learning time, this makes it feasible to run the testing algorithm with higher parameters, resulting in more confidence in learned models.
\end{document}
\ No newline at end of file
\documentclass[10pt,a4paper]{article}
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{graphicx}
\usepackage{fancyref}
\author{Toon Lenaerts s4321219}
\title{Introduction}
\begin{document}
\section{Introduction}
Fuzz testing or fuzzing is a testing technique where specific kinds of input data are sent to a software system. The behavior of the system is then monitored for anomalies. Different kinds of input data can be sent to the system under testing (SUT) in order to test for different properties. For example, invalid input can be sent to provoke an exception in the SUT. Another example is sending a longer input than expected to provoke buffer overflow errors.
In this work, the fuzzing technique we focus on is protocol state fuzzing. In protocol state fuzzing a state machine is inferred from a protocol implementation. This state machine is then inspected for unwanted behavior. Protocol state fuzzing is done as follows: First, we choose an implementation of a certain protocol. Then, we send messages compliant with the protocol to the implementation. All of the received output is recorded. Based on the order of inputs and received outputs we generate a state machine which is representative of the SUT. Finally, we can verify whether the machine is complaint with the protocol the SUT is based on.
Recently, Verleg has performed protocol state fuzzing on implementations of the Secure Shell protocol. Verleg learns three models for each of the tested implementations; one for each of the layers defined in SSH. These models are then evaluated by hand to find security related issues. For this evaluation, Verleg has defined several security properties from the RFCs of SSH. With this approach, Verleg has found a bug in the OpenSSH implementation and security related issues in several other implementations.
The work of Verleg, however, has not gone without criticism. There are three aspects where the work can be improved upon. First, it is possible to learn a model for the entirety of the SSH protocol, which could show additional security issues. Secondly, Verleg uses a method of testing that can be improved upon. Finally, checking the generated models with security properties can be done automatically, by using a model checker. In this thesis, we repeat Verlegs experiment, and improve upon all three of the mentioned aspects.
Verleg split his models into layers due to time considerations. Training the entire protocol causes an increase in state space, which increases the learning time. Verleg found learning the entire protocol infeasible because of this. We do not believe so, and have implemented a method of caching to ease learning the larger model.
Verleg uses a relatively simple algorithm to test generated models. This algorithm simply runs random tests on the model, and assumes the model is correct after some time. This does not provide a measure of confidence in the generated models. However, more advanced testing methods exist that do give confidence in tested models. One such method is developed by Moerman, it is based on the adaptive distinguishing sequence algorithm by Yannakakis and Lee. We replace Verlegs testing algorithm with this algorithm.
Verleg checks his generated models by hand. As shown in ??, models obtained from fuzzing can be checked with a model checker. Testing properties with a model checker is often more thorough than checking a model by hand, especially for larger models. Model checking thus gives a higher confidence in generated conclusions. We formalize the properties Verleg used, and verify our learned model with these formal properties.
In this thesis we first provide an overview of protocol state fuzzing and the Secure Shell algorithm in Section \ref{prelim}. A brief overview of related work is given in section \ref{sota}. Then, in Section \ref{method} we explain in detail what we altered from Verlegs work. In Section \ref{result} we give the results of our improved learning setup. We further expand on some of the decisions made for this thesis in Section \ref{discus}. Finally, we present our conclusions in Section \ref{conclusion}.
\end{document}
\ No newline at end of file
\documentclass[10pt,a4paper]{article}
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{graphicx}
\usepackage{listings}
\author{Toon Lenaerts s4321219}
\title{Method}
\begin{document}
Note: I assume that the reader has read Patrick's thesis, and is therefore familiar with the basic principles of protocol state fuzzing and its terminology. Altough I am not sure if I should explain NuSMV / LTL in more detail.
\section{Method}
Since we change Verlegs work in three ways, we will expand on the three categories of changes separately. In Section 3.1, we explain our changes to Verlegs program to enable mapping of all three SSH-Layers. Then, in Secion 3.2, we provide a brief explanation of the testing algorithm we used. Finally, in Section 3.3, we explain how we used model checking to verify security properties Verleg defined for SSH.
\subsection{Building a complete model}
As said previously, Verleg split building the model of SSH into three layers as defined in the SSH-specification. Part of the reason for this is practical, it takes a very long time to infer the entire model. Combining this with problems of non-determinism Verleg experienced, and lack of handling this non-determinism, meant that inferring the full model would often fail partway through. The other reason Verleg mentioned is that the setup was unable to detect previously run layers interacting when messages are sent, we took this for granted, and merely expected this to cause an increase in state-space.
With the most prominent problems in inferring the entire SSH-protocol being time and the process failing due to non-determinism, we investigated the methods of caching Verleg used in their mapper. The results of queries were saved during the process, however, it was mainly used to detect non-determinism by comparing the output of partial new queries with the output of old queries. A way to resume an old run of the program through the cache was implemented, but contained errors. We implemented using the cache to resume an earlier run of the program. We did not expand further on handling non-determinism and caching, we explain why in more detail in Section 5: Discussion. In order to reduce the amount of non-determinism, we had to fine-tune time-out parameters in the mapper, this was done mostly on a trial-and-error basis. The result was that running a relatively long query would take a considerable time, and thus that the entire fuzzing-process would take be time-consuming.
Since we want to improve the process of state fuzzing used by Verleg, rather than add upon the size of his research, we only map a single SSH implementation. The implementation used is OpenSSH, version 6.9p1-2, an implementation of SSH that Verleg also used.
\subsection{Improving Testing}
We attempt to improve the testing used in Verlegs process by using a different equivalence oracle. Verleg used a random-walk oracle, this oracle randomly chooses a path through the model, sends the required inputs to the SUT, and checks if the output corresponds to the expected output of the chosen path. This is repeated for some time, until the model is assumed to be correct. We instead use an implementation of an algorithm specified by (?) based on an algorithm by Lee \& Yannakakis to find adaptive distinguishing sequences.
The algorithm uses distinguishing sequences for every state of the hypothesis, a distinguishing sequence is a set of in- and outputs that provides a unique result for a certain state. Practically, it is possible to check if an unknown state $x$ is actually state $a$ by entering the distinguishing state starting from $x$, if the resulting output is equal to the predicted output, then we started from state $a$. The algorithm, when run exhausively on a model, will check all paths of an incrementing length between states. If the algorithm does not find a counterexample up to a specific path-length, we can say that the tested model is correct, unless the actual model is larger than the checked hypothesis by an amount of states larger than the tested path-length. Typically, the algorithm can be run either exhaustively or randomly, running it randomly is usually faster for finding counter-examples, but does not provide the confidence that using exhaustive testing does.
\subsection{Model Checking}
We use the NuSMV model checker to verify the properties defined by Patrick. NuSMV is a model checker where a state of a model is specified as a set of finite variables, and a transition-function can then be defined to change these variables. Specifications in temporal logic, such as CTL and LTL, can be checked for truth on specified models. NuSMV will provide a counterexample if a certain specification is not true.
The NuSMV model is generated from the result of the fuzzing process, these results are in .dot-format. Note that, as a side-effect of using a simple way to generate the NuSMV-code, we do not use arrays or any form of collection for transitions with multiple packets. Rather, we create a separate word with the concatenation of packets. This can be accounted for in an LTL-specification by simply replacing a packet with the conjunction of all words that contain that packet.
The properties we will verify are LTL-representations of the security properties Verleg defined for each layer of SSH. Verleg has defined three properties. The property of the connection layer, however, is very open ended. It is: ``We will ... look at unexpected state machine transitions that can point towards potential implementation flaws'', we can not properly represent this in LTL. The properties for the transport- and user authentication layers, however, can be represented in LTL.
\subsubsection{Transport Layer}
The security property for the Transport layer is defined as follows: ``We consider a transport layer state machine secure if there is no path from the initial state to the point where the authentication service is invoked without exchanging and employing cryptographic keys.'' We consider this path to be a path where a key exchange is performed successfully and followed by a successful service authentication request. Note that the key exchange does not have to be performed in consecutive steps, it can be temporarily be interrupted by other actions.
In natural languange, we can say the property should be something like: ``If the authentication service is invoked successfully, then the final step of the key exchange must have been done, the second step must have been done earlier, and the first step even earlier.'' The key exchange consists of three specific steps which have to be performed in order. The LTL as in figure 1 is the result. The O-operator used here is the Once-operator, a past time LTL operator, which is true if its argument is true in a past time instant.
\begin{figure}[h]
\begin{align*}
&G (\text{authentication request successful} \rightarrow \\
&O (\text{NEWKEYS successful } \& \\
&O( \text{KEX\_30 successful } \& \\
&O(\text{ KEXINIT successful}) )))
\end{align*}
\caption{LTL for Transport layer with natural language}
\end{figure}
Translating this LTL to NuSMV syntax means that the remaining natural language has to be replaced with equality checks for the variables in the NuSMV state. These variables are equal to the in- and output of a transition in the Mealy machine generated by the mapper. A successful invocation of the authentication service is translated as a request for the authentication service being done, and the request being accepted by the server. The third step of the key exchange, the client sending a newkeys-packet, is successful if the server sends no response. Conveniently, no server response is an indication that the newkeys was received correctly, in our tests, the server would return a packet if the newkeys was performed incorrectly. The second step, sending the kex30 packet, is very much like the third step. Finally, the first step of the protocol is translated a bit differently, since we found that sending any packet to the server from the initial state was recognized as the first step of the key exchange protocol. Therefore, sending the first step, a kexinit-packet, can be replaced by almost any packet.
\begin{figure}[h]
\begin{lstlisting}[basicstyle=\footnotesize]
G (
(input=SERVICE_REQUEST_AUTH & output=SERVICE_ACCEPT)
->
O ( (input=NEWKEYS & output= NO_RESP) &
O ( (input=KEX30 & output=KEX31_NEWKEYS) &
O (output=KEXINIT)
)
)
)
\end{lstlisting}
\caption{Transport layer LTL in NuSMV syntax.}
\end{figure}
\subsubsection{User Authentication Layer}
The security property is as follows: ``We consider a user authentication layer state machine secure if there is no path from the unauthenticated state to the authenticated state without providing the correct credentials.'' Since we have no method to directly determine if the model is in an authenticated state, we consider the authenticated state to be a state where opening a channel can be done successfully. Since opening a channel should only be possible after user authentication, this should show if we are in an authenticated state. The specification then is rather simple: Always, if a channel is opened succesfully, then there has not been a failure in user authentication since a successful user authentication. Opening a channel successfully in the model is straightforward, a request to open a channel is sent to the server, and a success message is received back. Checking whether there has been a success or failure in user authentication is done by checking if the server ever sends the UA\_FAILURE or UA\_SUCCESS packets. Additional checks if the user actually made a user authenication request could potentially be added. The resulting LTL is shown in figure 3. This LTL uses another past time operator: The S- (or Since-) operator is true if its first argument holds for every state since its second argument holds, note that the second argument has to hold somewhere in the path for the since-operator to hold.
\begin{figure}[h]
\begin{lstlisting}[basicstyle=\footnotesize]
G ( (input= CH_OPEN & output= CH_OPEN_SUCCESS)
-> (!(output=UA_FAILURE) S output= UA_SUCCESS) )
\end{lstlisting}
\caption{User Authentication layer LTL in NuSMV syntax.}
\end{figure}
\section{Results}
\subsection{Trained Model}
The result of mapping OpenSSH is a model with 26 states and 619 transitions. A version, edited for readability has been included in the appendices. (Note: I'm still working on this readable version, apologies.)
\subsection{Testing}
We used the Yannakakis-tool on the generated hypotheses. For the final hypothesis, we completed an exhaustive test up to a path length of two. During the entire mapping process, only two hypotheses were made, the counter example to the first hypothesis was found after 865 tests. Verifying the second hypothesis took 32728 tests.
\subsection{Model Checking}
Both LTL specifications are true for the generated model, according to NuSMV.
\end{document}
\ No newline at end of file
\documentclass[10pt,a4paper]{article}
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{fancyref}
\usepackage{graphicx}
\usepackage{tikz}
\usetikzlibrary{automata, arrows,shapes.multipart}
\author{Toon Lenaerts s4321219}
\begin{document}
\section{Preliminaries}
In this section, we provide some background information for the rest of the thesis. First, we explain Mealy machines in section \ref{mealy machines}. In section \ref{learning}, we expand on typical learning setups and the setup Verleg used. Finally, in section \ref{secure shell} we provide a brief overview of the SSH-protocol.
\subsection{Mealy Machines} \label{mealy machines}
We use a Mealy machine to represent the state machine of an SSH implementation. A Mealy machine is a DFA represented by a 6-tuple $\mathcal{M} = (Q, q_0, I, O, \delta, \lambda)$, where:
\begin{itemize}
\item $Q$ is the set of states.
\item $q_0 \in Q$ is the initial state.
\item $I$ is the input alphabet.
\item $O$ is the output alphabet.
\item $\delta$ is the transition function $Q \times I \rightarrow Q$, which gives the resulting state from sending a certain input when in a certain state.
\item $\lambda$ is the transition function $Q \times I \rightarrow O$, which gives the received output when sending a certain input from a certain state.
\end{itemize}
Mealy machines can be graphically represented as a graph. States are shown as nodes and state transitions as edges. Edges are labeled as ``$<input>/<output>$'' corresponding to the transition functions: If $\delta(q_0, A) = q_1$ and $\lambda(q_0, A) = X$ then the label for the edge between $q_0$ and $q_1$ will be ``$A/X$''. An example visualization for a Mealy machine is shown in figure \ref{mealy example}.
\begin{figure}[h]
\centering
\begin{tikzpicture}[>=stealth',shorten >=1pt,auto,node distance=2.8cm]
\node[initial,state] (q0) {$q_0$};
\node[state] (q1) [right of=q0] {$q_1$};
\node[state] (q2) [right of=q1] {$q_2$};
\path[->] (q0) edge node [below] {X/A} (q1);
\path[->] (q1) edge node [below] {Y/B} (q2);
\path[->] (q0) edge [bend left] node {Y/A} (q2);
\path[->] (q2) edge [loop right] node [text centered, text width = 1cm] {X/A \\ Y/B} (q2);
\path[->] (q1) edge [loop below] node {X/B} (q1);
\end{tikzpicture}
\caption{Example Mealy machine}
\label{mealy example}
\end{figure}
\subsection{Learning state machines} \label{learning}
In protocol state fuzzing, a state machine of the implementation is learned by actively sending and receiving messages to and from the System Under Learning (SUL). A single message is called a \textit{query}, and a sequence of messages a \textit{trace}. The state machine is formed by sending consecutive traces to the SUL and analyzing the output. The trace to be sent is, in our case, determined by the L*-algorithm. When a trace is completed the SUL has to be reset to its initial state.
After sending a sufficient amount of traces, a hypothesis is formed. The hypothesis is checked for accuracy by a testing oracle. Said oracle sends traces to the SUL, predicts an output through the hypothesis, and compares this output to the actual output received by the SUL. Since the SUL can potentially be infinitely large, it is impossible to guarantee that the oracle will find any existing counterexample in finite time. However, measures can be taken such that the testing oracle provides a certain degree of confidence in the model.
In our case, sending messages to the SUL means sending packets that contain, for example, the length of the packet, a sequence number or a list of supported encryptions. This information is unnecessary for the learning process, and is not supported by the Learnlib implementation of L* that we use. We abstract all this information away, leaving us with an abstract representation of messages. To convert these abstract messages to correct packets and back, a mapper is used. This mapper has to keep track of state variables, and has to be able to perform actions such as encryption and compression. This means that the mapper itself contains a state machine, which is based on existing knowledge about the protocol used in the SUL.
\begin{figure}[h]
\centering
\includegraphics[scale=0.8]{Naamloos.png}
\caption{A learning setup}
\label{learning setup}
\end{figure}
The setup shown in figure \ref{learning setup} is the setup Verleg used, and serves as an example for a typical learning setup, here ``Learner'' is the program that uses the L*-algorithm to generate traces to send to the SUL, these traces are in the form of abstract messages. The learner sends these messages to the mapper, which translates them to concrete packages which are sent to the SUL. A response of the SUL is then converted by the mapper to an abstract message and sent to the learner.
\subsection{Secure Shell} \label{secure shell}
The SSH-protocol uses a client-server structure consisting of three components. These components will be referred to as layers, however note that outer layers do not wrap inner layers, instead messages of different layers are distinguished through their message number. The three components are as follows:
\begin{itemize}
\item The transport layer protocol. This creates the basis for communication between server and client, providing a key exchange protocol and server authentication. The key exchange protocol is performed through three roundtrips. During the first, both client and server send a KEXINIT message. Then, the client sends a KEX30 message, the server responds with a KEX31 message. Finally, both parties send a NEWKEYS message, which indicates that the keys sent in the second step can be used.
\item The user authentication protocol. This component is used to authenticate a client to the server, for example, through a username and password combination, or through SSH-keys.
\item The connection protocol. This is used to provide different services to the connected client, it can thus multiplex the encrypted channel into different channels. The provided services can be services like file transfer or a remote terminal. Typical messages are requests for opening or closing channels, or requests for earlier named services.
\end{itemize}
\end{document}
\ No newline at end of file
\documentclass[10pt,a4paper]{article}
\usepackage[latin1]{inputenc}
\usepackage[english]{babel}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{graphicx}
\usepackage{fancyref}
\usepackage{listings}
\author{Toon Lenaerts s4321219}
\begin{document}
\section{Results}
In this section, we present our results for the three improvements we made to Verleg's work. In Section \ref{results model} we show the state machine we inferred from the OpenSSH implementation. Then, in Section \ref{results testing} we give some statistics about the used test algorithm, and explain some properties we found in the generated model. Finally, in Section \ref{results modelchecking} we show the LTL properties we checked with the model.
\subsection{Trained Model} \label{results model}
The result of mapping OpenSSH is a model with 26 states and 619 transitions. An edited version can be seen in Figure \ref{resultmachine}. The happy path, as defined by Verleg, is indicated in green. A more complete version, also edited for readability, has been included in the appendices. The more complete version contains nine states that can be reached by attempting to open a channel before performing user authentication. These states make for a less readable model, however, they do not contain any usable behavior of the SSH protocol; Although they can perform a successful authentication, no channels can be opened, and all possible paths from these states eventually lead to a disconnect-state. Therefore we have removed them from Figure \ref{resultmachine}.
\begin{figure}[h]
\centering
\includegraphics[scale=0.2]{100perc.png}
\caption{Inferred state machine, edited for readability}
\label{resultmachine}
\end{figure}
\subsection{Testing} \label{results testing}
We used the Yannakakis-tool on the generated hypotheses. For the final hypothesis, we completed an exhaustive test up to a path length of two. During the entire mapping process, only two hypotheses were made, the counter example to the first hypothesis was found after 865 tests. Verifying the second hypothesis took 32728 tests.
When we compare our model with Verleg's model, we can find some significant differences. To illustrate them, we isolated the connection layer from Figure \ref{resultmachine}, and emphasized the differences. The result is shown in Figure \ref{modeldifference}. Verleg's state machine of the connection layer of OpenSSH showed a fault, where closing a channel from the ``has\_commands\_pty'' would result in connection termination. Our model does not show this behavior, closing a channel from the ``has\_commands\_pty'' state leads to a new state, from which other actions are still allowed. We have checked these differences with our setup, and have found that our model holds. This means that Verleg's model contains an error which our model does not have.
\begin{figure}[h]
\centering
\includegraphics[scale=0.25]{differences.png}
\caption{Connection layer of the inferred state machine. Transitions in red are unique to Verleg's models, the state and transitions in green are unique to our work.}
\label{modeldifference}
\end{figure}
\subsection{Model Checking} \label{results modelchecking}
We have converted both security properties into LTL specifications and used the NuSMV model checker to check the properties for truthfulness with the state machine we have inferred. Since there are two properties, we expand on each property individually, first the transport layer, then the user authentication layer.
\subsubsection{Transport Layer}
The security property for the Transport layer is defined as follows: ``We consider a transport layer state machine secure if there is no path from the initial state to the point where the authentication service is invoked without exchanging and employing cryptographic keys.'' We consider this path to be a path where a key exchange is performed successfully and followed by a successful service authentication request. Note that the key exchange does not have to be performed in consecutive steps, it can temporarily be interrupted by other actions.
In natural languange, we can say the property should be something like: ``If the authentication service is invoked successfully, then the final step of the key exchange must have been done, the second step must have been done earlier, and the first step even earlier.'' The key exchange consists of three specific steps which have to be performed in order. The LTL as in Figure \ref{transportltlnatural} is the result. The O-operator used here is the Once-operator, a past time LTL operator, which is true if its argument is true in a past time instant.
\begin{figure}[h]
\begin{align*}
&G (\text{authentication request successful} \rightarrow \\
&O (\text{NEWKEYS successful } \& \\
&O( \text{KEX\_30 successful } \& \\
&O(\text{ KEXINIT successful}) )))
\end{align*}
\caption{LTL for Transport layer with natural language}
\label{transportltlnatural}
\end{figure}
Translating this LTL to NuSMV syntax means that the remaining natural language has to be replaced with equality checks for the variables in the NuSMV state. These variables are equal to the in- and output of a transition in the Mealy machine generated by the mapper. A successful invocation of the authentication service is translated as a request for the authentication service being done, and the request being accepted by the server. The third step of the key exchange, the client sending a newkeys-packet, is successful if the server sends no response. Conveniently, no server response is an indication that the newkeys was received correctly, in our tests, the server would return a packet if the newkeys was performed incorrectly. The second step, sending the kex30 packet, is very much like the third step. Finally, the first step of the protocol is translated a bit differently, since we found that sending any packet to the server from the initial state was recognized as the first step of the key exchange protocol. Therefore, sending the first step, a kexinit-packet, can be replaced by almost any packet. The resulting LTL-specification is shown in Figure \ref{transportltl}.
\begin{figure}[h]
\begin{lstlisting}[basicstyle=\footnotesize]
G (
(input=SERVICE_REQUEST_AUTH & output=SERVICE_ACCEPT)
->
O ( (input=NEWKEYS & output= NO_RESP) &
O ( (input=KEX30 & output=KEX31_NEWKEYS) &
O (output=KEXINIT)
)
)
)
\end{lstlisting}
\caption{Transport layer LTL in NuSMV syntax.}
\label{transportltl}
\end{figure}
\subsubsection{User Authentication Layer}
The security property is as follows: ``We consider a user authentication layer state machine secure if there is no path from the unauthenticated state to the authenticated state without providing the correct credentials.'' Since we have no method to directly determine if the model is in an authenticated state, we consider the authenticated state to be a state where opening a channel can be done successfully. Since opening a channel should only be possible after user authentication, this should show if we are in an authenticated state. The specification then is rather simple: Always, if a channel is opened succesfully, then there has not been a failure in user authentication since a successful user authentication. Opening a channel successfully in the model is straightforward, a request to open a channel is sent to the server, and a success message is received back. Checking whether there has been a success or failure in user authentication is done by checking if the server ever sends the UA\_FAILURE or UA\_SUCCESS packets. Additional checks if the user actually made a user authenication request could potentially be added. The resulting LTL is shown in Figure \ref{authenticationltl}. This LTL uses another past time operator: The S- (or Since-) operator is true if its first argument holds for every state since its second argument holds, note that the second argument has to hold somewhere in the path for the since-operator to hold.
\begin{figure}[h]
\begin{lstlisting}[basicstyle=\footnotesize]
G ( (input= CH_OPEN & output= CH_OPEN_SUCCESS)
-> (!(output=UA_FAILURE) S output= UA_SUCCESS) )
\end{lstlisting}
\caption{User Authentication layer LTL in NuSMV syntax.}
\label{authenticationltl}
\end{figure}
The learned model satisfies both LTL specifications, according to NuSMV.
\section{Discussion}
In this Section we discuss and motivate some of the choices we made in this thesis. In Section \ref{disc caching} we expand on the caching method we use. Then, in Section \ref{disc confidence} we give our consideration for the testing parameters used. In Section \ref{disc faults} we provide explanation for some faults in the generated model. Finally, in Section \ref{disc property} we expand on the security properties used.
\subsection{Caching} \label{disc caching}
In our setup, we improve upon the caching methods Verleg used. This allows us to resume an old, possibly crashed, run of the setup. Because of this, we are able to build a state machine of the entire SSH protocol. However, it is possible to implement much more extensive methods of caching and detection of non-determinism. For instance, we have not implemented any way of checking a trace which results in non-determinism. This can be done by, for example, sending a trace which results in non-determinism again, and comparing the output of both traces. If the second output does not result in non-determinism, we might assume that something went wrong during the first run of the trace (a packet was lost, for example), and can continue learning with the second output. We decided the caching we used was adequate, and thus have not implemented more advanced methods
\subsection{Chosen Confidence} \label{disc confidence}
For our equivalence oracle, we use the Yannakakis-based algorithm in order to build a measure of confidence for the generated state machine. The confidence increases by running the algorithm with a longer path-length. As explained, we run up to a path length of two. We do this due to time considerations; for our generated model, testing up to a path length of two requires 32.728 tests, A path length of up to four would take 7.276.082 tests. These tests are often long traces of inputs, and running them can take up to a couple of minutes per trace. Testing seven million traces with our setup is not feasible. However, one of Verleg's concerns for training all three layers of SSH simultaneously is the rekey procedure. The concern that the model will not implement the rekey procedure in higher layers correctly. The procedure contains three steps before returning to the state before the rekey. A confidence with a path length up to four should correctly implement the rekey procedure for all the states in the SSH-specification where rekey is possible. To remedy this problem, we have used the same solution Verleg has used. We use a separate command which performs an entire rekey procedure in the setup. Because of this, the model should still handle the entire procedure correctly. However, this does not provide confidence in performing only parts of the rekey procedure, or performing the procedure differently, for example.
\subsection{Faulty rekeys in State Machine} \label{disc faults}
As explained in Section \ref{results testing}, our model does not show the disconnecting behavior Verleg experienced. However, it still contains some strange behavior. In SSH, performing a rekey procedure in the ``higher'' connection layer should not influence actions in that layer. Therefore, in the generated state model, performing a rekey from a certain state in a higher layer should always return to that state upon finishing the procedure. In our model, performing a rekey from the ``authed'', ``has\_channel'' and ``has\_pty'' states ends in the ``authed\_rekeyed'', ``has\_commands'' and ``has\_commands\_pty'' states respectively.
\subsection{Security property coverage} \label{disc property}
As explained in Section \ref{method checking}, we have not formalized a security property for the connection layer. This however, means that the section of the state machine where some interesting behavior happens (see Section \ref{results testing} and Section \ref{disc faults}) is not covered by a security definition. From observing the problems we have found we have not found any significant security related problem, however, observing by eye for mistakes goes against the goal of this work, and therefore leaves us with a weakness in our models.
\end{document}
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment