preliminaries.tex 8.69 KB
 Paul Fiterau Brostean committed Feb 17, 2017 1 \section{Model learning}\label{sec:prelims}  Paul Fiterau Brostean committed Jan 20, 2017 2 \subsection{Mealy machines} \label{ssec:mealy}  Frits Vaandrager committed Feb 17, 2017 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 A \emph{Mealy machine} is a tuple $\M = (I, O, Q, q_0, \delta, \lambda)$, where $I$ is a finite set of inputs, $O$ is a finite set of outputs, $Q$ is a finite set of states, $q_0 \in Q$ is the initial state, $\delta: Q \times I \rightarrow Q$ is a transition function, and $\lambda: Q \times I \rightarrow O$ is an output function. % Output function $\lambda$ is extended to sequences of inputs by defining, for all $q \in Q$, $i \in I$ and $\sigma \in I^{\ast}$, %\begin{eqnarray*} %\delta(q, \epsilon) = q &,~~~ & \delta(q, i \sigma) = \delta(\delta(q,i), \sigma),\\ $\lambda(q, \epsilon) = \epsilon$ and $\lambda(q, i \sigma) = \lambda(q, i) \lambda(\delta(q, i), \sigma)$. %\end{eqnarray*} % The behavior of Mealy machine $\M$ is defined by function $A_\M : I^{\ast} \rightarrow O^{\ast}$ with $A_\M(\sigma) = \lambda (q^0, \sigma)$, for $\sigma \in I^{\ast}$. Mealy machines $\M$ and $\N$ are \emph{equivalent}, denoted $\M \approx \N$, iff $A_{\M} = A_{\N}$. Sequence $\sigma \in I^{\ast}$ \emph{distinguishes} $\M$ and $\N$ if and only if $A_{\M}(\sigma) \neq A_{\N}(\sigma)$.  Paul Fiterau Brostean committed Jan 18, 2017 23 24   Paul Fiterau Brostean committed Jan 20, 2017 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 %\begin{figure}[h] %\centering %\begin{tikzpicture}[>=stealth',shorten >=1pt,auto,node distance=2.8cm] % \node[initial,state] (q0) {$q_0$}; % \node[state] (q1) [right of=q0] {$q_1$}; % \node[state] (q2) [right of=q1] {$q_2$}; % % \path[->] (q0) edge node [below] {X/A} (q1); % \path[->] (q1) edge node [below] {Y/B} (q2); % \path[->] (q0) edge [bend left] node {Y/A} (q2); % \path[->] (q2) edge [loop right] node [text centered, text width = 1cm] {X/A \\ Y/B} (q2); % \path[->] (q1) edge [loop below] node {X/B} (q1); %\end{tikzpicture} %\caption{Example Mealy machine} %\label{mealy example} %\end{figure}  Paul Fiterau Brostean committed Jan 18, 2017 41   Paul Fiterau Brostean committed Jan 20, 2017 42 \subsection{MAT Framework} \label{ssec:mat}  Erik Poll committed Feb 15, 2017 43   Paul Fiterau Brostean committed May 16, 2017 44 The most efficient algorithms for model learning (see~\cite{Isberner2015} for a recent overview) all follow  Paul Fiterau Brostean committed Jan 29, 2017 45 the pattern of a \emph{minimally adequate teacher (MAT)} as proposed by Angluin~\cite{Angluin1987Learning}.  Paul Fiterau Brostean committed Feb 18, 2017 46 Here learning is viewed as a game in which a \textit{learner} has to infer an unknown automaton by asking queries to a teacher. The teacher knows the automaton, which in our setting is a Mealy machine $\M$,  Erik Poll committed Feb 15, 2017 47 also called the System Under Learning ({\dsut}).  Paul Fiterau Brostean committed Feb 18, 2017 48 Initially, the {\dlearner} only knows the input alphabet $I$ and output alphabet $O$ of $\M$.  Paul Fiterau Brostean committed Feb 18, 2017 49 The task of the {\dlearner} is to learn $\M$ via two types of queries:  Paul Fiterau Brostean committed Jan 20, 2017 50 51 \begin{itemize} \item  Paul Fiterau Brostean committed Feb 18, 2017 52 With a \emph{membership query}, the {\dlearner} asks what the response is to an input sequence $\sigma \in I^{\ast}$.  Paul Fiterau Brostean committed Jan 20, 2017 53 54 The teacher answers with the output sequence in $A_{\M}(\sigma)$. \item  Paul Fiterau Brostean committed Feb 14, 2017 55 With an \emph{equivalence query}, the {\dlearner} asks whether a hypothesized Mealy machine $\CH$ is correct, that is,  Paul Fiterau Brostean committed Jan 20, 2017 56 57 58 59 whether $\CH \approx \M$. The teacher answers \emph{yes} if this is the case. Otherwise it answers \emph{no} and supplies a \emph{counterexample}, which is a sequence $\sigma \in I^{\ast}$ that triggers a different output sequence for both Mealy machines, that is, $A_{\CH}(\sigma) \neq A_{\M}(\sigma)$. \end{itemize}  Erik Poll committed Feb 15, 2017 60 %  Frits Vaandrager committed Feb 17, 2017 61 62 The MAT framework can be used to learn black box models of software. If the behavior of a software system, or System Under Learning ({\dsut}), can be described by some unknown Mealy machine $\M$,  Paul Fiterau Brostean committed Feb 17, 2017 63 then a membership query can be implemented by sending inputs to the {\dsut} and observing resulting outputs.  Frits Vaandrager committed Feb 17, 2017 64 65 66 An equivalence query can be approximated using a conformance testing tool \cite{LeeY96} via a finite number of \emph{test queries}. A test query consists of asking the {\dsut} for the response to an input sequence $\sigma \in I^{\ast}$, similar to a membership query.  Paul Fiterau Brostean committed Feb 17, 2017 67 Note that this cannot rule out that there is more behavior  Paul Fiterau Brostean committed May 16, 2017 68 that has not been discovered.  Paul Fiterau Brostean committed Jan 18, 2017 69   Paul Fiterau Brostean committed Jan 20, 2017 70 71 72 \subsection{Abstraction} \label{ssec:mappers} Most current learning algorithms are only applicable to Mealy machines with small alphabets comprising abstract messages. Practical systems typically have parameterized input/output alphabets, whose application triggers updates on the system's state variables. To learn  Paul Fiterau Brostean committed Feb 18, 2017 73 these systems we place a \emph{mapper} between the {\dlearner} and the {\dsut}. The {\dmapper} is a transducer which translates  Frits Vaandrager committed Feb 17, 2017 74 concrete inputs to abstract inputs and concrete outputs to abstract outputs. For a thorough discussion of mappers, we refer to \cite{AJUV15}.  Paul Fiterau Brostean committed Jan 18, 2017 75   Paul Fiterau Brostean committed Jan 20, 2017 76 %Perhaps some explanation  Paul Fiterau Brostean committed Jan 18, 2017 77   Paul Fiterau Brostean committed Jan 20, 2017 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 % % %concrete inputs in $I$ to abstract inputs in $X$, %concrete outputs in $O$ to abstract outputs in $Y$, and vice versa. %alphabets % % %Starting from Angluin's seminal $L^{\ast}$ algorithm \cite{Ang87}, many %algorithms have been proposed for learning finite, deterministic Mealy machines %via a finite number of queries. We refer to \cite{Isberner2015} for recent overview. %In applications in which one wants to learn a model of a black-box reactive system, the teacher typically %consists of a System Under Learning (\dsut) that answers the membership queries, and a conformance %testing tool \cite{LeeY96} that approximates the equivalence queries using a set %of \emph{test queries}. A test query consists of asking to the \dsut for the response to an input sequence %$\sigma \in I^{\ast}$, similar to a membership query. % % % % % % %In protocol state fuzzing, a state machine of the implementation is learned by actively sending and receiving messages to and from the System Under Learning (SUL). A single message is called a \textit{query}, and a sequence of messages a \textit{trace}. The state machine is formed by sending consecutive traces to the SUL and analyzing the output. The trace to be sent is, in our case, determined by the L*-algorithm. When a trace is completed the SUL has to be reset to its initial state. % %After sending a sufficient amount of traces, a hypothesis is formed. The hypothesis is checked for by a testing oracle. Said oracle sends traces to the SUL, predicts an output through the hypothesis, and compares this output to the actual output received by the SUL. % %In our case, sending messages to the SUL means sending packets that contain, for example, the length of the packet, a sequence number or a list of supported encryptions. This information is unnecessary for the learning process, and is not supported by the Learnlib implementation of L* that we use. We abstract all this information away, leaving us with an abstract representation of messages. To convert these abstract messages to correct packets and back, a mapper is used. This mapper has to keep track of state variables, and has to be able to perform actions such as encryption and compression. This means that the mapper itself contains a state machine, which is based on existing knowledge about the protocol used in the SUL. % %\begin{figure}[h] %\centering %\includegraphics[scale=0.8]{Naamloos.png} %\caption{A learning setup} %\label{learning setup} %\end{figure} % %The setup shown in figure \ref{learning setup} is the setup Verleg used, and serves as an example for a typical learning setup, here Learner'' is the program that uses the L*-algorithm to generate traces to send to the SUL, these traces are in the form of abstract messages. The learner sends these messages to the mapper, which translates them to concrete packages which are sent to the SUL. A response of the SUL is then converted by the mapper to an abstract message and sent to the learner. % %\subsection{Secure Shell} \label{secure shell} % %The SSH-protocol uses a client-server structure consisting of three components. These components will be referred to as layers, however note that outer layers do not wrap inner layers, instead messages of different layers are distinguished through their message number. The three components are as follows: %\begin{itemize} %\item The transport layer protocol. This creates the basis for communication between server and client, providing a key exchange protocol and server authentication. The key exchange protocol is performed through three roundtrips. During the first, both client and server send a KEXINIT message. Then, the client sends a KEX30 message, the server responds with a KEX31 message. Finally, both parties send a NEWKEYS message, which indicates that the keys sent in the second step can be used. %\item The user authentication protocol. This component is used to authenticate a client to the server, for example, through a username and password combination, or through SSH-keys. %\item The connection protocol. This is used to provide different services to the connected client, it can thus multiplex the encrypted channel into different channels. The provided services can be services like file transfer or a remote terminal. Typical messages are requests for opening or closing channels, or requests for earlier named services.  Erik Poll committed Feb 15, 2017 121 %\end{itemize}