Commit 6c25843e authored by Joshua Moerman's avatar Joshua Moerman
Browse files

Small changes in text.

parent 8ccaa580
......@@ -406,7 +406,7 @@ For a description of discriminator finalization, we refer to \cite{Isberner2014a
%\end{figure}
\subsection{Conformance testing} \label{sec:testing}
Conformance testing for FSMs is a efficient way of finding counterexamples.
Conformance testing for FSMs is an efficient way of finding counterexamples.
Let $H = (I, O, Q_H, q_H, \delta_H, \lambda_H)$ be a hypothesis with $n$ states.
We call a conformance testing method $m$-complete if it can identify the hypothesis in the set of all FSMs with at most $m$ states.
Such $m$-complete methods are generally polynomially in the size of the hypothesis and exponential in $m - n$, which are far more efficient than an exhaustive search.
......@@ -455,8 +455,8 @@ Moreover, it is hard to estimate an upper bound for $M$ in practice.
%Therefore, alternative methods for finding counterexamples might yield better results.
\subsection{Fuzzing} \label{sec:fuzzing}
A \emph{mutation-based fuzzer} is a program that applies a set of tests (i.e.\ input sequences) to a target program, and then iteratively mutates (i.e.\ modifies) these tests to monitor if `something interesting' happens.
This could be that the target program crashes, or that its output changes.
A \emph{mutation-based fuzzer} is a program that applies a set of tests (i.e.\ input sequences) to a target program, and then iteratively mutates these tests to monitor if `something interesting' happens.
This could be a crash of the target program, a change in its output, or it finds that more code is covered (via instrumentation).
The \emph{American Fuzzy Lop} (AFL) fuzzer \citep{afl-website} is interesting for its approach in combining mutation-based test case generation with \emph{code coverage} monitoring.
AFL supports programs written in C, C++, or Objective C and there are variants that allow to fuzz programs written in Python, Go, Rust or OCaml.
......@@ -479,8 +479,7 @@ This, however, is 2-5{$\times$} slower than compile-time instrumentation \citep{
%\end{figure}
From a high-level, simplified perspective, AFL works by taking a program and a queue of tests, and iteratively mutating these tests to see if the \emph{coverage} of the program is increased; new tests that increase coverage are added to the queue.
Tests execution speed is increased by using a \emph{fork server}.
In the next paragraphs, we will describe in more detail how coverage is measured by AFL, which mutation strategies are applied, and how the fork server works.
In the next paragraphs, we will describe in more detail how coverage is measured by AFL, which mutation strategies are applied, and how execution time is minimized.
% I have commented out this algorithm, because it is quite easy to capture its purpose in a small piece of text (above).
%\begin{algorithm}[b]
......@@ -519,7 +518,7 @@ Every location in the array is represented by a compile-time random value.
When an edge in the control flow is taken, the bitmap is updated at the position of the current location and an xor of the previous location value.
The intention is that every edge in the control flow is mapped to a different byte in the bitmap.
Note that because the size of the bitmap is finite and the values that represent locations in the code are random, the bitmap is probabilistic: there is a change that collisions will occur.
Note that because the size of the bitmap is finite and the values that represent locations in the code are random, the bitmap is probabilistic: there is a chance that collisions will occur.
This is especially the case when the bitmap fills up, which can happen when fuzzing large programs with many edges in their control flow.
AFL can detect and resolve this situation by applying instrumentation on fewer edges in the target or by increasing the size of the bitmap.
%A trace bitmap for the program in \autoref{fig:fsm} is shown in \autoref{fig:afl-cfg}.
......@@ -565,8 +564,8 @@ The RERS challenge consists of two parts:
\item a set of nine problems, numbered 10 through 18, for which one has to determine whether or not a set of error statements present in the source code are \emph{reachable}, and provide a sequence of inputs such that the error statement is executed.
\end{enumerate}
\noindent In our approach, we have used a state-of-the-art learning algorithm (\emph{learner}) in combination with a conformance testing algorithm (\emph{tester}) to learn FSMs for the RERS 2016 problems.
In addition, we have used a mutation-based fuzzing tool (\emph{fuzzer}) to generate potentially interesting traces independently of the learner and the tester.
\noindent In our approach, we have used a state-of-the-art learner in combination with a tester to learn FSMs for the RERS 2016 problems.
In addition, we have used a fuzzer to generate potentially interesting traces independently of the learner and the tester.
As we have executed the learner/tester and the fuzzer indepenedently of one another, we describe their experimental setup and result in turn.
The code for our experiments is available at \url{https://gitlab.science.ru.nl/moerman/rers-2016/}.
......@@ -585,7 +584,7 @@ Finally, we sample a suffix uniformly from the set of state-specific discriminat
By using a geometric distribution for the infix, we are not bounding the length of the test sequence.
In our tool, the minimal and expected length of the infix can be set by parameters.
In our experiments, its minimal length was three, and its expected length was eleven.
\item[Counterexample handling] Counterexamples were processed using the \textsc{LinearForward} handler in LearnLib.
\item[Counterexample handling] Counterexamples were processed using the~\textsc{LinearForward} handler in LearnLib.
\item[Cache] We have used the cache that is implemented in LearnLib to avoid sending duplicate queries to the system under learning.
\end{description}
......@@ -608,18 +607,19 @@ As these traces are stored in a separate results folder by AFL, we could easily
\item[Post-processing] As the input bytes that AFL considers are not limited to the valid inputs for the challenge problems, we filtered out the bytes that were not accepted.
\end{description}
\noindent The traces that were found by AFL were executed on the final hypothesis of the learner to see if its output differed from that of the program binary.
To do so, we have created a tool that wraps a DOT file, so that it mimics the I/O behaviour of the program binary.
The output was compared to that of the program binary using \texttt{diff}.
\noindent The traces that were found by AFL were simulated on the final hypothesis of the learner to see if its output differed from that of the program binary.
\section{Results}
The results for the learning/testing setup described in \autoref{sec:exp_learning} are shown in \autoref{tab:learn_ltl} and \autoref{tab:learn_reach}.
We are confident that the learned models for the LTL problems (1--9) are complete, as the last hypothesis was learnt within 1 day and no further counterexamples were found in the following week.
The same holds for the first of the reachability problems (10), but of course we can never be sure with this technique!
The same holds for the first of the reachability problems (10).
Beware, however, that we can never guarantee completeness with black-box techniques.
For problems 11--18 we are confident that we do \emph{not} have complete models, as the learning algorithm was still very active when stopped/crashed after more than one week (it was still finding new states every ~10 minutes).
For problems 11--18 we know that we do \emph{not} have complete models, as the learner was still finding new states every 10 minutes when the server rebooted for maintenance.
The learner ran for a bit more than 7 days and saved all hypotheses.
As a result of this reboot, we do not have statistics on the number of queries.
\begin{table}
\centering
......@@ -629,8 +629,8 @@ For problems 11--18 we are confident that we do \emph{not} have complete models,
\begin{tabular}{ l | l l l }
size & plain & arithmetic & data structures \\
\hline
small & \parbox[t]{0.25\textwidth}{\textbf{Problem 1}\\time: 50s\\states: 13\\hypotheses: 6} & \parbox[t]{0.25\textwidth}{\textbf{Problem 2}\\time: 1m22s\\states: 22\\hypotheses: 10} & \parbox[t]{0.25\textwidth}{\textbf{Problem 3}\\time: 7m05s\\states: 26\\hypotheses: 13} \\
medium & \parbox[t]{0.25\textwidth}{\textbf{Problem 4}\\time: 34m\\states: 157\\hypotheses: 77} & \parbox[t]{0.25\textwidth}{\textbf{Problem 5}\\time: 2h43m\\states: 121\\hypotheses: 50} & \parbox[t]{0.25\textwidth}{\textbf{Problem 6}\\time: 4h51m\\states: 238\\hypotheses: 156} \\
small & \parbox[t]{0.25\textwidth}{\textbf{Problem 1}\\time: 50s\\states: 13\\hypotheses: 6} & \parbox[t]{0.25\textwidth}{\textbf{Problem 2}\\time: 1m22s\\states: 22\\hypotheses: 10} & \parbox[t]{0.25\textwidth}{\textbf{Problem 3}\\time: 7m05s\\states: 26\\hypotheses: 13} \\[37pt]
medium & \parbox[t]{0.25\textwidth}{\textbf{Problem 4}\\time: 34m\\states: 157\\hypotheses: 77} & \parbox[t]{0.25\textwidth}{\textbf{Problem 5}\\time: 2h43m\\states: 121\\hypotheses: 50} & \parbox[t]{0.25\textwidth}{\textbf{Problem 6}\\time: 4h51m\\states: 238\\hypotheses: 156} \\[37pt]
large & \parbox[t]{0.25\textwidth}{\textbf{Problem 7}\\time: 11h45m\\states: 610\\hypotheses: 407} & \parbox[t]{0.25\textwidth}{\textbf{Problem 8}\\time: 24h22m\\states: 646\\hypotheses: 432} & \parbox[t]{0.25\textwidth}{\textbf{Problem 9}\\time: 18h31m\\states: 854\\hypotheses: 550} \\
\end{tabular}
}
......@@ -644,8 +644,8 @@ For problems 11--18 we are confident that we do \emph{not} have complete models,
\begin{tabular}{ l | l l l }
size & plain & arithmetic & data structures \\
\hline
small & \parbox[t]{0.25\textwidth}{\textbf{Problem 10}\\time: 2m39s\\states: 59\\hypotheses: 3} & \parbox[t]{0.25\textwidth}{\textbf{Problem 11}\\time: 1w+\\states: 22 589\\hypotheses: 8 314} & \parbox[t]{0.25\textwidth}{\textbf{Problem 12}\\time: 1w+\\states: 12 771\\hypotheses: 4 325} \\
medium & \parbox[t]{0.25\textwidth}{\textbf{Problem 13}\\time: 1w+\\states: 12 848\\hypotheses: 5 564} & \parbox[t]{0.25\textwidth}{\textbf{Problem 14}\\time: 1w+\\states: 11 632\\hypotheses: 4 513} & \parbox[t]{0.25\textwidth}{\textbf{Problem 15}\\time: 1w+\\states: 7 821\\hypotheses: 3 792} \\
small & \parbox[t]{0.25\textwidth}{\textbf{Problem 10}\\time: 2m39s\\states: 59\\hypotheses: 3} & \parbox[t]{0.25\textwidth}{\textbf{Problem 11}\\time: 1w+\\states: 22 589\\hypotheses: 8 314} & \parbox[t]{0.25\textwidth}{\textbf{Problem 12}\\time: 1w+\\states: 12 771\\hypotheses: 4 325} \\[37pt]
medium & \parbox[t]{0.25\textwidth}{\textbf{Problem 13}\\time: 1w+\\states: 12 848\\hypotheses: 5 564} & \parbox[t]{0.25\textwidth}{\textbf{Problem 14}\\time: 1w+\\states: 11 632\\hypotheses: 4 513} & \parbox[t]{0.25\textwidth}{\textbf{Problem 15}\\time: 1w+\\states: 7 821\\hypotheses: 3 792} \\[37pt]
large & \parbox[t]{0.25\textwidth}{\textbf{Problem 16}\\time: 1w+\\states: 8 425\\hypotheses: 3 865} & \parbox[t]{0.25\textwidth}{\textbf{Problem 17}\\time: 1w+\\states: 11 758\\hypotheses: 5 584} & \parbox[t]{0.25\textwidth}{\textbf{Problem 18}\\time: 1w+\\states: 8 863\\hypotheses: 4 246} \\
\end{tabular}
}
......@@ -668,8 +668,8 @@ traces discovered, fuzzed them, and looped back to the very beginning.
\begin{tabular}{ l | l l l }
size & plain & arithmetic & data structures \\
\hline
small & \parbox[t]{0.25\textwidth}{\textbf{Problem 1}\\cycles: 46 521\\execs: $2.64 \times 10^9$\\paths: 253} & \parbox[t]{0.25\textwidth}{\textbf{Problem 2}\\cycles: 30 088\\execs: $2.68 \times 10^9$\\paths: 480} & \parbox[t]{0.25\textwidth}{\textbf{Problem 3}\\cycles: 19 551\\execs: $2.52 \times 10^9$\\paths: 453} \\
medium & \parbox[t]{0.25\textwidth}{\textbf{Problem 4}\\cycles: 460\\execs: $8.68 \times 10^8$\\paths: 3 453} & \parbox[t]{0.25\textwidth}{\textbf{Problem 5}\\cycles: 4 191\\execs: $1.63 \times 10^9$\\paths: 1 115} & \parbox[t]{0.25\textwidth}{\textbf{Problem 6}\\cycles: 109\\execs: $7.32 \times 10^8$\\paths: 4 494} \\
small & \parbox[t]{0.25\textwidth}{\textbf{Problem 1}\\cycles: 46 521\\execs: $2.64 \times 10^9$\\paths: 253} & \parbox[t]{0.25\textwidth}{\textbf{Problem 2}\\cycles: 30 088\\execs: $2.68 \times 10^9$\\paths: 480} & \parbox[t]{0.25\textwidth}{\textbf{Problem 3}\\cycles: 19 551\\execs: $2.52 \times 10^9$\\paths: 453} \\[37pt]
medium & \parbox[t]{0.25\textwidth}{\textbf{Problem 4}\\cycles: 460\\execs: $8.68 \times 10^8$\\paths: 3 453} & \parbox[t]{0.25\textwidth}{\textbf{Problem 5}\\cycles: 4 191\\execs: $1.63 \times 10^9$\\paths: 1 115} & \parbox[t]{0.25\textwidth}{\textbf{Problem 6}\\cycles: 109\\execs: $7.32 \times 10^8$\\paths: 4 494} \\[37pt]
large & \parbox[t]{0.25\textwidth}{\textbf{Problem 7}\\cycles: 35\\execs: $6.86 \times 10^8$\\paths: 9 556} & \parbox[t]{0.25\textwidth}{\textbf{Problem 8}\\cycles: 16\\execs: $6.75 \times 10^8$\\paths: 10 906} & \parbox[t]{0.25\textwidth}{\textbf{Problem 9}\\cycles: 71\\execs: $7.63 \times 10^8$\\paths: 11 305} \\
\end{tabular}
}
......@@ -683,8 +683,8 @@ traces discovered, fuzzed them, and looped back to the very beginning.
\begin{tabular}{ l | l l l }
size & plain & arithmetic & data structures \\
\hline
small & \parbox[t]{0.25\textwidth}{\textbf{Problem 10}\\cycles: 70 336\\execs: $2.58 \times 10^9$\\paths: 139} & \parbox[t]{0.25\textwidth}{\textbf{Problem 11}\\cycles: 10 365\\execs: $2.34 \times 10^9$\\paths: 801} & \parbox[t]{0.25\textwidth}{\textbf{Problem 12}\\cycles: 5 971\\execs: $2.14 \times 10^9$\\paths: 1 032} \\
medium & \parbox[t]{0.25\textwidth}{\textbf{Problem 13}\\cycles: 779\\execs: $1.35 \times 10^9$\\paths: 4 235} & \parbox[t]{0.25\textwidth}{\textbf{Problem 14}\\cycles: 621\\execs: $1.02 \times 10^9$\\paths: 3 838} & \parbox[t]{0.25\textwidth}{\textbf{Problem 15}\\cycles: 1 040\\execs: $1.77 \times 10^9$\\paths: 3 685} \\
small & \parbox[t]{0.25\textwidth}{\textbf{Problem 10}\\cycles: 70 336\\execs: $2.58 \times 10^9$\\paths: 139} & \parbox[t]{0.25\textwidth}{\textbf{Problem 11}\\cycles: 10 365\\execs: $2.34 \times 10^9$\\paths: 801} & \parbox[t]{0.25\textwidth}{\textbf{Problem 12}\\cycles: 5 971\\execs: $2.14 \times 10^9$\\paths: 1 032} \\[37pt]
medium & \parbox[t]{0.25\textwidth}{\textbf{Problem 13}\\cycles: 779\\execs: $1.35 \times 10^9$\\paths: 4 235} & \parbox[t]{0.25\textwidth}{\textbf{Problem 14}\\cycles: 621\\execs: $1.02 \times 10^9$\\paths: 3 838} & \parbox[t]{0.25\textwidth}{\textbf{Problem 15}\\cycles: 1 040\\execs: $1.77 \times 10^9$\\paths: 3 685} \\[37pt]
large & \parbox[t]{0.25\textwidth}{\textbf{Problem 16}\\cycles: 50\\execs: $7.22 \times 10^8$\\paths: 11 908} & \parbox[t]{0.25\textwidth}{\textbf{Problem 17}\\cycles: 19\\execs: $4.58 \times 10^8$\\paths: 10 283} & \parbox[t]{0.25\textwidth}{\textbf{Problem 18}\\cycles: 21\\execs: $4.58 \times 10^8$\\paths: 10 237} \\
\end{tabular}
}
......@@ -695,7 +695,7 @@ For the LTL problems of the challenge, none of the test traces that have a uniqu
This, in combination with the large number of cycles completed by the fuzzer, strengthens our belief that the learned models for these problems (1 - 9) are complete.
The number of reachable error states found by the learner and the fuzzer are shown in \autoref{tab:errors}.
The first entry in each cell is the number of unique error states that were found, and the second entry is the number of error states that were found by the given technique, but were not found by the other technique (e.g. ``fuzzing: 28/2'' means that the fuzzer has found 28 error states, and 2 of those were not found by the learner).
The first entry in each cell is the number of unique error states that were found, and the second entry is the number of error states that were found by the given technique, but were not found by the other technique (e.g. ``fuzzing: 28 (2)'' means that the fuzzer has found 28 error states, and 2 of those were not found by the learner).
\begin{table}
\centering
......@@ -705,20 +705,19 @@ The first entry in each cell is the number of unique error states that were foun
\begin{tabular}{ l | l l l }
size & plain & arithmetic & data structures \\
\hline
small & \parbox[t]{0.25\textwidth}{\textbf{Problem 10}\\learner: 45/0\\fuzzer: 45/0\\total: 45} & \parbox[t]{0.25\textwidth}{\textbf{Problem 11}\\learner: 20/0\\fuzzer: 22/2\\total: 22} & \parbox[t]{0.25\textwidth}{\textbf{Problem 12}\\learner: 21/0\\fuzzer: 21/0\\total: 21} \\
medium & \parbox[t]{0.25\textwidth}{\textbf{Problem 13}\\learner: 28/0\\fuzzer: 30/2\\total: 30} & \parbox[t]{0.25\textwidth}{\textbf{Problem 14}\\learner: 27/0\\fuzzer: 30/3\\total: 30} & \parbox[t]{0.25\textwidth}{\textbf{Problem 15}\\learner: 27/0\\fuzzer: 32/5\\total: 32} \\
large & \parbox[t]{0.25\textwidth}{\textbf{Problem 16}\\learner: 29/1\\fuzzer: 31/3\\total: 32} & \parbox[t]{0.25\textwidth}{\textbf{Problem 17}\\learner: 27/1\\fuzzer: 28/2\\total: 29} & \parbox[t]{0.25\textwidth}{\textbf{Problem 18}\\learner: 28/0\\fuzzer: 32/4\\total: 32} \\
small & \parbox[t]{0.25\textwidth}{\textbf{Problem 10}\\learner: 45 (0)\\fuzzer: 45 (0)\\total: 45} & \parbox[t]{0.25\textwidth}{\textbf{Problem 11}\\learner: 20 (0)\\fuzzer: 22 (2)\\total: 22} & \parbox[t]{0.25\textwidth}{\textbf{Problem 12}\\learner: 21 (0)\\fuzzer: 21 (0)\\total: 21} \\
medium & \parbox[t]{0.25\textwidth}{\textbf{Problem 13}\\learner: 28 (0)\\fuzzer: 30 (2)\\total: 30} & \parbox[t]{0.25\textwidth}{\textbf{Problem 14}\\learner: 27 (0)\\fuzzer: 30 (3)\\total: 30} & \parbox[t]{0.25\textwidth}{\textbf{Problem 15}\\learner: 27 (0)\\fuzzer: 32 (5)\\total: 32} \\
large & \parbox[t]{0.25\textwidth}{\textbf{Problem 16}\\learner: 29 (1)\\fuzzer: 31 (3)\\total: 32} & \parbox[t]{0.25\textwidth}{\textbf{Problem 17}\\learner: 27 (1)\\fuzzer: 28 (2)\\total: 29} & \parbox[t]{0.25\textwidth}{\textbf{Problem 18}\\learner: 28 (0)\\fuzzer: 32 (4)\\total: 32} \\
\end{tabular}
}
\end{table}
From these results we conclude that the fuzzer discovered more reachable error states than the learner/tester, albeit in some cases the learner/tester found some that were not discovered by the fuzzer.
This strenthens our belief that
\section{Present and Future Work on Learning and Fuzzing}
The goal of our present and future research in this area is to combine model learning and mutation-based fuzzing in the following ways.
\begin{enumerate}
\item use mutation-based fuzzing as a source of counterexamples \emph{during} learning, and
\item use fuzzing as a source of counterexamples \emph{during} learning, and
\item use (intermediate) learning results to guide mutation-based fuzzing.
\end{enumerate}
......@@ -768,7 +767,7 @@ This speeds up the execution of learning, independent of the technique used to f
\noindent There were some other issues that we had to address:
\begin{itemize}
\item A design decision in AFL is that it does not care about the target program's output.
\item AFL is designed such that it does not care about the target program's output.
Instead only coverage data is used as a measure for test case relevancy.
The learner, however, relies on output behaviour.
Therefore, we have extended AFL to always save data from the target's \lstinline{stdout} into a shared memory buffer (shared between libafl and the fork server process).
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment