@@ -94,7 +94,7 @@ If the system's response is the same as the predicted response (by the hypothesi

Otherwise, if there is a test for which the target and the hypothesis produce different outputs, then this input sequence can be used as a counterexample.

One of the main advantages of using conformance testing is that it can efficiently identify the hypothesis from the set of all finite state machines of size at most $m$.

This means that if we know a bound $m$ for the size of the system we learn, we are guaranteed to find a counterexample.

This means that if we know a bound $m$ for the size of the system we learn, we are guaranteed to find a counterexample if there exists one.

Unfortunately, conformance testing has some notable drawbacks.

First, it is hard (or even impossible) in practice to determine an upper-bound on the number of states of the system's target FSM.

Second, it is known that testing becomes exponentially more expensive for higher values of $m$\citep{Vasilevskii1973}. %, as a complete test set should include each sequence in the so-called \emph{traversal set}, which contains all input sequences of length $l = m - n + 1$ .

...

...

@@ -400,8 +400,12 @@ For a description of discriminator finalization, we refer to \cite{Isberner2014a

One of the main advantages of using conformance testing for finding counterexamples is that it can discover all counterexamples for a hypothesis $H =(I, O, Q_H, q_H, \delta_H, \lambda_H)$ with $n$ states, under the assumption that the target FSM $M =(I, O, Q_M, q_M, \delta_M, \lambda_M)$ has at most $m$ states, $m \leq n$.

Several different methods exist for constructing a so-called \emph{$m$-complete} test set (for a complete introduction and an overview of these methods, we refer to \citet{Dorofeeva2010}).

Conformance testing for FSMs is a efficient way of finding counter examples.

There exist many methods which guarantee to find a counterexample to a hypothesis if it exists in an fairly efficient way.

Let $H =(I, O, Q_H, q_H, \delta_H, \lambda_H)$ be a hypothesis with $n$ states.

We call a conformance testing method $m$-complete if it can identify the hypothesis in the set of all FSMs with at most $m$ states.

Such $m$-complete methods are generally polynomially in the size of the hypothesis and exponential in $m - n$, which are far more efficient than an exhaustive search.

For an overview of some $m$-complete methods, we refer to \citet{Dorofeeva2010}.

All of these methods require the following information:

\begin{itemize}

\item A set of \emph{access sequences}$S =\{\lfloor q \rfloor_H | q \in Q_H\}$, possibly extended to a \emph{transition cover} set $S \cdot I$.

...

...

@@ -409,12 +413,12 @@ All of these methods require the following information:

$.

\item A means of pairwise distinguishing all states of $H$, such as set of \emph{discriminators}$E$ for all pairs of states in $H$.

\end{itemize}

A test set is then constructed by taking the product of these sets, or subsets of these sets, e.g.\ $S \cdot I^{l}\cdot E$.

A test suite is then constructed by combining these sets, or subsets of these sets, e.g.\ $S \cdot I^{l}\cdot E$.

The difference between different testing methods is how states are distinguished (i.e.\ the last part).

In the so-called \emph{partial W-method}, or \emph{Wp-method}, \citep{Fujiwara1991} states are distinguished based on the current state of the hypothesis:

In the so-called \emph{partial W-method}, or \emph{Wp-method}, \citep{Fujiwara1991} states are distinguished pairwise:

For each state $q \in Q_H$ a set $E_{q}\subset E$ of discriminators is constructed, such that for each state $q' \in Q \setminus\{q\}$ there is a sequence $w \in E_{q}$ that distinguishes $q$ and $q'$, i.e.\ $\lambda_H(q, w)\neq\lambda_H(q', w)$.

Then, each trace $uv, u \in S \cdot I, v \in I^{l}$ is concatenated with the set $E_q$such that$q =\delta_H(q_H, uv)$.

Then, each trace $uv, u \in S \cdot I, v \in I^{l}$ is extended with the set $E_q$where$q =\delta_H(q_H, uv)$.

%The method that uses a set of discriminators for all pairs of states is called the \emph{W-method} \cite{chow1978testing}.

...

...

@@ -424,12 +428,12 @@ Then, each trace $uv, u \in S \cdot I, v \in I^{l}$ is concatenated with the set

%Recently, however, \citet{Smeenk2015} have described a test method that uses adaptive distinguishing sequences to construct a test sets that (in most cases) is smaller than one constructed by the W-method (although they have the same worst case complexity).

%We will refer to this method as \emph{ADS}.

Conformance testing is typically very expensive due to the exponential size of the traversal set.

Given a hypothesis $H$ with $n$ states and $k$ inputs, the worst-case length of a test set (i.e. the sum of the length of all sequences) is of order $\mathcal{O}(k^{l}n^3)$ (recall that $l = m - n +1$, where $m$ is the upper bound on the number of states of $M$).

Conformance testing is typically expensive due to the exponential size of the traversal set.

Given a hypothesis $H$ with $n$ states and $k$ inputs, the worst-case length of a test suite (i.e. the sum of the length of all sequences) is of order $\mathcal{O}(k^{l}n^3)$ (recall that $l = m - n +1$, where $m$ is the upper bound on the number of states of $M$).

Moreover, it is hard to estimate an upper bound for $M$ in practice.

Therefore, instead of iterating over the traversal set $I^{l}$ (for each prefix in $S$), we sample the set $I^{\ast}$ by randomly generating sequences $v \in I^{\ast}$according to a geometric distribution.

While generating a sequence, the probability of terminating the sequence is $p =1/x$.

Therefore, the expected (mean) length of such a sequence is $E(l)=1/p = x$, and the probability that a sequence is of length $l$ is $(1-p)^xp$.

Therefore, instead of exhausting these test suites, we randomly sample from them, where the length is random as well. To be precise, we first sample uniformly from $S$, then we sample the set $I^{\ast}$ by randomly generating sequences $v \in I^{\ast}$ according to a geometric distribution and finally we will sample uniformly from $E$.

By using a geometric distribution we are not bounding the length of the sequence.

For our tools the expected length can be set by a parameter.

%often a value close to $n$ is picked for $m$, in the hope that the test set contains at least some counterexamples.