Commit 462edec4 authored by Thom Badings's avatar Thom Badings
Browse files

Update README.md

parent dcc8fb8c
# Sampling-Based Verification of CTMCs with Uncertain Rates (SLURF)
# Scenario-Based Verification of Uncertain Parametric MDPs
This is an implementation of the approach proposed in the paper:
- [1] "Sampling-Based Verification of CTMCs with Uncertain Rates" by Thom Badings, Sebastian Junges, Nils Jansen, Marielle Stoelinga, and Matthias Volk, CAV 2022
Our implementation is called SLURF, named after the resemblance between a typical result figure and the trunk of an elephant, which is called a *slurf* in Dutch (Photo by [Rachel Claire](https://www.pexels.com/nl-nl/@rachel-claire/) via [Pexels](https://www.pexels.com/nl-nl/foto/voedsel-natuur-zonnig-veld-4577802/?utm_content=attributionCopyText&utm_medium=referral&utm_source=pexels)):
![img/slurf-intro.png](./img/slurf-intro.png)
- [1] "Scenario-Based Verification of Uncertain Parametric MDPs" by Thom Badings, Murat Cubuktepe, Nils Jansen, Sebastian Junges, Joost-Pieter Katoen, and Ufuk Topcu, Journal on STTT 2022.
## 1. Installation from source
While for users, we recommend to use SLURF via the Docker container, you can also build SLURF from source as follows:
While for users, we recommend to use this artifact via the Docker container, you can also build it from source as follows:
- Install [Storm](https://www.stormchecker.org/documentation/obtain-storm/build.html), [pycarl](https://moves-rwth.github.io/pycarl/installation.html#installation-steps) and [stormpy](https://moves-rwth.github.io/stormpy/installation.html#installation-steps) using the instructions in the documentation.
Note that one must use the master branches of all three tools.
Preferably, install pycarl and stormpy in a virtual environment.
We have tested the artifact using Storm version 1.6.4.
- Install dependencies: on macOS, tkinter needs to be available.
It can be installed via [Homebrew](https://brew.sh/):
`brew install python-tk`
- Install SLURF using
- Install the artifact using
`python setup.py install`
## 2. Run SLURF using a Docker container
## 2. Run using a Docker container
We provide a docker container. We assume you have Docker installed (if not, see the [Docker installation guide](https://docs.docker.com/get-docker/)). Then, run:
```
docker pull thombadings/slurf:cav22
docker pull thombadings/upmdps:sttt
```
or in case you downloaded this container from an (unpacked) archive:
```
docker load -i slurf_cav22_docker.tar
docker load -i upMDPs_STTT.tar
```
Our Docker container is built upon a container for the probabilistic model checker Storm (see [this documentation](https://www.stormchecker.org/documentation/obtain-storm/docker.html) for details).
......@@ -45,101 +42,38 @@ Our Docker container is built upon a container for the probabilistic model check
To use the docker container, open a terminal and navigate to the folder where you want to save the results in. Then, run the following command (for Windows platforms, please see the [documentation on the Storm website](https://www.stormchecker.org/documentation/obtain-storm/docker.html#run-the-docker-image-windows)):
```
docker run --mount type=bind,source="$(pwd)",target=/opt/slurf/output -w /opt/slurf --rm -it --name slurf thombadings/slurf:cav22
docker run --mount type=bind,source="$(pwd)",target=/opt/upmdps/output -w /opt/upmdps --rm -it --name upmdps movesrwth/storm:1.6.4
```
You will see a prompt inside the docker container. The README in this folder is what you are reading. Now you are ready to run the code for a single model (Section 3) or to replicate the experiments presented in [1] (Section 4).
## 3. How to run for a single model?
A miminal command to run a single CTMC or fault tree is as follows:
```
python runfile.py --N <number of samples> --beta <confidence level> --model <path to model file>
```
The `model` argument should contain the path to the model file, rooted in the `model` folder. For defining parametric CTMCs, we support [the PRISM format](https://prismmodelchecker.org/manual/ThePRISMLanguage/Introduction), while for parametric fault trees, we support [the Galileo format](https://www.cse.msu.edu/~cse870/Materials/FaultTolerant/manual-galileo.htm#Editing%20in%20the%20Textual%20View). See Section 6 for how the parameters in these models are defined.
We need two input files: (1) a parametric MDP in [the PRISM format](https://prismmodelchecker.org/manual/ThePRISMLanguage/Introduction), and (2) a file that describes the property to verify against. The current implementation assumes a uniform distribution over the parameters of the parametric MDP.
To run for 100 samples of the SIR epidemic model with a population of 20 and a confidence probability of 0.99 (i.e., the obtained results via scenario optimization are correct with at least 99% probability), the following command may be executed:
A minimal code to run the BRP (256,5) benchmark is then as follows:
```
python runfile.py --N 100 --beta 0.99 --model ctmc/epidemic/sir20.sm
python3 sampler.py --model models/brp/brp_256_5.pm --property models/brp/brp.prctl --threshold 0.5 --bisimulation weak --comparator 'leq' --num_samples 10000 --beta 0.999 --outfile 'brp_256_5_output';
```
The `model` argument is mandatory, while the number of samples `N` is 100 by default, and the default confidence probability `beta` is 0.99. For details on all possible arguments and their default values, see below.
### Parameter distribution and property files
Besides the model specification file, SLURF requires two additional Excel files, which must be located in the same folder as the model:
- A parameter probability distribution file: `parameters.xlsx` by default, but a manual file can be passed as follows: `--param_file filename.xlsx`
- A property definition file: `properties.xlsx` by default, but a manual file can be passed as follows: `--prop_file filename.xlsx`
The parameter distribution file defines the probability distributions of the parameters. For example, the `parameters.xlsx` for the SIR20 CTMC model looks as follows:
| name | type | mean | std |
| --- | --- | --- | --- |
| ki | gaussian | 0.05 | 0.002 |
| kr | gaussian | 0.04 | 0.002 |
The `model` and `property` arguments refer to the PRISM model and the property file, respectively, and the `bisimulation` can either be strong, weak, or none. Moreover, the `threshold` is the probability for the property that we check against. Finally, `num_samples` is the number of parameter samples that is used, `beta` is the confidence level (for which we compute the lower bound on the satisfaction probability), and `outfile` is the file (prefix) in which we store the results (note that this is relative to the `output/` folder already).
Here, `ki` and `kr` are the parameter names, `type` can either be `gaussian`, in which case we pass a `mean` and `std`, or it can be `interval` in which case we definee a `lb` and `ub` column for the lower and upper bounds (see the `parameters.xlsx` file for the Kanban CTMC for an example with interval distributions).
The properties file defines the properties that are verified by Storm. We can either pass a list of properties with only varying timebounds, or a list of independent properties. For the SIR20 CTMC, only the timebounds vary, so the `properties.xlsx` file looks as follows:
| label | property | time | enabled |
|-----------|--------------------------------------|------| --- |
| Rel T=104 | P=? [ (popI>0) U[100,104] (popI=0) ] | 104 | TRUE |
| Rel T=108 | P=? [ (popI>0) U[100,108] (popI=0) ] | 108 | TRUE |
| ... | ... | ... | ... |
By contrast, for the Kanban CTMC, we pass multiple independent properties, yielding the `properties.xlsx` file:
| label | property | enabled |
| --- | --- | --- |
| Exp. tokens cell 1 | R{"tokens_cell1"}=? [ S ] | TRUE |
| Exp. tokens cell 2 | R{"tokens_cell2"}=? [ S ] | TRUE |
| Exp. tokens cell 3 | R{"tokens_cell3"}=? [ S ] | TRUE |
| Exp. tokens cell 4 | R{"tokens_cell4"}=? [ S ] | TRUE |
Note that in both cases, the `enabled` column should be set to `TRUE` in order for the property to be verified by Storm.
### Multiple expected reward measures
- For models with multiple reward structures, bisimulation should be disabled. You can do this by providing the optional argument `--no-bisim`.
### Constructing Pareto-curves
Besides computing rectangular confidence regions (which is the default), SLURF can also construct a Pareto-front or -curve on the computed solution vectors. To do so, set the option `--pareto_pieces n`, where `n` is an integer that describes the number of linear elements in the Pareto-front. For `n=0`, a default rectangular confidence region is computed.
For details on all possible arguments and their default values, see below.
### Inspecting results
The results for individual experiments are saved in the `output/` folder, where a folder is created for each run of the script. In this experiment folder, you will find all related figures, and an Excel file with the raw export data.
The results for individual experiments are saved in the `output/` folder. Depending on the arguments provided to the script, at most three output files are created here: one .txt file with some logs, and two .csv files with results that are also shown in the tables in the main paper, [1].
## 4. How to run experiments?
The figures and tables in the experimental section of [1] can be reproduced by running one the shell scripts in the `experiments` folder:
- `cd experiments; bash run_experiments.sh` runs the full experiments as presented in [1]. Expected run time: multiple days.
- `cd experiments; bash run_experiments_partial.sh` runs a partial set of experiments. Expected run time: 3 hours.
The and tables in the experimental section of [1] can be reproduced by running the shell script `RUN_ALL.sh` in the `experiments/` folder. This script then runs a number of subscripts that make calls to the Python code. All results are stored in the `output/` folder, and the tables in [1] can be replicated as follows:
Before running the experiments, we recommend to remove any existing files/folders in the `output/` folder (except the `.keep` file).
Table 1 can be replicated by putting the CSV output files from running the drone experiment that end with `_satprob.csv`. Note that three of such CSV files are created, for the different wind conditions in the experiment. Every of each files represents a row in Table 1.
Note that warnings may pop up (related to the model checker Storm), but these can safely be ignored.
Table 3 can be replicated by collecting all CSV output files that end with `_confprob.csv` (except for the drone results, which are presented in Table 1). For every benchmark instance, such a file is created, which represents one row in Table 3.
### Run times
- With the expected run times in mind, we recommend running the partial set. The partial set of experiments uses reduced numbers of solution vectors and less repetitions for each experiment.
- If you want to run the full experiment set anyways, you can reduce the number of repetitions per experiment to reduce the run time. To do this, open one of the respective shell scripts in the `experiments` folder, and reduce the value of variable `reps`, e.g. to `reps=2`.
- The default timeout of one hour (set in `table4_statistics.sh` / `table4_statistics_partial.sh`) may not be enough to run some benchmarks (e.g., Kanban 5; see Table 1 in [1]) in the full experiment set. If desired, you can increase the timeout for these experiments to finish.
### Inspecting experiment results
Both scripts run 5 experiments, which we now discuss one by one. All tables are stored in CSV format in the folder `output/` (which is the shared Docker folder, when using SLURF via the run command above). Partial tables are stored with the suffix `_partial` in the filename.
1. Creating all figures presented in [1]. The figures are saved in the `output/` folder, where a subfolder is created for each experiment, e.g. `sir60_N=400_date=2022-05-05_14-17-30`. Within these subfolders, you will find the experiment figures, saved as .pdf files.
2. Benchmark statistics. The table (corresponding with Tables 1 and 4 in [1]) is saved as `benchmark_statistics.csv`.
3. Tightness of obtained lower bounds. The tables (corresponding with Table 2 in [1]) are saved as `table2_kanban.csv` and `table2_rc.csv`.
4. Scenario optimization run times. The tables (corresponding with Table 3 in [1]) are saved as `table3_kanban.csv` and `table2_rc.csv`.
5. Comparison to naive Monte Carlo baseline. The table (corresponding with Table 5 in [1]) is saved as `naive_baseline.csv`.
Table 4 can be replicated by collecting all CSV output files that end with `_satprob.csv` (except for the drone results, which are presented in Table 1). For every benchmark instance, such a file is created. Table 4 can be produced by putting together the rows from the CSV files for the correct number of values (N).
## 5. List of possible arguments
......@@ -166,80 +100,10 @@ Below, we list all arguments that can be passed to the command for running SLURF
| curve_plot_mode | No | conservative | str | If `conservative`, overapproximation of curves are plotted over time; if `optimistic`, underapproximations are plotted |
| pareto_pieces | No | 0 | int | If nonzero, a pareto front is plotted, with a front consisting of the specified number of linear pieces |
## 6. Defining parametric models
SLURF supports uncertain parametric CTMCs (defined in PRISM format) and fault trees (defined in Galileo format). For example, the SIR epidemic model CTMC is defined as follows:
```
ctmc
const double ki;
const double kr;
const int maxPop = 20;
const int initS = maxPop-5;
const int initI = 5;
const int initR = 0;
module SIR
popS: [0..maxPop] init initS;
popI: [0..maxPop] init initI;
popR: [0..maxPop] init initR;
<>popS > 0 & popI > 0 & popI < maxPop -> ki*popS*popI : (popS'= popS-1) & (popI'= popI+1);
<>popI > 0 & popR < maxPop -> kr*popI : (popR'= popR+1) & (popI'= popI-1);
<>popI=0 -> 1 : true;
endmodule
label "done" = popI=0;
```
There are two parameters, `ki` and `kr`. As described in Section 3, the parameter distributions are defined in a separate Excel file.
Similarly, the following is the DCAS dynamic fault tree, in Galileo format:
```
param P;
param B;
param CS;
param SS;
param MS;
param MA;
param MB;
param PA;
param PB;
param PS;
toplevel "System";
"System" or "FDEP" "CPU" "MOTOR" "PUMPS";
"FDEP" fdep "TRIGGER" "P" "B";
"TRIGGER" or "CS" "SS";
"CPU" wsp "P" "B";
"MOTOR" or "SWITCH" "MOTORS";
"SWITCH" pand "MS" "MA";
"MOTORS" csp "MA" "MB";
"PUMPS" and "PUMP1" "PUMP2";
"PUMP1" csp "PA" "PS";
"PUMP2" csp "PB" "PS";
"P" lambda=P dorm=0;
"B" lambda=B dorm=0.5;
"CS" lambda=CS dorm=0;
"SS" lambda=SS dorm=0;
"MS" lambda=MS dorm=0;
"MA" lambda=MA dorm=0;
"MB" lambda=MB dorm=0;
"PA" lambda=PA dorm=0;
"PB" lambda=PB dorm=0;
"PS" lambda=PS dorm=0;
```
The original non-parametric fault tree which we adapted can be found in the [FFORT benchmark suite](https://dftbenchmarks.utwente.nl/ffort.php). Other benchmarks from this suite can be converted to a parametric model in a similar way.
## 6. Rebuilding the Docker container
The included Docker image of SLURF is based on the Stormpy image (by the storm developers, see [this documentation](https://www.stormchecker.org/documentation/obtain-storm/docker.html) for details). If, one makes changes to the source code of SLURF, the docker container must be built again, using the included Dockerfile. Rebuilding the SLURF image can be done by executing the following command in the SLURF directory (here, 1.0 indicated the version):
The included Docker image is based on the Stormpy 1.6.4 image (by the storm developers, see [this documentation](https://www.stormchecker.org/documentation/obtain-storm/docker.html) for details). If, one makes changes to our source code, the docker container must be built again, using the included Dockerfile. Rebuilding the image for this implementation can be done by executing the following command in the root directory (here, 1.0 indicates the version):
```
docker build -f Dockerfile --tag slurf:1.0 .
docker build -f Dockerfile --tag upmdps:1.0 .
```
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment