README.md 3.08 KB
Newer Older
1
# Preliminaries
Nils Golembiewski's avatar
Nils Golembiewski committed
2

3
## Install rust
4
5
See: [https://www.rust-lang.org/tools/install](https://www.rust-lang.org/tools/install)

6
## Install anaconda
7
8
See: [https://docs.anaconda.com/anaconda/install/index.html](https://docs.anaconda.com/anaconda/install/index.html)

9
## (Recommended) install mamba for faster virtual environment installations
10
11
See: [https://github.com/mamba-org/mamba](https://github.com/mamba-org/mamba)

12
## Install and activate the virtual environment:
13
14
15
16
17
```bash
<mamba/conda> env create -f conda_environment.yml && \
conda activate natural_computing
```

18
# The dataset
19

20
## Obtain the raw data
21
22
Run the following commands
```bash
23
mkdir -p data
24
25
26
wget https://nilsgolembiewski.nl/public_files/uploads/2IGcXY8HeE69JlgFk1QLCvBh7NRxAV/full_export.txt.gz -O - | gunzip -c > data/raw_data.txt
```

27
## Generate dataset from raw data
28
29
30
31
```bash
cargo run --manifest-path=data_generation/Cargo.toml --release -- -d ./data/raw_data.txt -o ./data/dataset -l 20
```

32
33
This will result in the following dataset:

Nils Golembiewski's avatar
Nils Golembiewski committed
34
35
Folder structure: `folder/<canvas_id>/<canvas_id>_<user_id>_<idx>_<label>_<info>.<data_type>`.

36
### `before.png`
37
38
the state of the canvas as it was before the modification

39
### `delta.png`
40
41
the modifications since the canvas was moved

42
### `mask_points.txt` columns
43
44
`x`, `y`. The first y (y=0) is the top of the image. 

45
### `sequence.txt` columns
46
47
48
Concatenated information of: the latest placed pixel, the previously placed pixel (if any)
placed pixel: `canvas_id`, `user_id`, `x`, `y`, `r`, `g`, `b`, `timestamp`, `is_grief`
previous pixel: `exists`, `user_id`, `r`, `g`, `b`, `timestamp`, if `exists` is zero, all other values are also -1
Nils Golembiewski's avatar
Nils Golembiewski committed
49

50

51
## Downloads
52
53
A pregenerated dataset can be downloaded here: [https://nilsgolembiewski.nl/public_files/uploads/fDhANiJtdVw7EZSoW3sFyunk6mRL9q/dataset.zip](https://nilsgolembiewski.nl/public_files/uploads/fDhANiJtdVw7EZSoW3sFyunk6mRL9q/dataset.zip).

54
55
The corresponding `train_metadata.yml` can be downloaded from [https://nilsgolembiewski.nl/public_files/uploads/dXuJ7lqc6WPKh4ebgfVOw523vnSAjN/train_metadata.yml.gz](https://nilsgolembiewski.nl/public_files/uploads/dXuJ7lqc6WPKh4ebgfVOw523vnSAjN/train_metadata.yml.gz)

Nils Golembiewski's avatar
Nils Golembiewski committed
56
Or use the following commands (unzipping may take a while):
57
58
59
60
61
62
63
64
65
```bash
mkdir -p data
cd data
wget https://nilsgolembiewski.nl/public_files/uploads/fDhANiJtdVw7EZSoW3sFyunk6mRL9q/dataset.zip -O dataset.zip \
    && unzip -q dataset.zip \
    && rm dataset.zip
wget https://nilsgolembiewski.nl/public_files/uploads/dXuJ7lqc6WPKh4ebgfVOw523vnSAjN/train_metadata.yml.gz -O - | gunzip -c > train_metadata.yml
```

66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85

# Training

## Vision model
### Train for the folds
```bash
python train_vision.py -t data/train_metadata.yml -j 4 -e configurations/train_vision/resnet_18.yml -o output/vision_models
```

### Inspect results
Run:
```
mlflow ui
```
And view the results in a browser by clicking on the link which is printed. Each fold is a separate run, but they share a common `unique_id`, which can be found in the parameters.

The best models for each fold can be found in the output folder (`output/vision_models/<unique_id>`) if the command above was used.


# TODO
86
mistake analysis, mistake correlated?