Changes

Ilya Schurov · bd1cbab8
--- a/Snellius-cluster.md
+++ b/Snellius-cluster.md
+Sometimes, our [local Science's cluster](Cluster/Slurm-Quickstart) is not enough for your computations: you need to use too much nodes or spend too much time. In this case, the best strategy is to use our cluster for development and testing, and then run your production-ready code on the larger Snellius cluster.
+Unlike Science's cluster, which is readily available for all TCM members, machine time on Snellius is distributed in a more restrictive way: you have to file an application before you can actually run some code there. In this application, you need to estimate and justify how much resources (core-hours, memory and storage) do you need for a specific project. We recommend filing so-called ["Small Applications"](https://servicedesk.surf.nl/wiki/pages/viewpage.action?pageId=30660193) that is relatively easy. Note that it tooks about 2 weeks for an application to be considered.
+One caveat is that you may file only one application per year. So if you have several projects in mind, it is better to combine them in one application.
+There is a comprehensive template and an example application presented on the [official webpage](https://servicedesk.surf.nl/wiki/pages/viewpage.action?pageId=30660193) which we recommend to consult in the first place. Below we also present some more example applications specifically related to physics.
+If you need any help preparing your Snellius application, you can freely ask Tom Westerhout.
+## Example 1
+### Description
+Many interesting phenomena come from many-body effects in quantum systems. Describing such systems, however, is a very challenging task because of a) very fast scaling of the Hilbert space dimension with system size, and b) the so called "sign problem". One of the standard methods for tackling such problems is exact diagonalization. We are working on a state-of-the-art implementation of this technique that prioritizes user friendliness without sacrificing performance.
+### Scientific project description
+Exact diagonalization is one of the oldest and most established numerical methods for simulation of small quantum systems. Exponential scaling of the computational resources is the main limiting factor in its applicability, and requires highly parallel implementations if one wants to consider slightly larger systems. We are working on implementing a scalable and user-friendly exact diagonalization package where Chapel, instead of MPI, is used for both shared- and distributed-memory parallelism.
+We have a very competitive single-node implementation that we have so far scaled to 16 nodes. By making use of Snellius, we hope to improve the scaling of our code to 50-100 nodes. We will work on tuning our implementation such that reproducing numerical experiments such as https://doi.org/10.1103/PhysRevE.98.033309 becomes possible.
+### Technical project requirements
+We have a multilingual project where Chapel (https://chapel-lang.org/) is used for all shared- and distributed-memory parallelism. Kernels for the most compute-intensive operations are compiled tuned for the target architecture using Halide (https://halide-lang.org/). All user-facing code is written in Haskell.
+We will run most large-scale jobs on the thin CPU nodes since we will require 50 to a 100 nodes at once. However, we also wish to explore the opportunity to compile our code for the GPUs (Halide natively supports it, and the Chapel team is working on it with initial support available since version 1.29).
+Assistance from SURF in tuning the code for the Snellius suprtcomputer will be greatly appreciated. We are familiar with Epyc processors and Infiniband networks, but might be lacking some Snellius-specific details.
+In total, we will require 1000000 SBU comprised from:
+  - 768000 SBUs on thin CPU nodes: 5 runs * 24 hours * 128 SBUs * 50 nodes
+  - 184320 SBUs on GPU nodes: 3 runs * 12 hours * 512 SBUs * 10 nodes
+  - Remaining 47680 SBUs to be used for testing and small-scale runs on both thin CPU and GPU nodes.
+We will also require 10TB of project space since jobs operate with multi-TB objects in memory, and we wish to store both checkpoints and output files (that are 100-500 GB per run).