Parallel Computing
From NEST
Contents |
Introduction
As of version 2.0, NEST is capable of running simulations distributed over multiple computers. This feature is implemented on top of the Message Passing Interface (MPI). Using this feature, NEST is capable of simulating much larger networks, than would fit into the memory of a single machine. Additionally, the overall connect time and the runtime of the simulation can be reduced. The following paragraphs describe the facilities for distributed computing in detail.
Build Requirements
To compile NEST for distributed computing, you need a library implementation for MPI installed on your system. If you are on a cluster, you most likely have this already. We supports MPICH 1.2.7, MPICH 2, OpenMPI and ScaliMPI. Several other implementations are known to work but may require a little tweaking. Most Linux distributions provide packaged versions of MPICH and OpenMPI. Note, that in the case of a pre-packaged MPI library you will need both, the library and the development packages.
Compilation
If the MPI library and header files are installed to the standard directories of your system, it is likely that a simple
./configure --with-mpi
will find them. If your MPI installation resides in a non-standard location, you may use
./configure --with-mpi=/path/to/mpi
to specify the base directory, where the MPI bin, include and lib directories are found. This variant is more flexible and can also be used if MPI has been installed from the sources.
Running distributed simulations
Distributed simulations cannot be run interactively, which mean that the simulation has to be provided as a script. However, the script does not have to be changed compared to the script for serial simulation: inter-process communication and node distribution is managed transparently inside of NEST.
To run a simulation distributed on 128 processes, simply run
mpirun -np 128 nest script.sli
If using PyNEST, the command is
mpirun -np 128 python script.py
Note that PyNEST will only work nicely together with OpenMPI. For details on the usage of mpirun refer to the documentation by your library implementor.
Results will also be saved distributed in files of the form label-r-t-g.ext, where label is the type of device (e.g. voltmeter or spike_detector) or the label of the device (if set), r is the MPI rank of the process, t the thread of the device and g its global id. ext is the file extension, depending on the type of the device.
Node distribution
In the default configuration, nodes are distributed over the virtual processes by using a modulo operation,
. This corresponds to a random distribution of nodes. If nothing is known about the activity in the network, this will probably be a good distribution, as the communication load is distributed evenly across the processes. This is shown for an example network in figure 1.
However, for networks where the activity is coarsly known, better distributions always exist. To allow the user to specify, which nodes are to be placed in the same virtual process, the model subnet has a boolean parameter childeren_on_same_vp, which can be used to achieve a more fine-grained control over the distribution of nodes.
One important thing to note in a distributed scenario is that devices are replicated once for each each virtual process. The reason for this is mostly performance: For devices injecting signals into the network we want to avoid unnecessary communication between VPs. Analog recording devices (e.g. voltmeter) send request events to the neurons they record from. This request mechanism works only locally and would be very costly over the network. For devices that record spikes, the main reason is to disburden the hard disks by only writing local spikes. However this leads to many data files that have to be sorted and combined manually after the simulation is finished. Devices only stimulate/record from devices local to their VP.
MPI related commands and concepts
Each node provides some distribution related information via its status dictionary. The most important fields are
- vp: the VP a node is attached to
- thread: the id of the thread a node is attached to
- local: boolean that indicates if a node exists locally in a process
In MPI programs, each process has a unique identifier, its rank. Normally, you will not have to care about this number. However, in special cases it may be necessary to use it (e.g. opening files with a different names for each process). The rank of a process is available via the command
Rank
Extended addresses
As explained above, devices are replicated for each VP. If not specified, all commands that operate on global IDs or addresses will use the node on thread 0 implicitly. It may be desirable to access the other nodes, which is possible using extended addresses, which work as follows:
Assume that [0 1 2] is the address of a node. By appending a thread id to access to the normal address, we obtain the extended address of the node. The following example extracts the spike count from the spike_detector with global ID 4 on thread 2:
4 GetAddress 2 append GetStatus /n_events get
