Hi!

 

I have now created

 

https://github.com/nest/nest-simulator/issues/3108

 

Best,

Hans Ekkehard

 

-- 

 

Prof. Dr. Hans Ekkehard Plesser

 

Department of Data Science

Faculty of Science and Technology

Norwegian University of Life Sciences

PO Box 5003, 1432 Aas, Norway

 

Phone +47 6723 1560

Email hans.ekkehard.plesser@nmbu.no

Home http://arken.nmbu.no/~plesser

 

 

 

From: Hans Ekkehard Plesser <hans.ekkehard.plesser@nmbu.no>
Date: Tuesday, 20 February 2024 at 08:23
To: NEST User Mailing List <users@nest-simulator.org>
Subject: [NEST Users] Re: Problems when using a mask inside the conn_dict on 3 or 4 MPI processes

 

Hello Miriam,

 

I have explored further and found that there are at least two independent bugs causing problems. The combination of slicing a layer and using MPI is causing the problems.

 

I will create an issue on Github myself with further reduced reproducers.

 

Best,

Hans Ekkehard

 

 

 

-- 

 

Prof. Dr. Hans Ekkehard Plesser

 

Department of Data Science

Faculty of Science and Technology

Norwegian University of Life Sciences

PO Box 5003, 1432 Aas, Norway

 

Phone +47 6723 1560

Email hans.ekkehard.plesser@nmbu.no

Home http://arken.nmbu.no/~plesser

 

 

 

From: Hans Ekkehard Plesser <hans.ekkehard.plesser@nmbu.no>
Date: Monday, 19 February 2024 at 20:33
To: NEST User Mailing List <users@nest-simulator.org>
Subject: [NEST Users] Re: Problems when using a mask inside the conn_dict on 3 or 4 MPI processes

 

Hello Miriam,

 

I have been able to reproduce the error. I assume it is related to the fact that you slice the layer of neurons, i.e., pick out individual neurons and use them as sources. I will have a closer look at this soon.

 

Ideally, such slicing should not be necessary. It rather seems to point to lack of support in NEST for certain connection patterns. We can look at that at a later stage.

 

Best,

Hans Ekkehard

 

-- 

 

Prof. Dr. Hans Ekkehard Plesser

 

Department of Data Science

Faculty of Science and Technology

Norwegian University of Life Sciences

PO Box 5003, 1432 Aas, Norway

 

Phone +47 6723 1560

Email hans.ekkehard.plesser@nmbu.no

Home http://arken.nmbu.no/~plesser

 

 

 

From: Kempter, Miriam <m.kempter@fz-juelich.de>
Date: Friday, 9 February 2024 at 10:59
To: users@nest-simulator.org <users@nest-simulator.org>
Subject: [NEST Users] Problems when using a mask inside the conn_dict on 3 or 4 MPI processes

Einige Personen, die diese Nachricht erhalten haben, erhalten nicht oft eine E-Mail von m.kempter@fz-juelich.de. Erfahren Sie, warum dies wichtig ist

Dear NEST Community, 
 
while adapting my model to run on multiple MPI processes, I have been running into some problems connected with the usage of masks inside the connectivity dictionary for a 2D-spatially distributed population. 
 
You can find a minimal example containing further details in the attachments in the form of a .txt file. Please just change "txt" with "py" for execution. The following explanation also references to the example. 
 

My setup:
    - NEST version 3.4
    - Python version 3.10
    - Executed with mpirun inside a conda environment
        "mpirun -np 4 python3 minEx_4MPIprocesses_problem.py" 
    - System: Ubuntu 22.04

     
    - The same problem occurred when the model was executed on JURECA. 
 
 
Problem description:
    While using multiple MPI processes:
         1. Create circular mask. See line 119 
             Create a spatial 2D-Population of neurons "neurons".  See line 125-132
         2. Select some of the neurons as source neurons. See line 154
         3. Set up the connection dictionary with a mask inside. See line 157-163
         4. For each source neuron:  See line 173 
                 Connect the source neurons with the neurons population using the connection dictionary 
                 See line 183


Where does the Problem occur?:

    The error occurs at the point when the nest.Connect(…) call is executed in the loop at line 183 when each source neuron is connected.  


When does the Problem occur?:
 
    Its occurrance depends on whether or not a mask is used inside the connection-dict given to the nest.Connect(...) function.  
  

    If the mask is removed as in the conn-dict. in line 164-169, no error is produced in none of the used settings (however the result is not as desired). 
  

    If the mask is used, the way the problem shows itself depends on the number of MPI processes used, and on the setting of the number of neurons, extent, mask radius, number of source neurons and if edge wrap is used. 
        - For 1 and 2 MPI procs the model is running correctly, independent of the used conn-dict and setting. 
        - For 3 MPI procs either
            the model runs but the distance between connected neurons does not correspond to the given mask dimensions. In other words: The established connections are longer/shorter than the mask should allow. 
            Or execution leads to an error code output ("segmentation fault") from mpirun and job abortion. 
                The terminal output for an example run can be seen in the attached file "minEx_4MPIprocesses_problem_error_output_3_MPI" 
        - For 4 MPI procs either,
            the model runs correctly,
             or it leads to an error code output ("segmentation fault")  and the model not terminating. (When executed in the terminal the  keyboard command "str+c" is needed to stop the execution. The terminal output for an example run can be seen in the attached file "minEx_4MPIprocesses_problem_error_output_4_MPI" 
        

    If  the model is executed with the same setting on 3 and 4 MPI procs there are 3 possible combinations of above described problems. 
        1. The model runs and terminates on both, however the distances created on 3 MPI procs are wrong. 
        2. The model works with 4 MPI procs but creates the "segmentation fault" error on 3 MPI processes. 
        3. The model creates a "segmentation fault" error on both. 
 
Please keep in mind that whether an error occurs and in which form highly depends on the used setting. In the minimal example I provide different settings which represent the above described cases. However, it might not cover everything that can occur. 
 
W
orkaround:
For my own use-case I found a rather computing-time consuming workaround.  
It involves applying nest.SelectNodesByMask() on every source neuron, and from the resulting set choosing the targets using the desired probability. 
This requires multiple additional loops and also communicating the position data of every neuron to every MPI process in the beginning. 
Using this approach, the connection distances seem to be correct and no error occurs while executing the model. However, while writing this I'm questioning if I actually tested it enough. So there might be some not yet discovered problems. 
 

Is there something that I overlooked or approached wrong when using masks in the connectivity dict on multiple MPI processes?  
 
Thanks in advance!  
 
     
Best, 
Miriam Kempter

 



------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Jülich GmbH
52425 Jülich
Sitz der Gesellschaft: Jülich
Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Stefan Müller
Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende),
Karsten Beneke (stellv. Vorsitzender), Dr. Ir. Pieter Jansens
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------