Example of GYACOMO23 performances are presented to give intuitions about the way to distribute data among the process for parallel runs.
Marconi, 21x6x192x96x24, single precision
We parallelized this simulation on two marconi skl nodes (48CPU each) with np=2, nky=12, nz = 4.
The profiling (c.f. figure under) shows the costs of the routines in time for one CPU.
Local (coffee lake i7) 3x2x64x48x16, single precision
Here, a small problem is solved locally (6 cpu) with np=1, nky=3, nz=2.
The profiling follows: