Parallel performance analysis · Wiki · Antoine Cyril David Hoffmann / Gyacomo

Example of GYACOMO23 performances are presented to give intuitions about the way to distribute data among the process for parallel runs.

Marconi, 21x6x192x96x24, single precision

We parallelized this simulation on two marconi skl nodes (48CPU each) with np=2, nky=12, nz = 4. The profiling (c.f. figure under) shows the costs of the routines in time for one CPU.

Local (coffee lake i7) 3x2x64x48x16, single precision

Here, a small problem is solved locally (6 cpu) with np=1, nky=3, nz=2. The profiling follows: