Performance Considerations#

When running Monte Carlo simulations, it’s essential to consider the performance of your system to ensure efficient processing of large datasets. The dual energy inversion algorithm is computationally intensive, requiring a few thousand realizations to obtain solid statistical estimates. For example, \(5000\) Monte Carlo realizations in a modest \(1000^3\)-sized image will result in 5 trillion solution searches for the governing system of equations.

To optimize your system’s performance, it’s crucial to understand how the algorithm calculation is designed. Here are some key considerations:

  • The algorithm processes one chunk at a time, with all MPI processes cooperating by splitting the number of voxels in that chunk. After the chunk is done, a checkpoint is written to the file system, which is beneficial for resuming the simulation in case it crashes.

  • Choosing Chunk Size: A smaller chunk size will result in more frequent checkpoints. However, if the chunk size is too small, it can lead to decreased efficiency due to the overhead of reading and writing data, notably when using GPUs, as data has to be transfered back and forth between the system memory and the GPU memory. Based on our early tests, a chunk size of around (250, 250, 250) seems to be a good compromise between memory access and checkpoint frequency.

  • Number of MPI Processes for CPU processing: The number of MPI processes determines how many processes will cooperate to process each chunk. When using only CPUs, choose the number of MPI processes equal to the maximum number of parallel slots available.

  • Number of MPI Processes for GPU processing: When using GPUs, RockVerse will automatically distribute the MPI processes as evenly as possible across the GPU devices. The optimum number of MPI processes is therefore a multiple of the number of available GPU devices. If you’re working with large memory devices, make the number of MPI processes equal to the number of available GPUs. If you see the GPUs are under-utilized, increase the number of MPI processes but keep it as a multiple of GPU devices, for example twice as much such that each GPU run two parallel processes. If you encounter memory problems, reduce the chunk size.

The considerations above are general guidelines. Ultimately, the key to achieving optimal performance is to test and benchmark different simulation parameters on your specific system configuration.