Preparing the Dataset#

Now that we have successfully imported the C04B21 dual-energy CT dataset in the previous part of the tutorial, we can proceed with preparing the dataset for the Monte Carlo inversion. In this part, we will perform the necessary pre-processing steps to get the data ready, which include:

  • Creating the dual-energy class instance.

  • Importing the original images.

  • Assigning a mask and specifying the standard properties.

  • Running the pre-processing step to generate the inversion coefficients.

  • Performing appraisal.

Start a parallel MPI cluster#

In this tutorial, we will process data directly within the Jupyter Notebook, in a parallel MPI (Message Passing Interface) environment using ipyparallel.

The machine used in this tutorial has 8 GPUs available, which will significantly enhance processing speed due to the parallel computation capabilities of RockVerse. Let’s create a cluster with a set of 8 MPI engines. RockVerse will automatically assign one MPI process to each GPU.

[1]:
import ipyparallel as ipp

# Create an MPI cluster with 8 engines
cluster = ipp.Cluster(engines="mpi", n=8)

# Start and connect to the cluster
rc = cluster.start_and_connect_sync()

# Enable IPython magics for parallel processing
rc[:].activate()

Starting 8 engines with <class 'ipyparallel.cluster.launcher.MPIEngineSetLauncher'>

This will enable the %%px cell magic, which allows RockVerse to perform parallel processing interactively within this Jupyter notebook.

Create the Dual Energy group and import the CT images#

Next, let’s create the dual energy group and copy the images we imported in the previous part of this tutorial.

The copy_image method below will create a new image in the Zarr store, giving us the opportunity to change the chunk size. Monte Carlo dual energy processing utilizes all MPI processes to share the load of each individual chunk. Once processing for each chunk is finished, RockVerse will write a checkpoint to the file system, allowing the simulation to be resumed in case it crashes midway.

Therefore, prefer smaller chunks if you want frequent checkpoints, but not so small that the read/write overhead becomes considerable. Considering our image shape and the 8 available GPUs, we’ll copy the original CT scans creating 16 chunks.

[ ]:
%%px --block

import matplotlib.pyplot as plt
from IPython.display import display
import rockverse as rv

# Create the Dual Energy CT group
dectgroup = rv.dect.create_group(
    store='/path/to/dual_energy_ct/C04B21',
    overwrite=True)

# Copy the low energy CT image
dectgroup.copy_image(
    image=rv.open(store='/path/to/imported/dual_energy_carbonate/C04B21Raw100keV'),
    name='lowECT',
    chunks=16,
    overwrite=True)

# Copy the high energy CT image
dectgroup.copy_image(
    image=rv.open(store='/path/to/imported/dual_energy_carbonate/C04B21Raw140keV'),
    name='highECT',
    chunks=16,
    overwrite=True)

[stdout:0] [2025-07-28 11:36:12] Copying: 100% 16/16 [00:03<00:00,  5.24chunk/s]
[2025-07-28 11:36:15] Copying: 100% 16/16 [00:02<00:00,  5.45chunk/s]

Now, let’s take a quick look at the data using the orthogonal viewer:

[3]:
%%px --block

# Create orthogonal viewers for low and high energy images
lowE_viewer = rv.OrthogonalViewer(image=dectgroup.lowECT, histogram_bins=2**10)
highE_viewer = rv.OrthogonalViewer(image=dectgroup.highECT, histogram_bins=2**10)

# Matplotlib's figure object can be accessed through
# the figure property in the orthogonal viewer.
# Let's increase the figure size
lowE_viewer.figure.set_size_inches(10, 10)
highE_viewer.figure.set_size_inches(10, 10)

# Each process will create its own repeated image,
# let's close all but rank zero:
if rv.mpi_rank != 0:
    plt.close(lowE_viewer.figure)
    plt.close(highE_viewer.figure)
[stdout:0] [2025-07-28 11:36:19] Histogram Low energy attenuation (min/max): 100% 16/16 [00:00<00:00, 92.63chunk/s]
[2025-07-28 11:36:19] Histogram Low energy attenuation (counting voxels): 100% 16/16 [00:02<00:00,  6.53chunk/s]
[2025-07-28 11:36:25] Histogram High energy attenuation (min/max): 100% 16/16 [00:00<00:00, 86.90chunk/s]
[2025-07-28 11:36:25] Histogram High energy attenuation (counting voxels): 100% 16/16 [00:01<00:00,  9.31chunk/s]

[output:0]
../../../_images/tutorials_digitalrock_dual_energy_dual_energy_tutorial_prepare_data_5_3.png
[output:0]
../../../_images/tutorials_digitalrock_dual_energy_dual_energy_tutorial_prepare_data_5_5.png

Intermediate step (optional): perform image segmentation#

Now that the original CT images have been successfully imported, the next step is to define a mask. The mask helps optimize computational resources by excluding voxels that are not relevant to our analysis. Additionally, we need to provide information about the calibration materials, namely compositions and probability density functions for the X-ray attenuation.

This information comes from image segmentation. If a segmentation or a mask image have already been created elsewhere, you can use the copy_image method to import them (refer to the API documentation for detailed instructions). However, the original dataset from the Digital Rocks Portal does not include this information. Nevertheless, the rock sample and the standard materials are reasonably aligned with the image’s z-axis. Let’s quickly build a (not very accurate but useful for our tutorial…) segmentation image using RockVerse’s cylindrical regions. A little trial and error is all it takes in our case:

Air Region#

[4]:
%%px --block

# Adjusting viewer properties will help us in this task
highE_viewer.update_image_dict(clim=(-1200, 3000))
highE_viewer.mask_color = 'gold'
highE_viewer.mask_alpha = 0.5

# This is the final cylindrical region for probing air attenuation
air_region = rv.region.Cylinder(p=(126, 20, 461), v=(0, 0, 1), r=10, l=750)

# Set the region in the viewer and visualize the result
highE_viewer.region = air_region

# Changing region rebuilds the histogram. Let's set scales again
highE_viewer.ax_histogram.set_xlim(-1250, 3050)
highE_viewer.ax_histogram.set_ylim(0, 5e5)

# Only display the figure for rank 0
if rv.mpi_rank == 0:
    display(highE_viewer.figure)
[stdout:0] [2025-07-28 11:36:38] Histogram High energy attenuation (min/max): 100% 16/16 [00:00<00:00, 24.40chunk/s]
[2025-07-28 11:36:38] Histogram High energy attenuation (counting voxels): 100% 16/16 [00:00<00:00, 65.83chunk/s]

[output:0]
../../../_images/tutorials_digitalrock_dual_energy_dual_energy_tutorial_prepare_data_7_3.png

Water region#

[5]:
%%px --block

# Final water region
water_region = rv.region.Cylinder(p=(55, 176.2, 461), v=(0, 0, 1), r=10, l=900)

# Adjust the viewer and display for rank 0
highE_viewer.region = water_region
highE_viewer.ref_point = water_region.p
highE_viewer.ax_histogram.set_xlim(-1250, 3050)
highE_viewer.ax_histogram.set_ylim(0, 3e5)

if rv.mpi_rank == 0:
    display(highE_viewer.figure)
[stdout:0] [2025-07-28 11:36:43] Histogram High energy attenuation (min/max): 100% 16/16 [00:00<00:00, 28.33chunk/s]
[2025-07-28 11:36:43] Histogram High energy attenuation (counting voxels): 100% 16/16 [00:00<00:00, 67.40chunk/s]

[output:0]
../../../_images/tutorials_digitalrock_dual_energy_dual_energy_tutorial_prepare_data_9_3.png

Silica region#

[6]:
%%px --block

# Final silica region
silica_region = rv.region.Cylinder(p=(124.7, 228, 461), v=(0, 0, 1), r=6, l=900)

# Adjust the viewer and display for rank 0
highE_viewer.region = silica_region
highE_viewer.ref_point = silica_region.p
highE_viewer.ax_histogram.set_xlim(-1250, 3050)
highE_viewer.ax_histogram.set_ylim(0, 1e5)

if rv.mpi_rank == 0:
    display(highE_viewer.figure)
[stdout:0] [2025-07-28 11:36:49] Histogram High energy attenuation (min/max): 100% 16/16 [00:00<00:00, 27.63chunk/s]
[2025-07-28 11:36:49] Histogram High energy attenuation (counting voxels): 100% 16/16 [00:00<00:00, 68.04chunk/s]

[output:0]
../../../_images/tutorials_digitalrock_dual_energy_dual_energy_tutorial_prepare_data_11_3.png

Teflon region#

[7]:
%%px --block

# Final teflon region
teflon_region = rv.region.Cylinder(p=(196, 179, 461), v=(0, 0, 1), r=8.5, l=900)

# Adjust the viewer and display for rank 0
highE_viewer.region = teflon_region
highE_viewer.ref_point = teflon_region.p
highE_viewer.ax_histogram.set_xlim(-1250, 3050)
highE_viewer.ax_histogram.set_ylim(0, 2e5)

if rv.mpi_rank == 0:
    display(highE_viewer.figure)
[stdout:0] [2025-07-28 11:36:55] Histogram High energy attenuation (min/max): 100% 16/16 [00:00<00:00, 28.66chunk/s]
[2025-07-28 11:36:55] Histogram High energy attenuation (counting voxels): 100% 16/16 [00:00<00:00, 69.17chunk/s]

[output:0]
../../../_images/tutorials_digitalrock_dual_energy_dual_energy_tutorial_prepare_data_13_3.png

Rock sample region#

[8]:
%%px --block --group-outputs=type

# Final rock region
rock_region = rv.region.Cylinder(p=(126, 114, 461), v=(0, 0, 1), r=63, l=875)

# Adjust the viewer and display for rank 0
highE_viewer.region = rock_region
highE_viewer.ref_point = rock_region.p
highE_viewer.ax_histogram.set_xlim(-1250, 3050)
highE_viewer.ax_histogram.set_ylim(0, 2e6)

if rv.mpi_rank == 0:
    display(highE_viewer.figure)
[stdout:0] [2025-07-28 11:37:01] Histogram High energy attenuation (min/max): 100% 16/16 [00:00<00:00, 22.90chunk/s]
[2025-07-28 11:37:01] Histogram High energy attenuation (counting voxels): 100% 16/16 [00:00<00:00, 18.18chunk/s]

[output:0]
../../../_images/tutorials_digitalrock_dual_energy_dual_energy_tutorial_prepare_data_15_3.png

Combined segmentation image#

Now, we can use these regions to create the final segmentation image:

[9]:
%%px --block

# Create the segmentation voxel image inside the dual energy group
dectgroup.create_segmentation(fill_value=0, overwrite=True)

# Use the VoxelImage math method to assign each region
dectgroup.segmentation.math(value=1, op='set', region=air_region)     # Air will be phase 1
dectgroup.segmentation.math(value=2, op='set', region=water_region)   # Water will be phase 2
dectgroup.segmentation.math(value=3, op='set', region=silica_region)  # Silica will be phase 3
dectgroup.segmentation.math(value=4, op='set', region=teflon_region)  # Teflon will be phase 4
dectgroup.segmentation.math(value=5, op='set', region=rock_region)    # Rock sample will be phase 5

# Adjust the viewer and display for rank 0
highE_viewer.region = None
highE_viewer.segmentation = dectgroup.segmentation
highE_viewer.ref_point = rock_region.p
highE_viewer.ax_histogram.set_xlim(-1250, 3050)
highE_viewer.ax_histogram.set_ylim(0, 2.6e7)

if rv.mpi_rank == 0:
    display(highE_viewer.figure)
[stdout:0] [2025-07-28 11:37:08] (segmentation) Set: 100% 16/16 [00:03<00:00,  4.81chunk/s]
[2025-07-28 11:37:12] (segmentation) Set: 100% 16/16 [00:00<00:00, 32.33chunk/s]
[2025-07-28 11:37:12] (segmentation) Set: 100% 16/16 [00:00<00:00, 44.61chunk/s]
[2025-07-28 11:37:13] (segmentation) Set: 100% 16/16 [00:00<00:00, 48.65chunk/s]
[2025-07-28 11:37:13] (segmentation) Set: 100% 16/16 [00:00<00:00, 42.84chunk/s]
[2025-07-28 11:37:14] Histogram High energy attenuation (min/max): 100% 16/16 [00:00<00:00, 101.32chunk/s]
[2025-07-28 11:37:14] Histogram High energy attenuation (counting voxels): 100% 16/16 [00:01<00:00,  9.32chunk/s]
[2025-07-28 11:37:17] Histogram High energy attenuation (min/max): 100% 16/16 [00:00<00:00, 105.06chunk/s]
[2025-07-28 11:37:17] Histogram High energy attenuation (reading segmentation): 100% 16/16 [00:00<00:00, 107.79chunk/s]
[2025-07-28 11:37:18] Histogram High energy attenuation (counting voxels): 100% 16/16 [00:02<00:00,  7.54chunk/s]

[output:0]
../../../_images/tutorials_digitalrock_dual_energy_dual_energy_tutorial_prepare_data_17_3.png

Assigning the mask image#

Our segmentation result conveniently delivers the inversion mask (voxels we want to exclude from our inversion) through segmentation phase 0. Let’s assign it to our dect group:

[10]:
%%px --block

# Create the empty mask
dectgroup.create_mask(fill_value=False, overwrite=True)

# Use VoxelImage math method to set mask to True where segmentation is 0
dectgroup.mask.math(value=True,
                    op='set',
                    segmentation=dectgroup.segmentation,
                    phases=(0,))

# Adjust the viewer and display for rank 0
highE_viewer.mask = dectgroup.mask
highE_viewer.mask_color = 'k'
highE_viewer.mask_alpha = 0.75
highE_viewer.ax_histogram.set_xlim(-1250, 3050)
highE_viewer.ax_histogram.set_ylim(0, 2e6)

if rv.mpi_rank == 0:
    display(highE_viewer.figure)

[stdout:0] [2025-07-28 11:37:25] (mask) Set: 100% 16/16 [00:00<00:00, 34.15chunk/s]
[2025-07-28 11:37:26] Histogram High energy attenuation (min/max): 100% 16/16 [00:00<00:00, 39.81chunk/s]
[2025-07-28 11:37:26] Histogram High energy attenuation (reading segmentation): 100% 16/16 [00:00<00:00, 507.53chunk/s]
[2025-07-28 11:37:26] Histogram High energy attenuation (counting voxels): 100% 16/16 [00:00<00:00, 17.83chunk/s]

[output:0]
../../../_images/tutorials_digitalrock_dual_energy_dual_energy_tutorial_prepare_data_19_3.png

The black voxels in the image above will be ignored during the Monte Carlo inversion. Let’s rebuild both viewers with all the updates so far. This time we will increase the number of histogram bins and write proper labels for the segmentation phases.

[11]:
%%px --block

# Common properties can be set at once at OrthogonalViewer creation
kwargs = {
    'region': None,
    'mask': dectgroup.mask,
    'mask_color': 'k',
    'mask_alpha': 0.75,
    'histogram_bins': 5000,
    'segmentation': dectgroup.segmentation,
    'ref_point': rock_region.p,
}

lowE_viewer = rv.OrthogonalViewer(image=dectgroup.lowECT, **kwargs)
highE_viewer = rv.OrthogonalViewer(image=dectgroup.highECT, **kwargs)

# Fine tuning viewer properties
for viewer in [lowE_viewer, highE_viewer]:
    viewer.figure.set_size_inches(10, 10)    # figure size
    viewer.update_image_dict(clim=(-1200, 4000)) # X-ray CT clims
    viewer.ax_histogram.set_xlim(-1250, 4500)  # Histogram limits
    # Set segmentation names in the legend labels
    viewer.ax_histogram.legend(
        [
            highE_viewer.histogram_lines['full'],
            highE_viewer.histogram_lines['1'],
            highE_viewer.histogram_lines['2'],
            highE_viewer.histogram_lines['3'],
            highE_viewer.histogram_lines['4'],
            highE_viewer.histogram_lines['5'],
        ], [
            'Full',
            'Air',
            'Water',
            'SiO2',
            'Teflon',
            'Rock'
        ]
    )

# Close all but rank 0
if rv.mpi_rank != 0:
    plt.close(lowE_viewer.figure)
    plt.close(highE_viewer.figure)

[stdout:0] [2025-07-28 11:37:34] Histogram Low energy attenuation (min/max): 100% 16/16 [00:00<00:00, 60.42chunk/s]
[2025-07-28 11:37:34] Histogram Low energy attenuation (reading segmentation): 100% 16/16 [00:00<00:00, 438.91chunk/s]
[2025-07-28 11:37:34] Histogram Low energy attenuation (counting voxels): 100% 16/16 [00:03<00:00,  5.01chunk/s]
[2025-07-28 11:37:42] Histogram High energy attenuation (min/max): 100% 16/16 [00:00<00:00, 61.05chunk/s]
[2025-07-28 11:37:42] Histogram High energy attenuation (reading segmentation): 100% 16/16 [00:00<00:00, 455.31chunk/s]
[2025-07-28 11:37:42] Histogram High energy attenuation (counting voxels): 100% 16/16 [00:02<00:00,  5.37chunk/s]

[output:0]
../../../_images/tutorials_digitalrock_dual_energy_dual_energy_tutorial_prepare_data_21_3.png
[output:0]
../../../_images/tutorials_digitalrock_dual_energy_dual_energy_tutorial_prepare_data_21_5.png

Filling Standard Material Information and running pre-process#

Now we can finally populate the information and the X-ray attenuation probability density functions (PDFs) for the standard materials. There are two ways to inform RockVerse about these PDFs:

  1. By passing a two-element list or tuple containing the x (attenuation values) and y (PDF values) as Numpy arrays for an arbitrary PDF model. Input PDF values do not need to be normalized, as RockVerse will handle the normalization when assigning the values.

  2. By passing a tuple with mean and standard deviation for a gaussian (normal) model.

Passing the PDFs as x, y arrays#

In our case, the simplest (although not very accurate…) way is to use option 1 with the segmentation histograms we have already calculated in the orthogonal viewers (ensure you have a high number of bins to avoid inversion errors due to histogram interpolation). Let’s try this first:

[12]:
%%px --block

dectgroup.calibration_material[0].description = 'Air'
dectgroup.calibration_material[0].lowE_pdf = (
    lowE_viewer.histogram.bin_centers, lowE_viewer.histogram.count[1].values)
dectgroup.calibration_material[0].highE_pdf = (
    highE_viewer.histogram.bin_centers, highE_viewer.histogram.count[1].values)

dectgroup.calibration_material[1].description = 'Water'
dectgroup.calibration_material[1].bulk_density = 1.0
dectgroup.calibration_material[1].composition = {'H': 2, 'O': 1}
dectgroup.calibration_material[1].lowE_pdf = (
    lowE_viewer.histogram.bin_centers, lowE_viewer.histogram.count[2].values)
dectgroup.calibration_material[1].highE_pdf = (
    highE_viewer.histogram.bin_centers, highE_viewer.histogram.count[2].values)

dectgroup.calibration_material[2].description = 'SiO2'
dectgroup.calibration_material[2].bulk_density = 2.2
dectgroup.calibration_material[2].composition = {'Si': 1, 'O': 2}
dectgroup.calibration_material[2].lowE_pdf = (
    lowE_viewer.histogram.bin_centers, lowE_viewer.histogram.count[3].values)
dectgroup.calibration_material[2].highE_pdf = (
    highE_viewer.histogram.bin_centers, highE_viewer.histogram.count[3].values)

dectgroup.calibration_material[3].description = 'Teflon'
dectgroup.calibration_material[3].bulk_density = 2.2
dectgroup.calibration_material[3].composition = {'C': 2, 'F': 4}
dectgroup.calibration_material[3].lowE_pdf = (
    lowE_viewer.histogram.bin_centers, lowE_viewer.histogram.count[4].values)
dectgroup.calibration_material[3].highE_pdf = (
    highE_viewer.histogram.bin_centers, highE_viewer.histogram.count[4].values)

There is a special function in the DECTGroup class for quick PDF visualization:

[13]:
%%px --block

fig, _ = dectgroup.view_pdfs(bins=30, percentile_interval=(0.1, 99.9), plot_cdfs=True)

# Close all but rank 0
if rv.mpi_rank != 0:
    plt.close(fig)
[output:0]
../../../_images/tutorials_digitalrock_dual_energy_dual_energy_tutorial_prepare_data_25_1.png

Run the Preprocessing Step#

Pre-processing will check all the necessary details in the dectgroup and pre-calculate the dual energy inversion coefficients \(A\), \(B\), and \(n\) for low and high energy. Additionally, this step verifies all the internal hash values to ensure simulation integrity, which is crucial for resuming any interrupted simulations.

The simulations are controlled by the following parameters:

  • maxA: Maximum search value for inversion coefficient \(A\).

  • maxB: Maximum search value for inversion coefficient \(B\).

  • maxn: Maximum value for inversion coefficient \(n\).

  • tol: Tolerance value for terminating the Newton-Raphson optimizations.

  • whis: The boxplot whisker length for determining Monte Carlo outlier results.

  • required_iterations: The required number of valid Monte Carlo iterations for each voxel.

  • maximum_iterations: The maximum number of trials to get valid Monte Carlo iterations per voxel.

  • threads_per_block: The number of threads per block when processing using GPUs, which can optimize performance based on available hardware.

RockVerse has default values for these parameters, which you can get or set using the corresponding properties (see the API documentation for more details):

  • dectgroup.maxA

  • dectgroup.maxB

  • dectgroup.maxn

  • dectgroup.tol

  • dectgroup.whis

  • dectgroup.required_iterations

  • dectgroup.maximum_iterations

  • dectgroup.threads_per_block

Let’s call the preprocess method:

[14]:
%%px --block
dectgroup.preprocess()
[stdout:0] [2025-07-28 11:38:06] Hashing Low energy attenuation: 100% 16/16 [00:00<00:00, 73.02chunk/s]
[2025-07-28 11:38:06] Hashing High energy attenuation: 100% 16/16 [00:00<00:00, 74.38chunk/s]
[2025-07-28 11:38:06] Hashing mask: 100% 16/16 [00:00<00:00, 233.90chunk/s]
[2025-07-28 11:38:06] Hashing segmentation: 100% 16/16 [00:00<00:00, 246.06chunk/s]
[2025-07-28 11:38:06] Generating inversion coefficients: 100% 100000/100000 [00:13<00:00, 7569.18/s]

Pre-process appraisal#

It is important to check consistency between the generated coefficients and the expected PDFs before launching the full inversion. The pre-calculated inversion coefficients can be accessed through the following attributes:

  • lowE_inversion_coefficients

  • highE_inversion_coefficients

Let’s take a look:

[15]:
%%px --block

# ATTENTION: Assigning/retrieving information from the dect group is a collective MPI call.
# Ensure all processes make the call when running in a parallel environment.
lowEcoefs = dectgroup.lowE_inversion_coefficients

# Now we can print from only rank 0
if rv.mpi_rank == 0:
    print('Low energy inversion coefficients\n', lowEcoefs)

[stdout:0] Low energy inversion coefficients
          CT_0  CT_1    CT_2    CT_3       Z_1        Z_2       Z_3  \
0     -1030.5 -34.5  1597.0  1101.5  6.671270  10.835973  8.259450
1      -991.0  44.5  1663.0  1167.5  6.882240  10.961650  8.291675
2      -995.5   0.5  1531.5  1171.0  6.042790  10.585431  8.191185
3      -983.0 -17.0  1654.5  1101.5  6.764494  10.887532  8.272813
4      -948.0 -26.0  1628.0   869.0  7.639039  12.141179  8.560826
...       ...   ...     ...     ...       ...        ...       ...
49995 -1030.5  -4.0  1641.0   939.0  7.781500  12.756306  8.697968
49996  -983.0 -31.0  1601.5  1114.0  6.456008  10.734712  8.232567
49997 -1018.0 -48.0  1619.0  1136.5  6.344823  10.689802  8.220350
49998 -1000.0 -48.0  1663.0  1062.0  6.840251  10.933911  8.284662
49999 -1035.0  -4.0  1676.0  1092.0  7.134244  11.170285  8.342858

                A             B         n           err
0      466.613636  5.474260e+01  1.086738  2.273737e-13
1      630.513810  2.064419e+01  1.391308  5.209785e-13
2       83.505888  3.412196e+02  0.483090  6.122224e-13
3      446.444298  4.181777e+01  1.211336  1.136868e-13
4      789.719417  2.202488e-03  4.832876  4.687428e-13
...           ...           ...       ...           ...
49995  919.104927  3.654279e-07  8.057914  4.687428e-13
49996  261.494552  1.237727e+02  0.842812  0.000000e+00
49997  174.555904  1.798998e+02  0.734732  3.215549e-13
49998  445.488630  3.231922e+01  1.323814  2.542115e-13
49999  702.758676  5.329334e+00  1.906965  1.607775e-13

[50000 rows x 11 columns]

The view_pdfs method we used before will show the corresponding histograms when the inversion coefficients have been calculated:

[16]:
%%px --block

fig, _ = dectgroup.view_pdfs(bins=30, percentile_interval=(0.1, 99.9), plot_cdfs=True)

# Close all but rank 0
if rv.mpi_rank != 0:
    plt.close(fig)
[output:0]
../../../_images/tutorials_digitalrock_dual_energy_dual_energy_tutorial_prepare_data_31_2.png

Note the long tails in the input PDFs, a result of segmentation phase for water touching the water pipe area and segmentation phases for SiO2 and Teflon slightly overlapping with the empty region. This can be improved by using Gaussian models for the PDFs.

Passing mean and variance for Gaussian PDF Models#

The second form to pass a PDF model to the dect group is to provide the mean and standard deviation of a Gaussian (normal) distribution. If you have these values from some other analysis, you can manually pass a tuple containing the corresponding values (mu, sigma) to the Gaussian PDF attributes:

dectgroup.calibration_material[0].lowE_gaussian_pdf = (mu_0_l, sigma_0_l)
dectgroup.calibration_material[0].highE_gaussian_pdf = (mu_0_h, sigma_0_h)

dectgroup.calibration_material[1].lowE_gaussian_pdf = (mu_1_l, sigma_1_l)
dectgroup.calibration_material[1].highE_gaussian_pdf = (mu_1_h, sigma_1_h)

dectgroup.calibration_material[2].lowE_gaussian_pdf = (mu_2_l, sigma_2_l)
dectgroup.calibration_material[2].highE_gaussian_pdf = (mu_2_h, sigma_2_h)

dectgroup.calibration_material[3].lowE_gaussian_pdf = (mu_3_l, sigma_3_l)
dectgroup.calibration_material[3].highE_gaussian_pdf = (mu_3_h, sigma_3_h)

Gaussian PDF models take precedence over the PDF arrays when calculating the inversion coefficients. To delete Gaussian PDF models, set the corresponding attributes to None.

dectgroup.calibration_material[0].lowE_gaussian_pdf = None

RockVerse has a convenience method to call the gaussian_fit function from its optimization module and fit Gaussian distributions to the assigned lowE_pdf and highE_pdf values:

[17]:
%%px --block

dectgroup.calibration_material[0].fit_lowE_gaussian_pdf()
dectgroup.calibration_material[0].fit_highE_gaussian_pdf()

dectgroup.calibration_material[1].fit_lowE_gaussian_pdf()
dectgroup.calibration_material[1].fit_highE_gaussian_pdf()

dectgroup.calibration_material[2].fit_lowE_gaussian_pdf()
dectgroup.calibration_material[2].fit_highE_gaussian_pdf()

dectgroup.calibration_material[3].fit_lowE_gaussian_pdf()
dectgroup.calibration_material[3].fit_highE_gaussian_pdf()

# Let's print one of the values just for checking
# Remember all processes have to call operations on the dectgroup
mu, sigma = dectgroup.calibration_material[3].highE_gaussian_pdf
name = dectgroup.calibration_material[3].description

# Now we can print from only rank 0
if rv.mpi_rank == 0:
    print(f'Gaussian model for {name.lower()} low energy attenuation: mu={mu:.4f}, sigma={sigma:.4f}.')

[stdout:0] Gaussian model for teflon low energy attenuation: mu=1056.8979, sigma=35.6190.

Another call to view_pdfs will show the Gaussian models as well:

[18]:
%%px --block

fig, _ = dectgroup.view_pdfs(bins=30, percentile_interval=(0.1, 99.9), plot_cdfs=True)

# Close all but rank 0
if rv.mpi_rank != 0:
    plt.close(fig)
[output:0]
../../../_images/tutorials_digitalrock_dual_energy_dual_energy_tutorial_prepare_data_35_2.png

Let’s call the preprocess again to generate a new set of inversion coefficients. RockVerse will prioritize the Gaussian models over the lowE_pdf and highE_pdf attributes.

[19]:
%%px --block
dectgroup.preprocess(restart=True)

fig, _ = dectgroup.view_pdfs(bins=30, percentile_interval=(0.1, 99.9), plot_cdfs=True)

# Close all but rank 0
if rv.mpi_rank != 0:
    plt.close(fig)
[stdout:0] [2025-07-28 11:38:30] Hashing Low energy attenuation: 100% 16/16 [00:00<00:00, 73.39chunk/s]
[2025-07-28 11:38:30] Hashing High energy attenuation: 100% 16/16 [00:00<00:00, 73.18chunk/s]
[2025-07-28 11:38:30] Hashing mask: 100% 16/16 [00:00<00:00, 247.02chunk/s]
[2025-07-28 11:38:31] Hashing segmentation: 100% 16/16 [00:00<00:00, 238.03chunk/s]
[2025-07-28 11:38:31] Generating inversion coefficients: 100% 100000/100000 [00:02<00:00, 46305.78/s]

[output:0]
../../../_images/tutorials_digitalrock_dual_energy_dual_energy_tutorial_prepare_data_37_3.png

This concludes our data preparation. There are also convenience functions for quick visualization of the inversion coefficients and the resulting \(Z_{eff}\):

[20]:
%%px --block

fig, _ = dectgroup.view_inversion_coefs()

if rv.mpi_rank != 0:
    plt.close(fig)
[output:0]
../../../_images/tutorials_digitalrock_dual_energy_dual_energy_tutorial_prepare_data_39_1.png
[21]:
%%px --block

fig, _ = dectgroup.view_inversion_Zeff()

if rv.mpi_rank != 0:
    plt.close(fig)

[output:0]
../../../_images/tutorials_digitalrock_dual_energy_dual_energy_tutorial_prepare_data_40_1.png

The tails in these histograms indicate that we have adequately captured the distributions, and therefore we have reasonable values for the maxA, maxB, and maxn attributes. If you notice truncated tails on the right side of the histograms, consider rerunning the preprocess function after increasing those maximum values.

That is the end of this part of the tutorial. We can now stop the cluster and proceed to the next part.

[ ]:
cluster.stop_cluster()
<coroutine object Cluster.stop_cluster at 0x7f781451fb90>
The Kernel crashed while executing code in the current cell or a previous cell.

Please review the code in the cell(s) to identify a possible cause of the failure.

Click <a href='https://aka.ms/vscodeJupyterKernelCrash'>here</a> for more info.

View Jupyter <a href='command:jupyter.viewOutput'>log</a> for further details.