HDF5 reader for image data


Hi guys,

I’m on of the ilastik developers and we have users that use ilastik for segmentation of data for CellProfiler. This data can become quite large (larger than RAM), so we usually advise saving it to h5 in order to make use of block-wise reading/writing.
Is there any way to get CellProfiler to read h5 datasets as images?

Thank you and best regards


@k-dominik Is there a standard for storing images inside HDF5 arrays?


Hi @agoodman,

I don’t know about standards, but we always put the whole volume into a single dataset using h5py, which provides a numpy compatible interface for h5 datasets.

So, for example, our data would be in myfile.h5/exported_data


@k-dominik It would be very difficult to implement in a flexible way. Is there a reason you recommend people use HDF5 rather than NumPy arrays?


I suppose we could also use numpy arrays, but then our data would only be readable with numpy. Hdf5 libraries on the other hand are available in multiple programming languages. You can save multiple data-sets, h5 supports attributes and grouping.

Don’t you guys use h5 already internally?


Yes, But our use is entirely private.

We don’t use HDF5 as an interchange format because it would require us to provide a schema.


(This is why we prefer standard interchange formats like CSV and common image formats.)


Ok, I see. So you were mentioning numpy arrays as an options. Does your loadimages module support this type of data?


@k-dominik No. :stuck_out_tongue_winking_eye: But it is something we’ve considered. It’d be a straightforward implementation:

_, extension = os.path.split(pathname)

if extension == "npy":
    image = numpy.load(pathname)

Would NumPy support help? Would you be interesting in contributing? I can walk you through the details.


I wanted chime in, since I use ilastik output for CellProfiler frequently. What I typically do is export ilastik output as a tif sequence. I then import these TIFF files into CellProfiler as if they were any other images.


thank you, that is the way to go for images that fit into ram. We do not, however, support writing out of ram images to TIFF.


Ah! I missed your point about larger-than-RAM images.

Do pyramid TIFFs work with larger-than-RAM images. I think there is some internal tiling that sounds similar to the block reading you describe in hdf5.


I haven’t heard about pyramid tiffs before, but bigtiff does it for sure.


Is bigtiff a potential solution? Is bigtiff already built into ilastik?


we can read bigtiff, but haven’t implemented writing …


I found this MATLAB description of processing bigtiff files interesting. Could this be used as a roadmap for an ilastik and CellProfiler solution?