CP2.2: No HDF5 output in headless mode


#1

I have a pipeline with CreateBatchFiles at the end to create HDF5 experiment description and then submit jobs to a cluster. In “View output settings” I selected HDF5 in “Output file format”. I get csv files from ExportToSpreadsheet but no HDF5 output file is created. The HDF5 output does appear when I run CP interactively (without CreateBatchFiles). Is it normal behaviour?

When I use ExportToCellH5 module I get the following error:

Error detected during run of module ExportToCellH5
Traceback (most recent call last):
File “/home.nis/NIScp2/CellProfiler/cellprofiler/pipeline.py”, line 1819, in run_with_yield
self.run_module(module, workspace)
File “/home.nis/NIScp2/CellProfiler/cellprofiler/pipeline.py”, line 2067, in run_module
module.run(workspace)
File “/home.nis/NIScp2/CellProfiler/cellprofiler/modules/exporttocellh5.py”, line 398, in run
c5_image_writer = c5_pos.add_image(shape=shape5D, dtype=dtype5D)
File “/usr/local/lib/python2.7/dist-packages/cellh5/cellh5write.py”, line 137, in add_image
img_dset = self.get_group(CH5Const.IMAGE).create_dataset(CH5Const.RAW_IMAGE, shape=shape, dtype=dtype)
File “/usr/lib/python2.7/dist-packages/h5py/_hl/group.py”, line 108, in create_dataset
self[name] = dset
File “h5py/_objects.pyx”, line 54, in h5py._objects.with_phil.wrapper (/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2577)
File “h5py/_objects.pyx”, line 55, in h5py._objects.with_phil.wrapper (/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2536)
File “/usr/lib/python2.7/dist-packages/h5py/_hl/group.py”, line 277, in setitem
h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
File “h5py/_objects.pyx”, line 54, in h5py._objects.with_phil.wrapper (/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2577)
File “h5py/_objects.pyx”, line 55, in h5py._objects.with_phil.wrapper (/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2536)
File “h5py/h5o.pyx”, line 202, in h5py.h5o.link (/build/h5py-nQFNYZ/h5py-2.6.0/h5py/h5o.c:3574)
RuntimeError: Unable to create link (Name already exists)
Thu Nov 30 14:55:19 2017: Image # 2, module ExportToCellH5 # 15: 0.01 sec


#2

Hi,

You definitely don’t want to use the ExportToCellH5 module for making a batch file, that won’t work.

Are you RUNNING the pipeline with CreateBatchFiles in it headless? Can you try doing it in the GUI to help troubleshoot the following steps?

Do you get any message at all when you run the pipeline after adding CreateBatchFiles? For example I get a message that looks like this.

If you’re getting spreadsheets though and no message like that, I suspect you may have accidentally disabled CreateBatchFiles (otherwise I wouldn’t think you’d be getting spreadsheets at all); does it by any chance look a lighter/hazier green like this? If so, just click the green check mark to re-activate it, resave your pipeline, and try again.

image


#3

Hi,
I do get the HDF5 file with experiment description from CreateBatchFiles module and I also get the window message “CreateBatchFiles saved pipeline to…”. I am able to use this file to create jobs and submit them to a queue on a cluster. The result of the entire analysis is an output folder with sub-folders corresponding to groups defined in the Grouping module. These sub-folders contain CSV files as defined in ExportToSpreadsheet. So all is great here.

When I run the pipeline in the interactive mode (GUI; no CreateBatchFiles module) I can select HDF5 OUTPUT file in “View output settings”, and obtain the H5 file with all results TOGETHER with CSV files. However, when the same option is selected in the batch mode, no H5 OUTPUT file appears, only CSVs.

Why do I want the HDF5 output file? When tracking cells with LAP algorithm, sometimes it’s necessary to readjust tracking parameters AFTER the analysis using Data Tools > TrackObjects. The required input is HDF5 file. Since all our analyses run in the batch mode, it would be convenient to have the results in this format. Since the results are not saved in HDF5 format in Batch mode, when I want to use Data Tools I have to rerun the analysis via GUI.


#4

Ahhh, I see more clearly what you were asking, I apologize for the confusion!

AFAIK, the creation of h5 and MAT files are GUI only EXCEPT that it seems you can ask CP in command line to keep the h5 file when running in batch mode according to the instructions here by explicitly stating the name and output location you want the file to be kept in. I don’t know how (if at all) that h5 file might be different from the one made when CP is running in GUI mode, but try that and see if it works for you.


#5

No worries, I wasn’t clear in my initial question.

Do you happen to know the command line parameter for naming the output h5 file? I’ve only found the “-o” switch and I’m already using it. The command I’m submitting to PBS queueing is something like:

/usr/local/bin/cellprofiler -c -r -b -p Batch_data.h5 -f 1 -l 100 -o output/out_0001 -t /tmp/cp2

The h5 files with results are indeed saved (with file names such as: Cpmeasurements25DqGv.hdf5) in the tmp folder but are removed immediately after the job finishes.


#6

I think you need to not flag it at all; from the link:

When run on the command-line, CellProfiler interprets the first non-switch command-line argument as the name of the HDF5 file to use for measurement output; it uses a temporary file for output if no name is supplied.

If you do this, does the h5 file persist after the job finishes, or does it still get nuked?


#7

Success! The file stays with this command:

/usr/local/bin/cellprofiler -c -r -b -p Batch_data.h5 -f 1 -l 100 output/out_0001/resultsOUT.h5 -o output/out_0001 -t /tmp/cp2

Thanks a million for your assistance!