CP parallel processing -- re-defining the output directory


#1

Dear all,

we’re facing some troubles with CP (2.1.0) parallel computing on our grid infrastructure – I’ve successfully created the *.mat file (using the CreateBatchFiles module), as described [here at your pages][cellprofiler.org/CPmanual/Help_Other%20Features_Batch_Processing.html]. However, then, we’ve realized that when running multiple jobs processing a set of images, the output files (generated by ExportToSpreadsheet module) are rewriting each other (all the cluter nodes are working with a shared location).

Even though we’ve found, that the ExportToDatabase module should help (as described [here][discourse.cellprofiler.org/t/cp2-batch-jobs-export-csv-file-suffix/1088/3], I’m wondering, why re-defining the default output directory didn’t help during our experiments with ExportToSpreadsheet.

More precisely:

  • I’ve created a batch file *.mat with ExportToSpreadsheet module (the output directory had been set to “default”)

  • then, I’ve created a set of directories and copied *.mat file into each of them

  • within the cluster jobs, I’ve run the CP with the option “-o”, being set to a dedicated directory for each of the jobs

Based on this, I thought that this would make the CP to store the results into a separate directory, not overwriting the results. However, even providing the “-o” option, it didn’t help – the CP was storing the data into the original directory (the one, which had been set to “default” when creating the batch file) all the time.

What am I doing wrong?

Thank you very much for your answers!

–best
Tom.


#2

Dear friends,

nobody can provide any hint into this? Is there any information missing or should I provide something else?

Maybe, I should reformulate the question: once having *.mat file generated, is it possible to change the default output directory via the “-o” option when starting the batch processing via CellProfiler?

Thanks a lot for any clue you provide…

–best
Tom.


#3

Hi Tom,

Sorry, busy here. I believe the ‘-o’ switch had a bug but has been partially fixed in more recent versions than the release. Please see this Github issue which also refers to this Forum post with similar (same?) issues.

So, you can try installing a trunk build here (or perhaps slightly safer is the 2.1.1 release candidate there). And from the last comment on the github issue using a project file seems to respect the ‘-o’ switch from the command-line.

Having said all that, we do recommend using ExportToDatabase rather than ExportToSpreadsheet for batch file processing. You need to set up a MySQL database, but the tables get populated simultaneously without the intervening step of concatenating al your CSVs (assuming you are doing that). And CSVs can get unwieldy (certainly if you plan to use Excel, or other limiting tools) when you are dealing with processing jobs that necessitate the use of headless, cluster jobs. But these are just our suggestions!

Hope that helps,
David