Batch Analysis - Running one pipeline after the other automatically?


#1

Hello,

I was wondering how would be possible to run different pipelines one after the other automatically in CP.

Our set up is that we have 100 384 well plates and we would like to run the same pipeline for all of them. But since we would like to have one xcel file per plate, we would have the same pipeline 100 times (each pipeline will have a distinct prefix in the Export to spreadsheet, like “plate1_” etc).

So, any ideas, codes, suggestions to concatenate and run a pipeline one after the other without having to wait to one to finish to load the following and so on?

Thanks in advance!
:slight_smile:


#2

You could write a Bash script that runs each pipeline inside a loop.


#3

What kind of machine will you use to process the images. A multi-core desktop or a cluster with a queueing system? What operating system?

How many fields of view per well do you have?

The brute force approach (likely to choke up CP when preparing the pipeline because of the large number of images) would be to add all of your images to the Images module in “Input modules”, then group them by plate in Groups module. Add CreateBatchFiles at the end of the pipeline and press “Analyze Images”. You’ll get a Batch_data.h5 file that describes the entire analysis. When you run the following command in the command line:

PATH_TO_CP/cellprofiler --get-batch-commands Batch_data.h5

you will get a list of commands to execute the actual analysis:

PATH_TO_CP/cellprofiler -c -r -b -p Batch_data.h5 -f 1 -l 384 -o output_dir
PATH_TO_CP/cellprofiler -c -r -b -p Batch_data.h5 -f 385 -l 768 -o output_dir
etc…

where parameters “-f” and “-l” determine the 1st and the last image from the image list; only these will be processed by this command. Each of these “jobs” will create a separate folder with outputs in the “output_dir”. They can be executed independently thus taking advantage of a distributed computing environment (a cluster) or a multi-core desktop.

Another approach would be to prepare your pipeline with LoadData module instead of the 4 modules in “Input modules” section. The LoadData lets you specify a CSV file with the list of your input image files. So you can either prepare such a file with all of your images and make grouping per well, or prepare separate CSVs for every plate and skip grouping if you insist on having separate results file per plate. In the latter option, once you have your pipeline, you can add an option “–data-file” to make the line that executes CP for a particular plate as follows:

PATH_TO_CP/cellprofiler -c -r -b -p your_pipeline.cppipe -o output_dir --data-file image_list_for_a_plate.csv

Of course, if you’re skilled with command line and shell scripting you can automate the entire process.

Some additional info on batch processing.


#4

How do you manage to get the output to separate folders? I get all the output to “output_dir”.


#5

I’m sorry, I wasn’t clear. Each of these jobs will need to have a different output_dir, for example “output/out_0001”, etc.

Here is a bash script that takes Batch_data.h5 file and creates jobs for submission to PBS queueing system. If you don’t have a queueing system you can just execute each file from pbs.jobs folder produced by this script.


#6

Yes, that was what I thought from the beginning. However, it would have been nice to have an option to create separate output folders automatically.


#7

You can use metadata tags to set output folder subfolders, FWIW.