Memory requirements for batch processing


#1

Hi,

I am trying to process a batch of around 45000 images on 3 Windows PCs (15000 per PC). I set up a pipline that works well when tested on a small subset of images in the appropriate file structure. However, when I ran the full sets they all crashed with the same error message, at around the same time, with about the same number of images processed (~1100). The error message was:

CellProfiler>AnalyzeImagesButton_Callback at 7988

CellProfiler>gui_mainfcn at 9740

CellProfiler at 61

This was running the latest version of CellProfiler on machines with 2GB RAM. All machines were using 1.5GB of memory and 1.5GB virtual memory. The error reported in matlab crash dump file was a segementation violation.

Any idea what is going on? Is it running out of memory?

Cheers,

Dave


#2

Hi David,

You are correct in assuming that CellProfiler is crashing because of memory requirements. This is unfortunately not one of MATLAB’s strong points. I would recommend running such a large data set on a cluster of computers, but this takes time to set up. Another approach might be to insert the module SpeedUpCellProfiler which is in the “Other” category. Here you can clear the memory and “pack” it before every new cycle, and also you can reduce the saving of the output file from every cycle to some Nth number. Just be sure to keep any images from being wiped that are not loaded each cycle (e.g. images produced by Tile or CorrectIllumination_Calculate).

Good luck!
Mike


#3

Hi Mike,

Thanks for the reply. I was thinking another way to get around the problem would be to run CellProfiler separately on small subsets of the data. Can CellProfiler be run from the command line so that I can set up a batch job?

David


#4

Hi David,

This is precisely how CellProfiler distributes jobs to a cluster computer. Instead of actually using a cluster, you could just add the CreateBatchFiles module to your pipeline, which after the first cycle will produce .m script files for each batch. Since you will be running the jobs on the same computer that created the batch files, you will not need to change any parameters in this module except for maybe how many images will be processed in each batch. After you have run the first cycle and produced the script files in your output folder, add that output folder to the path of matlab and you can now run each script individually. They will be named like this: Batch_2_101.m and so on.

Good luck!
Mike


#5

For a set of 45000 images, particularly if significant measurements are being made, it does indeed sound like the images should be split into batches (as David suggested and Mike explained). For other people’s projects, another thing to try if you are on the borderline of memory issues is to turn off the image displays, which is an option in File > Set Preferences.

Anne