Improving CellProfiler Speed


#1

Hello,

I have been running a CellProfiler3.0 pipeline locally on my laptop for some time. For a typical 96-well plate (4 fields, 3 channels), it takes around 3-5 hours to run. We recently upgraded to 384-well plates in hopes of starting screening, and it’s taking well over 20 hours/plate. The program always indicates that it is running 4 workers, and I have done as much as I can to shorten the processing time (hiding all windows on run, reducing measurements collected, etc). I am wondering what would be the best way to improve processing time. We are open to getting a new desktop if necessary, or running on the cloud if that is feasible (my understanding of cloud computing is minimal). Thank you very much!


#2

Without knowing really anything about your setup or pipeline I can’t really give good advice other than

  • Make sure you’re using ExportToDatabase (in SQLite mode is a good place to start) not ExportToSpreadsheet
  • Give CellProfiler a temp directory with lots of file space.

A dedicated image processing machine with plenty of RAM and disk space isn’t a bad idea if you’re planning to do screening; if you want to try running in the cloud there are ways to do that but whether it’s better to do that vs buy a new machine depends on a lot of factors (your comfort with executing things from the terminal, whether you’d rather spend a lot of money up-front vs smaller amounts over time, exactly how much data you want to run and how complex your pipeline is).


#3

I can attach my pipeline if that could help identify the best solution. I am completely unfamiliar with running programs in the cloud, but if it is simple to learn, I am open to it.


#4

@bcimini Why is exporting to a database faster than exporting to a CSV-file? I did some preliminary testing with a small image set (only 6 images), and the pipeline took 4 min 10 sec with ExportToSpreadsheet, and 5 min 30 sec with ExportToDatabase (SQLite). I realize that with such a small sample set, there might some overhead that confounds the results, and I will try with a larger samples as soon as I have figured out how to see the total run time after completion of the pipeline, so that I don’t have to wait around to time the results. But it would be interesting to hear your experience and the reason why database export would be faster.