Data interpretation


I’ve been really impressed with CellProfiler and it’s doing a great job with my analysis tasks… but analysis is one thing, and pretty graphs are another.

Right now, my post-CP “pipeline” looks something like:

  • use CP to generate .tab output
  • run .tab files through a brief Python script to make them more database-friendly (delete dubiously useful header at top of file, copy the set name down to each individual row)
  • import modified .tab files into Access and apply filters and get counts using SQL queries
  • export count data into Excel and manually compute statistics and generate graphs

This has been fine for development but I’m not looking forward to trying to automate and scale this. What kind of solutions do you all have in place? I’ve been trying to avoid installing MySQL so I haven’t taken a look at CP Analyst yet… will it ease some of my pain?

thanks all!


Hi Tim,

Well, whether or not CPAnalyst will ease your analysis is a hard one for us to answer, and really depends on your data and how you want to analyze it. Take a look at the CellProfiler Analyst Examples videos ( for a demo of its capabilities. This is perhaps the best way to see what CPA can do until the papers detailing it get published.

Basically, our workflow is to:
(1) Run CP. Get lots of measurements, and use ExportToDatabase, which outputs Per_Image and Per_Object tables as *.csv files, plus setup database scripts
(2) Upload the *.csv files to a MySQL database using the SETUP scripts generated (which may be a pain to setup, but we can’t help much with that). Usually we add a meta-data table as well with treatment/well information, or say, gene or RNAi descriptors.
(3) Run CPA accessing the database for both data visualization and to use a machine learning classifier, which (we think) is a very powerful method to find, say, unique or rare phenotypes

See the help in ExportToDatabase for more info. Good luck!



Another option (if you are comfortable programming in Matlab, and if the analysis you want to do is pretty routine rather than exploratory) is to rewrite your script in Matlab as a CellProfiler module, which you would place directly in your pipeline. Or, you could modify the code of an existing DataTool or “DIsplayXX” module (which are basically performing similar functionalities as DataTools) in CellProfiler to generate the graphs you want.

As David says, getting your data into MySQL and using CP Analyst may be better if you want to explore your data more interactively and filter it for certain properties (using SQL queries as you are accustomed). It also allows you to use machine learning for complex phenotypes.