Satisfying the Per_Object_checked table from the ExportToDatabase module

exporttodatabase

#1

Hello CellProfilers! \o

The Classifier in CellProfiler-Analyst only draws on positive instances of my phenotype. The ExportToDatabase module in CellProfiler warns that the objects exported are not in a one-to-one relationship, and reading the module help and various forum posts indicate SQL finagling may be necessary.

I’m hoping you can confirm or comment on the colocalization pipeline implementation logic. To help you developers and volunteers understand the analysis I’m interested in, I’ve spend the last week generating clear, representative TIF images of the phenotype to be identified (link to TIF images and the pipeline: http://engr.uconn.edu/~pan14001/cp-forum-post-1/). Here is a montage made using FIJI of those 5 multi-channel image frames:

The only difference between each of these 5 images is the degree of overlap of the red centromeres with green inside the blue nuclei; the red fluorescent centromeres in image 1 are radially a full diameter length away and they progress to image 5 which has complete overlap.

I’m interested in:

  1. Counting whether or not a given nucleus contains any centromere overlap.
  2. (Stretch goal) Counting how many overlaps there are per nucleus.

I’ve been building the pipeline off of the excellent colocalization example. The reason I need CellProfiler-Analyst with my data is the thresholded radius for the centromeres is necessarily wider than the true diameter. Therefore object based overlap yields false positives. Combating this requires combining with the measured correlation coefficients, but infering which correlation coefficient is appropriate, as well as its value range, is challenging purely inside of CellProfiler and from playing with the data and images inside R.

As mentioned, the obstacle I have now is that CellProfiler-Analyst does not draw any nuclei with no overlaps, and so I can’t establish the negative phenotype to generate a training set.

To debug why this is happening, I enabled Debug level logging as suggested by the first page of the CellProfiler-Analyst documentation and ran the SQL commands with sqlite3:

sqlite> SELECT ImageNumber FROM Per_Image GROUP BY ImageNumber;
1
2
3
4
5
sqlite> SELECT Per_Object_checked.ImageNumber, COUNT(Per_Object_checked.ObjectNumber) FROM Per_Object_checked GROUP BY Per_Object_checked.ImageNumber;
2|8
3|13
4|13
5|13
sqlite> SELECT Per_Image.ImageNumber FROM Per_Image;
1
2
3
4
5
sqlite>

Thus the Per_Object_checked table seems to be excluding the first image. I tried replacing the None values in the table with zeros, but that makes no difference because presumably Pre_Object_checked is limiting the selections.

If I don’t export the cen_overlap object, strangely, I get even fewer Per_Object_checked hits:

sqlite> SELECT Per_Object_checked.ImageNumber, COUNT(Per_Object_checked.ObjectNumber) FROM Per_Object_checked GROUP BY Per_Object_checked.ImageNumber;
3|8
4|9
5|10
sqlite>

I can draw from image 1 if I set check_tables = no but then of course get this error:

Exception in thread TileLoader_Thread-4:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/home/omsai/code/CellProfiler-Analyst/cpa/tilecollection.py", line 142, in run
    new_data = imagetools.FetchTile(obKey, display_whole_image=display_whole_image)
  File "/home/omsai/code/CellProfiler-Analyst/cpa/imagetools.py", line 36, in FetchTile
    pos = list(db.GetObjectCoords(obKey))
  File "/home/omsai/code/CellProfiler-Analyst/cpa/dbconnect.py", line 658, in GetObjectCoords
    raise Exception(message)
Exception: Failed to load coordinates for object key ImageNumber:1, ObjectNumber:39. This may indicate a problem with your per-object table.
You can check your per-object table "Per_Object" in TableViewer

But I don’t understand how I should logically remedy this, namely:

  1. Are there changes I need to make in the pipeline?
  2. What changes should I make the the properties file? It seems SQL statement injection is limited; perhaps a more expanded “Dynamic Group” would work?

For completeness, I’m using Ubuntu 16.04.1 LTS. At the time of writing there are no binary GNU/Linux releases, so I’ve installed from Git using CellProfiler-Analyst 2.2.1 and the stable branch of CellProfiler (not sure where the most appropriate place is to fetch the version number in this case; stdout in the terminal says “Version: 2016-05-03T18:31:00 ac0529e / 20160503183100” and the title bar says “2.2.0 (rev ac0529e)”. I no longer remember what exactly I was investigating in May with CellProfiler that required me to move up from the stable release.

By the way, the CellProfiler workshop at ASCB in 2015 was extremely helpful. I had tried on two occasions before to learn on my own but it finally all came together there. Seeing the move to pip from non-relocatable RPMs also made it a lot easier to install and work with. The documentation built into the CellProfiler GUI is the best I’ve seen in a few years of working with imaging software.


#2

the ExportToDatabase module in CellProfiler warns that the objects exported are not in a one-to-one relationship

This is your issue in a nutshell. You need to export the data to separate tables, not to a single table- otherwise the fact that there are different numbers of nuc vs cen vs ect vs cen_overlap is going to make this very hard to untangle. I can go into more detail about why if you’d like, but that’s the short answer as to why you are getting errors and never getting nuclei with 0 overlaps in them.

Are there changes I need to make in the pipeline?

You should move your ‘Relate’ modules to the very last thing before your ‘Export’ modules- that’ll allow you to calculate per-nucleus means for each of your object measurements. You have ‘Calculate per-parent means’ turned on, but it (confusingly IMO) only applies to measurements taken BEFORE that point, so in order to actually calculate the per-nucleus means your Relate modules need to be AFTER your measurements.


Right now you’re finding overlaps by doing MaskObjects, keeping only the overlap area, then trying to figure out if it’s truly an overlap later in CPA based on the correlation coefficient and the % of the original area the overlap occupies, yes? There’s potentially an easier way to do this, which I’ve posted below; based on whether or not it works your workflow to your ‘Stretch goal’ would follow one of two different paths. Theoretically you shouldn’t need to use CPA to analyze the ‘nuc’ at all in either pathway.

  • There’s an option in ‘MaskObjects’ where instead of keeping only the intersection area you call it a colocalization or not based on the amount of overlap between the two objects- instead of setting ‘Handling of objects that are partially masked’ to ‘Keep overlapping region’, you’d set it to ‘Remove depending on overlap’, and decide just how overlapping two objects need to be (10%? 50%? 75%?) to actually designate two objects as colocalized or not.

If you can accurately call colocalizations based on setting MaskObjects like that alone:
Look at nuc->Children_cen_overlap column in your spreadsheet. That will tell you how many overlaps there are, and therefore by extension if any are present or not. Congratulations, you’re done!

If you can’t accurately call colocalizations based on setting MaskObjects like that alone:
Run your pipeline with the modifications to ExportToDatabase and RelateObjects I discussed above, then do CPA classification of the cen_overlap objects with the Fast Gentle Boosting algorithm- this will give you a series of rules that distinguish true overlapping objects from false ones. Copy those rules into a text file, then run your pipeline again with two additional modules just before the Export steps: FilterObjects, where you can filter the cen_overlap objects based on the text file containing those rules (for the sake of the example, we’ll say that you call the resulting objects true_overlap), then RelateObjects to relate true_overlap to nuc. Look at nuc->Children_true_overlap.


Hopefully that made sense, feel free to ask follow up questions if it didn’t. Good luck!


#3

Thank you for the thoughtful reply!

The “Remove depending on overlap” option in MaskObjects is gold: I didn’t notice till you mentioned it. That option works very nicely. I ended up using 15% overlap for the actual data pipeline (as opposed to the generated images above).

I understand the rationale for the “Single object table” option of the ExportToDatabase module now - it makes the assumption that a valid cell must have one of each object that has been related (either explicitly with RelateObjects or implicitly with IdentifySecondaryObjects / IdentifyTertiaryObjects). I see that the ExportToDatabase module help for “Create one table per object, a single object table or a single object view?” explains which columns the “One table per object” needs for one to manually create the “Per_Object” table.

Although, the workaround I used was to have the non-centromere object nuclei recognized by CellProfiler Analyst was making the centromere object area a small non-zero value, and zeroing the other NAs after reading how CreateObjectCheckedTable() in dbconnect.py was filtering them out (found from ipdb output):

    > /home/omsai/code/CellProfiler-Analyst/cpa/dbconnect.py(1481)CreateObjectCheckedTable()
       1479             self.execute(query)
       1480             query = 'CREATE TABLE %s AS SELECT * FROM %s WHERE (%s) AND (%s) AND (%s)'%(p.object_table, p.object_table[:-8], " IS NOT NULL AND ".join(all_cols), " != '' AND ".join(all_cols), " > 0 AND ".join(AreaShape_Area))
    -> 1481         self.execute(query)
       1482
       1483     def CreateObjectImageTable(self):

Even if the pipeline no longer requires machine learning CellProfiler Analyst is super useful to review the accuracy of the processed thumbnails, overlay statistics on the well plates, etc. Thanks again.

Edit: I figured the above out a few days after your post, but only got around to following up now. Sorry for the delay.