CP2.2 does not write csv file


#1

Hi,
I have a basic pipeline comparing the area of positive staining and the area covered by the structure (an IHC slide). The program tells me that the csv file is written by “ExportToSpreadsheet” and shows also the correct (output) location + name, but no csv file appears in the directory. I checked if I actually selected “measurements to export”, which I did, and defined Output/Input folders to writable folders. The actual images from the pipeline steps are properly written.
I also checked for hidden files in the output directories, but nothing is found.
I created the following pipeline with CP 2.1.1, and then moved the pipeline to our “production” computer, with more memory. That one uses CP2.2.
Is it possible, that CP2.2 has an issues with writing files, could a work around be using ExportToDatabase, extracting the data with some clever scripting?

This is the pipeline:

UnmixColors (for DAB color images)
ColorToGray
IdentifyPrimaryObjects (DAB channel)
MeasureImagesAreaOccupied
IdentifyPrimaryObjects (nuclei channel)
MeasureImagesAreaOccupied
CalculateMath
ExportToSpreadsheet
SaveImages
SaveImages
SaveImages
SaveImages

Thank you for your help.

Philipp


#2

You certainly could write to a database and then export the data as a CSV (you wouldn’t even need a script, most database programs will do this for you easily), but that shouldn’t strictly be necessary.

One time I’ve seen this behavior before it was for this reason:

Scroll allllllll the way down to the bottom of ExportToSpreadsheet, there’s an option called ‘Export all measurement types?’ (this is different from the ‘Select the measurements to export’ option that’s in the middle of the window). Is that set to ‘No’, and if it is, did you enumerate any/all of the CSVs you wanted written? If you set that to ‘No’ but don’t list any CSVs, none will be written.

If that’s not it, you can upload the .cppipe pipeline file so that someone can take a look to see if other settings are potentially incorrect.


Cellprofiller not saving or exporting files
#3

Thank you bcimini for your fast reply.

The ‘Export all measurement types’ is set to ‘No’. I unfortunately do not understand what you mean by “enumerate any/all of the CSVs you wanted written”. Could you specify?
I assume you mean that I need a way that my csv file is not overwritten by the parallel processes, a general problem of parallelization. To avoid that, I used the metadata {tokens, variables, “green fields”} to specify a sub-folder, made up with the metadata-variables to re-create a unique folder name for each image analyzed. This also works well for the output files written during the SaveImage modules (here: binary masks of recognized objects)

Additional settings: the export setting "add a prefix to file name? “Yes”; Filename prefix “results_Image”; generates a file “resultsImageImage.csv” in my CP2.1.1 pipeline for my test input image set (verified again, it works).

In another thread it was mentioned that the “Filename prefix” field does not support the metadata variables in the way the “Output file location” > “Sub-Folder” field allows.Thus, while the output sub-folder is created in folder defined by the image input file, the actual output csv-file is named always the same (in my case ‘resultsImageImage.csv’), but adding all data to a single file is not possible currently, without using the SQL approach.
Are there plans to change this in an upcoming revision? If not off topic, I can post my cygwin-bash solution to combine the output (using find and xargs), if that would be useful to others (it’s just a two-liner), which avoids then to use of a more elaborate ExportToDatabase approach.


#4

The ‘Export all measurement types’ is set to ‘No’. I unfortunately do not understand what you mean by “enumerate any/all of the CSVs you wanted written”. Could you specify?

When you set ‘Export all measurement types’ to ‘No’, you’re telling it not to make CSVs for any measurements except the ones you’ll select in the dropdown menu (see figure below, red arrow). If you want to export more than one CSV, you can keep clicking ‘Add another dataset’ (blue arrow) until all of your CSVs are listed. If the dropdown is still set to ‘Do not use’, then there won’t be any CSVs created because you haven’t given it any information to export.


I assume you mean that I need a way that my csv file is not overwritten by the parallel processes, a general problem of parallelization. To avoid that, I used the metadata {tokens, variables, “green fields”} to specify a sub-folder, made up with the metadata-variables to re-create a unique folder name for each image analyzed. This also works well for the output files written during the SaveImage modules (here: binary masks of recognized objects).[…]but adding all data to a single file is not possible currently, without using the SQL approach

If you’re running CP on a cluster (or if you opened CP 5 times and ran the same pipeline with each), this is true; if you’re just running this on a single production computer (which you imply), it isn’t- a single running copy of CP won’t overwrite its own data even if it has multiple ‘workers’ running.

If you’re running one copy of CP on a single computer, this is how it works no matter how many parallel workers you’re running:
For example, say you have 10 images, the Metadata_Treatment for 5 is ‘Treatment 1’ and for 5 is ‘Control’. You’re identifying ‘Cells’.

  • If you tell ExportToSpreadsheet to export to DefaultOutputFolder, you’ll get 3 spreadsheets, ‘Images’, ‘Experiment’, and ‘Cells’, each of which has the data from all 10 images.
  • If you tell it to use DefaultOutputFolder, sub-folder:‘Metadata_Treatment’, you’ll get two subfolders, each of which has 3 spreadsheets, ‘Images’, ‘Experiment’, and ‘Cells’, each of which has the data from the relevant 5 images
  • If you tell it to use DefaultOutputFolder, sub-folder:‘Metadata_Treatment’_‘ImageNumber’, you’ll get 10 subfolders, each of which has 3 spreadsheets, ‘Images’, ‘Experiment’, and ‘Cells’, each of which has the data from only a single image.

Does this make more sense?

If you ARE running on a cluster, you do need to use SQL or re-concatenate the CSVs at the end with a script, but as you stated that’s a general problem of parallelization, not of CP specifically.


In another thread it was mentioned that the “Filename prefix” field does not support the metadata variables in the way the “Output file location” > “Sub-Folder” field allows.Thus, while the output sub-folder is created in folder defined by the image input file, the actual output csv-file is named always the same (in my case ‘resultsImageImage.csv’), but adding all data to a single file is not possible currently, without using the SQL approach.
Are there plans to change this in an upcoming revision?

Not to my knowledge, but you can feel free to make feature requests on our GitHub.


Applying same pipeline to all images in a Z-stack?
#5

Thank you for your help on this. I finally understood the file writing process and will looking into the DB-modules in the future. One of the issues was the different sizes of input images, which crashed individual threads and the final data table was not written.

In summary, I learned about the limitation of the tiff-format - in CellProfiler (2.3) it appears that a single level (tiff pane) can have maximally a size of 840MB, while ImageJ oddly allows more before the image is not displayed anymore correctly (~1200MB). With this knowledge, I am able to analyze slide scans of single mouse organs (example: liver, pancreas, brain) at 10x magnification without relying on a tiled format.


Is there a discussion or group taking about “tiling” formats? I want to get to a point were I can use all different formats (lif, czi, ndpi, ome-tiff) directly into a CP pipeline via converters (example: ndpisplit from IMNC laboratory, Paris). Tiled czi are an issues, since they are not properly ordered, for some reason. Maybe not a CP-forum question, but maybe there is interest.


#6

The image size limits are related to the amount of memory of the respective computer, not anything intrinsic in CP, so it’s possible you’d be able to get increased performance/stability on large images if you are able to run on a more powerful machine.

There are certainly people thinking about how to use CP with large images; for example, while I haven’t tried it myself we know people are using CP as part of Orbit. If you have more specific questions I can pass them on to our software engineers, but I don’t know of a specific discussion about them or a group addressing that in our “space”. It might be a good idea to start a new thread though and see if you can get one going!


#7

Thanks for all the info. I look into that and check out the orbit (I am using/would like to use more the omero system).

About the performance issues - I tried all combinations (heap space increase, disabling HT in bios, running one core with ~128GB to 512GB (!) of RAM) - there seems to be a hard limit on what CP (2.1, 2.3) can process when using single-page TIFF as input (situation is different with jpeg). I’ll document this once I have another project, also with the 3.0rc at hand.

Thanks.


#8

We’d love to see your documentation of that, thanks a lot!