Creating a new module - problems

measure
sendemail
measurefeatures
measureareashape

#1

Hi,

Let me just start with a small introduction: I’m a student at the ETH working in the Pelkmans lab, working with another PhD student on setting up the image analysis for high throughput analysis of amongst others virus entry. The modular build of the CellProfiler and the fact that it is written in MatLab is interesting for us because this would allow us to experiment easily with different analysis methods and combinations. We therefore set out to create our own module for the CellProfiler which could do some final statistics and data analysis using the measurements and objects created by stadard CP modules.

So, we have been trying to add our own module in CP. We have figured out how to access the measurements and object (properties) created by several CP modules, and we are now trying to establish a new module. We have created a file in the Modules dir with a name that corresponds to its function name, and in the CellProfiler.m file we have inserted a reference to our module at the following position:

line 36: "%#function …"
line 3760: created a new modulename toolhelp
line 3789 (line numbering is altered from original file): We have added this module to the handles.Current.ModuleFilenames array
line 8761: we have added it to the load_listbox function, in the ‘OtherFiles’ array.

Doing the previous allowed us to add our new module to pipelines. Apart from a revisionnumber error message, and the fact that we can not remove this new module after adding it we get an error stating that this module is not valid because it appears to have no variables. There are several textVars present in the module, and ofcourse also just plain variables used for storing data (cells, matrices). Perhaps we are going totally the wrong way about this - if so, please tell us :smile: To keep a long story short, our questions are the following:

  1. Is there any documentation on how to add your own modules, and reference internal values of CP?
  2. If yes, then we will RTFM; If not, could you explain the ‘no-variables’ error we get and possible give us some hints how to go about setting up our own module?

Thanx in advance.

Kind regards,
Berend Snijder.


#2

Hi Berend,

This is a mistake on our part, and not yours.

In order to compile the CellProfiler code into executables for the Macintosh and PC, we need to hard code the Modules into CellProfiler. This is because our normal developers version looks for all .m files in the Modules folder (same for Data Tools and Image Tools) and adds them to the list automatically. This does not work in a compiled version since there are no .m files.

We have corrected this error and the new Developers version is availabe on the website. Please download this version, and you will see the code in load_listbox automatically adds .m files to the Add Modules window.

Sorry for the inconvenience!

Regards,
Mike


#3

Ok, thanx!

That certainly fixed our problems. We still are wondering whether you have some overview of which data is stored at what location/objects in the ‘handles’ object. So far we have found all what we need for our preliminary analysis (measurement values, etc.), but this has been done using trial/error and good old research. Since we would like to store data in the handles object ourselves, it would be nice if we could do it in such a way that for instance the viewData function is compatible with the data from our modules (we’ll have to comply with the CP nomenclature conventions, such as the +‘Features’ one). If such an overview of where and how to store data in the ‘handles’-object exists that would surely help our programming, if not, we will continue the old way.

Eitherway, your help is greatly appreciated!

Bye,
Berend.


#4

No problem.

If you look at HelpProgrammingNotes.m in the Help folder of CellProfiler, there is a section on handles.Measurements. This gives some information about how to store your measurements. However, it looks like it could use some re-formatting so I will give you some more direction here and then add it to the Programming Notes of future releases.

All measurements are stored in handles.Measurements
All IMAGE measurements are stored in handles.Measurements.Image
All OBJECT measurements are stored in handles.Measurements.(ObjectName)

An image measurement is something that is not based on objects, such as the threshold used to get rid of the background pixel intensity.

An object measurement is something that is specific to each object, such as its area or intensity measures.

Within each objects structure, you have MeasureFeatures and Measure, where Measure is whatever you want to store (its name is arbitrary). The MeasureFeatures structure is a cell array of strings, with one string per measurement in that group. For example, an imaginary MeasureAreaShape module measures Area, Perimeter, and Zernike0_0 so the features look like this:

handles.Measurements.Nuclei.AreaShapeFeatures = {‘Area’ ‘Perimeter’ ‘Zernike0_0’};

The Measure structure holds all the data for each object. However, we seperate these measurements by image. Each image gets a new matrix which contains measurements which have the same number of COLUMNS as there are FEATURES, and has as many ROWS as their are OBJECTS in that image.

For example, if we have two images and we measure AreaShape with the above features and assume the following:

Image 1 has 5 objects
Image 2 has 8 objects
AreaShape has 3 features (Area, Perimeter, Zernike0_0)

Then we should get a structure which looks like this:

handles.Measurements.Nuclei.AreaShape =
{[5x3 double] [8x3 double]}

So if you want to look at the data for Image 1 you can type the following:

handles.Measurements.Nuclei.AreaShape{1}

Which will give the data in the matrix:

[50 20 2;50 20 2;50 20 2;50 20 2;50 20 2]

If you want to look at a specific FEATURE of the objects, you can do the following. Here we will look at the Perimeter for all objects in Image 1:

handles.Measurements.Nuclei.AreaShape{1}(:,2)

Which should return:

[20;20;20;20;20]

It will probably be easiest to look at our modules to see how we store measurements, but this is the general setup (Where Measure can be anything you want):

handles.Measurements.(ObjectName).MeasureFeatures = {‘Measure1’ ‘Measure2’ ‘Measure3’};

handles.Measurements.(ObjectName).Measure{handles.Current.SetBeingAnalyzed} = MyData;

Where MyData MUST have as many columns as their are features and as many rows as their objects.

I understand it may be confusing, so please ask questions and we can keep modifying this information so that it makes more sense.

Mike


#5

That’s great, thanks! Your explanation and the HelpProgrammingNotes.m was exactly what we were looking for. Perhaps useful for others atempting the same thing; I find the easiest way to add data (measurements in our case) is by using the CPaddmeasurements function. Furthermore, one can easily adjust/copy the code from the data export functions to get specialized data export.

So now we are running the developers version of CP on fairly big datasets (appr. 200 images, hoping to increase this to 600 per cycle) and we bump into some other issues. One is for instance that after a while continuous java errors appear (appearantly/perhaps from the GUI). An example:

Exception in thread “AWT-EventQueue-0” java.lang.NullPointerException: HDC for component at sun.java2d.loops.DrawGlyphList.DrawGlyphList(Native Method)

It either slows down the process a lot or eventually will crash the analysis. It will therefore probably be helpful to disable the GUI for all modules, and just do an automated export data at the end of the analysis. Is there an easy way to disable the GUI windows from being updated/displayed, or should we adjust the modules we use to disable the GUI interfaces?

And another question, if we eventually would like to run our CP modules in a compiled version (performance gain), you said in your previous post you use an adjusted version of CP to hardcode the available modules, etc. Since we already have this version of sourcecode could you still explain how to add our modules so that we can also use a compiled version of CP containing the added functionality? Or is there an other way to go about this?

Thanks again for your help!

Berend.


#6

Hi Berend,

I’m glad to hear things are working out.

This error usually occurs because the system is running low on memory. The best fix for this is to increase your memory, we typically have 1.5GB of memory or more in our machines. Also, in the next release of CellProfiler, we will provide an option in the preferences to run CellProfiler without Display windows. This has fixed other peoples problems with running so many images. You still load the main CP GUI, but none of the display windows for each module will appear. If you want to try this before the release (which will be soon hopefully), you can simply close all the windows as they pop up during the first cycle of an analysis.

However, we can confirm that large image sets (over 200) do slow down CellProfiler. MatLab is not efficient with running loops and when you begin to add so many measurements (200xhowever many measurements) it is noticebly slower. The solution is to run them on a cluster. The cluster breaks up 600 images into many small batches which matlab can handle easily. We can process about 45,000 images in less than 24 hours on our cluster. If you have a cluster available and are interested in doing this type of large scale analysis, please let me know. I may start a seperate topic just for cluster questions as this is where the ultimate power of CellProfiler comes from. It would be pretty hard to analyze 45,000 images by eye!

From my experience so far, there is not a large performance gain using the compiled version of CellProfiler. This is because the program still calls MatLab library functions (which is why you must install MCR). The main reason for compiling is that you do not need another MatLab license to run the software. If you are still adamant about compiling CellProfiler, we actually have a script which produces the necessary additions to CellProfiler.m for compiling. We didn’t actually type in all that hard-coded information! :smile:

Mike


#7

Yes, since we have a cluster available and since we will definitely have to upscale our analysis this doesn’t sound like a bad idea :smile: We would at least be very interested if there was a ‘How To’ somewhere, so we could just experiment a little.

Concerning the performance/stability gain, do MatLab & CP become more stable when running under Linux in comparison with running under Windows?

Cheers!


#8

Hi Berend,

We have seen very little difference in stability between any platforms, although there were more issues setting up Mac OS X and Linux compiled versions due to compiler issues. I am assuming your cluster is Linux as it is more stable of an operating system in general.

You can first read the help in CreateBatchScripts.m which basically explains how to set up the cluster. However, the information there only works if you have matlab licenses for every node. We have just recently (yesterday) successfully run cluster jobs with a compiled version of CP. You can now essentially run CP on a cluster without a single MatLab license.

I have pasted the contents of runcluster.m at the bottom of this post. We load all the CellProfiler modules into this small program and do not use any of the GUI code. This simple program will take in batchfile which contains the handles structure, and clusterfile which contains the information for a small chunk of the images. It then processes the images as the GUI would and produces an output for that set of images.

There are many small technical details, such as creating a command file to load the correct libraries (you can see an example in the Mac compiled version). We also have a batch script which submits all of the individual cluster jobs. The compiled version of this batch script differs from the matlab version which you can see in CreateBatchScripts (it is called batchrun.sh).

Since this is such a big topic to discuss, I am going to start a new forum topic dedicated to running cellprofiler on the cluster. It will contain all the information necessary to set up a cluster and how to run jobs. However, this will take some time and I will be out of town next week. We will also provide the files necessary (batchrun.sh, runcluster.m) upon request.

The next release of CellProfiler (which is soon) will provide a module to produce the cluster files necessary for a compiled CP on a cluster. It also contains many pipeline bugs.

I hope this gives you some idea of how we run CP on the cluster.

-Mike


function runcluster(batchfile,clusterfile)

%%% Must list all CellProfiler modules here
%#function Align ApplyThreshold Average CalculateRatios CalculateStatistics ClassifyObjects ColorToGray ConvertToImage CorrectIllumination_Apply CorrectIllumination_Calculate CreateBatchScripts CreateWebPage Crop DefineGrid DisplayDataOnImage DisplayGridInfo DisplayHistogram DisplayImageHistogram DisplayMeasurement Exclude ExpandOrShrink FilterByObjectMeasurement Flip GrayToColor IdentifyObjectsInGrid IdentifyPrimAutomatic IdentifyPrimManual IdentifySecondary IdentifyTertiarySubregion InvertIntensity LoadImages LoadSingleImage LoadText MeasureCorrelation MeasureImageAreaOccupied MeasureImageIntensity MeasureImageSaturationBlur MeasureObjectAreaShape MeasureObjectIntensity MeasureObjectNeighbors MeasureTexture MergeBatchOutput OverlayOutlines PlaceAdjacent Relate RenameOrRenumberFiles RescaleIntensity Resize Restart Rotate SaveImages SendEmail Smooth SpeedUpCellProfiler SplitOrSpliceMovie Subtract SubtractBackground Tile WriteSQLFiles CreateClusterFiles

load(batchfile);
load(clusterfile);
tic
handles.Current.BatchInfo.Start = cluster.StartImage;
handles.Current.BatchInfo.End = cluster.EndImage;
for BatchSetBeingAnalyzed = cluster.StartImage:cluster.EndImage,
disp(sprintf(‘Analyzing set %d’, BatchSetBeingAnalyzed));
toc;
handles.Current.SetBeingAnalyzed = BatchSetBeingAnalyzed;
for SlotNumber = 1:handles.Current.NumberOfModules,
ModuleNumberAsString = sprintf(’%02d’, SlotNumber);
ModuleName = char(handles.Settings.ModuleNames(SlotNumber));
handles.Current.CurrentModuleNumber = ModuleNumberAsString;
try
handles = feval(ModuleName,handles);
catch
handles.BatchError = [ModuleName ’ ’ lasterr];
disp(‘Batch Error: ’ ModuleName ’ ’ lasterr]);
rethrow(lasterror);
quit;
end
end
end
cd(cluster.OutputFolder);
handles.Pipeline = ];
save(sprintf(’%s%d_to_%d_OUT’,cluster.BatchFilePrefix,cluster.StartImage,cluster.EndImage),‘handles’);