Using the ExportToDatabase module


#1

Hi,

I tried to use the ExportToDatabase module and have gotten an ‘Identifier to long’ error. I’m using MySQL 5.0.26 it appears that Identifiers for column names have a 64 character limit. This is the first Identifier that caused a problem:

Mean_ThresholdedCells_Intensity_CorrGreen_IntegratedIntensityEdge

turns out this one is 65 characters. Any comments?

Which database server are you all using? Is there any documentation on your databasing strategy?

After deleting lines that were too long I was able to create a database. From a first pass examination of the tables created after the SBS example it appears that a new column is made for every new measurement. That means table from different types of analysis will have different numbers of columns. That’s a bit strange. Granted this will keep all measurements related to a single object on one row, but as more channels/colors are taken of one field the number of columns is going to keep getting bigger. Was there a reason for doing it this way?

Has anyone considered putting the outline for each object in the database?

One final question is this a list of all the possible measurements in CP?

IntegratedIntensity
MeanIntensity
StdIntensity
MinIntensity
MaxIntensity
IntegratedIntensityEdge
MeanIntensityEdge
StdIntensityEdge
MinIntensityEdge
MaxIntensityEdge
MassDisplacement
Area
Eccentricity
Solidity
Extent
EulerNumber
Perimeter
FormFactor
MajorAxisLength
MinorAxisLength
Orientation
AngularSecondMoment
Contrast
Correlation
Variance
InverseDifferenceMoment
SumAverage
SumVariance
SumEntropy
Entropy
DifferenceVariance
DifferenceEntropy
InformationMeasure1
InformationMeasure2
GaborX
GaborY

thanks, John


#2

The quick answer is that we have also run in to this problem and have had to modify the values. Let me look in to your other questions and get back to you. In the last release we made many changes to how data is stored to try to reduce the size of column names. Since then, we seem to have not run in to this problem.

Mike


#3

Hi Mike,

Thanks for the quick reply. Nice to know somebody’s on the other end. I’m not a database expert, but given the binary based nature of computers keeping the size under 64 (or some power of 2) would be a good idea.

I’m curious, do you all do batch processing of images independent of experiment context and then pull out the numbers from the batch databases later? In that case, not being properly normalized is probably a good enough way to get the numbers from the image processing. Which is the time consuming part of big image screens of course. Uncoupling that part from later experimental analysis could be a good way to go, particularly in a cluster environment.

You could have normalized your object table by having a column called “object type”. In that you would have ‘Nuclei’ or ‘ThresholdedCells’ and the rest of the row would have the numbers for that object. That way instead of having five columns (or more) that contain ‘IntegratedIntensity’ you would have one column and five rows. Then you could have a separate table with object types and you could add object types as you create new ones and never have to add columns to your object table.

thanks, John