We aim to use CellProfiler in headless mode on a cluster for the automated analysis of large amounts of histology data. We have about twenty thousand slide images (TIFs of several GBs') of several different stainings. These are masked and converted into 2000x2000 pixel tiles by a toolkit specifically designed for that purpose. The tiles are fed into CellProfiler pipelines for each respective staining.
...or that was the idea.
We had CellProfiler 2.x.x installed on our CentOS6 based High-Performance Cluster and some test runs (two years ago) were succesful. However, the cluster has recently been updated to CentOS7 and we needed to reinstall CP, which has been a nightmarish experience. From one side due to the fact that we do not have administrative permissions on our system, and the modular installation system (LMOD) employed by the HPC facility is unsuitable for installation of CP. From the other side, due to difficulties getting older releases of CP to work.
Because our pipelines were designed for a previous version of CellProfiler, we tried to install a previous version of CP. Git cloning a previous version commit and using make to build CP from source halts at a point where the script attempts to download dependency files (through an svn system) that are apparently no longer hosted / accessible on the website (connection time-outs).
We prefer to install CellProfiler in a contained environment so other users of the HPC facility can not meddle with our software, so ideally, it should not be dependent of the system wide Python installation. We therefore opted for the Anaconda method of installation. A lot of effort lead to an error-less installation, and correct display of --help and --version messages (although CP was still displaying rc3.0.0 as its version, whereas the actual commit dated back to version 2.1). Running a pipeline then, perhaps unsurprisingly, leads to a warning message about potentially unexpected behavior (as the pipeline was designed for a 'previous' version of CP). No error message is displayed, and CellProfiler quits without notice. Since there is no output, debug mode or verbose option, I have no idea what kind of error occurs.
From what I understand (and experienced), converted older pipelines do not work properly in headless mode out-of-the-box on a newer installation. CellProfiler (or at least the rc3.0.0) seemed to ignore the
-i flag (in our tests), and search for the test directory specified in the pipeline/project file at the design stage.
We are quite desperate for a solution and no longer know how to continue searching for one. We would very much appreciate your advice in rethinking our approach,
- We need to run headless on a cluster, because of the overwhelming amount of data
- We cannot perform administrative tasks during installation, and installing all dependencies manually is a complete hell, so a no-go.
- We would preferably run 2.1 or maybe 2.2 to prevent having to alter the pipelines.
The best thing for us (we think) would be if the SVN would be restored, so we can attempt to build from source using a previous commit. This way, it should work as it did previously.
If this is not possible or if there is a better alternative: Please advise us on the best approach to set up such a new pipeline (e.g. What is the best installation strategy for our situation, What are important considerations/differences when loading data in a pipeline when compared to the design stage on a desktop?).
Thank you in advance!