Getting CP2 running a batch job on Amazon EC2


#1

I’ve been at this off and on for a couple weeks now, trying a mish-mash of different approaches and tools from the different builds and directions available, and I realized it’s high time I asked what your recommendation is for a new user who’s trying to get CP2 running on an EC2 instance. It’s running Ubuntu 12.04.

My current goal is just to be able to run the simplest batch job I can think of, which is essentially just a batch job I saved from the example ExampleHumanImages pipeline. So far I’ve run into a combination of build errors when trying to setup CP2.0 and Java library errors after a seemingly successful build on trying to run CellProfiler 2.1.

Following mainly directions from (github.com/CellProfiler/CellPro … evelopment), I’ve had the nearest success I think following Option 3, using ubuntubuild.sh. I’ve had a couple of seemingly successful builds this way of CP 2.1, but ran into Java library issues I haven’t been able to resolve so far on trying to run. I’ve also tried altering ubuntubuild.sh a bit to try to build CP 2.0 that way, but have not built successfully yet that way.

So mainly I’m just looking for someone in the know to point the way to the most likely-to-succeed paths to try, and I’ll focus my efforts there.

Thanks.
Blake


#2

We should be coming out with a Linux build for Ubuntu for our upcoming release (most likely before the end of 2013). That will give you a binary package that you should be able to install. I can see that you do pay a hefty premium for using Windows on EC2, but it is a straightforward and well-tested install.

The Linux build is extremely difficult and we’re devoting a good portion of our resources to making a reproducible build procedure for Centos and Ubuntu. The Java library issues might be solved by placing the directory containing jvm.so on your LD_LIBRARY_PATH. You should be able to find out that path by navigating to the directory “cellprofiler/utilities” in the checked-out source and typing “java findlibjvm”.

We use Oracle’s JDK 1.6 on our clusters and have had problems with OpenJDK.

–Lee


#3

Hi Lee,

I still have had no luck in getting CellProfiler 2 running in headless mode on Ubuntu, although I felt at times that I was closing in.

Do you think you’re still on track to have an Ubuntu build done before the end of the year? If not, have you considered, or would you consider, devoting some small amount of time to getting CellProfiler installed and running on an Amazon EC2 instance, which could be saved as an AMI (amazon machine instance) and made available for any user wanting to run CellProfiler on Amazon’s cloud? I can’t imagine I’m the only one interested in this. I’d be glad to provide sudo access to an EC2 instance, I’d pay for time on it until someone in your group could get CellProfiler running using whatever standard practices and tricks you use, and I’d post the AMI in the publicly available listing in EC2.

Another alternative gaining traction in the startup community is docker (blog.docker.io/2013/11/docker-0- … tribution/ The disadvantage compared to installing CellProfiler on an EC2 instance that would be shared is more learning and effort for your team. The advantage is it’s more portable for those willing to also take the plunge and learn how to use docker effectively.

Cheers,
Blake


#4

Hi Blake,

We have beta instructions for CentOS now here: cellprofiler.org/linux.shtml
I realize this is not Ubuntu, but I thought I’d at least mention it. And I’ll ping our Linux developers who might have some more recommendations for all your comments.

Cheers,
David


#5

David,

Thanks times 100. The centos linux builder worked like a charm for me. It took a bit of grappling to figure out that I think there are some significant issues with trying to run a batch job in the 2.1 that the centos builder installs which was exported from say 2.0 using the gui. When I moved to the latest build of the gui, I was able to get through the remaining issues and get your ExampleHuman pipeline to run successfully headless on the centos amazon ec2 machine image I found from the centos people.

I now just have a few remaining issues and questions that I think I’ve come across in my forays on the forums recently, but which you can probably answer easily.

  1. So far I haven’t been able to get any .mat or .hf5 output from the run. Have you seen this happen successfully using the latest build on centos? No errors are reported even in DEBUG mode–there’s just no OUT file. I am able to export to a database successfully, so that’s sufficient for the moment at least (although the interface I’ve written was previously against the OUT file), but exporting to a spreadsheet also isn’t working for me (see #2).

  2. The ExportToSpreadsheet module seems to have some issue that prevents it from loading when running headless. This same error occurs whether I select the default output directory, default input directory, or specify a subdirectory of the input directory. Here’s the error message seen when running in DEBUG:

Failed to load pipeline Traceback (most recent call last): File "/usr/cellprofiler/src/CellProfiler/cellprofiler/pipeline.py", line 1089, in loadtxt module_name, from_matlab) File "/usr/cellprofiler/src/CellProfiler/cellprofiler/cpmodule.py", line 187, in set_settings_from_values from_matlab) File "/usr/cellprofiler/src/CellProfiler/cellprofiler/modules/exporttospreadsheet.py", line 1019, in upgrade_settings directory = cps.DirectoryPath.upgrade_setting(directory) File "/usr/cellprofiler/src/CellProfiler/cellprofiler/settings.py", line 315, in upgrade_setting dir_choice, custom_path = DirectoryPath.split_string(value) ValueError: need more than 1 value to unpack

  1. If we’re right now about to purchase a machine specifically to run CellProfiler, is there any guidance in a document or on the forums as to our best bet? Is there an os (windows/mac/linux) that’s currently (versions 2.0 and 2.1) more stable and/or has some compelling features like multithreading?

I’ll come back and update this post with an outline of the steps I’ve taken to get it running headless on centos along with a link to the functional Amazon EC2 image, which I’ll make publicly available, in case anyone else finds this post and finds those things useful.

Thanks,
Blake


#6

Hi Blake,

Glad the CentOS rpms worked for you on Ubuntu. As far as I know we hadn’t tested that.

I know this may sound simplistic (sorry), but have you double-checked your “View Output Settings” button settings? We will eventually change how the output settings are handled, but for now they live in obscurity behind this button. We are a little worried that folks will ‘misplace’ their output because of this.

We normally run CentOS only for batch/cluster runs, which output directly to the database, so we haven’t done much testing at all with the Linux GUI. Afaik, the output should be the same for Linux as the other OSes, as specified in the View Output Settings button.

Hmm we will check this out. Can you provide a project file and image set for us to test?

We don’t expect differences between OSes, however our development is on Win64 (Windows 7), and so Mac- and Linux-specific issues are found and fixed post hoc. So Win64 with decent amount of RAM (8GB is probably fine) is probably your best bet. All 2.1 version should have the same multithreading benefits over 2.0. But a cluster of any OS, and usually these are Linux, is your best method of analyzing lots of images, vs. a monster single machine.

Cheers,
David


#7

Regarding the ExportToSpreadheet issue, I just tried a simple pipeline including ExportToSpreadsheet and I could not reproduce your error. I used CreateBatchFiles and then ran the single batch of three images on our Linux cluster, headless, and there were no issues. I used the 2013_12_03 version of CP 2.1.
Are you still seeing this error?

Side note: ExportToSpreadsheet will only output if you use run all the images as a single batch, as it won’t (afaik) put all the batches output back into a single CSV.
-David


#8

David, thanks for your responses and digging into this.

First I should correct a couple of items I was unclear on. I’m not using the gui on linux–I’m running it headless, using a CreateBatchFiles from a osx. Ubuntu vs CentOS is no concern for me, as I’m just running on EC2 on amazon and can use any OS, any distribution–I used the CentOS installer you linked to on CentOS, not on Ubuntu. Also, in the time since I made that last post I’ve moved past some of the issues I mentioned, and ExportToSpreadsheet isn’t really of critical importance for me at this point–I should have updated my post about that. I also resolved my problem with not getting h5 output–that was just a mistake in my part in not reading the docs thoroughly and so not understanding that in headless mode it doesn’t keep the h5 output unless you specify a non-temporary path to use.

Currently I have it working fine between 2.1 on osx and 2.1 on CentOS except for one remaining hurdle that’s been the death of me in the last few days. I can’t find any functional way of specifying via the command line at runtime on the cluster at set of images to process. No matter how I’ve tried to specify the images, no matter what command line arguments I provide or what csv files I provide to LoadData, from everything I’ve tried on the command line, it still wants to process the exact same list of images found using the gui on osx. The most promising route I went was using the LoadData module. The --data-file argument seems to be ignored from what I can tell, such that headless it looks for a csv file with the same name in the same location as when the batch was created on the gui, and even when I put a changed images.csv file (headless it looks for it and seems to find it), it ignores the actual contents of that csv file and instead wants to process the images files specified in the original images.csv file on the gui computer.

Could you point me to or briefly describe what the “standard” or best-tested route is for achieving what I’m looking for on 2.1? Basically I want to be able to test a pipeline on the gui on osx or windows using a small set of images, then use CreateBatchFiles to put the pipeline on the server, and on the server run that pipeline on many different sets of images, without having to go back to the client for each different set of images I need to process.

I’m also going back and trying to get 2.0 running on the server to see if the LoadImages or LoadData modules work for me with that version.

Thanks!
Blake


#9

Hi Blake,
OK, good - glad many of the issues are sorted out for (and by!) you.

As for the batch processing issue, the way we submit the batches is not via a single command-line command, but to script the batches with image number ranges. We use cgi scripts in house to make nice web forms for our less-command-line-savvy users, but you could write a shell script to do this. Maybe I am misunderstanding but I think what you need is stated at the end of the Help section below, called Batch Processing, namely:

[quote]Submit your batches to the cluster. Log on to your cluster, and navigate to the directory where you have installed CellProfiler on the cluster. A single batch can be submitted with the following command:
./python-2.6.sh CellProfiler.py -p <Default_Output_Folder_path>/Batch_data.mat -c -r -b -f <first_image_set_number> -l <last_image_set_number> This command runs the batch by using additional options to CellProfiler that specify the following (type “CellProfiler.py -h” to see a list of available options):

-p <Default_Output_Folder_path>/Batch_data.mat: The location of the batch file, where <Default_Output_Folder_path> is the output folder path as seen by the cluster computer.

-c: Run “headless”, i.e., without the GUI

-r: Run the pipeline specified on startup, which is contained in the batch file.

-b: Do not build extensions, since by this point, they should already be built.

-f <first_image_set_number>: Start processing with the image set specified, <first_image_set_number>

-l <last_image_set_number> : Finish processing with the image set specified, <last_image_set_number>
To submit all the batches for a full image set, you will need a script that calls CellProfiler with these options with sequential image set numbers, e.g, 1-50, 51-100, etc and submit each as an individual job. [/quote]

So you would write a script with incrementing ‘-f’ and ‘-l’ image set number ranges. Something like this for images 1 to 5, then 6 to 10, etc:

./python-2.6.sh CellProfiler.py -p <YOUR_PIPELINE_DIR>/Batch_data.h5 -c -r -b --do-not-fetch -o <YOUR_OUTPUT_DIR> -f 1 -l 5 -d <YOUR_DONE_FILE>/Batch_1_to_5_DONE.mat ./python-2.6.sh CellProfiler.py -p <YOUR_PIPELINE_DIR>/Batch_data.h5 -c -r -b --do-not-fetch -o <YOUR_OUTPUT_DIR> -f 6 -l 10 -d <YOUR_DONE_FILE>/Batch_6_to_10_DONE.mat ...

Sorry if that was obvious, but does it help?
David


#10

Hi David,

I’ve got it working, and am doing my first cloud run right now, thanks! I think the main trouble I’ve been having the last few days relates to me wanting to set up the pipeline and create the batch files without having filesystem access to the full set of images I wanted to run on the cluster. I’d been treating that as a hard requirement for my workflow due to our hardware arrangement and thus the LoadData module seemed to be the most likely approach to work for my needs, but decided this afternoon to revisit that assumption, and made all the images for processing accessible via the filesystem from the osx gui, in a matching file structure to how I have it set up for the cluster. That allowed me to use the new 2.1 Images and related modules as designed, including the Groups module, which worked headless on centos just as described in the docs for me (unlike the -f and -l arguments as discussed below). Kudos by the way on the Metadata and Groups modules–they’re great.

Just by way of reporting my troubles with the widely-advertised -f and -l approach, I’ll share my experience. I’m somewhat familiar with the docs and have previously tried your suggestion among a number of approaches, but was never able to get the -l argument specifically to be accepted without crashing. Maybe the -f and -l options are specific to a certain way of loading the images which I haven’t tried yet, although I’ve now tried it with both LoadData and with the new 2.1 Images modules. Or it could be that the two versions of CP2.1 I’m running are not compatible. On osx I’m running 20140102, and on centos I’m running 20131205. I get a warning with every run stating that they could be incompatible, but I’m guessing this is expected behavior, and assumed that they were probably compatible enough. I’ve just tried using the centos installer again, but it still pulls the same 20131205 version. I also tried this flow using a batch file created on a windows machine running cp2.1 to see the error was specific to making the batch files on osx, but got the same error message. The error message I get every time I try to supply a -l argument is (if not exactly the same every time, then roughly similar):

Failed during initial processing of /tmp/CpmeasurementsZpx7OT.hdf5
Traceback (most recent call last):
  File "/usr/cellprofiler/src/CellProfiler/cellprofiler/utilities/hdf5_dict.py", line 293, in __init__
    maxshape = (None, ))
  File "/usr/cellprofiler/lib/python2.7/site-packages/h5py/_hl/group.py", line 94, in create_dataset
    dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
  File "/usr/cellprofiler/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 76, in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
  File "h5t.pyx", line 1379, in h5py.h5t.py_create (h5py/h5t.c:12683)
  File "h5t.pyx", line 1451, in h5py.h5t.py_create (h5py/h5t.c:12533)
TypeError: Object dtype dtype('object') has no native HDF5 equivalent
Error loading HDF5 /root/testpilot2/Batch_data_short_win.h5
Traceback (most recent call last):
  File "/usr/cellprofiler/src/CellProfiler/cellprofiler/measurements.py", line 1700, in load_measurements
    image_numbers = image_numbers)
  File "/usr/cellprofiler/src/CellProfiler/cellprofiler/measurements.py", line 266, in __init__
    image_numbers=image_numbers)
  File "/usr/cellprofiler/src/CellProfiler/cellprofiler/utilities/hdf5_dict.py", line 293, in __init__
    maxshape = (None, ))
  File "/usr/cellprofiler/lib/python2.7/site-packages/h5py/_hl/group.py", line 94, in create_dataset
    dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
  File "/usr/cellprofiler/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 76, in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
  File "h5t.pyx", line 1379, in h5py.h5t.py_create (h5py/h5t.c:12683)
  File "h5t.pyx", line 1451, in h5py.h5t.py_create (h5py/h5t.c:12533)
TypeError: Object dtype dtype('object') has no native HDF5 equivalent
So sorry. CellProfiler failed to remove the temporary file, /tmp/CpmeasurementsZpx7OT.hdf5 and there it sits on your disk now.

Note that I only get this error message or anything like it when supplying the -l argument–it seems to deal with -f fine.

Glad to share more of my experiences and dead ends, especially concerning the LoadData module, but at this point I’m guessing you all probably want to deprecate that pretty soon anyway. If what I’ve described above is true, and it’s widely known by users that you really need to have filesystem access on your gui client to the full set of files you want to analyze in the cloud, it could be useful to make that more clear in guides and documentation, although I definitely could have just missed it.

Thanks again for your helpful and rapid responses today!
Blake


#11

I discovered the same error (though not in the same way), and filed a bug report on it: github.com/CellProfiler/CellProfiler/issues/995. I’ll repost here when it’s fixed, and you can confirm whether it solves your problem.
-Mark


#12

We fixed the bug; can you try to pull from source again, and see if it solves your problem?
-Mark


#13

Actually I am having the same problem with the -l command. We are using the latest version 2.1 installed on a CentOS 6.4 cluster. My pipeline will run fine with no image numbers set, or just a first number (-f) set but crashes when I add a -l argument. Here is a typical output:

[scott@login1 ~]$ cellprofiler -c -r -p /home/scott/CellProfilerBatchFiles/131111_Dextran/Batch_data.h5 -f 1 -l 8 Version: 2014-01-24T15:02:55 2.1.0.Release / 20140124150255 Plugin directory doesn't point to valid folder: /home/scott/plugins Failed during initial processing of /tmp/CpmeasurementsY9kXHN.hdf5 Traceback (most recent call last): File "/usr/cellprofiler/src/CellProfiler/cellprofiler/utilities/hdf5_dict.py", line 293, in __init__ maxshape = (None, )) File "/usr/cellprofiler/lib/python2.7/site-packages/h5py/_hl/group.py", line 94, in create_dataset dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds) File "/usr/cellprofiler/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 76, in make_new_dset tid = h5t.py_create(dtype, logical=1) File "h5t.pyx", line 1379, in h5py.h5t.py_create (h5py/h5t.c:12683) File "h5t.pyx", line 1451, in h5py.h5t.py_create (h5py/h5t.c:12533) TypeError: Object dtype dtype('object') has no native HDF5 equivalent Error loading HDF5 /home/scott/CellProfilerBatchFiles/131111_Dextran/Batch_data.h5 Traceback (most recent call last): File "/usr/cellprofiler/src/CellProfiler/cellprofiler/measurements.py", line 1703, in load_measurements image_numbers = image_numbers) File "/usr/cellprofiler/src/CellProfiler/cellprofiler/measurements.py", line 269, in __init__ image_numbers=image_numbers) File "/usr/cellprofiler/src/CellProfiler/cellprofiler/utilities/hdf5_dict.py", line 293, in __init__ maxshape = (None, )) File "/usr/cellprofiler/lib/python2.7/site-packages/h5py/_hl/group.py", line 94, in create_dataset dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds) File "/usr/cellprofiler/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 76, in make_new_dset tid = h5t.py_create(dtype, logical=1) File "h5t.pyx", line 1379, in h5py.h5t.py_create (h5py/h5t.c:12683) File "h5t.pyx", line 1451, in h5py.h5t.py_create (h5py/h5t.c:12533) TypeError: Object dtype dtype('object') has no native HDF5 equivalent So sorry. CellProfiler failed to remove the temporary file, /tmp/CpmeasurementsY9kXHN.hdf5 and there it sits on your disk now. Pipeline saved with CellProfiler version 20140124145122 stopping worker thread 0 stopping worker thread 1 stopping worker thread 2 stopping worker thread 3 stopping worker thread 4 stopping worker thread 5 stopping worker thread 6 Exiting the JVM monitor thread

Is it possible this is the same bug or am I missing something else?

-cscott


#14

You should wait and see what the developers have to say on this, but as for me I’ve never gone back and tried using -f and -l. However, using the new Images, Metadata and Groups modules has worked perfectly for me. That enables calling CP on the command line for just one or more groups at a time. For instance, here’s how I’ve started a run just on a subset of images from a particular plate and site (where the plate and site variables are $p and $s):


#15

Thanks for the reply,

I actually tried to specify the images using the -g command but was also unable to get that to work which is strange. As I said, I can get the pipeline to run in a single thread if I don’t specify the -l, and if I add the --get-batch-commands argument I get what seems to me a valid output (I truncated the output to just a few lines):

CellProfiler -c -r -b -p /home/scott/CellProfilerBatchFiles/131111_Dextran_test/Batch_data.h5 -g Metadata_Site=1,Metadata_Well=H11 CellProfiler -c -r -b -p /home/scott/CellProfilerBatchFiles/131111_Dextran_test/Batch_data.h5 -g Metadata_Site=2,Metadata_Well=H11 CellProfiler -c -r -b -p /home/scott/CellProfilerBatchFiles/131111_Dextran_test/Batch_data.h5 -g Metadata_Site=3,Metadata_Well=H11 CellProfiler -c -r -b -p /home/scott/CellProfilerBatchFiles/131111_Dextran_test/Batch_data.h5 -g Metadata_Site=4,Metadata_Well=H11 CellProfiler -c -r -b -p /home/scott/CellProfilerBatchFiles/131111_Dextran_test/Batch_data.h5 -g Metadata_Site=1,Metadata_Well=H12 CellProfiler -c -r -b -p /home/scott/CellProfilerBatchFiles/131111_Dextran_test/Batch_data.h5 -g Metadata_Site=2,Metadata_Well=H12 CellProfiler -c -r -b -p /home/scott/CellProfilerBatchFiles/131111_Dextran_test/Batch_data.h5 -g Metadata_Site=3,Metadata_Well=H12 CellProfiler -c -r -b -p /home/scott/CellProfilerBatchFiles/131111_Dextran_test/Batch_data.h5 -g Metadata_Site=4,Metadata_Well=H12

But if I try to run something like:

cellprofiler -c -r -p /home/scott/CellProfilerBatchFiles/131111_Dextran_test/Batch_data.h5 -g "Metadata_Site=4,Metadata_Well=H 12" ~/output/test.h5

I get an error indicating that it doesn’t like the specified grouping key like so:

Version: 2014-01-24T15:02:55 2.1.0.Release / 20140124150255 Plugin directory doesn't point to valid folder: /home/scott/plugins Pipeline saved with CellProfiler version 20140124145122 Times reported are CPU times for each module, not wall-clock time Uncaught exception in CellProfiler.py Traceback (most recent call last): File "/usr/cellprofiler/src/CellProfiler/CellProfiler.py", line 228, in main run_pipeline_headless(options, args) File "/usr/cellprofiler/src/CellProfiler/CellProfiler.py", line 717, in run_pipeline_headless initial_measurements = initial_measurements) File "/usr/cellprofiler/src/CellProfiler/cellprofiler/pipeline.py", line 1627, in run initial_measurements = measurements): File "/usr/cellprofiler/src/CellProfiler/cellprofiler/pipeline.py", line 1737, in run_with_yield in group(workspace): File "/usr/cellprofiler/src/CellProfiler/cellprofiler/pipeline.py", line 1655, in group ", ".join(grouping.keys()), ", ".join(keys))) ValueError: The grouping keys specified on the command line (Metadata_Site, Metadata_Well) must be the same as those defined by the modules in the pipeline () stopping worker thread 0 stopping worker thread 1 stopping worker thread 2 stopping worker thread 3 stopping worker thread 4 stopping worker thread 5 stopping worker thread 6 Exiting the JVM monitor thread

I guess this is a separate bug (or error on my part) but it still means I can’t break the job up into parts for the cluster.

cscott


#16

Hi cscott,

We may have fixed the problem, as Mark noted above. It looks like you’re using the release version, so if you have not installed the Developer’s version, you could try the 2.1 “trunk build” here: cellprofiler.org/cgi-bin/trunk_build.cgi

If you run with the GUI, does the Groups module report the proper groups when you click the Update button?

Cheers,
David


#17

Hello, I’m the administrator of the cluster from where the user “cscott” is talking about.

Previously I have installed cellProfiler using the yum depot on centos 6.4. You said we should try the version “trunk build”. But as far as I saw, there is no Linux distribution. So I tried with the trunk on github.

I did a clone of https://github.com/CellProfiler/CellProfiler.git and tried to build it from source. I didn’t had a HOSTTYPE variable, so I set one to 64 (not sure if this is correct).

I did a make -f Makefile.CP2 PREFIX=/opt/cellProfiler

but then I got a 404 error when the script was trying to dowload this file: http://zlib.net/zlib-1.2.5.tar.bz2"

Is there a better way to test it as I probably already have all the dependencies since I have already installed cellProfiler on this machine?

Thanks!


#18

Sorry about the “trunk build” comment. My mistake, you’re right, there is no Linux trunk build – I must have been answering too many questions here :smile:

I don’t know about the zlib issue, but I will ask our Linux gurus if they have any updates on that.

Cheers,
David


#19

I am guessing that the “TypeError: Object dtype dtype(‘object’) has no native HDF5 equivalent” is due to some subtle flavor difference between the Linux and Windows build of h5py or hdf5. CellProfiler is failing at a place where it takes a chunk of data from the HDF5 batch file file and then tries saving it in the HDF5 measurement file and it gives up saying “that data type that you got from HDF5 isn’t supported in HDF5”. So I will experiment with a little defensive programming in that piece of code and see if I can send you a patch that fixes the problem (and if it doesn’t I may have to try a couple times).


Errors Starting Batch Processing
#20

I’ve updated the file that’s almost certainly the cause of the problem with a fix that should solve it. If this doesn’t fix the problem, tell me, I have one more trick I can try. The patch is github.com/CellProfiler/CellPro … 3fa4d9af8f.

If you don’t have a GIT repository, you can navigate to /usr/cellprofiler/src/CellProfiler and apply the patch like this (you may have to use sudo to override permissions?):

    wget --no-check-certificate https://github.com/CellProfiler/CellProfiler/commit/4c07c3539d3ca0949c6f3ce81a91983fa4d9af8f.patch -O - | git am

Let me know if it works.