Workers not finding files on s3 when attempting distributed-cellprofiler


#1

Hi,

I’m trying to use your new distributed cellprofiler tool and I feel like I’m pretty (delusionally) close, but CP doesn’t seem to be retrieving the pipeline or images from my s3 bucket. I notice when I start the run that an ecsconfigs file appears in my bucket so it seems that something has access. I do all the steps and an SQS queue is created as well as a cluster, task and spotfleet, but nothing is written to my bucket and when I look in the cloudwatch logs it seems like there is a problem accessing files. I’ve modified the “ExampleSBS” to use the load data module and a file list I will attached those. It runs locally on my laptop without issue. I attached a screenshot of one of the cloudwatch logs for a well:

ExampleSBS_03.cppipe (17.0 KB)
setlist.csv (10.2 KB)

When I purge the queue to stop after it’s clear nothing is getting analyzed then the monitor function stops with this error:

Service has been downscaled
Old alarms deleted
Shutting down spot fleet sfr-f83e82b3-d07a-47f9-befd-c571055f50bc
Job done.
Deleting existing queue.
Deleting service
De-registering task
Removing cluster if it’s not the default and not otherwise in use

An error occurred (InvalidParameterException) when calling the CreateExportTask operation: GetBucketAcl call on the given bucket failed. Please check if the specified Amazon S3 Bucket is in the same AWS region as CloudWatch Logs.
Traceback (most recent call last):
File “run.py”, line 317, in
monitor()
File “run.py”, line 293, in monitor
result =getAWSJsonOutput(cmd)
File “run.py”, line 21, in getAWSJsonOutput
requestInfo = json.loads(out)
File “/usr/lib/python2.7/json/init.py”, line 339, in loads
return _default_decoder.decode(s)
File “/usr/lib/python2.7/json/decoder.py”, line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File “/usr/lib/python2.7/json/decoder.py”, line 382, in raw_decode
raise ValueError(“No JSON object could be decoded”)
ValueError: No JSON object could be decoded

problem here is my s3 bucket appears to be “Global”. My vpc and cloudwatch are us-west-2. thanks in advance! -John


#2

I don’t have a ton of experience troubleshooting this sort of thing but if you log into your instance, can you get into the bucket (even just ls to see if you can access the files inside it).

My guess is you won’t be able to; can you check your bucket permissions and your instance role permissions to make sure they’re set up to allow instances with that role to access the bucket? Alternately, can you try making a bucket in the same region as your vpc and put the files there?


#3

Hi, no, I can’t seem to access the bucket from the cp instance via an ls. I can do such a thing from a different instance where I have installed and enabled the aws cli tools:

ubuntu@ip-172-31-39-238:~$ aws s3 ls s3://cp.bucket
PRE ExampleSBSImages/
PRE ecsconfigs/
PRE exportedlogs/

but from the cp instance I can’t do that or install the tools necessary:

[ec2-user@ip-172-31-45-189 ~] aws s3 ls s3://cp.bucket -bash: aws: command not found [ec2-user@ip-172-31-45-189 ~] sudo apt-get update
-bash: apt-get: command not found

I made a new bucket with open permissions and it’s in the same region as the instances. I also added AmazonS3FullAccess to the ecsInstanceRole and the aws-ec2-spot-fleet-role. I’m getting logging from the ec2 instance and when I set up the bucket I enabled logging which says something about who is trying to connect to s3. Here’s an example of the s3 logging. 54.202.161.253 is the public ip of the cp instance:

414b004f24c8fcb0a14884fb1cb913d989e441b6dc1463e33e99b2acdbec105a cp.bucket [13/Feb/2017:18:21:51 +0000] 54.202.161.253 arn:aws:iam::524277596825:user/admin 091DB0F63A4553E8 REST.GET.BUCKET - “GET /cp.bucket?prefix=ExampleSBSImages%2Foutput%2FC06%2F&encoding-type=url HTTP/1.1” 200 - 295 - 21 20 “-” “Boto3/1.4.1 Python/2.7.6 Linux/4.4.41-36.55.amzn1.x86_64 Botocore/1.4.61” -

here’s an example of the cp instance logging that was exported to s3:

2017-02-13T19:05:06.686Z cellprofiler -c -r -b -p /home/ubuntu/bucket/ExampleSBSImages/ExampleSBS_03.cppipe -i /home/ubuntu/bucket/ExampleSBSImages/images/ -o /home/ubuntu/local_output/A10 -d /home/ubuntu/local_output/A10/cp.is.done --data-file=/home/ubuntu/bucket/ExampleSBSImages/setlist.csv -g Metadata_Well=A10
2017-02-13T19:05:06.969Z Plugin directory doesn’t point to valid folder: /home/ubuntu/plugins
2017-02-13T19:05:12.039Z Version: 2016-09-16T14:16:40 7a8b7d5 / 20160916141640
2017-02-13T19:05:12.039Z Failed to stop Ilastik
2017-02-13T19:05:12.583Z Uncaught exception in CellProfiler.py
2017-02-13T19:05:12.584Z Traceback (most recent call last):
2017-02-13T19:05:12.584Z File “/usr/local/src/CellProfiler/cellprofiler/main.py”, line 252, in main
2017-02-13T19:05:12.584Z run_pipeline_headless(options, args)
2017-02-13T19:05:12.584Z File “/usr/local/src/CellProfiler/cellprofiler/main.py”, line 887, in run_pipeline_headless
2017-02-13T19:05:12.584Z pipeline.load(options.pipeline_filename)
2017-02-13T19:05:12.584Z File “/usr/local/src/CellProfiler/cellprofiler/pipeline.py”, line 852, in load
2017-02-13T19:05:12.585Z raise IOError("Could not find file, " + fd_or_filename
2017-02-13T19:05:12.585Z IOError: Could not find file, /home/ubuntu/bucket/ExampleSBSImages/ExampleSBS_03.cppipe
2017-02-13T19:05:12.585Z Failed to stop Ilastik

Not sure what to look for at this point. thanks for your help. -John


#4

You don’t have to use aws s3 ls, the bucket should be mounted with s3fs - can you follow the steps below and then either print the output or the error message?

1.SSH into the instance
2. Use docker ps to get the container name ID of your DCP docker - this will be the very first thing on the line. Be sure to use a DCP docker and not the ecs-agent one.
3. Enter the container with docker exec -i -t {CONTAINER} /bin/bash
4. type ls bucket/ExampleSBSImages

Edited for clarity of future users.


#5

[ec2-user@ip-172-31-45-189 ~] docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7a2d37bff5c1 amazon/amazon-ecs-agent:latest "/agent" 36 hours ago Up 36 hours ecs-agent [ec2-user@ip-172-31-45-189 ~] docker exec -i -t {ecs-agent} /bin/bash
Error response from daemon: No such container: {ecs-agent}
[ec2-user@ip-172-31-45-189 ~] docker exec -i -t {7a2d37bff5c1} /bin/bash Error response from daemon: No such container: {7a2d37bff5c1} [ec2-user@ip-172-31-45-189 ~] ls bucket/ExampleSBSImages
ls: cannot access bucket/ExampleSBSImages: No such file or directory

I’m not currently trying to run a batch. should I be? -john


#6

Yes, that’s the Amazon docker container that manages other things for Docker- you want to be looking inside one of the DCP dockers, which means something needs to be running. Sorry for the confusion!


#7

when I log into the worker that keeps running after emptying the queue this is the output:
[ec2-user@ip-172-31-45-189 ~] docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 63296715c19b bethcimini/distributed-cellprofiler:latest "./run-worker.sh " 2 minutes ago Up 2 minutes ecs-testCP3Task-2-testCP3-eccd86cbc4b8dcefcc01 7a2d37bff5c1 amazon/amazon-ecs-agent:latest "/agent" 37 hours ago Up 37 hours ecs-agent [ec2-user@ip-172-31-45-189 ~] docker exec -i -t {ecs-testCP3Task-2-testCP3-eccd86cbc4b8dcefcc01} /bin/bash
Error response from daemon: No such container: {ecs-testCP3Task-2-testCP3-eccd86cbc4b8dcefcc01}
[ec2-user@ip-172-31-45-189 ~] ls bucket/ExampleSBSImages ls: cannot access bucket/ExampleSBSImages: No such file or directory [ec2-user@ip-172-31-45-189 ~]

when I log into one of the spot fleet instances it is this:

[ec2-user@ip-172-31-41-41 ~]$ docker ps
-bash: docker: command not found


#8

Try docker exec -i -t 63296715c19b /bin/bash - sorry, I said name before when I meant ID, will fix above to clarify


#9

[ec2-user@ip-172-31-45-189 ~]$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
63296715c19b bethcimini/distributed-cellprofiler:latest "./run-worker.sh " 11 minutes ago Up 11 minutes ecs-testCP3Task-2-testCP3-eccd86cbc4b8dcefcc01
7a2d37bff5c1 amazon/amazon-ecs-agent:latest “/agent” 37 hours ago Up 37 hours ecs-agent

[ec2-user@ip-172-31-45-189 ~]$ docker exec -i -t {63296715c19b} /bin/bash
Error response from daemon: No such container: {63296715c19b}


#10

No brackets, exactly as I typed it.


#11

ok sorry

root@63296715c19b:/home/ubuntu#

nothing else


#12

No problem!

ls bucket/ExampleSBSImages


#13

right:

root@63296715c19b:/home/ubuntu# ls bucket/ExampleSBSImages
ls: cannot access bucket/ExampleSBSImages: No such file or directory
root@63296715c19b:/home/ubuntu#


#14

OK, I see the bug I think!
Try stdbuf -o0 s3fs $AWS_BUCKET /home/ubuntu/bucket -o passwd_file=/credentials.txt , then the ls command again.


#15

bingo

root@63296715c19b:/home/ubuntu# ls bucket/ExampleSBSImages
1049_Metadata.csv Channel2ILLUM.mat ExampleSBS.cpproj ExampleSBS_02.cppipe ExampleSBS_03.cpproj images setlist.csv
Channel1ILLUM.mat ExampleSBS.cppipe ExampleSBSIllumination.cppipe ExampleSBS_03.cppipe Thumbs.db output


#16

I’m so sorry about the bug! I’ll have an updated docker container made in about 5 minutes (I’ll post here then), and then you should be good to go.

This also means you’re officially the first person outside of our lab to get DCP working for them- much :cake: to you for your persistence!


#17

OK, debugged docker uploaded- you can try just scaling your ECS tasks down to 0, then back up to however many you originally wanted once they hit 0, they should pull the updated docker when they reboot.


#18

excellent! I started using CP when Michael Sjaastad the co-developer of the discovery-1 mentioned it to me. Awesome to be part of pushing CP forward!


#19

no cake yet unfortunately… I restarted a few minutes ago and though it seems like maybe the pipeline is being opened there is a problem with LoadData (run_with_yield?) and accessing files (path is doubled).


#20

What’s the path in your LoadData CSV?