Message boards :
CMS Application :
response problem?
Message board moderation
Author | Message |
---|---|
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
04/08/16 17:12:17 (pid:8067) Response problem from schedd <130.246.180.120:9818?noUDP&sock=50946_f1aa> on ALIVE job 1064599.0. 04/08/16 17:12:22 (pid:8067) Response problem from schedd <130.246.180.120:9818?noUDP&sock=50946_f1aa> on ALIVE job 1064599.0. I am gettig this message ever 5 sec in startd.log |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,874,101 RAC: 116 |
04/08/16 17:12:17 (pid:8067) Response problem from schedd <130.246.180.120:9818?noUDP&sock=50946_f1aa> on ALIVE job 1064599.0. Hmm, mine is working OK, tho' the file is rather large. Could be a network or firewall problem. I'll let RAL know, in case there might be something their end. |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,874,101 RAC: 116 |
As of 1640 BST, job 164599.0 didn't exist in the queue. Is the problem persisting? |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Yes, and it looks like another task is having problems as well. Need to investigate. If it is a firewall issue, why is it starting now after several succsessful boinc tasks? EDIT: It uploaded the first job in this boinc task successfuly( job 6006) but can't get a new one.It is attempting to connect fo r45 min now. |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,874,101 RAC: 116 |
Yes, and it looks like another task is having problems as well. I'd say abort it, as the job it's trying to connect to does not exist any more. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I reset the router---no change. I disabled the firewall---no change. Finally i disabled the network adapter and reenabled--- seem to work. Very strange, i never had that before. Sorry, to have botherd you. |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,874,101 RAC: 116 |
Reply from RAL: The job seemed to complete successfully (job has exit status 0 and shadow has status 100), although the shadow did lose contact with the startd: 04/08/16 15:50:14 (1064599.0) (23142): CCBClient: received failure message from CCB server collector 130.246.180.120:9623 in response to request for reversed connection to <10.0.2.15:38071>: CCB server rejecting request for ccbid 3100 because no daemon is currently registered with that id (perhaps it recently disconnected). 04/08/16 15:50:14 (1064599.0) (23142): Failed to reverse connect to <10.0.2.15:38071> via CCB. 04/08/16 15:50:14 (1064599.0) (23142): RemoteResource::killStarter(): Could not send command to startd 04/08/16 15:50:15 (1064599.0) (23142): Job 1064599.0 terminated: exited with status 0 04/08/16 15:50:15 (1064599.0) (23142): **** condor_shadow (condor_SHADOW) pid 23142 EXITING WITH STATUS 100 Those times would be BST (GMT+1). |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Thanks, Ivan. |
©2024 CERN