Message boards : CMS Application : response problem?
Message board moderation

To post messages, you must log in.

AuthorMessage
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2606 - Posted: 8 Apr 2016, 15:17:15 UTC
Last modified: 8 Apr 2016, 15:19:44 UTC

04/08/16 17:12:17 (pid:8067) Response problem from schedd <130.246.180.120:9818?noUDP&sock=50946_f1aa> on ALIVE job 1064599.0.
04/08/16 17:12:22 (pid:8067) Response problem from schedd <130.246.180.120:9818?noUDP&sock=50946_f1aa> on ALIVE job 1064599.0.

I am gettig this message ever 5 sec in startd.log
ID: 2606 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,101
RAC: 116
Message 2607 - Posted: 8 Apr 2016, 15:35:58 UTC - in response to Message 2606.  

04/08/16 17:12:17 (pid:8067) Response problem from schedd <130.246.180.120:9818?noUDP&sock=50946_f1aa> on ALIVE job 1064599.0.
04/08/16 17:12:22 (pid:8067) Response problem from schedd <130.246.180.120:9818?noUDP&sock=50946_f1aa> on ALIVE job 1064599.0.

I am gettig this message ever 5 sec in startd.log

Hmm, mine is working OK, tho' the file is rather large. Could be a network or firewall problem. I'll let RAL know, in case there might be something their end.
ID: 2607 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,101
RAC: 116
Message 2608 - Posted: 8 Apr 2016, 15:45:12 UTC - in response to Message 2606.  

As of 1640 BST, job 164599.0 didn't exist in the queue. Is the problem persisting?
ID: 2608 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2610 - Posted: 8 Apr 2016, 15:54:17 UTC - in response to Message 2608.  
Last modified: 8 Apr 2016, 15:58:51 UTC

Yes, and it looks like another task is having problems as well.
Need to investigate.
If it is a firewall issue, why is it starting now after several succsessful boinc tasks?

EDIT: It uploaded the first job in this boinc task successfuly( job 6006) but can't get a new one.It is attempting to connect fo r45 min now.
ID: 2610 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,101
RAC: 116
Message 2612 - Posted: 8 Apr 2016, 16:14:17 UTC - in response to Message 2610.  

Yes, and it looks like another task is having problems as well.
Need to investigate.
If it is a firewall issue, why is it starting now after several succsessful boinc tasks?

EDIT: It uploaded the first job in this boinc task successfuly( job 6006) but can't get a new one.It is attempting to connect fo r45 min now.

I'd say abort it, as the job it's trying to connect to does not exist any more.
ID: 2612 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2613 - Posted: 8 Apr 2016, 16:19:16 UTC - in response to Message 2612.  

I reset the router---no change.
I disabled the firewall---no change.
Finally i disabled the network adapter and reenabled--- seem to work.

Very strange, i never had that before.

Sorry, to have botherd you.
ID: 2613 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,101
RAC: 116
Message 2617 - Posted: 9 Apr 2016, 10:28:15 UTC
Last modified: 9 Apr 2016, 10:28:55 UTC

Reply from RAL:

The job seemed to complete successfully (job has exit status 0 and shadow
has status 100), although the shadow did lose contact with the startd:

04/08/16 15:50:14 (1064599.0) (23142): CCBClient: received failure message from CCB server collector 130.246.180.120:9623 in response to request for reversed connection to <10.0.2.15:38071>:
CCB server rejecting request for ccbid 3100 because no daemon is currently registered with that id (perhaps it recently disconnected).
04/08/16 15:50:14 (1064599.0) (23142): Failed to reverse connect to <10.0.2.15:38071> via CCB.
04/08/16 15:50:14 (1064599.0) (23142): RemoteResource::killStarter(): Could not send command to startd
04/08/16 15:50:15 (1064599.0) (23142): Job 1064599.0 terminated: exited with status 0
04/08/16 15:50:15 (1064599.0) (23142): **** condor_shadow (condor_SHADOW) pid 23142 EXITING WITH STATUS 100


Those times would be BST (GMT+1).
ID: 2617 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2618 - Posted: 9 Apr 2016, 12:23:11 UTC - in response to Message 2617.  

Thanks, Ivan.
ID: 2618 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : CMS Application : response problem?


©2024 CERN