Message boards : CMS Application : Possible disruption in the next several hours
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1128
Credit: 7,870,419
RAC: 595
Message 5028 - Posted: 3 Jul 2017, 18:15:01 UTC

mea culpa! I realised today that I'd accidentally typed one zero too many in the WMAgent request for the current batch, and launched ten times too many jobs! Alan tells me this could overload the agent, so I've submitted a "normal" batch and have set this one to "force-complete". This will clear out its queue, but I don't know exactly what effect it will have on currently-running jobs.
So, there may be some jobs report as failed, or otherwise faulty, but once the tasks start picking up jobs from the new batch it should all clear up. My apologies, I hope it's not too traumatic.
ID: 5028 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1128
Credit: 7,870,419
RAC: 595
Message 5029 - Posted: 3 Jul 2017, 18:48:07 UTC - in response to Message 5028.  

OK, the new batch has started queueing. There was a hiatus of about 35 minutes with no jobs in the queue.
ID: 5029 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1128
Credit: 7,870,419
RAC: 595
Message 5034 - Posted: 4 Jul 2017, 12:46:22 UTC
Last modified: 4 Jul 2017, 13:25:45 UTC

There will be an upgrade to the HTCondor schedd this afternoon. I'm told it should make no significant disturbance, but be warned...
[Added] Done, with no problems seen. [/Added]
ID: 5034 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1128
Credit: 7,870,419
RAC: 595
Message 5038 - Posted: 4 Jul 2017, 17:13:25 UTC

Oops, something has gone wrong now. There is a failure in the WMAgent -- it still says there are jobs available, but other monitors and Dashboard say that they have run out.
Suggest you set No New Tasks until I round up the CERN posse.
ID: 5038 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1128
Credit: 7,870,419
RAC: 595
Message 5039 - Posted: 4 Jul 2017, 19:18:28 UTC - in response to Message 5038.  

Problem fixed & jobs in the queue. Time to restart.
ID: 5039 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : CMS Application : Possible disruption in the next several hours


©2024 CERN