Message boards : News : Out Of Jobs
Message board moderation
| Author | Message |
|---|---|
Laurence CERN![]() Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 0 |
We are out of jobs and are fighting will a few other issues. I have stopped new tasks being sent for now. A feature to handle this situation more gracefully is on the work plan. I hope that we can be back running after the weekend. |
ivanSend message Joined: 20 Jan 15 Posts: 1154 Credit: 8,341,466 RAC: 943 |
|
|
Send message Joined: 13 Feb 15 Posts: 1257 Credit: 1,014,861 RAC: 129 |
BOINC-tasks ready to send 0 After work request BOINC tells: CMS-dev 26 Feb 12:54:46 Project has no tasks available No jobs can be picked up by new requests. 73 BOINC-tasks in progress should be enough to drain your tiny well of 100 jobs. Have 1 job running in my standalone CMS-VM outside of BOINC. |
ivanSend message Joined: 20 Jan 15 Posts: 1154 Credit: 8,341,466 RAC: 943 |
|
|
Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 7 |
What is the status? Are we "go" again? What was wrong and what happens to the other batches? |
ivanSend message Joined: 20 Jan 15 Posts: 1154 Credit: 8,341,466 RAC: 943 |
What is the status?Both our CRAB3 Condor server and I myself seem to be back on our feet. Are we "go" again?Well, we seem to have some users who have stuck around. Laurence has started tasks flowing again from this site, there are a couple of tasks still running from vLHC but they don't seem to have started our jobs there again What was wrong and what happens to the other batches?I'm not sure exactly what all the problems were. Time-outs were being adjusted to assist with suspend/resume, we did run out of disk space briefly, and it's likely that an older batch also ran out of proxy time. In the middle of all this we were all busy patching everything Linux for the glibc vulnerability that CMS (and possibly the rest of CERN, WLCG, etc) had put a deadline of February 24th to fix; that seems to have adversely affected some servers and VMs. We seem to have lost the spool directory for the 160219 batch; the 160216 batch is still there but not active -- I've given it a new proxy but it doesn't seem to be even twitching. Currently Condor only knows the batches I submitted today and a couple of zombies from last week that don't want to die properly. Usual problem of too many things going on at once, I suppose. Let's give it a few days and see if we keep recovering. |
|
Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 7 |
Thanks, Ivan. That is the kind of thing, that we need to hear every once in a while, instead of dead silence. |
ivanSend message Joined: 20 Jan 15 Posts: 1154 Credit: 8,341,466 RAC: 943 |
|
|
Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 7 |
Unfortunately, I didn't take my modem into my sick-bed with me. :-) I did not mean you.There are others, or not? |
|
Send message Joined: 20 Mar 15 Posts: 243 Credit: 901,716 RAC: 0 |
[quote]What is the status?Both our CRAB3 Condor server and I myself seem to be back on our feet. Are we "go" again? Started a box up, the "stuck" job resumed, the next glidein run picked up a job (159 from the last batch) and all is sweetness and light once again. |
ivanSend message Joined: 20 Jan 15 Posts: 1154 Credit: 8,341,466 RAC: 943 |
|
©2025 CERN