Message boards :
News :
Updated Job Agent
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
The CMS job agent has been updated to add some additional protections. The VM will now shutdown if there are no more jobs, no output has been produced or if too many jobs fail. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
too many jobs fail. Could you put a number to that? |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
If all the jobs in a single glidein (run) fail. So for a typical glidein (run) that has five jobs, then if all five fail, the VM will shutdown. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Thanks, Laurence. ...and error out the boinc-task, or just suspend it? |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
The CMS job agent has been updated to add some additional protections. The VM will now shutdown if there are no more jobs, no output has been produced or if too many jobs fail. Since the update no single BOINC-task is running longer than about 6 hours. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Since the update no single BOINC-task is running longer than about 6 hours. Which equates roughly to one "run". |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Yes, so one of the checks is triggering but as the logging is broken we can't see which one. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
Yes, so one of the checks is triggering but as the logging is broken we can't see which one. After the first run it just tells: 09:15:01 +0100 2016-03-06 [INFO] CMS glidein Run 1 ended 09:15:02 +0100 2016-03-06 [INFO] No more jobs. Shutting down! |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
The logging should how be working again so all tasks logs should show the following output. 2016-03-08 08:50:02 (1847): Guest Log: [INFO] CMS glidein Run 1 ended 2016-03-08 08:50:09 (1847): Guest Log: Log extracts for Run 1 jobs 2016-03-08 08:50:10 (1847): Guest Log: [INFO] No more jobs. Shutting down! As there no log files are shown, it looks like the issue is with locating them. Will investigate in more detail. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 728 |
The logging should how be working again so all tasks logs should show the following output. The log files look to have reappeared, but the tasks seem to still stop after just one glidein: http://boincai05.cern.ch/CMS-dev//result.php?resultid=116494 |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
So we are back to where we started but with running for 6 hours rather than 24 due to a bad check. I think I understand what is happening but want to take my time over the fix and get it right as there is no urgency. |
©2024 CERN