Message boards : News : Updated Job Agent
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 2215 - Posted: 4 Mar 2016, 11:14:35 UTC

The CMS job agent has been updated to add some additional protections. The VM will now shutdown if there are no more jobs, no output has been produced or if too many jobs fail.
ID: 2215 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2218 - Posted: 4 Mar 2016, 12:32:02 UTC - in response to Message 2215.  

too many jobs fail.


Could you put a number to that?
ID: 2218 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 2219 - Posted: 4 Mar 2016, 12:34:16 UTC - in response to Message 2218.  

If all the jobs in a single glidein (run) fail. So for a typical glidein (run) that has five jobs, then if all five fail, the VM will shutdown.
ID: 2219 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2224 - Posted: 4 Mar 2016, 13:59:52 UTC - in response to Message 2219.  
Last modified: 4 Mar 2016, 14:00:11 UTC

Thanks, Laurence.


...and error out the boinc-task, or just suspend it?
ID: 2224 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 2253 - Posted: 5 Mar 2016, 21:53:21 UTC - in response to Message 2215.  
Last modified: 5 Mar 2016, 21:54:00 UTC

The CMS job agent has been updated to add some additional protections. The VM will now shutdown if there are no more jobs, no output has been produced or if too many jobs fail.

Since the update no single BOINC-task is running longer than about 6 hours.
ID: 2253 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2254 - Posted: 5 Mar 2016, 22:01:09 UTC - in response to Message 2253.  

Since the update no single BOINC-task is running longer than about 6 hours.


Which equates roughly to one "run".
ID: 2254 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 2255 - Posted: 5 Mar 2016, 22:29:43 UTC - in response to Message 2254.  

Yes, so one of the checks is triggering but as the logging is broken we can't see which one.
ID: 2255 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 2258 - Posted: 6 Mar 2016, 8:17:27 UTC - in response to Message 2255.  

Yes, so one of the checks is triggering but as the logging is broken we can't see which one.

After the first run it just tells:

09:15:01 +0100 2016-03-06 [INFO] CMS glidein Run 1 ended
09:15:02 +0100 2016-03-06 [INFO] No more jobs. Shutting down!
ID: 2258 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 2267 - Posted: 8 Mar 2016, 10:18:03 UTC - in response to Message 2258.  

The logging should how be working again so all tasks logs should show the following output.

2016-03-08 08:50:02 (1847): Guest Log: [INFO] CMS glidein Run 1 ended
2016-03-08 08:50:09 (1847): Guest Log: Log extracts for Run 1 jobs
2016-03-08 08:50:10 (1847): Guest Log: [INFO] No more jobs. Shutting down!

As there no log files are shown, it looks like the issue is with locating them. Will investigate in more detail.
ID: 2267 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1139
Credit: 8,310,612
RAC: 728
Message 2268 - Posted: 8 Mar 2016, 14:12:35 UTC - in response to Message 2267.  

The logging should how be working again so all tasks logs should show the following output.

2016-03-08 08:50:02 (1847): Guest Log: [INFO] CMS glidein Run 1 ended
2016-03-08 08:50:09 (1847): Guest Log: Log extracts for Run 1 jobs
2016-03-08 08:50:10 (1847): Guest Log: [INFO] No more jobs. Shutting down!

As there no log files are shown, it looks like the issue is with locating them. Will investigate in more detail.

The log files look to have reappeared, but the tasks seem to still stop after just one glidein: http://boincai05.cern.ch/CMS-dev//result.php?resultid=116494
ID: 2268 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 2271 - Posted: 8 Mar 2016, 14:39:36 UTC - in response to Message 2268.  

So we are back to where we started but with running for 6 hours rather than 24 due to a bad check. I think I understand what is happening but want to take my time over the fix and get it right as there is no urgency.
ID: 2271 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : Updated Job Agent


©2024 CERN