Message boards : CMS Application : Fast Computers
Message board moderation

To post messages, you must log in.

AuthorMessage
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2470 - Posted: 21 Mar 2016, 11:48:47 UTC

I repeat my post, as i did not get an answer.


I understand, that if a job is calculated faster than 20min, wait or sleep cycles are introduced.
I suggest to reduce that value, to not waste computing time of fast computers and accommodate the (hopefully)enabling of multi core operation.

As it is working very well with the current job length,i would rather not change the length.
However,i think it is important to allow for faster computers and multi threaded
operation.
ID: 2470 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,875,337
RAC: 216
Message 2499 - Posted: 22 Mar 2016, 12:14:25 UTC - in response to Message 2470.  

I repeat my post, as i did not get an answer.


I understand, that if a job is calculated faster than 20min, wait or sleep cycles are introduced.
I suggest to reduce that value, to not waste computing time of fast computers and accommodate the (hopefully)enabling of multi core operation.

As it is working very well with the current job length,i would rather not change the length.
However,i think it is important to allow for faster computers and multi threaded
operation.

Well, this is under some discussion in CMS groups at the moment. The rationale is to try to force real CRAB users to submit longer jobs -- longer jobs are better for the "real" GRID. Of course we are atypical and our thrust is in a different direction, getting jobs that are short enough and produce just enough output so as to minimise transfer timeouts. Compromises need to be made; I'm not familiar with WMAgent so I don't know if it will do the same when it's running.
BTW, this did "save" us a bit the other day when a Volunteer's host had a bad task that was erroring out immediately -- the only indication of what the error was, was "Memory Error" in the logs. These jobs then waited for the 20 mins before reporting and getting a new job, so he got 17 or 18 jobs at 20 minute intervals until the glidein stopped. I'm not sure how many he would have burned through if the wait wasn't there...
ID: 2499 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2500 - Posted: 22 Mar 2016, 12:27:52 UTC - in response to Message 2499.  

Thanks, Ivan.
I am aware, that is always a balancing act.

I'm not sure how many he would have burned through if the wait wasn't there...


Agreed,however, i was not proposing to turn the limit off, just reducing it from 20 min to maybe 10.
There are better ways of stopping "runaway" computers, but this is an other discussion.
I was hoping for multi-threaded mode to be enabled, sometime...
Volunteers are willing to throw more computing power at the project, but the can't.(Running multiple tasks instead is somewhat a waste)
ID: 2500 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 184
Message 2501 - Posted: 22 Mar 2016, 14:51:29 UTC - in response to Message 2470.  

I think that if the jobs fails sleep cycles are introduced.
ID: 2501 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 266
Message 2502 - Posted: 22 Mar 2016, 15:07:24 UTC

I've seen this in the short 25 event-jobs of Leonardo Christella and reported it here:

http://lhcathomedev.cern.ch/vLHCathome-dev/forum_thread.php?id=100&postid=2273#2273
ID: 2502 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : CMS Application : Fast Computers


©2024 CERN