Message boards :
CMS Application :
Fast Computers
Message board moderation
Author | Message |
---|---|
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I repeat my post, as i did not get an answer. I understand, that if a job is calculated faster than 20min, wait or sleep cycles are introduced. |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,944,443 RAC: 3,276 |
I repeat my post, as i did not get an answer. Well, this is under some discussion in CMS groups at the moment. The rationale is to try to force real CRAB users to submit longer jobs -- longer jobs are better for the "real" GRID. Of course we are atypical and our thrust is in a different direction, getting jobs that are short enough and produce just enough output so as to minimise transfer timeouts. Compromises need to be made; I'm not familiar with WMAgent so I don't know if it will do the same when it's running. BTW, this did "save" us a bit the other day when a Volunteer's host had a bad task that was erroring out immediately -- the only indication of what the error was, was "Memory Error" in the logs. These jobs then waited for the 20 mins before reporting and getting a new job, so he got 17 or 18 jobs at 20 minute intervals until the glidein stopped. I'm not sure how many he would have burned through if the wait wasn't there... |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Thanks, Ivan. I am aware, that is always a balancing act. I'm not sure how many he would have burned through if the wait wasn't there... Agreed,however, i was not proposing to turn the limit off, just reducing it from 20 min to maybe 10. There are better ways of stopping "runaway" computers, but this is an other discussion. I was hoping for multi-threaded mode to be enabled, sometime... Volunteers are willing to throw more computing power at the project, but the can't.(Running multiple tasks instead is somewhat a waste) |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 329,589 RAC: 107 |
I think that if the jobs fails sleep cycles are introduced. |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 849,977 RAC: 1,466 |
I've seen this in the short 25 event-jobs of Leonardo Christella and reported it here: http://lhcathomedev.cern.ch/vLHCathome-dev/forum_thread.php?id=100&postid=2273#2273 |
©2024 CERN