Message boards :
News :
New plot for the jobs stats
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
A new plot has been added to the CMS job stats page. It shows the wallclock consumption for successful and failed jobs. This is a better indicator than using the number of jobs as it is not affected by job length. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Is the wall clock not the length of time jobs take? How can this not be affected by job length? |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 859,751 RAC: 25 |
Is the wall clock not the length of time jobs take? Every job is reporting the elapsed wallclock (and also cpu) time - e.g. Time report complete in 3959.85 seconds When a job is running longer due to other type of events or more events per job, the reported time will increase, but still coming from 1 job. The number of seconds used by the jobs returned within 1 per hour is more informative, cause the types of jobs differs too much to show the crunching power per hour. |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
If you show the number of jobs done per hour, if the jobs take 30 mins or 2 hours the plot will show 4 times the more for the 30min jobs. This will give the impression that resource capacity is fluctuating depending on the job length. However if we show the walltime used per hour, it will not fluctuate with different jobs lengths. Also in comparing failures with success, we care more about walltime lost. A job failing at the start is better than one failing at the end after all the precious CPU time has been consumed. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
As an example: It is reporting "1 job" and "3600sec". If it is a longer job it reports "1 job" and "7200sec". Yes, the second one reports later, but is also larger in magnitude in terms of walltime, but jobs are increased but just one. Also in comparing failures with success, we care more about walltime lost. A job failing at the start is better than one failing at the end after all the precious CPU time has been consumed. That makes sense. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,181,211 RAC: 2,023 |
I've made a subtle change to the CMS job plots. Instead of the "running jobs" plot being sorted by "site" (which didn't add much, since it only plots results for the T3_CH_Volunteer pseudo-site), I've selected sort by "submission tool", because we're trying to continue Hassen's work and get WMAgent submission to work, to replace the time-intensive (to me...) CRAB3 submissions. Unfortunately, at the moment Dashboard reports our WMAgent submissions as "unknown", but knowing that you can now see what the split is between CRAB3 and WMAgent jobs. If the ratio changes (I believe that ideally it should be 50-50 if both are working) it should give a better handle on where the problem, if any, might lie. I believe, but I may be wrong, that what is currently reporting as "unknown" is a backlog of jobs Hassen submitted, freed by changes made in the last two days. |
©2024 CERN