Message boards : News : New plot for the jobs stats
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 0
Message 1834 - Posted: 3 Feb 2016, 15:30:56 UTC

A new plot has been added to the CMS job stats page. It shows the wallclock consumption for successful and failed jobs. This is a better indicator than using the number of jobs as it is not affected by job length.
ID: 1834 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1835 - Posted: 3 Feb 2016, 16:34:30 UTC - in response to Message 1834.  

Is the wall clock not the length of time jobs take?
How can this not be affected by job length?
ID: 1835 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 859,751
RAC: 25
Message 1836 - Posted: 3 Feb 2016, 16:58:36 UTC - in response to Message 1835.  

Is the wall clock not the length of time jobs take?
How can this not be affected by job length?

Every job is reporting the elapsed wallclock (and also cpu) time - e.g. Time report complete in 3959.85 seconds
When a job is running longer due to other type of events or more events per job,
the reported time will increase, but still coming from 1 job.

The number of seconds used by the jobs returned within 1 per hour is more informative, cause the types of jobs differs too much to show the crunching power per hour.
ID: 1836 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 0
Message 1837 - Posted: 3 Feb 2016, 17:05:47 UTC - in response to Message 1835.  

If you show the number of jobs done per hour, if the jobs take 30 mins or 2 hours the plot will show 4 times the more for the 30min jobs. This will give the impression that resource capacity is fluctuating depending on the job length. However if we show the walltime used per hour, it will not fluctuate with different jobs lengths.

Also in comparing failures with success, we care more about walltime lost. A job failing at the start is better than one failing at the end after all the precious CPU time has been consumed.
ID: 1837 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1838 - Posted: 3 Feb 2016, 17:33:25 UTC

As an example:
It is reporting "1 job" and "3600sec".
If it is a longer job it reports "1 job" and "7200sec".
Yes, the second one reports later, but is also larger in magnitude in terms of walltime, but jobs are increased but just one.

Also in comparing failures with success, we care more about walltime lost. A job failing at the start is better than one failing at the end after all the precious CPU time has been consumed.



That makes sense.
ID: 1838 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1139
Credit: 8,181,211
RAC: 2,023
Message 3600 - Posted: 22 Jun 2016, 20:26:19 UTC

I've made a subtle change to the CMS job plots. Instead of the "running jobs" plot being sorted by "site" (which didn't add much, since it only plots results for the T3_CH_Volunteer pseudo-site), I've selected sort by "submission tool", because we're trying to continue Hassen's work and get WMAgent submission to work, to replace the time-intensive (to me...) CRAB3 submissions.
Unfortunately, at the moment Dashboard reports our WMAgent submissions as "unknown", but knowing that you can now see what the split is between CRAB3 and WMAgent jobs. If the ratio changes (I believe that ideally it should be 50-50 if both are working) it should give a better handle on where the problem, if any, might lie.
I believe, but I may be wrong, that what is currently reporting as "unknown" is a backlog of jobs Hassen submitted, freed by changes made in the last two days.
ID: 3600 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : New plot for the jobs stats


©2024 CERN