Message boards :
CMS Application :
Dip?
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 · Next
Author | Message |
---|---|
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
Small outage -- looks like my window onto WMAgent status was telling me lies and we ran out of jobs. New batch submitted so hopefully up again soon. |
Send message Joined: 28 Jul 16 Posts: 485 Credit: 394,839 RAC: 0 |
|
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
What an impressive boost. Yes, I think it's some kind of Dashboard artefact unfortunately. The spike in "unknown status" jobs about that time might be the 24-hour time-out "echo" from the problems yesterday afternoon. Nevertheless, it did give me the opportunity to report, at a CERN computing meeting this afternoon, that we had reached a record number of CMS@Home jobs. That was before it fell back again, of course... |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
No Jobs? The bad thing about this is, that boinc-tasks error out, ruining the quota. Therefore, when jobs are available again, the quota is used up and no tasks can be run for another 24h. Maybe it is a good time to address this issue. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
No Jobs? Sorry, the WMAgent server has died. I've notified my CERN contacts. I'm well aware of the quota issue, it affects me too. I'll bring it up with CERN IT again. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Thanks for the reply, Ivan. I just thought, i mention it. If the boinc-task could be made to not produce an error( send shutdown file to the "shared" folder), when no jobs available, the problem would be solved. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
Thanks for the reply, Ivan. Indeed, I'm just not sure how hard it would be to implement it, or what potential side-effects need to be guarded against. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
Ah! The server is showing green again. It may take some minutes before jobs start again, we are just at the transition from one batch to another. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Good news. Is there a way to view the results? You posted a link, a little while ago, but i could not get get it to work. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
Good news. I've not found the WMAgent jobs in Dashboard directly, like we could with the CRAB jobs. If you find one of the Dashboard portals that doesn't need CMS credentials, you can look for jobs at the Tier-3 site T3_CH_Volunteer. (Unfortunately my credentials are almost always loaded in my browser so it's hard to tell which portals are public.) The WMStatus and cluster monitoring tools that I use do need CMS credentials. Just for giggles, see if you can view this site. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 874,807 RAC: 1,175 |
I've not found the WMAgent jobs in Dashboard directly, like we could with the CRAB jobs. If you find one of the Dashboard portals that doesn't need CMS credentials I can find my running job 9ed10698-0bcf-11e7-94b5-02163e018309-380_0 from your batch wmagent_ireid_MonteCarlo_eff_IDR_CMS_Home_170317_140431_8234 on the dashboard. It's not instantly, but a few hours later. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
Here we go again -- something else in WMAgent has died. Set No New Tasks to protect your daily quota. :-( |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
The problem has been traced to an authentication certificate becoming invalid, for reasons as yet unknown. CERN IT are working on it. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
CMS@Home jobs are available again. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Here we go again -- something else in WMAgent has died. Set No New Tasks to protect your daily quota. :-( This is a nice idea---if you catch it, before the quota is used up. I really think, that needs to be fixed for good. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
Here we go again -- something else in WMAgent has died. Set No New Tasks to protect your daily quota. :-( Yes, I've raised it again with CERN this week. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Thanks,Ivan. BTW. The running jobs graph is getting more and more "spiky". |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
Thanks,Ivan. Yes, I'm not sure exactly what's causing it. It's probably some kind of bottleneck causing a "relaxation oscillator" effect, if you've ever studied those in electronics or elsewhere. It's actually made to look worse on the Dashboard plots because of the time binning. Using a CERN tool with finer binning (5 mins instead of 1 hour) it's rather smoother. Our jobs run as user cmst1. There's a hint that the amplitude dampens out over time. |
Send message Joined: 8 Apr 15 Posts: 782 Credit: 12,478,477 RAC: 4,357 |
Yes that relaxation oscillator has been around for about 100 years. (vacuum tube Abraham-Bloch multivibrator relaxation oscillator 1920) I remember way back in my younger days with the old 555 timer IC It has been used for many things over the years. Mad Scientist For Life |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
WMAgent has died again. Set no new tasks if you can... |
©2025 CERN