Message boards : CMS Application : Dip?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 4787 - Posted: 7 Mar 2017, 10:40:54 UTC

Small outage -- looks like my window onto WMAgent status was telling me lies and we ran out of jobs. New batch submitted so hopefully up again soon.
ID: 4787 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 473
Credit: 389,411
RAC: 34
Message 4798 - Posted: 15 Mar 2017, 13:02:36 UTC

What an impressive boost.
Looks like an "anti DIP".
:-)
ID: 4798 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 4799 - Posted: 15 Mar 2017, 19:04:07 UTC - in response to Message 4798.  

What an impressive boost.
Looks like an "anti DIP".
:-)

Yes, I think it's some kind of Dashboard artefact unfortunately. The spike in "unknown status" jobs about that time might be the 24-hour time-out "echo" from the problems yesterday afternoon. Nevertheless, it did give me the opportunity to report, at a CERN computing meeting this afternoon, that we had reached a record number of CMS@Home jobs. That was before it fell back again, of course...
ID: 4799 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 4801 - Posted: 18 Mar 2017, 7:17:05 UTC
Last modified: 18 Mar 2017, 7:23:58 UTC

No Jobs?


The bad thing about this is, that boinc-tasks error out, ruining the quota.

Therefore, when jobs are available again, the quota is used up and no tasks can be run for another 24h.

Maybe it is a good time to address this issue.
ID: 4801 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 4802 - Posted: 18 Mar 2017, 10:46:35 UTC - in response to Message 4801.  

No Jobs?


The bad thing about this is, that boinc-tasks error out, ruining the quota.

Therefore, when jobs are available again, the quota is used up and no tasks can be run for another 24h.

Maybe it is a good time to address this issue.

Sorry, the WMAgent server has died. I've notified my CERN contacts.
I'm well aware of the quota issue, it affects me too. I'll bring it up with CERN IT again.
ID: 4802 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 4803 - Posted: 18 Mar 2017, 11:06:23 UTC

Thanks for the reply, Ivan.
I just thought, i mention it.
If the boinc-task could be made to not produce an error( send shutdown file to the "shared" folder), when no jobs available, the problem would be solved.
ID: 4803 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 4804 - Posted: 18 Mar 2017, 11:42:28 UTC - in response to Message 4803.  

Thanks for the reply, Ivan.
I just thought, i mention it.
If the boinc-task could be made to not produce an error( send shutdown file to the "shared" folder), when no jobs available, the problem would be solved.

Indeed, I'm just not sure how hard it would be to implement it, or what potential side-effects need to be guarded against.
ID: 4804 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 4805 - Posted: 18 Mar 2017, 11:46:05 UTC

Ah! The server is showing green again. It may take some minutes before jobs start again, we are just at the transition from one batch to another.
ID: 4805 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 4806 - Posted: 18 Mar 2017, 11:56:40 UTC - in response to Message 4805.  
Last modified: 18 Mar 2017, 11:57:02 UTC

Good news.
Is there a way to view the results?
You posted a link, a little while ago, but i could not get get it to work.
ID: 4806 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 4807 - Posted: 18 Mar 2017, 12:08:20 UTC - in response to Message 4806.  

Good news.
Is there a way to view the results?
You posted a link, a little while ago, but i could not get get it to work.

I've not found the WMAgent jobs in Dashboard directly, like we could with the CRAB jobs. If you find one of the Dashboard portals that doesn't need CMS credentials, you can look for jobs at the Tier-3 site T3_CH_Volunteer. (Unfortunately my credentials are almost always loaded in my browser so it's hard to tell which portals are public.) The WMStatus and cluster monitoring tools that I use do need CMS credentials.
Just for giggles, see if you can view this site.
ID: 4807 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 238
Message 4808 - Posted: 18 Mar 2017, 14:26:53 UTC - in response to Message 4807.  

I've not found the WMAgent jobs in Dashboard directly, like we could with the CRAB jobs. If you find one of the Dashboard portals that doesn't need CMS credentials

I can find my running job 9ed10698-0bcf-11e7-94b5-02163e018309-380_0 from your
batch wmagent_ireid_MonteCarlo_eff_IDR_CMS_Home_170317_140431_8234 on the dashboard.
It's not instantly, but a few hours later.
ID: 4808 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 4811 - Posted: 22 Mar 2017, 21:13:14 UTC

Here we go again -- something else in WMAgent has died. Set No New Tasks to protect your daily quota. :-(
ID: 4811 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 4812 - Posted: 23 Mar 2017, 8:41:39 UTC

The problem has been traced to an authentication certificate becoming invalid, for reasons as yet unknown. CERN IT are working on it.
ID: 4812 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 4814 - Posted: 23 Mar 2017, 12:09:29 UTC - in response to Message 4812.  

CMS@Home jobs are available again.
ID: 4814 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 4815 - Posted: 23 Mar 2017, 22:07:03 UTC

Here we go again -- something else in WMAgent has died. Set No New Tasks to protect your daily quota. :-(


This is a nice idea---if you catch it, before the quota is used up.

I really think, that needs to be fixed for good.
ID: 4815 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 4816 - Posted: 24 Mar 2017, 8:54:50 UTC - in response to Message 4815.  

Here we go again -- something else in WMAgent has died. Set No New Tasks to protect your daily quota. :-(


This is a nice idea---if you catch it, before the quota is used up.

I really think, that needs to be fixed for good.

Yes, I've raised it again with CERN this week.
ID: 4816 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 4817 - Posted: 24 Mar 2017, 9:20:47 UTC - in response to Message 4816.  
Last modified: 24 Mar 2017, 9:23:36 UTC

Thanks,Ivan.

BTW. The running jobs graph is getting more and more "spiky".
ID: 4817 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 4818 - Posted: 24 Mar 2017, 14:24:37 UTC - in response to Message 4817.  

Thanks,Ivan.

BTW. The running jobs graph is getting more and more "spiky".

Yes, I'm not sure exactly what's causing it. It's probably some kind of bottleneck causing a "relaxation oscillator" effect, if you've ever studied those in electronics or elsewhere. It's actually made to look worse on the Dashboard plots because of the time binning. Using a CERN tool with finer binning (5 mins instead of 1 hour) it's rather smoother. Our jobs run as user cmst1. There's a hint that the amplitude dampens out over time.


ID: 4818 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 751
Credit: 11,609,314
RAC: 1,490
Message 4819 - Posted: 24 Mar 2017, 20:31:48 UTC - in response to Message 4818.  

Yes that relaxation oscillator has been around for about 100 years.
(vacuum tube Abraham-Bloch multivibrator relaxation oscillator 1920)

I remember way back in my younger days with the old 555 timer IC
It has been used for many things over the years.
Mad Scientist For Life
ID: 4819 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 4825 - Posted: 2 Apr 2017, 12:01:08 UTC

WMAgent has died again. Set no new tasks if you can...
ID: 4825 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 · Next

Message boards : CMS Application : Dip?


©2024 CERN