Message boards : CMS Application : Probable job interruption
Message board moderation

To post messages, you must log in.

AuthorMessage
Profileivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1152
Credit: 8,310,612
RAC: 0
Message 9001 - Posted: 18 Aug 2025, 10:41:59 UTC

WMCore want to update our WMAgent, but the current workflow won't end for over a week, So, we'll have to force-stop the w/f which will mean running tasks will be lost. I'll try to give at least a day's notice of this, so be prepared to set No New Tasks when I have a deadline to work with.
ID: 9001 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profileivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1152
Credit: 8,310,612
RAC: 0
Message 9002 - Posted: 20 Aug 2025, 13:14:55 UTC - in response to Message 9001.  

OK, I've told WMCore that they can shut down our workflow for the upgrade anytime after 1200 tomorrow (CERN time). Please set No New Tasks ASAP.
I'll let you know when it's safe to go back in the water again.
ID: 9002 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profileivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1152
Credit: 8,310,612
RAC: 0
Message 9004 - Posted: 22 Aug 2025, 14:51:24 UTC - in response to Message 9002.  

No action yet from WMCore, and many people are still running tasks, it seems, This workflow will probably finish over the weekend. I'll start injecting smaller workflows then, so as not to have too much interruption when the intervention does finally occur.
ID: 9004 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 834
Credit: 15,340,685
RAC: 13,177
Message 9006 - Posted: 30 Aug 2025, 8:14:46 UTC - in response to Message 9004.  

It looks like CMS is taking the weekend off again Ivan
Mad Scientist For Life
ID: 9006 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 834
Credit: 15,340,685
RAC: 13,177
Message 9009 - Posted: 30 Aug 2025, 19:57:24 UTC

Back to work again at 30 Aug 2025, 14:12:25 UTC
ID: 9009 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profileivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1152
Credit: 8,310,612
RAC: 0
Message 9010 - Posted: 3 Sep 2025, 13:12:39 UTC - in response to Message 9006.  

More like me... I'm submitting smaller workflows to minimise disruption when the update happens, bur jobs draining don't necessarily happen when I'm awake!
ID: 9010 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profileivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1152
Credit: 8,310,612
RAC: 0
Message 9011 - Posted: 3 Sep 2025, 14:40:36 UTC - in response to Message 9004.  

In reply to ivan's message of 22 Aug 2025:
No action yet from WMCore, and many people are still running tasks, it seems, This workflow will probably finish over the weekend. I'll start injecting smaller workflows then, so as not to have too much interruption when the intervention does finally occur.

It looks like this will finally happen tomorrow night (CET). I'm letting the queues drain.
ID: 9011 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 834
Credit: 15,340,685
RAC: 13,177
Message 9012 - Posted: 4 Sep 2025, 0:04:02 UTC

Well I take a day off from checking here and first thing I see is 42 crashed CMS tasks so I'm glad I only had one host running here
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3487946

Same thing happened over at production about a week ago
Mad Scientist For Life
ID: 9012 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profileivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1152
Credit: 8,310,612
RAC: 0
Message 9022 - Posted: 8 Sep 2025, 11:02:48 UTC - in response to Message 9012.  

There was a problem with the Certificate Authority (which checks to ensure certificates are valid and/or issues new certificates). The CA has been made aware of the problem -- and we have jobs running again, so someone fixed it!
For the avoidance of doubt, you can now set "allow new tasks" again, if you want.
ID: 9022 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 834
Credit: 15,340,685
RAC: 13,177
Message 9026 - Posted: 8 Sep 2025, 19:53:04 UTC - in response to Message 9022.  

Thanks Ivan........I'm under the weather so just getting up and started a new CMS host here
ID: 9026 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profileivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1152
Credit: 8,310,612
RAC: 0
Message 9027 - Posted: 8 Sep 2025, 20:53:33 UTC - in response to Message 9026.  

In reply to Magic Quantum Mechanic's message of 8 Sep 2025:
Thanks Ivan........I'm under the weather so just getting up and started a new CMS host here

Aww, get well soon. Tasty little box, I'm tempted to blow some of my pension lump-sum on something more modern...
ID: 9027 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 834
Credit: 15,340,685
RAC: 13,177
Message 9028 - Posted: 8 Sep 2025, 23:43:56 UTC - in response to Message 9027.  

Aww, get well soon. Tasty little box, I'm tempted to blow some of my pension lump-sum on something more modern...

Many thanks Ivan
Hope all is well

I have several other hosts running the CMS over at production again too
Yes these days we sure can get tempted with the newer processors and ram but I try to get the ones being replaced by the newest and most expensive versions
Mad Scientist For Life
ID: 9028 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 834
Credit: 15,340,685
RAC: 13,177
Message 9146 - Posted: 25 Sep 2025, 3:55:08 UTC

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3530216

Looks like the weekend started early again

No new work left and the ones we have are just doing the 30 minute run of doing nothing

Hope we can get this back running along with the production CMS
Mad Scientist For Life
ID: 9146 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : CMS Application : Probable job interruption


©2025 CERN