InfoMessage
1) Message boards : CMS Application : Upcoming disruptions to CERN infrastructure
Message 8847
Posted 20 days ago by Profileivan
Intervention completed, new workflow submitted.
2) Message boards : CMS Application : Upcoming disruptions to CERN infrastructure
Message 8841
Posted 23 days ago by Profileivan
We're trying to schedule an intervention on the WMAgent for Monday morning, so expect job availability to drop off on Sunday night.
3) Message boards : CMS Application : Upcoming disruptions to CERN infrastructure
Message 8834
Posted 27 days ago by Profileivan
In reply to ivan's message of 14 Jun 2025:
Job shortage

Provlem fixed!
4) Message boards : CMS Application : Upcoming disruptions to CERN infrastructure
Message 8833
Posted 29 days ago by Profileivan
Job shortage
5) Message boards : CMS Application : Upcoming disruptions to CERN infrastructure
Message 8798
Posted 8 May 2025 by Profileivan
Intervention is over and I've submitted a new batch of jobs. They should be available soon.
6) Message boards : CMS Application : Upcoming disruptions to CERN infrastructure
Message 8797
Posted 7 May 2025 by Profileivan
We need to upgrade the WMAgent, but we can't wait for the current workflow to finish. So, we will force-stop the workflow tomorrow (Thursday 08/05). Please set your machines to "No new tasks" to give current tasks a chance to finish before the stoppage.
7) Message boards : CMS Application : Failures to contact CMS-Factory
Message 8496
Posted 10 Sep 2024 by Profileivan
[Once again, I can't post to the main project!]
[Hmm, maybe it doesn't like my emoticons!!]

Is anyone else having their tasks fail after just a few minutes due to a failure to contact the CMS-Factory? It's happening increasingly for me, but 've not seen it for a few other volunteers I've studied -- and no-one else is complaining at the moment!

2024-09-10 06:01:15 (12611): Guest Log: [INFO] Testing connection to EOSCMS
2024-09-10 06:01:15 (12611): Guest Log: [INFO] Testing connection to CMS-Factory
2024-09-10 06:01:30 (12611): Guest Log: [DEBUG] Status run 1 of up to 3: 1
2024-09-10 06:01:53 (12611): Guest Log: [DEBUG] Status run 2 of up to 3: 1
2024-09-10 06:02:23 (12611): Guest Log: [DEBUG] Status run 3 of up to 3: 1
...
2024-09-10 06:02:23 (12611): Guest Log: libnsock nsock_trace_handler_callback(): Callback: CONNECT ERROR [Network is unreachable (101)] for EID 16 [2001:1458:d00:17::13:80]
2024-09-10 06:02:23 (12611): Guest Log: Ncat: Network is unreachable.
2024-09-10 06:02:23 (12611): Guest Log: [ERROR] Could not connect to vocms0205.cern.ch on port 80

I'm having some real problems with a new security regime at my institution, so this could possibly be a manifestation of that.😒
8) Message boards : CMS Application : Problems posting to CMS@Home
Message 8486
Posted 14 Jul 2024 by Profileivan
I've just had problems posting this message to the main CMS@Home message board. Maybe summer has truly arrived in Switzerland...

Unfortunately, I'm still having problems submitting jobs. The website I use to check progress, https://cmsweb-testbed.cern.ch/wmstats/index.html, is currently returning garbage. I submitted a new batch yesterday, which seemed to go well:
2024-07-13 15:39:06,179:INFO:reqmgr2: Assign succeeded.
2024-07-13 15:39:06,188:INFO:inject-test-wfs: TC_Backfill.json request successfully created!
2024-07-13 15:39:06,188:INFO:inject-test-wfs:
Injected 1 workflows out of 1 templates in 1.13 secs. Good job!
but looking at the condor queue, I don't see any new jobs:
 condor_q -name vocms267.cern.ch -pool vocms0840.cern.ch cmst1
-- Schedd: vocms267.cern.ch : <188.185.64.105:4080?... @ 07/14/24 01:56:04
OWNER BATCH_NAME    SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
cmst1 ID: 81       7/12 13:56    199      1      _    200 81.172
cmst1 ID: 83       7/12 13:56    199      1      _    200 83.88
cmst1 ID: 84       7/12 13:56    195      5      _    200 84.55-138
cmst1 ID: 85       7/12 14:03    196      4      _    200 85.83-149
cmst1 ID: 86       7/12 14:03    192      8      _    200 86.3-166
cmst1 ID: 87       7/12 14:03    179     21      _    200 87.8-195
cmst1 ID: 88       7/12 14:03    135     65      _    200 88.4-197
cmst1 ID: 89       7/12 14:03     57    140      3    200 89.3-199
cmst1 ID: 90       7/12 14:12      _      _     25     25 90.0-24

Total for query: 273 jobs; 0 completed, 0 removed, 28 idle, 245 running, 0 held, 0 suspended
Total for all users: 273 jobs; 0 completed, 0 removed, 28 idle, 245 running, 0 held, 0 suspended

I put a query in to the WM Ops Mattermost channel at CERN, but I don't expect a response over the weekend. 😒
9) Message boards : CMS Application : Upcoming disruptions to CERN infrastructure
Message 8479
Posted 18 Jun 2024 by Profileivan
For some reason I could'nt post this to the CMS@Home message board:

There seems to be a problem at CERN. Several WMAgents, including ours, are showing error status and I don't think we are generating jobs. A polite e-mail has been sent.

Polite response:
The CMSWEB team have been upgrading cmsweb-testbed frontends to a new technology and the redirect rules are still being polished (i.e. it looks like WM is still not fully functional).This transition started last Thursday.

Sorry about that.

Further communication from WMCore: it looks like we might have disruptions for a little while yet (vocms0267 is our current WMAgent node). 😢
We have recently migrated CERN nodes to Alma9, as a result of the WMAgent containerization effort.
With that said, we would like to upgrade vocms0267 as well. We already have another node in Alma9, named vocms267, but I believe we will have to change its condor setup.
Once we have the new node (vocms267) properly configured, we would like to upgrade the agent as well and start adopting the docker container solution for CMS@Home project as well.
There is work in progress with the cmsweb(-testbed) frontends, and depending on how that investigation goes, it might be that we will have to make a speedy transition, as apparently CouchDB replication does not work with the new frontends replacing Apache.
10) Message boards : CMS Application : Problem with upgrade of BOINC server
Message 8478
Posted 9 Jun 2024 by Profileivan
Calm down.


+1

+1
11) Message boards : CMS Application : Problem with upgrade of BOINC server
Message 8471
Posted 6 Jun 2024 by Profileivan
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3335869

I'll have to think about finding your job logs on the data-bridge and looking for the cause of your bad luck. 😕

I found one log from you, for computer 4786 from Wed Jun 5 12:58:16 2024.local, so 20:58 UTC? Currently can't match that to any task in https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=4786, but it's gone midnight now and I need some sleep...
12) Message boards : CMS Application : Problem with upgrade of BOINC server
Message 8470
Posted 6 Jun 2024 by Profileivan
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3335869

I'll have to think about finding your job logs on the data-bridge and looking for the cause of your bad luck. 😕
13) Message boards : CMS Application : Problem with upgrade of BOINC server
Message 8465
Posted 5 Jun 2024 by Profileivan
OK, an update. You'll have noticed tasks are flowing again -- Laurence fixed the OS upgrade problem.
We seem also to have finally cracked the new storage configuration files, all production and post-production instances are returning successful completions. For the time being, I am generating workflows for quad-core VMs while we verify that this is truly so. Therefore, you should be setting, in your computing preferences, Max # CPUs = 4 ( and Max # jobs to <= actual CPUs/4). This applies to both CMS@Home and CMS@Home-dev. Apologies to those of you with hosts having fewer than 4 cores...
There's a discussion to be had soon as to how we proceed with multicore jobs vs. single core. There are difficulties in mixing the two, perhaps even some we haven't considered yet. I'm trying to gather my thoughts to produce an initial discussion paper.
14) Message boards : CMS Application : Problem with upgrade of BOINC server
Message 8464
Posted 5 Jun 2024 by Profileivan
Again
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3335623

You seem to be unlucky 😢.
I'm getting reasonable amounts of credit.
15) Message boards : CMS Application : Problem with upgrade of BOINC server
Message 8463
Posted 5 Jun 2024 by Profileivan
Apparently there is a problem with the BOINC server after an OS upgrade to RHEL9. The server status display shows zero CMS tasks available even though there are jobs pending. This is affecting creation of new tasks, even though we do have some jobs being run.
We are working on a fix.

@Ivan: Jobs, you created yesterday afternoon, are coming trough now.
It seems all jobs are exactly the same. Is this on purpose for testing or is that a failure?

In which way the same? They are Monte Carlo simulations, so while the config file might be the same, they should give different results due to the pseudorandom-number generators.
16) Message boards : CMS Application : Problem with upgrade of BOINC server
Message 8454
Posted 31 May 2024 by Profileivan
Apparently there is a problem with the BOINC server after an OS upgrade to RHEL9. The server status display shows zero CMS tasks available even though there are jobs pending. This is affecting creation of new tasks, even though we do have some jobs being run.
We are working on a fix.
17) Message boards : CMS Application : CMS multi-core
Message 8375
Posted 28 Mar 2024 by Profileivan
How hard would it be to make native version of CMS?

Not that easy. To my knowledge we have running versions of CMSSW on x64, NVidia (CUDA), XeonPhi (Kights Landing), and Arm. The original idea to run in VirtualBox was because the x64 architecture existed in Windows, Linux, and MacOS, so we could deploy to those environments with minimal additional effort. Given that the effort for years has been mostly Laurence (on the BOINC side), me from CMS, and lately Federica from CMS, I don't think we can support a large number of environments. You also have to realise that there are a heck of a lot of other things running behind the scenes, CVMFS and containerisation to name but two, so that folding them into a VM rather than expecting Volunteers to maintain them as well makes participation rather easier for Joe Average.
18) Message boards : CMS Application : CMS multi-core
Message 8367
Posted 27 Mar 2024 by Profileivan
I requested a dual core task. That task did 3 jobs within 4.5 hours, but now I don't get a new sub-job, so VM almost idling.
I'm not so sure any longer that the VM is not processing events.
No process cmsRun with up to 200% cpu or any other process with high CPU usage is shown in Console ALT-F3 (top),
but the total CPU used by the VM since beginning is ~184% (incl. init phase) and there is also data transfered.
At the start of the first three seen jobs at 27-mar-2024 05:38:10.10, 27-mar-2024 07:41:24.24 and 27-mar-2024 10:01:54.54
The last two jobs where I did not see a cmsRun data downloaded at 27-mar-2024 12:27:03.03 and 27-mar-2024 15:03:15.15

Update
19) Message boards : News : Multi-core jobs available for CMS@Home-dev
Message 8346
Posted 25 Mar 2024 by Profileivan
We are currently testing multi-core jobs for CMS@Home. Note that these will only run in -dev as the main project does not currently allow you to select multi-core VMs. We currently have 2-core and 4-core tasks in the queue, so please try selecting 4-core in your machine preferences, and let us know how it works.
20) Message boards : CMS Application : New Version 60.70
Message 7896
Posted 25 Nov 2022 by Profileivan
We've reverted the change that garbled our glidein script -- I'm running main and -dev jobs successfully now.
Next 20


©2025 CERN