Info | Message |
---|---|
1) Message boards : CMS Application : Upcoming disruptions to CERN infrastructure
Message 8847 Posted 20 days ago by ![]() |
Intervention completed, new workflow submitted. |
2) Message boards : CMS Application : Upcoming disruptions to CERN infrastructure
Message 8841 Posted 23 days ago by ![]() |
We're trying to schedule an intervention on the WMAgent for Monday morning, so expect job availability to drop off on Sunday night. |
3) Message boards : CMS Application : Upcoming disruptions to CERN infrastructure
Message 8834 Posted 27 days ago by ![]() |
In reply to ivan's message of 14 Jun 2025:Job shortage Provlem fixed! |
4) Message boards : CMS Application : Upcoming disruptions to CERN infrastructure
Message 8833 Posted 29 days ago by ![]() |
Job shortage |
5) Message boards : CMS Application : Upcoming disruptions to CERN infrastructure
Message 8798 Posted 8 May 2025 by ![]() |
Intervention is over and I've submitted a new batch of jobs. They should be available soon. |
6) Message boards : CMS Application : Upcoming disruptions to CERN infrastructure
Message 8797 Posted 7 May 2025 by ![]() |
We need to upgrade the WMAgent, but we can't wait for the current workflow to finish. So, we will force-stop the workflow tomorrow (Thursday 08/05). Please set your machines to "No new tasks" to give current tasks a chance to finish before the stoppage. |
7) Message boards : CMS Application : Failures to contact CMS-Factory
Message 8496 Posted 10 Sep 2024 by ![]() |
[Once again, I can't post to the main project!] [Hmm, maybe it doesn't like my emoticons!!] Is anyone else having their tasks fail after just a few minutes due to a failure to contact the CMS-Factory? It's happening increasingly for me, but 've not seen it for a few other volunteers I've studied -- and no-one else is complaining at the moment! 2024-09-10 06:01:15 (12611): Guest Log: [INFO] Testing connection to EOSCMS 2024-09-10 06:01:15 (12611): Guest Log: [INFO] Testing connection to CMS-Factory 2024-09-10 06:01:30 (12611): Guest Log: [DEBUG] Status run 1 of up to 3: 1 2024-09-10 06:01:53 (12611): Guest Log: [DEBUG] Status run 2 of up to 3: 1 2024-09-10 06:02:23 (12611): Guest Log: [DEBUG] Status run 3 of up to 3: 1 ... 2024-09-10 06:02:23 (12611): Guest Log: libnsock nsock_trace_handler_callback(): Callback: CONNECT ERROR [Network is unreachable (101)] for EID 16 [2001:1458:d00:17::13:80] 2024-09-10 06:02:23 (12611): Guest Log: Ncat: Network is unreachable. 2024-09-10 06:02:23 (12611): Guest Log: [ERROR] Could not connect to vocms0205.cern.ch on port 80 I'm having some real problems with a new security regime at my institution, so this could possibly be a manifestation of that.😒 |
8) Message boards : CMS Application : Problems posting to CMS@Home
Message 8486 Posted 14 Jul 2024 by ![]() |
I've just had problems posting this message to the main CMS@Home message board. Maybe summer has truly arrived in Switzerland... Unfortunately, I'm still having problems submitting jobs. The website I use to check progress, https://cmsweb-testbed.cern.ch/wmstats/index.html, is currently returning garbage. I submitted a new batch yesterday, which seemed to go well: 2024-07-13 15:39:06,179:INFO:reqmgr2: Assign succeeded. 2024-07-13 15:39:06,188:INFO:inject-test-wfs: TC_Backfill.json request successfully created! 2024-07-13 15:39:06,188:INFO:inject-test-wfs: Injected 1 workflows out of 1 templates in 1.13 secs. Good job!but looking at the condor queue, I don't see any new jobs: condor_q -name vocms267.cern.ch -pool vocms0840.cern.ch cmst1 -- Schedd: vocms267.cern.ch : <188.185.64.105:4080?... @ 07/14/24 01:56:04 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS cmst1 ID: 81 7/12 13:56 199 1 _ 200 81.172 cmst1 ID: 83 7/12 13:56 199 1 _ 200 83.88 cmst1 ID: 84 7/12 13:56 195 5 _ 200 84.55-138 cmst1 ID: 85 7/12 14:03 196 4 _ 200 85.83-149 cmst1 ID: 86 7/12 14:03 192 8 _ 200 86.3-166 cmst1 ID: 87 7/12 14:03 179 21 _ 200 87.8-195 cmst1 ID: 88 7/12 14:03 135 65 _ 200 88.4-197 cmst1 ID: 89 7/12 14:03 57 140 3 200 89.3-199 cmst1 ID: 90 7/12 14:12 _ _ 25 25 90.0-24 Total for query: 273 jobs; 0 completed, 0 removed, 28 idle, 245 running, 0 held, 0 suspended Total for all users: 273 jobs; 0 completed, 0 removed, 28 idle, 245 running, 0 held, 0 suspended I put a query in to the WM Ops Mattermost channel at CERN, but I don't expect a response over the weekend. 😒 |
9) Message boards : CMS Application : Upcoming disruptions to CERN infrastructure
Message 8479 Posted 18 Jun 2024 by ![]() |
For some reason I could'nt post this to the CMS@Home message board:There seems to be a problem at CERN. Several WMAgents, including ours, are showing error status and I don't think we are generating jobs. A polite e-mail has been sent. Further communication from WMCore: it looks like we might have disruptions for a little while yet (vocms0267 is our current WMAgent node). 😢 We have recently migrated CERN nodes to Alma9, as a result of the WMAgent containerization effort. With that said, we would like to upgrade vocms0267 as well. We already have another node in Alma9, named vocms267, but I believe we will have to change its condor setup. Once we have the new node (vocms267) properly configured, we would like to upgrade the agent as well and start adopting the docker container solution for CMS@Home project as well. There is work in progress with the cmsweb(-testbed) frontends, and depending on how that investigation goes, it might be that we will have to make a speedy transition, as apparently CouchDB replication does not work with the new frontends replacing Apache. |
10) Message boards : CMS Application : Problem with upgrade of BOINC server
Message 8478 Posted 9 Jun 2024 by ![]() |
Calm down. +1 |
11) Message boards : CMS Application : Problem with upgrade of BOINC server
Message 8471 Posted 6 Jun 2024 by ![]() |
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3335869 I found one log from you, for computer 4786 from Wed Jun 5 12:58:16 2024.local, so 20:58 UTC? Currently can't match that to any task in https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=4786, but it's gone midnight now and I need some sleep... |
12) Message boards : CMS Application : Problem with upgrade of BOINC server
Message 8470 Posted 6 Jun 2024 by ![]() |
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3335869 I'll have to think about finding your job logs on the data-bridge and looking for the cause of your bad luck. 😕 |
13) Message boards : CMS Application : Problem with upgrade of BOINC server
Message 8465 Posted 5 Jun 2024 by ![]() |
OK, an update. You'll have noticed tasks are flowing again -- Laurence fixed the OS upgrade problem. We seem also to have finally cracked the new storage configuration files, all production and post-production instances are returning successful completions. For the time being, I am generating workflows for quad-core VMs while we verify that this is truly so. Therefore, you should be setting, in your computing preferences, Max # CPUs = 4 ( and Max # jobs to <= actual CPUs/4). This applies to both CMS@Home and CMS@Home-dev. Apologies to those of you with hosts having fewer than 4 cores... There's a discussion to be had soon as to how we proceed with multicore jobs vs. single core. There are difficulties in mixing the two, perhaps even some we haven't considered yet. I'm trying to gather my thoughts to produce an initial discussion paper. |
14) Message boards : CMS Application : Problem with upgrade of BOINC server
Message 8464 Posted 5 Jun 2024 by ![]() |
Again You seem to be unlucky 😢. I'm getting reasonable amounts of credit. |
15) Message boards : CMS Application : Problem with upgrade of BOINC server
Message 8463 Posted 5 Jun 2024 by ![]() |
Apparently there is a problem with the BOINC server after an OS upgrade to RHEL9. The server status display shows zero CMS tasks available even though there are jobs pending. This is affecting creation of new tasks, even though we do have some jobs being run. In which way the same? They are Monte Carlo simulations, so while the config file might be the same, they should give different results due to the pseudorandom-number generators. |
16) Message boards : CMS Application : Problem with upgrade of BOINC server
Message 8454 Posted 31 May 2024 by ![]() |
Apparently there is a problem with the BOINC server after an OS upgrade to RHEL9. The server status display shows zero CMS tasks available even though there are jobs pending. This is affecting creation of new tasks, even though we do have some jobs being run. We are working on a fix. |
17) Message boards : CMS Application : CMS multi-core
Message 8375 Posted 28 Mar 2024 by ![]() |
How hard would it be to make native version of CMS? Not that easy. To my knowledge we have running versions of CMSSW on x64, NVidia (CUDA), XeonPhi (Kights Landing), and Arm. The original idea to run in VirtualBox was because the x64 architecture existed in Windows, Linux, and MacOS, so we could deploy to those environments with minimal additional effort. Given that the effort for years has been mostly Laurence (on the BOINC side), me from CMS, and lately Federica from CMS, I don't think we can support a large number of environments. You also have to realise that there are a heck of a lot of other things running behind the scenes, CVMFS and containerisation to name but two, so that folding them into a VM rather than expecting Volunteers to maintain them as well makes participation rather easier for Joe Average. |
18) Message boards : CMS Application : CMS multi-core
Message 8367 Posted 27 Mar 2024 by ![]() |
I requested a dual core task. That task did 3 jobs within 4.5 hours, but now I don't get a new sub-job, so VM almost idling.I'm not so sure any longer that the VM is not processing events. Update |
19) Message boards : News : Multi-core jobs available for CMS@Home-dev
Message 8346 Posted 25 Mar 2024 by ![]() |
We are currently testing multi-core jobs for CMS@Home. Note that these will only run in -dev as the main project does not currently allow you to select multi-core VMs. We currently have 2-core and 4-core tasks in the queue, so please try selecting 4-core in your machine preferences, and let us know how it works. |
20) Message boards : CMS Application : New Version 60.70
Message 7896 Posted 25 Nov 2022 by ![]() |
We've reverted the change that garbled our glidein script -- I'm running main and -dev jobs successfully now. |
©2025 CERN