1) Message boards : CMS Application : CMS multi-core (Message 8375)
Posted 28 Mar 2024 by Profile ivan
Post:
How hard would it be to make native version of CMS?

Not that easy. To my knowledge we have running versions of CMSSW on x64, NVidia (CUDA), XeonPhi (Kights Landing), and Arm. The original idea to run in VirtualBox was because the x64 architecture existed in Windows, Linux, and MacOS, so we could deploy to those environments with minimal additional effort. Given that the effort for years has been mostly Laurence (on the BOINC side), me from CMS, and lately Federica from CMS, I don't think we can support a large number of environments. You also have to realise that there are a heck of a lot of other things running behind the scenes, CVMFS and containerisation to name but two, so that folding them into a VM rather than expecting Volunteers to maintain them as well makes participation rather easier for Joe Average.
2) Message boards : CMS Application : CMS multi-core (Message 8367)
Posted 27 Mar 2024 by Profile ivan
Post:
I requested a dual core task. That task did 3 jobs within 4.5 hours, but now I don't get a new sub-job, so VM almost idling.
I'm not so sure any longer that the VM is not processing events.
No process cmsRun with up to 200% cpu or any other process with high CPU usage is shown in Console ALT-F3 (top),
but the total CPU used by the VM since beginning is ~184% (incl. init phase) and there is also data transfered.
At the start of the first three seen jobs at 27-mar-2024 05:38:10.10, 27-mar-2024 07:41:24.24 and 27-mar-2024 10:01:54.54
The last two jobs where I did not see a cmsRun data downloaded at 27-mar-2024 12:27:03.03 and 27-mar-2024 15:03:15.15

Update
3) Message boards : News : Multi-core jobs available for CMS@Home-dev (Message 8346)
Posted 25 Mar 2024 by Profile ivan
Post:
We are currently testing multi-core jobs for CMS@Home. Note that these will only run in -dev as the main project does not currently allow you to select multi-core VMs. We currently have 2-core and 4-core tasks in the queue, so please try selecting 4-core in your machine preferences, and let us know how it works.
4) Message boards : CMS Application : New Version 60.70 (Message 7896)
Posted 25 Nov 2022 by Profile ivan
Post:
We've reverted the change that garbled our glidein script -- I'm running main and -dev jobs successfully now.
5) Message boards : News : Server Release 1.4.0 (Message 7875)
Posted 10 Nov 2022 by Profile ivan
Post:
Strange, when I enabled -dev on my Windows 10 box it downloaded the vboxwrapper but not the .dvi. I had to do a project reset.
Rocky Linux machine has been running -dev nominally since 04/11.

10/11/2022 14:14:26 | lhcathome-dev | work fetch resumed by user
10/11/2022 14:14:26 | lhcathome-dev | Sending scheduler request: To fetch work.
10/11/2022 14:14:26 | lhcathome-dev | Requesting new tasks for CPU
10/11/2022 14:14:27 | lhcathome-dev | Scheduler request completed: got 2 new tasks
10/11/2022 14:14:27 | lhcathome-dev | Project requested delay of 61 seconds
10/11/2022 14:14:29 | lhcathome-dev | update requested by user
10/11/2022 14:14:29 | lhcathome-dev | Started download of vboxwrapper_26206_windows_x86_64.exe
10/11/2022 14:14:32 | lhcathome-dev | Finished download of vboxwrapper_26206_windows_x86_64.exe
10/11/2022 14:14:33 | lhcathome-dev | Starting task CMS_2966346_1667912538.876527_0
10/11/2022 14:14:33 | lhcathome-dev | Starting task CMS_2970324_1667914940.149287_0

2022-11-10 14:14:40 (7768): Adding virtual disk drive to VM. (CMS_2022_09_07.vdi)
2022-11-10 14:14:47 (7768): Error in check if parent hdd is registered.
Command:
VBoxManage -q showhdinfo "C:\ProgramData\BOINC/projects/lhcathomedev.cern.ch_lhcathome-dev/CMS_2022_09_07.vdi" 
Output:
VBoxManage.exe: error: Could not find file for the medium 'C:\ProgramData\BOINC\projects\lhcathomedev.cern.ch_lhcathome-dev\CMS_2022_09_07.vdi' (VERR_FILE_NOT_FOUND)
VBoxManage.exe: error: Details: code VBOX_E_FILE_ERROR (0x80bb0004), component MediumWrap, interface IMedium, callee IUnknown
VBoxManage.exe: error: Context: "OpenMedium(Bstr(pszFilenameOrUuid).raw(), enmDevType, enmAccessMode, fForceNewUuidOnOpen, pMedium.asOutParam())" at line 205 of file VBoxManageDisk.cpp
6) Message boards : CMS Application : New Version 60.66 (Message 7814)
Posted 17 Sep 2022 by Profile ivan
Post:
Thought it was a interrupt at 13 UTC in the CMS-Servers (WM-Agent upgrade).
No, there were enough jobs.

Since 05.30 UTC this morning we ran out of CMS-jobs.

From the error messages, it seems a proxy certificate expired! Looking at graphs for "All" production sites, it may have affected them as well, but most have recovered. However our Agent (and at least one other which is also affected), is on a different system (cmsweb-testbed) to others (cmsweb) -- perhaps they fixed the others but forgot about us?
7) Message boards : CMS Application : New Version 60.66 (Message 7801)
Posted 15 Sep 2022 by Profile ivan
Post:
Jobs now available again. I believe the little glitch here in -dev has been repaired but I won't be able to confirm it myself until mid-morning tomorrow.

Not looking good so far:
cmsRun  -j FrameworkJobReport.xml PSet.py
warn  [frontier.c:1014]: Request 507 on chan 1 failed at Thu Sep 15 21:48:52 2022: -6 [fn-socket.c:239]: read from 172.64.206.32 timed out after 10 seconds
warn  [frontier.c:1136]: Trying next server cms-frontier.openhtc.io[172.64.207.32]
warn  [frontier.c:1014]: Request 701 on chan 1 failed at Thu Sep 15 21:49:07 2022: -6 [fn-socket.c:239]: read from 172.64.207.32 timed out after 10 seconds
warn  [frontier.c:1136]: Trying next server cms-frontier.openhtc.io[2606:4700:e6::ac40:cf20]
warn  [frontier.c:1014]: Request 702 on chan 1 failed at Thu Sep 15 21:49:07 2022: -9 [fn-socket.c:85]: network error on connect to 2606:4700:e6::ac40:cf20: Network is unreachable
warn  [frontier.c:1136]: Trying next server cms-frontier.openhtc.io[2606:4700:e6::ac40:ce20]
warn  [frontier.c:1014]: Request 703 on chan 1 failed at Thu Sep 15 21:49:07 2022: -9 [fn-socket.c:85]: network error on connect to 2606:4700:e6::ac40:ce20: Network is unreachable
warn  [frontier.c:1136]: Trying next server cms1-frontier.openhtc.io
warn  [frontier.c:1014]: Request 704 on chan 1 failed at Thu Sep 15 21:49:27 2022: -6 [fn-urlparse.c:178]: host name cms1-frontier.openhtc.io problem: Name or service not known
warn  [frontier.c:1136]: Trying next server cms2-frontier.openhtc.io
Hmm, that's not the glitch I was looking at. That's more reminiscent of IPv6 problems I've seen in the past. IPv4 fails (172.64.207.32) and then IPv6 is not available (2606:4700:e6::ac40:ce20).
... but finally:
Begin processing the 1st record. Run 1, Event 1920001, LumiSection 3841 on stream 0 at 15-Sep-2022 21:51:36.256 CEST
That certainly looks better. Transient network problems earlier?
8) Message boards : CMS Application : New Version 60.66 (Message 7799)
Posted 15 Sep 2022 by Profile ivan
Post:
There has been a delay, I'm afraid. Still, tomorrow is for clean-up!

Jobs now available again. I believe the little glitch here in -dev has been repaired but I won't be able to confirm it myself until mid-morning tomorrow.
9) Message boards : CMS Application : New Version 60.66 (Message 7798)
Posted 15 Sep 2022 by Profile ivan
Post:
There has been a delay, I'm afraid. Still, tomorrow is for clean-up!
10) Message boards : CMS Application : New Version 60.66 (Message 7789)
Posted 14 Sep 2022 by Profile ivan
Post:
I have a problem on my Win10 box -- a task ran all night without actually running a job. Started a new VM just now and it seemed to try to run the glidein script in the wrong directory.

That glitch has been patched. I need to wait until tomorrow when I can submit new jobs, and check running BOINC on my work PC.
11) Message boards : CMS Application : New Version 60.66 (Message 7788)
Posted 14 Sep 2022 by Profile ivan
Post:
I have a problem on my Win10 box -- a task ran all night without actually running a job. Started a new VM just now and it seemed to try to run the glidein script in the wrong directory.

Note that we will have no jobs for a while tonight while the WMAgent is upgraded, but keep an eye on your VMs' top consoles to see if they are actually running cmsRun jobs.
12) Message boards : CMS Application : New Version 60.63 (Message 7688)
Posted 30 Jul 2022 by Profile ivan
Post:
I'm running a new CMS-task doing its first CMS-job inside the VM.
After 25% into this first job, I noticed a differencing image in the snapshot folder with the size of 4444913664 bytes, 4.13 GB.

Is this to be expected?

I can't comment, myself. Laurence is on holiday next week so he may not be able to reply.
13) Message boards : CMS Application : New Version 60.63 (Message 7681)
Posted 29 Jul 2022 by Profile ivan
Post:
The only current glitch is that the "Show VM Console" button doesn't appear (nor does the "Show graphics" option show the current logs) so I cannot check that it is actually running cmsRun jobs.

The reason for this is reported here, but you already found the suggested workaround.
2022-07-28 23:52:52 (212243): Required extension pack not installed, remote desktop not enabled.

Thanks, hadn't noticed that. I saw that VirtualBox was updated when I did a "yum update" yesterday, but didn't realise the extpak wasn't installed at the same time. That's bitten me once before on my "managed" Win10 box when central IT updated Vbox but not the extpak. Loaded pack, aborted task, new task shows VM console. I still have a problem with the OS intercepting Alt-Fn key sequences, tho'. The only reliable one is Alt-F3.
14) Message boards : CMS Application : New Version 60.63 (Message 7674)
Posted 29 Jul 2022 by Profile ivan
Post:
[The only current glitch is that the "Show VM Console" button doesn't appear (nor does the "Show graphics" option show the current logs) so I cannot check that it is actually running cmsRun jobs.

OK, got around that by running VirtualBox and logging in to the VM. Output files are appearing and look OK.
15) Message boards : CMS Application : New Version 60.63 (Message 7673)
Posted 29 Jul 2022 by Profile ivan
Post:
[ I managed to get vboxwrapper to compile by using devtools-11; gcc 8.0 didn't have a 32-bit libstdc++.a.
Now waiting for my task backoff to time-out so I can try with the "new" vboxwrapper -- at least it passes an ldd test without problem.

OK, it seems to have worked. First task ran to completion and a new task has started up. I see both my computers (Win10 and Rocky Linux) running in the HTcondor pool:
[lxplus789:~] > condor_status -pool vocms0840.cern.ch|grep '@9-'
glidein_4804_350226202@9-4416-18346           LINUX      X86_64 Claimed   Busy          3.360 2500  0+00:23:36
glidein_4821_834398040@9-4599-1833            LINUX      X86_64 Claimed   Busy          1.350 2500  0+00:20:05

The only current glitch is that the "Show VM Console" button doesn't appear (nor does the "Show graphics" option show the current logs) so I cannot check that it is actually running cmsRun jobs.
16) Message boards : CMS Application : New Version 60.63 (Message 7672)
Posted 29 Jul 2022 by Profile ivan
Post:
In cc_config.xml add/set:
|dont_check_file_sizes|1|/dont_check_file_sizes|
Then reload config files via boincmanager.

Even then the client will occasionally download a fresh copy from the server, e.g. after a crash.

OK, thanks for pointing that out.
17) Message boards : CMS Application : New Version 60.63 (Message 7660)
Posted 28 Jul 2022 by Profile ivan
Post:
... I managed to get vboxwrapper to compile ...

The most recent changes are not yet merged to BOINC master.
You would have to download them from the links below, then recompile vboxwrapper.
Otherwise you would get a version that may run into the "VirtualBox 4.0" error.

https://github.com/BOINC/boinc/blob/5347c8068c5594cc008dacb80e97c4c85601a08c/samples/vboxwrapper/vbox_common.cpp

https://github.com/BOINC/boinc/blob/5347c8068c5594cc008dacb80e97c4c85601a08c/samples/vboxwrapper/vbox_vboxmanage.cpp

Hmm, OK, done that. BOINC did download the "official" wrapper when it first asked for work because the local copy was the wrong size. I copied the new version across after that; hopefully it won't overwrite again when it does eventually get a new task after my quota is increased.
18) Message boards : CMS Application : New Version 60.63 (Message 7657)
Posted 28 Jul 2022 by Profile ivan
Post:
... It seems vboxwrapper_26205_x86_64-pc-linux-gnu was built on a system with glibc 2.29 but I'm using Rocky Linux 8.6 (derived from RHEL 8) which still uses 2.28.

As an option you may want to upgrade to Rocky Linux 9.
It installs glibc-2.34-28.el9_0.x86_64.rpm

For details see:
https://download.rockylinux.org/pub/rocky/
https://download.rockylinux.org/pub/rocky/9/BaseOS/x86_64/os/Packages/g/

Hmm, that's only just been released... I managed to get vboxwrapper to compile by using devtools-11; gcc 8.0 didn't have a 32-bit libstdc++.a.
Now waiting for my task backoff to time-out so I can try with the "new" vboxwrapper -- at least it passes an ldd test without problem.
19) Message boards : CMS Application : New Version 60.63 (Message 7650)
Posted 27 Jul 2022 by Profile ivan
Post:
Has anyone else run into a problem with the vboxwrapper under Linux? I get the message:
../../projects/lhcathomedev.cern.ch_lhcathome-dev/vboxwrapper_26205_x86_64-pc-linux-gnu: /lib64/libm.so.6: version `GLIBC_2.29' not found (required by ../../projects/lhcathomedev.cern.ch_lhcathome-dev/vboxwrapper_26205_x86_64-pc-linux-gnu)
It seems vboxwrapper_26205_x86_64-pc-linux-gnu was built on a system with glibc 2.29 but I'm using Rocky Linux 8.6 (derived from RHEL 8) which still uses 2.28.
It's not immediately obvious to me how to build my own version of vboxwrapper (it seems you have to build the whole BOINC tree, but that always trips up over the wx-widgets version unless you are lucky). I tried copying the V26204 wrapper to my directory tree and renaming that to V26205, but have yet to find out if that works as BOINC won't serve me new tasks since I had so many failures last night before I realised something was wrong.
20) Message boards : CMS Application : New Version 60.63 (Message 7640)
Posted 26 Jul 2022 by Profile ivan
Post:
Task seems to be running fine on my Windows 10 box. I also see the Apache home page rather than logs in "Show graphics". In the console window, Ctrl-Alt-F1 brings up the console output, Ctrl-Alt-F3 brings up the "top" output and Ctrl-Alt-F6 shows the console login page. With F2, F4 and F5 I just get the dummy messages that job output/job wrapper/error messages may appear, but they don't.


Next 20


©2024 CERN