Message boards :
CMS Application :
New Version 60.60
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
This update provides a new version of the VboxWrapper which supports the muliattachmode. Please let me know if there are any issues. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,168,972 RAC: 1,763 |
This update provides a new version of the VboxWrapper which supports the muliattachmode. Please let me know if there are any issues. For what it's worth, this version dies immediately on both my Windows 10 machine and a Rocky Linux 8.6 box[1]. I didn't have much time to investigate this afternoon (too many meetings...). Please let us know whether or not your new tasks are running normally. [1] Tasks link if anyone has the time and inclination to take a look overnight UK time. |
Send message Joined: 28 Jul 16 Posts: 482 Credit: 394,720 RAC: 0 |
I suspect it is caused by a vdi registration error (on Windows a well as on Linux): VBoxManage.exe: error: Cannot register the hard disk 'C:\ProgramData\BOINC\projects\lhcathomedev.cern.ch_lhcathome-dev\CMS_2021_07_07.vdi' {f888c51e-0503-4495-8794-fd67809dc4e8} because a hard disk 'C:\ProgramData\BOINC\projects\lhcathome.cern.ch_lhcathome\CMS_2021_07_07.vdi' with UUID {f888c51e-0503-4495-8794-fd67809dc4e8} already exists The old app version as well as the new app version both refer to the same vdi file "CMS_2021_07_07.vdi". To allow both app versions to coexist the new vdi file must have a different name and a different UUID. @Laurence Please clone the old vdi file using a command line like this: vboxmanage clonemedium CMS_2021_07_07.vdi CMS_2022_06_15.vdi Then create the new app version with CMS_2022_06_15.vdi instead of CMS_2021_07_07.vdi. BTW BOINC first published an unstripped Linux version of vboxwrapper. They recently updated the executable with a stripped version which is slightly smaller: https://boinc.berkeley.edu/dl/vboxwrapper_26204_x86_64-pc-linux-gnu |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 859,751 RAC: 36 |
My CMS-test with the new vboxwrapper is running fine. Meanwhile busy with its 4th cmsRun and will be ready in about 3 hours. MasterLog 2022-06-15 07:44 245K StartdLog 2022-06-15 07:41 286K StarterLog 2022-06-14 21:00 215 finished_0.log 2022-06-14 21:03 39 finished_1.log 2022-06-15 00:43 1.4M finished_2.log 2022-06-15 03:50 1.4M finished_3.log 2022-06-15 07:04 1.4M running.log 2022-06-15 07:46 239K stderr.log 2022-06-15 07:42 23K stdout.log 2022-06-15 07:01 24K wmagentJob.log 2022-06-15 07:42 6.7K wmagentJob_1.log 2022-06-15 00:43 23K wmagentJob_2.log 2022-06-15 03:50 23K wmagentJob_3.log 2022-06-15 07:04 23K https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3092830 snapshot difference file atm 1460MB stderr.txt so far: 2022-06-14 20:52:55 (13408): Detected: vboxwrapper 26204 2022-06-14 20:52:55 (13408): Detected: BOINC client v7.19.0 2022-06-14 20:52:55 (13408): Detected: VirtualBox VboxManage Interface (Version: 6.1.34) 2022-06-14 20:52:56 (13408): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds) 2022-06-14 20:52:56 (13408): Successfully copied 'init_data.xml' to the shared directory. 2022-06-14 20:52:56 (13408): Create VM. (boinc_0514318c2c914930, slot#0) 2022-06-14 20:52:57 (13408): Setting Memory Size for VM. (2048MB) 2022-06-14 20:52:57 (13408): Setting CPU Count for VM. (1) 2022-06-14 20:52:57 (13408): Setting Chipset Options for VM. 2022-06-14 20:52:58 (13408): Setting Graphics Controller Options for VM. 2022-06-14 20:52:58 (13408): Setting Boot Options for VM. 2022-06-14 20:52:58 (13408): Setting Network Configuration for NAT. 2022-06-14 20:52:58 (13408): Enabling VM Network Access. 2022-06-14 20:52:59 (13408): Disabling USB Support for VM. 2022-06-14 20:52:59 (13408): Disabling COM Port Support for VM. 2022-06-14 20:52:59 (13408): Disabling LPT Port Support for VM. 2022-06-14 20:53:00 (13408): Disabling Audio Support for VM. 2022-06-14 20:53:00 (13408): Disabling Clipboard Support for VM. 2022-06-14 20:53:00 (13408): Disabling Drag and Drop Support for VM. 2022-06-14 20:53:00 (13408): Adding storage controller(s) to VM. 2022-06-14 20:53:01 (13408): Adding virtual disk drive to VM. (CMS_2021_07_07.vdi) 2022-06-14 20:53:02 (13408): Adding VirtualBox Guest Additions to VM. 2022-06-14 20:53:02 (13408): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB) 2022-06-14 20:53:02 (13408): forwarding host port 53846 to guest port 80 2022-06-14 20:53:03 (13408): Enabling remote desktop for VM. 2022-06-14 20:53:03 (13408): Enabling shared directory for VM. 2022-06-14 20:53:04 (13408): Starting VM using VBoxManage interface. (boinc_0514318c2c914930, slot#0) 2022-06-14 20:53:11 (13408): Successfully started VM. (PID = '9800') 2022-06-14 20:53:11 (13408): Reporting VM Process ID to BOINC. 2022-06-14 20:53:11 (13408): Guest Log: BIOS: VirtualBox 6.1.34 2022-06-14 20:53:11 (13408): Guest Log: CPUID EDX: 0x178bfbff 2022-06-14 20:53:11 (13408): Guest Log: BIOS: No PCI IDE controller, not probing IDE 2022-06-14 20:53:11 (13408): Guest Log: BIOS: AHCI 0-P#0: PCHS=16383/16/63 LCHS=1024/255/63 0x0000000002800000 sectors 2022-06-14 20:53:11 (13408): VM state change detected. (old = 'poweredoff', new = 'running') 2022-06-14 20:53:11 (13408): Detected: Web Application Enabled (http://localhost:53846) 2022-06-14 20:53:11 (13408): Detected: Remote Desktop Enabled (localhost:53847) 2022-06-14 20:53:11 (13408): Preference change detected 2022-06-14 20:53:11 (13408): Setting CPU throttle for VM. (100%) 2022-06-14 20:53:12 (13408): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 150 seconds) or (Vbox_job.xml: 600 seconds)) 2022-06-14 20:53:13 (13408): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032 2022-06-14 20:53:13 (13408): Guest Log: BIOS: Booting from Hard Disk... 2022-06-14 20:53:16 (13408): Guest Log: BIOS: KBD: unsupported int 16h function 03 2022-06-14 20:53:16 (13408): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 2022-06-14 20:53:44 (13408): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds 2022-06-14 20:53:44 (13408): Guest Log: vboxguest: misc device minor 56, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000) 2022-06-14 20:53:46 (13408): Guest Log: VBoxService 5.2.6 r120293 (verbosity: 0) linux.amd64 (Jan 15 2018 14:51:00) release log 2022-06-14 20:53:46 (13408): Guest Log: 00:00:00.000193 main Log opened 2022-06-14T18:53:45.742080000Z 2022-06-14 20:53:46 (13408): Guest Log: 00:00:00.000423 main OS Product: Linux 2022-06-14 20:53:46 (13408): Guest Log: 00:00:00.000545 main OS Release: 4.14.232-19.cernvm.x86_64 2022-06-14 20:53:46 (13408): Guest Log: 00:00:00.000587 main OS Version: #1 SMP Fri Apr 30 17:12:25 CEST 2021 2022-06-14 20:53:46 (13408): Guest Log: 00:00:00.000639 main Executable: /usr/sbin/VBoxService 2022-06-14 20:53:46 (13408): Guest Log: 00:00:00.000642 main Process ID: 2169 2022-06-14 20:53:46 (13408): Guest Log: 00:00:00.000644 main Package type: LINUX_64BITS_GENERIC 2022-06-14 20:53:46 (13408): Guest Log: 00:00:00.005730 main 5.2.6 r120293 started. Verbose level = 0 2022-06-14 20:54:05 (13408): Guest Log: [INFO] Mounting the shared directory 2022-06-14 20:54:05 (13408): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor 2022-06-14 20:54:05 (13408): Guest Log: [INFO] Sourcing essential functions from /cvmfs/grid.cern.ch 2022-06-14 20:54:05 (13408): Guest Log: [INFO] Testing connection to cern.ch 2022-06-14 20:54:05 (13408): Guest Log: [INFO] Testing connection to VCCS 2022-06-14 20:54:06 (13408): Guest Log: [INFO] Testing connection to HTCondor 2022-06-14 20:54:06 (13408): Guest Log: [INFO] Testing connection to WMAgent 2022-06-14 20:54:06 (13408): Guest Log: [INFO] Testing connection to EOSCMS 2022-06-14 20:54:06 (13408): Guest Log: [INFO] Testing connection to CMS-Frontier 2022-06-14 20:54:06 (13408): Guest Log: [INFO] Testing connection to Frontier 2022-06-14 20:54:07 (13408): Guest Log: [INFO] Could not find a local HTTP proxy 2022-06-14 20:54:07 (13408): Guest Log: [INFO] CVMFS and Frontier will have to use DIRECT connections 2022-06-14 20:54:07 (13408): Guest Log: [INFO] This makes the application less efficient 2022-06-14 20:54:07 (13408): Guest Log: [INFO] It also puts higher load on the project servers 2022-06-14 20:54:07 (13408): Guest Log: [INFO] Setting up a local HTTP proxy is highly recommended 2022-06-14 20:54:07 (13408): Guest Log: [INFO] Advice can be found in the project forum 2022-06-14 20:54:07 (13408): Guest Log: [INFO] Reloading and probing the CVMFS configuration 2022-06-14 20:54:21 (13408): Guest Log: [INFO] Probing /cvmfs/cvmfs-config.cern.ch... OK 2022-06-14 20:54:21 (13408): Guest Log: [INFO] Probing /cvmfs/grid.cern.ch... OK 2022-06-14 20:54:30 (13408): Guest Log: [INFO] Probing /cvmfs/oasis.opensciencegrid.org... OK 2022-06-14 20:54:30 (13408): Guest Log: [INFO] Probing /cvmfs/singularity.opensciencegrid.org... OK 2022-06-14 20:54:31 (13408): Guest Log: [INFO] Probing /cvmfs/cms.cern.ch... OK 2022-06-14 20:54:31 (13408): Guest Log: [INFO] Probing /cvmfs/cms-ib.cern.ch... OK 2022-06-14 20:54:32 (13408): Guest Log: [INFO] Excerpt from "cvmfs_config stat": VERSION HOST PROXY 2022-06-14 20:54:32 (13408): Guest Log: [INFO] 2.7.2.0 http://s1fnal-cvmfs.openhtc.io:8080 DIRECT 2022-06-14 20:54:32 (13408): Guest Log: [INFO] Reading volunteer information 2022-06-14 20:54:46 (13408): Guest Log: [INFO] Requesting an X509 credential from LHC@home 2022-06-14 20:54:47 (13408): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev 2022-06-14 20:54:49 (13408): Guest Log: [INFO] CMS application starting. Check log files. 2022-06-14 21:40:17 (13408): Preference change detected 2022-06-14 21:40:17 (13408): Setting CPU throttle for VM. (100%) 2022-06-14 21:40:17 (13408): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 150 seconds) or (Vbox_job.xml: 600 seconds)) 2022-06-14 21:42:36 (13408): Preference change detected 2022-06-14 21:42:36 (13408): Setting CPU throttle for VM. (100%) 2022-06-14 21:42:37 (13408): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 150 seconds) or (Vbox_job.xml: 600 seconds)) 2022-06-14 22:04:54 (13408): Stopping VM. 2022-06-14 22:05:10 (13408): Successfully stopped VM. 2022-06-14 22:07:21 (2904): Detected: vboxwrapper 26204 2022-06-14 22:07:21 (2904): Detected: BOINC client v7.19.0 2022-06-14 22:07:21 (2904): Detected: VirtualBox VboxManage Interface (Version: 6.1.34) 2022-06-14 22:07:21 (2904): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds) 2022-06-14 22:07:22 (2904): Guest Log: BIOS: VirtualBox 6.1.34 2022-06-14 22:07:22 (2904): Guest Log: CPUID EDX: 0x178bfbff 2022-06-14 22:07:22 (2904): Guest Log: BIOS: No PCI IDE controller, not probing IDE 2022-06-14 22:07:22 (2904): Guest Log: BIOS: AHCI 0-P#0: PCHS=16383/16/63 LCHS=1024/255/63 0x0000000002800000 sectors 2022-06-14 22:07:22 (2904): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032 2022-06-14 22:07:22 (2904): Guest Log: BIOS: Booting from Hard Disk... 2022-06-14 22:07:22 (2904): Guest Log: BIOS: KBD: unsupported int 16h function 03 2022-06-14 22:07:22 (2904): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 2022-06-14 22:07:22 (2904): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds 2022-06-14 22:07:22 (2904): Guest Log: vboxguest: misc device minor 56, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000) 2022-06-14 22:07:22 (2904): Guest Log: VBoxService 5.2.6 r120293 (verbosity: 0) linux.amd64 (Jan 15 2018 14:51:00) release log 2022-06-14 22:07:22 (2904): Guest Log: 00:00:00.000193 main Log opened 2022-06-14T18:53:45.742080000Z 2022-06-14 22:07:22 (2904): Guest Log: 00:00:00.000423 main OS Product: Linux 2022-06-14 22:07:22 (2904): Guest Log: 00:00:00.000545 main OS Release: 4.14.232-19.cernvm.x86_64 2022-06-14 22:07:22 (2904): Guest Log: 00:00:00.000587 main OS Version: #1 SMP Fri Apr 30 17:12:25 CEST 2021 2022-06-14 22:07:22 (2904): Guest Log: 00:00:00.000639 main Executable: /usr/sbin/VBoxService 2022-06-14 22:07:22 (2904): Guest Log: 00:00:00.000642 main Process ID: 2169 2022-06-14 22:07:22 (2904): Guest Log: 00:00:00.000644 main Package type: LINUX_64BITS_GENERIC 2022-06-14 22:07:22 (2904): Guest Log: 00:00:00.005730 main 5.2.6 r120293 started. Verbose level = 0 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Mounting the shared directory 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Sourcing essential functions from /cvmfs/grid.cern.ch 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Testing connection to cern.ch 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Testing connection to VCCS 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Testing connection to HTCondor 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Testing connection to WMAgent 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Testing connection to EOSCMS 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Testing connection to CMS-Frontier 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Testing connection to Frontier 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Could not find a local HTTP proxy 2022-06-14 22:07:22 (2904): Guest Log: [INFO] CVMFS and Frontier will have to use DIRECT connections 2022-06-14 22:07:22 (2904): Guest Log: [INFO] This makes the application less efficient 2022-06-14 22:07:22 (2904): Guest Log: [INFO] It also puts higher load on the project servers 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Setting up a local HTTP proxy is highly recommended 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Advice can be found in the project forum 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Reloading and probing the CVMFS configuration 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Probing /cvmfs/cvmfs-config.cern.ch... OK 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Probing /cvmfs/grid.cern.ch... OK 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Probing /cvmfs/oasis.opensciencegrid.org... OK 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Probing /cvmfs/singularity.opensciencegrid.org... OK 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Probing /cvmfs/cms.cern.ch... OK 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Probing /cvmfs/cms-ib.cern.ch... OK 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Excerpt from "cvmfs_config stat": VERSION HOST PROXY 2022-06-14 22:07:22 (2904): Guest Log: [INFO] 2.7.2.0 http://s1fnal-cvmfs.openhtc.io:8080 DIRECT 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Reading volunteer information 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Requesting an X509 credential from LHC@home 2022-06-14 22:07:22 (2904): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev 2022-06-14 22:07:22 (2904): Guest Log: [INFO] CMS application starting. Check log files. 2022-06-14 22:07:22 (2904): Starting VM using VBoxManage interface. (boinc_0514318c2c914930, slot#0) 2022-06-14 22:07:38 (2904): Successfully started VM. (PID = '2500') 2022-06-14 22:07:38 (2904): Reporting VM Process ID to BOINC. 2022-06-14 22:07:38 (2904): VM state change detected. (old = 'poweredoff', new = 'running') 2022-06-14 22:07:38 (2904): Detected: Web Application Enabled (http://localhost:53846) 2022-06-14 22:07:38 (2904): Detected: Remote Desktop Enabled (localhost:53847) 2022-06-14 22:07:38 (2904): Preference change detected 2022-06-14 22:07:38 (2904): Setting CPU throttle for VM. (100%) 2022-06-14 22:07:39 (2904): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 150 seconds) or (Vbox_job.xml: 600 seconds)) 2022-06-14 22:37:53 (2904): Status Report: Job Duration: '64800.000000' 2022-06-14 22:37:53 (2904): Status Report: Elapsed Time: '6000.649441' 2022-06-14 22:37:53 (2904): Status Report: CPU Time: '5724.281250' 2022-06-15 00:18:12 (2904): Status Report: Job Duration: '64800.000000' 2022-06-15 00:18:12 (2904): Status Report: Elapsed Time: '12000.649441' 2022-06-15 00:18:12 (2904): Status Report: CPU Time: '11701.828125' 2022-06-15 01:58:15 (2904): Status Report: Job Duration: '64800.000000' 2022-06-15 01:58:15 (2904): Status Report: Elapsed Time: '18000.649441' 2022-06-15 01:58:15 (2904): Status Report: CPU Time: '17559.312500' 2022-06-15 03:38:32 (2904): Status Report: Job Duration: '64800.000000' 2022-06-15 03:38:32 (2904): Status Report: Elapsed Time: '24000.649441' 2022-06-15 03:38:32 (2904): Status Report: CPU Time: '23545.921875' 2022-06-15 05:18:36 (2904): Status Report: Job Duration: '64800.000000' 2022-06-15 05:18:36 (2904): Status Report: Elapsed Time: '30000.649441' 2022-06-15 05:18:36 (2904): Status Report: CPU Time: '29417.093750' 2022-06-15 06:59:36 (2904): Status Report: Job Duration: '64800.000000' 2022-06-15 06:59:36 (2904): Status Report: Elapsed Time: '36000.765282' 2022-06-15 06:59:36 (2904): Status Report: CPU Time: '35387.703125' |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,168,972 RAC: 1,763 |
I suspect it is caused by a vdi registration error (on Windows a well as on Linux): Ah, thanks, I'd missed that difference because of the long line and my eyesight problems -- I thought it was the same file. |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
It is working fine for me with the Theory app. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3092467 Maybe it is an issue with the app version upgrade. does a project reset fix it? |
Send message Joined: 28 Jul 16 Posts: 482 Credit: 394,720 RAC: 0 |
It's not caused directly by BOINC. The issue appears as long as the VirtualBox media manager tries to register a vdi with a UUID that is already in the list. If that vdi is attached to a VM it can't be switched to multiattach mode. Once switched to multiattach mode a vdi can be used by many VMs. The old method copies the original vdi to a slot and sets a new random UUID. That's why all of them can be named "vm_image.vdi". See the comments from the BOINC sorcecode: https://github.com/BOINC/boinc/blob/master/samples/vboxwrapper/vbox_vboxmanage.cpp#L519-L593 |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,168,972 RAC: 1,763 |
Hmm, yes, I got the new wrapper running on both my machines by making sure CMS@Home wasn't running (I had to manually remove the BOINC VMs on Windows as they hung around in VirtualBox after I did a pause and abort on the tasks). |
Send message Joined: 8 Apr 15 Posts: 778 Credit: 12,140,404 RAC: 2,420 |
No problems here ( Windows 10 ) |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 819 |
https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=2189965 Win10pro downloading CMS_2021_07.07.vdi (3.7 GByte) |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
What is the consensus on this? It is ready for the prod server next week? |
Send message Joined: 28 Jul 16 Posts: 482 Credit: 394,720 RAC: 0 |
I checked some logs from my computers as well as from other volunteers (Windows and Linux). Looks like the new vboxwrapper works as expected. Not yet tested is a heavy load scenario on computers with lots of cores but I'm sure the new vboxwrapper isn't less robust than the old one. I tested it with a self compiled version but not with the version provided by BOINC. Please use a cloned CMS vdi file (new name + new UUID) when you prepare the app version for the production server. It's a few seconds of work but ensures that both app versions can coexist on the clients until older tasks are finished. |
Send message Joined: 8 Apr 15 Posts: 778 Credit: 12,140,404 RAC: 2,420 |
What is the consensus on this? It is ready for the prod server next week? I say YES Ivan ( at least as far as Windows 10 ) and (Version: 6.1.32) |
Send message Joined: 31 Aug 21 Posts: 13 Credit: 1,118,469 RAC: 0 |
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3093277 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3093276 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3093290 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3093339 My host had some sort of problem with all these CMS tasks. But it is running Boinc 7.20.0 (Development version) + Windows 11 + Virtualbox 6.1.34 ... so maybe that has something to do with it. Or is it clearly something else ? ATLAS tasks had "Outcome : Success"... https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3093565 ... but Run time 24 min 48 sec CPU time 2 min 35 sec ... and "No HITS file was produced" for all three of them and these lines in Stderr output : 2022-06-17 02:07:07 (2204): Guest Log: *** Job finished *** 2022-06-17 02:07:07 (2204): Guest Log: *** The last 20 lines of the pilot log: *** 2022-06-17 02:07:07 (2204): Guest Log: *** Error codes and diagnostics *** 2022-06-17 02:07:07 (2204): Guest Log: "exeErrorCode": 65, 2022-06-17 02:07:07 (2204): Guest Log: "exeErrorDiag": "Non-zero return code from EVNTtoHITS (33); Logfile error in log.EVNTtoHITS: \"IOVDbSvc FATAL Conditions database connection COOLOFL_TRT/OFLP200 cannot be opened - STOP\"", 2022-06-17 02:07:07 (2204): Guest Log: "pilotErrorCode": 1165, 2022-06-17 02:07:07 (2204): Guest Log: "pilotErrorDiag": "Local output file is missing" Theory task run without problems. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3092782 |
Send message Joined: 28 Jul 16 Posts: 482 Credit: 394,720 RAC: 0 |
The good news: Your CMS tasks correctly configure the differencing image and boot the VM. The bad news: There are lots of network errors when the bootstrap script from inside the VM sends some network tests. Might be a firewall issue. Example https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3093277: 2022-06-17 10:05:30 (1696): Guest Log: [INFO] Testing connection to HTCondor 2022-06-17 10:05:45 (1696): Guest Log: [DEBUG] Status run 1 of up to 3: 1 2022-06-17 10:06:06 (1696): Guest Log: [DEBUG] Status run 2 of up to 3: 1 2022-06-17 10:06:36 (1696): Guest Log: [DEBUG] Status run 3 of up to 3: 1 2022-06-17 10:06:36 (1696): Guest Log: [DEBUG] run 1 . . . <and many lines below> Didn't look into the ATLAS example yet. |
Send message Joined: 28 Jul 16 Posts: 482 Credit: 394,720 RAC: 0 |
Your ATLAS VM boots fine and uses a differencing image. See: 2022-06-17 01:42:33 (2204): Adding virtual disk drive to VM. (ATLAS_vbox_0.84_image.vdi) The error happens much deeper inside the running VM in one of the ATLAS scripts: 2022-06-17 02:07:07 (2204): Guest Log: "exeErrorDiag": "Non-zero return code from EVNTtoHITS (33); Logfile error in log.EVNTtoHITS: \"IOVDbSvc FATAL Conditions database connection COOLOFL_TRT/OFLP200 cannot be opened - STOP\"", |
Send message Joined: 31 Aug 21 Posts: 13 Credit: 1,118,469 RAC: 0 |
The bad news: Okay, I believe you are right. I fired up another host that run Windows 10 + Boinc 7.20.0 + VirtualBox 6.1.34. This host too produced the same errors: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3093178 Then I downgraded Boinc from 7.20.0 to 7.16.20. Same errors again. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3093397 Then I downgraded VirtualBox from 6.1.34 to 6.1.32 . Same errors again. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3093379 2022-06-17 20:05:13 (4272): VM Completion Message: Could not connect to all required network services I wish I knew what to change and where. But I think I'll just pause trying these CMS tasks for now so that I won't flood this board with my messages. This network thing seems to be a problem on my hosts only. |
Send message Joined: 28 Jul 16 Posts: 482 Credit: 394,720 RAC: 0 |
It may help to know which script causes the errors. It can be found here: https://gitlab.cern.ch/vc/vm/-/blob/master/bin/basic_network_tests The test command is just 1 line in that script: https://gitlab.cern.ch/vc/vm/-/blob/master/bin/basic_network_tests#L20 The script is called a couple of times from inside the VM when bootstrap-cms is executed. https://gitlab.cern.ch/vc/vm/blob/master/sbin/bootstrap-cms#L52-L76 Hence, it's CMS only. |
Send message Joined: 28 Jul 16 Posts: 482 Credit: 394,720 RAC: 0 |
Just checked the UUIDs of the CMS vdis from the dev server and the prod server. They are the same since the vdis are identical. This causes problems when a volunteer runs tasks from dev and prod concurrently. See: https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=563&postid=7390 @Laurence Please ensure each vdi that is sent out has a unique UUID. |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
I will release a new version later with a name change. Hopefully that will be enough. |
©2024 CERN