Message boards :
LHCb Application :
New version v1.02
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
A new version has been requested by LHCb. |
Send message Joined: 10 May 15 Posts: 4 Credit: 39,333 RAC: 0 |
I have run threee of theese and they all end in error code 5. This is an example LHCb Simulation v1.02 (vbox64_mt_mcore) x86_64-pc-linux-gnu Name LHCb_25826_1513818422.243445_0 Workunit 360294 Created 21 Dec 2017, 1:07:05 UTC Sent 21 Dec 2017, 12:33:30 UTC Report deadline 28 Dec 2017, 12:33:30 UTC Received 21 Dec 2017, 12:57:06 UTC Server state Over Outcome Computation error Client state Compute error Exit status 5 (0x00000005) Unknown error code Computer ID 2332 Run time 30 sec CPU time Validate state Invalid Credit 0.00 Device peak FLOPS 34.79 GFLOPS Application version LHCb Simulation v1.02 (vbox64_mt_mcore) x86_64-pc-linux-gnu Peak working set size 16.36 MB Peak swap size 317.56 MB Peak disk usage 2,382.03 MB Stderr output <core_client_version>7.6.33</core_client_version> <![CDATA[ <message> process exited with code 5 (0x5, -251) </message> <stderr_txt> 2017-12-21 07:36:44 (11935): vboxwrapper (7.7.26196): starting 2017-12-21 07:36:44 (11935): Feature: Checkpoint interval offset (308 seconds) 2017-12-21 07:36:44 (11935): Detected: VirtualBox VboxManage Interface (Version: 5.2.4) 2017-12-21 07:36:44 (11935): Detected: Minimum checkpoint interval (600.000000 seconds) 2017-12-21 07:36:44 (11935): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds) 2017-12-21 07:36:44 (11935): Successfully copied 'init_data.xml' to the shared directory. 2017-12-21 07:36:44 (11935): Create VM. (boinc_d9679f03a02df903, slot#35) 2017-12-21 07:37:00 (11935): Setting Memory Size for VM. (1830MB) 2017-12-21 07:37:00 (11935): Setting CPU Count for VM. (12) 2017-12-21 07:37:00 (11935): Setting Chipset Options for VM. 2017-12-21 07:37:00 (11935): Setting Boot Options for VM. 2017-12-21 07:37:00 (11935): Setting Network Configuration for NAT. 2017-12-21 07:37:00 (11935): Enabling VM Network Access. 2017-12-21 07:37:13 (11935): Disabling USB Support for VM. 2017-12-21 07:37:13 (11935): Disabling COM Port Support for VM. 2017-12-21 07:37:13 (11935): Disabling LPT Port Support for VM. 2017-12-21 07:37:13 (11935): Disabling Audio Support for VM. 2017-12-21 07:37:13 (11935): Disabling Clipboard Support for VM. 2017-12-21 07:37:14 (11935): Disabling Drag and Drop Support for VM. 2017-12-21 07:37:14 (11935): Adding storage controller(s) to VM. 2017-12-21 07:37:14 (11935): Adding virtual disk drive to VM. (vm_image.vdi) 2017-12-21 07:37:19 (11935): Error in storage attach (fixed disk) for VM: -2135228411 Command: VBoxManage -q storageattach "boinc_d9679f03a02df903" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --setuuid "" --medium "/usr/bin/slots/35/vm_image.vdi" Output: VBoxManage: error: Could not get the storage format of the medium '/usr/bin/slots/35/vm_image.vdi' (VERR_NOT_SUPPORTED) VBoxManage: error: Details: code VBOX_E_IPRT_ERROR (0x80bb0005), component MediumWrap, interface IMedium, callee nsISupports VBoxManage: error: Context: "OpenMedium(Bstr(pszFilenameOrUuid).raw(), enmDevType, enmAccessMode, fForceNewUuidOnOpen, pMedium.asOutParam())" at line 179 of file VBoxManageDisk.cpp VBoxManage: error: Invalid UUID or filename "/usr/bin/slots/35/vm_image.vdi" 2017-12-21 07:37:19 (11935): Powering off VM. 2017-12-21 07:37:19 (11935): Deregistering VM. (boinc_d9679f03a02df903, slot#35) 2017-12-21 07:37:19 (11935): Removing network bandwidth throttle group from VM. 2017-12-21 07:37:19 (11935): Removing storage controller(s) from VM. 2017-12-21 07:37:19 (11935): Removing VM from VirtualBox. 2017-12-21 07:37:20 (11935): Removing virtual disk drive from VirtualBox. Hypervisor System Log: VM Execution Log: VM Startup Log: VM Trace Log: UUID: 24a70507-9078-46e2-abe8-80c3a9795403 Settings file: '/usr/bin/slots/35/boinc_d9679f03a02df903/boinc_d9679f03a02df903.vbox' 2017-12-21 07:37:00 (11935): Command: VBoxManage -q modifyvm "boinc_d9679f03a02df903" --description "LHCb_25826_1513818422.243445_0" Exit Code: 0 Output: 2017-12-21 07:37:00 (11935): Command: VBoxManage -q modifyvm "boinc_d9679f03a02df903" --memory 1830 Exit Code: 0 Output: 2017-12-21 07:37:00 (11935): Command: VBoxManage -q modifyvm "boinc_d9679f03a02df903" --cpus 12 Exit Code: 0 Output: 2017-12-21 07:37:00 (11935): Command: VBoxManage -q modifyvm "boinc_d9679f03a02df903" --acpi on --ioapic on Exit Code: 0 Output: 2017-12-21 07:37:00 (11935): Command: VBoxManage -q modifyvm "boinc_d9679f03a02df903" --boot1 disk --boot2 dvd --boot3 none --boot4 none Exit Code: 0 Output: 2017-12-21 07:37:00 (11935): Command: VBoxManage -q modifyvm "boinc_d9679f03a02df903" --nic1 nat --natdnsproxy1 on --cableconnected1 off Exit Code: 0 Output: 2017-12-21 07:37:13 (11935): Command: VBoxManage -q modifyvm "boinc_d9679f03a02df903" --cableconnected1 on Exit Code: 0 Output: 2017-12-21 07:37:13 (11935): Command: VBoxManage -q modifyvm "boinc_d9679f03a02df903" --usb off Exit Code: 0 Output: 2017-12-21 07:37:13 (11935): Command: VBoxManage -q modifyvm "boinc_d9679f03a02df903" --uart1 off --uart2 off Exit Code: 0 Output: 2017-12-21 07:37:13 (11935): Command: VBoxManage -q modifyvm "boinc_d9679f03a02df903" --lpt1 off --lpt2 off Exit Code: 0 Output: 2017-12-21 07:37:13 (11935): Command: VBoxManage -q modifyvm "boinc_d9679f03a02df903" --audio none Exit Code: 0 Output: 2017-12-21 07:37:14 (11935): Command: VBoxManage -q modifyvm "boinc_d9679f03a02df903" --clipboard disabled Exit Code: 0 Output: 2017-12-21 07:37:14 (11935): Command: VBoxManage -q modifyvm "boinc_d9679f03a02df903" --draganddrop disabled Exit Code: 0 Output: 2017-12-21 07:37:14 (11935): Command: VBoxManage -q storagectl "boinc_d9679f03a02df903" --name "Hard Disk Controller" --add "ide" --controller "PIIX4" Exit Code: 0 Output: 2017-12-21 07:37:14 (11935): Command: VBoxManage -q storageattach "boinc_d9679f03a02df903" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --setuuid "" --medium "/usr/bin/slots/35/vm_image.vdi" Exit Code: -2135228411 Output: VBoxManage: error: Could not get the storage format of the medium '/usr/bin/slots/35/vm_image.vdi' (VERR_NOT_SUPPORTED) VBoxManage: error: Details: code VBOX_E_IPRT_ERROR (0x80bb0005), component MediumWrap, interface IMedium, callee nsISupports VBoxManage: error: Context: "OpenMedium(Bstr(pszFilenameOrUuid).raw(), enmDevType, enmAccessMode, fForceNewUuidOnOpen, pMedium.asOutParam())" at line 179 of file VBoxManageDisk.cpp VBoxManage: error: Invalid UUID or filename "/usr/bin/slots/35/vm_image.vdi" 2017-12-21 07:37:15 (11935): Command: VBoxManage -q storageattach "boinc_d9679f03a02df903" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --setuuid "" --medium "/usr/bin/slots/35/vm_image.vdi" Exit Code: -2135228411 Output: VBoxManage: error: Could not get the storage format of the medium '/usr/bin/slots/35/vm_image.vdi' (VERR_NOT_SUPPORTED) VBoxManage: error: Details: code VBOX_E_IPRT_ERROR (0x80bb0005), component MediumWrap, interface IMedium, callee nsISupports VBoxManage: error: Context: "OpenMedium(Bstr(pszFilenameOrUuid).raw(), enmDevType, enmAccessMode, fForceNewUuidOnOpen, pMedium.asOutParam())" at line 179 of file VBoxManageDisk.cpp VBoxManage: error: Invalid UUID or filename "/usr/bin/slots/35/vm_image.vdi" 2017-12-21 07:37:16 (11935): Command: VBoxManage -q storageattach "boinc_d9679f03a02df903" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --setuuid "" --medium "/usr/bin/slots/35/vm_image.vdi" Exit Code: -2135228411 Output: VBoxManage: error: Could not get the storage format of the medium '/usr/bin/slots/35/vm_image.vdi' (VERR_NOT_SUPPORTED) VBoxManage: error: Details: code VBOX_E_IPRT_ERROR (0x80bb0005), component MediumWrap, interface IMedium, callee nsISupports VBoxManage: error: Context: "OpenMedium(Bstr(pszFilenameOrUuid).raw(), enmDevType, enmAccessMode, fForceNewUuidOnOpen, pMedium.asOutParam())" at line 179 of file VBoxManageDisk.cpp VBoxManage: error: Invalid UUID or filename "/usr/bin/slots/35/vm_image.vdi" 2017-12-21 07:37:17 (11935): Command: VBoxManage -q storageattach "boinc_d9679f03a02df903" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --setuuid "" --medium "/usr/bin/slots/35/vm_image.vdi" Exit Code: -2135228411 Output: VBoxManage: error: Could not get the storage format of the medium '/usr/bin/slots/35/vm_image.vdi' (VERR_NOT_SUPPORTED) VBoxManage: error: Details: code VBOX_E_IPRT_ERROR (0x80bb0005), component MediumWrap, interface IMedium, callee nsISupports VBoxManage: error: Context: "OpenMedium(Bstr(pszFilenameOrUuid).raw(), enmDevType, enmAccessMode, fForceNewUuidOnOpen, pMedium.asOutParam())" at line 179 of file VBoxManageDisk.cpp VBoxManage: error: Invalid UUID or filename "/usr/bin/slots/35/vm_image.vdi" 2017-12-21 07:37:18 (11935): Command: VBoxManage -q storageattach "boinc_d9679f03a02df903" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --setuuid "" --medium "/usr/bin/slots/35/vm_image.vdi" Exit Code: -2135228411 Output: VBoxManage: error: Could not get the storage format of the medium '/usr/bin/slots/35/vm_image.vdi' (VERR_NOT_SUPPORTED) VBoxManage: error: Details: code VBOX_E_IPRT_ERROR (0x80bb0005), component MediumWrap, interface IMedium, callee nsISupports VBoxManage: error: Context: "OpenMedium(Bstr(pszFilenameOrUuid).raw(), enmDevType, enmAccessMode, fForceNewUuidOnOpen, pMedium.asOutParam())" at line 179 of file VBoxManageDisk.cpp VBoxManage: error: Invalid UUID or filename "/usr/bin/slots/35/vm_image.vdi" 2017-12-21 07:37:19 (11935): Command: VBoxManage -q storageattach "boinc_d9679f03a02df903" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --setuuid "" --medium "/usr/bin/slots/35/vm_image.vdi" Exit Code: -2135228411 Output: VBoxManage: error: Could not get the storage format of the medium '/usr/bin/slots/35/vm_image.vdi' (VERR_NOT_SUPPORTED) VBoxManage: error: Details: code VBOX_E_IPRT_ERROR (0x80bb0005), component MediumWrap, interface IMedium, callee nsISupports VBoxManage: error: Context: "OpenMedium(Bstr(pszFilenameOrUuid).raw(), enmDevType, enmAccessMode, fForceNewUuidOnOpen, pMedium.asOutParam())" at line 179 of file VBoxManageDisk.cpp VBoxManage: error: Invalid UUID or filename "/usr/bin/slots/35/vm_image.vdi" 2017-12-21 07:37:19 (11935): Command: VBoxManage -q snapshot "boinc_d9679f03a02df903" list Exit Code: 0 Output: This machine does not have any snapshots 2017-12-21 07:37:19 (11935): Command: VBoxManage -q bandwidthctl "boinc_d9679f03a02df903" remove "boinc_d9679f03a02df903_net" Exit Code: -2135228415 Output: VBoxManage: error: Could not find a bandwidth group named 'boinc_d9679f03a02df903_net' VBoxManage: error: Details: code VBOX_E_OBJECT_NOT_FOUND (0x80bb0001), component BandwidthControlWrap, interface IBandwidthControl, callee nsISupports VBoxManage: error: Context: "DeleteBandwidthGroup(name.raw())" at line 259 of file VBoxManageBandwidthControl.cpp 2017-12-21 07:37:19 (11935): Command: VBoxManage -q storagectl "boinc_d9679f03a02df903" --name "Hard Disk Controller" --remove Exit Code: 0 Output: 2017-12-21 07:37:20 (11935): Command: VBoxManage -q unregistervm "boinc_d9679f03a02df903" --delete Exit Code: 0 Output: 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100% 2017-12-21 07:37:20 (11935): Command: VBoxManage -q closemedium disk "/usr/bin/slots/35/vm_image.vdi" --delete Exit Code: -2135228411 Output: VBoxManage: error: Could not get the storage format of the medium '/usr/bin/slots/35/vm_image.vdi' (VERR_NOT_SUPPORTED) VBoxManage: error: Details: code VBOX_E_IPRT_ERROR (0x80bb0005), component MediumWrap, interface IMedium, callee nsISupports VBoxManage: error: Context: "OpenMedium(Bstr(pszFilenameOrUuid).raw(), enmDevType, enmAccessMode, fForceNewUuidOnOpen, pMedium.asOutParam())" at line 179 of file VBoxManageDisk.cpp 07:37:25 (11935): called boinc_finish(-2135228411) </stderr_txt> ]]> Proud Founder and member of Have a look at my WebCam |
Send message Joined: 10 Mar 17 Posts: 40 Credit: 108,345 RAC: 0 |
crunched a couple of these last days and almost all of them failed with error message "194 (0x000000C2) EXIT_ABORTED_BY_CLIENT". these are the WUs with the Stderr output: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=821859 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=821868 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=821861 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=821737 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=821875 |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
LHCb_2017_05_05.xml configures a RAM size of 2048 MB via "<memory_size_mb>2048</memory_size_mb>". The VM (2-core) effectively runs with 830 MB, which is sent via sched_reply_lhcathomedev.cern.ch_lhcathome-dev.xml. 830 MB is much less than the 2048 MB of the normal 1-core VMs over at -prod AND it is exactly the same value that is used for Theory. Is it a server side configuration error? |
Send message Joined: 8 Apr 15 Posts: 781 Credit: 12,378,545 RAC: 4,995 |
I decided to give these another test and they seem to be running Valids again but I only have them running on one old 3-core host so it only runs one 2-core task at a time but the last 2 are Valids so I will let that host continue running these multi-core tasks (and it looks like the single core version over at LHC are working so I loaded a few on one of my 8-core hosts for a few days) I can only tell if any of these are even running here by looking at the server status page but that doesn't tell us how they are doing and our stats pages haven't worked/been updated for a LONG time so I can only go by what my hosts are doing here. Mad Scientist For Life |
Send message Joined: 8 Apr 15 Posts: 781 Credit: 12,378,545 RAC: 4,995 |
Well so far I have all 7 Valid tasks but when I look at the Valid 2-core and 3-core tasks several of those the CPU time is less than the Run time. I also checked a Linux running these and some are running 10 hour tasks but most of the tasks are only running around 15 minutes. https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=2206&offset=0&show_names=0&state=4&appid=3 |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
Have one Task running with two Cpus: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2128880 six Jobs are finished so far. |
Send message Joined: 8 Apr 15 Posts: 781 Credit: 12,378,545 RAC: 4,995 |
I just looked at yours and it is a good Valid Axel Run time 13 hours 14 min 31 sec CPU time 16 hours 7 min 14 sec |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Have one Task running with two Cpus: Did you limit the RAM setting somewhere or is it the project's default value? It seems to be much too low for LHCb, especially in a 2-core setup. 2018-07-11 14:05:29 (2580): Setting Memory Size for VM. (830MB) 2018-07-11 14:05:29 (2580): Setting CPU Count for VM. (2) It may be a result of that low RAM setting that your CPU efficiency is only 61 % and your log shows lines like this: 2018-07-12 03:19:44 (2580): VM did not power off when requested. 2018-07-12 03:19:44 (2580): VM was successfully terminated. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
Have no own settings, only default: It is running in combination with Atlas from Production on this Ryzen. In the log of LHCb is a line: CPUID EDX: 0x178bfbff google say: Hardware is not correct detected. Virtualbox 5.2.14 including extension pack. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Just a guess: A corrupt vdi? You may consider to check/clean your vbox environment and/or reset the project to get a fresh vdi. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
Just a guess: LHCb is running the first time on this PC. (Four weeks old). So, the .vdi is downloaded yesterday. The actuell task is running job 29 and 30 with two CPU's at the moment. |
©2024 CERN