Message boards :
General Discussion :
Vboxwrapper race mitigation
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Vboxwrapper used for CMS v61.01 and Theory 6.01 introduces a short living lock to protect multiattach disk operations against race conditions. Those can happen when no task from a distinct subproject is running and a BOINC client then starts a couple of them concurrently. Root cause for the race condition is a design decision in VirtualBox that does not allow to attach a 'mutiattach' virtual disk within 1 step. Vboxwrapper currently used here reports to be v26207. In fact, it is a nightly build from BOINC on github that is based on v26207 but includes the relevant PRs 5571 and 5598. Once available the final version will report an updated version number, most likely 26208. During normal operation there's no difference between v26206 (used before) and v26207+. Relevant information can be found in stderr.txt. It looks like this: 2024-04-27 08:53:41 (15636): Adding virtual disk drive to VM. (Theory_2024_04_26_dev.vdi) 2024-04-27 08:53:48 (15636): Attempts: 5 The attempts line is suppressed if the lock can be set at the 1st attempt. Otherwise it shows how often a vboxwrapper instance had to go through the 'lock acquire' loop until it could get the lock. Each vboxwrapper that can't get the lock sleeps for a short period of time until the next attempt. A timeout (currently 90 s) avoids an endless loop. Messages like this are still included in vbox_trace.txt to indicate they can be identified: Command: VBoxManage -q storageattach "boinc_41406568c2ab1634" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --mtype multiattach --medium "/home/boinc9/BOINC_TEST/projects/lhcathomedev.cern.ch_lhcathome-dev/Theory_2024_04_26_dev.vdi" Exit Code: -2135228409 Output: VBoxManage: error: Cannot attach medium '/home/boinc9/BOINC_TEST/projects/lhcathomedev.cern.ch_lhcathome-dev/Theory_2024_04_26_dev.vdi': the media type 'MultiAttach' can only be attached to machines that were created with VirtualBox 4.0 or later VBoxManage: error: Details: code VBOX_E_INVALID_OBJECT_STATE (0x80bb0007), component SessionMachine, interface IMachine, callee nsISupports VBoxManage: error: Context: "AttachDevice(Bstr(pszCtl).raw(), port, device, DeviceType_HardDisk, pMedium2Mount)" at line 781 of file VBoxManageStorageController.cpp The cleanup follows immediately and looks like this. Here, vboxwrapper also automatically removes 3 child disk orphans: 2024-04-27 08:53:41 (15633): Command: VBoxManage -q showhdinfo "/home/boinc9/BOINC_TEST/projects/lhcathomedev.cern.ch_lhcathome-dev/Theory_2024_04_26_dev.vdi" Exit Code: 0 Output: UUID: 09e7e89e-310f-45d3-b402-27d8c420e14e Parent UUID: base State: created Type: multiattach Location: /home/boinc9/BOINC_TEST/projects/lhcathomedev.cern.ch_lhcathome-dev/Theory_2024_04_26_dev.vdi Storage format: VDI Format variant: dynamic default Capacity: 20480 MBytes Size on disk: 781 MBytes Encryption: disabled Property: AllocationBlockSize=1048576 Child UUIDs: 4a8a5d30-405d-4666-89ee-fd139047c9c5 d6e4263f-9efb-40ce-9193-c5a2d39cf558 ea7a4528-2a03-48a8-9de4-280c6eb77d8c 2024-04-27 08:53:41 (15633): Command: VBoxManage -q showhdinfo "4a8a5d30-405d-4666-89ee-fd139047c9c5" Exit Code: 0 Output: UUID: 4a8a5d30-405d-4666-89ee-fd139047c9c5 Parent UUID: 09e7e89e-310f-45d3-b402-27d8c420e14e State: inaccessible Access Error: Could not open the medium '/home/boinc9/BOINC_TEST/slots/1/boinc_foobar0816/Snapshots/{4a8a5d30-405d-4666-89ee-fd139047c9c5}.vdi'. VD: error VERR_FILE_NOT_FOUND opening image file '/home/boinc9/BOINC_TEST/slots/1/boinc_foobar0816/Snapshots/{4a8a5d30-405d-4666-89ee-fd139047c9c5}.vdi' (VERR_FILE_NOT_FOUND) Type: normal (differencing) Auto-Reset: off Location: /home/boinc9/BOINC_TEST/slots/1/boinc_foobar0816/Snapshots/{4a8a5d30-405d-4666-89ee-fd139047c9c5}.vdi Storage format: VDI Format variant: dynamic default Capacity: 0 MBytes Size on disk: 0 MBytes Encryption: disabled Property: AllocationBlockSize= 2024-04-27 08:53:41 (15633): Command: VBoxManage -q closemedium disk "/home/boinc9/BOINC_TEST/slots/1/boinc_foobar0816/Snapshots/{4a8a5d30-405d-4666-89ee-fd139047c9c5}.vdi" Exit Code: 0 Output: 2024-04-27 08:53:41 (15633): Command: VBoxManage -q showhdinfo "d6e4263f-9efb-40ce-9193-c5a2d39cf558" Exit Code: 0 Output: UUID: d6e4263f-9efb-40ce-9193-c5a2d39cf558 Parent UUID: 09e7e89e-310f-45d3-b402-27d8c420e14e State: inaccessible Access Error: Could not open the medium '/home/boinc9/BOINC_TEST/slots/2/boinc_foobar0817/Snapshots/{d6e4263f-9efb-40ce-9193-c5a2d39cf558}.vdi'. VD: error VERR_FILE_NOT_FOUND opening image file '/home/boinc9/BOINC_TEST/slots/2/boinc_foobar0817/Snapshots/{d6e4263f-9efb-40ce-9193-c5a2d39cf558}.vdi' (VERR_FILE_NOT_FOUND) Type: normal (differencing) Auto-Reset: off Location: /home/boinc9/BOINC_TEST/slots/2/boinc_foobar0817/Snapshots/{d6e4263f-9efb-40ce-9193-c5a2d39cf558}.vdi Storage format: VDI Format variant: dynamic default Capacity: 0 MBytes Size on disk: 0 MBytes Encryption: disabled Property: AllocationBlockSize= 2024-04-27 08:53:42 (15633): Command: VBoxManage -q closemedium disk "/home/boinc9/BOINC_TEST/slots/2/boinc_foobar0817/Snapshots/{d6e4263f-9efb-40ce-9193-c5a2d39cf558}.vdi" Exit Code: 0 Output: 2024-04-27 08:53:42 (15633): Command: VBoxManage -q showhdinfo "ea7a4528-2a03-48a8-9de4-280c6eb77d8c" Exit Code: 0 Output: UUID: ea7a4528-2a03-48a8-9de4-280c6eb77d8c Parent UUID: 09e7e89e-310f-45d3-b402-27d8c420e14e State: inaccessible Access Error: Could not open the medium '/home/boinc9/BOINC_TEST/slots/3/boinc_foobar0818/Snapshots/{ea7a4528-2a03-48a8-9de4-280c6eb77d8c}.vdi'. VD: error VERR_FILE_NOT_FOUND opening image file '/home/boinc9/BOINC_TEST/slots/3/boinc_foobar0818/Snapshots/{ea7a4528-2a03-48a8-9de4-280c6eb77d8c}.vdi' (VERR_FILE_NOT_FOUND) Type: normal (differencing) Auto-Reset: off Location: /home/boinc9/BOINC_TEST/slots/3/boinc_foobar0818/Snapshots/{ea7a4528-2a03-48a8-9de4-280c6eb77d8c}.vdi Storage format: VDI Format variant: dynamic default Capacity: 0 MBytes Size on disk: 0 MBytes Encryption: disabled Property: AllocationBlockSize= 2024-04-27 08:53:42 (15633): Command: VBoxManage -q closemedium disk "/home/boinc9/BOINC_TEST/slots/3/boinc_foobar0818/Snapshots/{ea7a4528-2a03-48a8-9de4-280c6eb77d8c}.vdi" Exit Code: 0 Output: 2024-04-27 08:53:42 (15633): Command: VBoxManage -q closemedium disk "/home/boinc9/BOINC_TEST/projects/lhcathomedev.cern.ch_lhcathome-dev/Theory_2024_04_26_dev.vdi" Exit Code: 0 Output: After cleanup vboxwrapper can attach the vdi and set the 'multiattach' flag: 2024-04-27 08:53:42 (15633): Command: VBoxManage -q storageattach "boinc_41406568c2ab1634" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --medium "/home/boinc9/BOINC_TEST/projects/lhcathomedev.cern.ch_lhcathome-dev/Theory_2024_04_26_dev.vdi" Exit Code: 0 Output: 2024-04-27 08:53:43 (15633): Command: VBoxManage -q storageattach "boinc_41406568c2ab1634" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --medium none Exit Code: 0 Output: 2024-04-27 08:53:43 (15633): Command: VBoxManage -q storageattach "boinc_41406568c2ab1634" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --mtype multiattach --medium "/home/boinc9/BOINC_TEST/projects/lhcathomedev.cern.ch_lhcathome-dev/Theory_2024_04_26_dev.vdi" Exit Code: 0 Output: The same lock is also set when vboxwrapper deregisters a VM that uses a multiattach disk. Although concurrent operations are rare, they may appear as 'Attempts: n' close to the end of stderr.txt. |
©2024 CERN