Message boards :
Theory Application :
New Version 5.30
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 28 Jul 16 Posts: 478 Credit: 394,720 RAC: 79 |
I'm obviously doing something wrong with the graceful shutdown thing No. You do it right. You get computation errors because the task/VM shuts down before it has returned a result file to BOINC. This leads to: <message> upload failure: <file_xfer_error> <file_name>Theory_2390-1129919-264_0_r791023812_result</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> </message> 2 dev and 1 Production currently running happily Sooner or later they will fail, at least either dev or prod. Please be patient and don't run dev and prod concurrently until Laurence fixed it (early next week I guess). |
Send message Joined: 13 Apr 15 Posts: 138 Credit: 2,945,852 RAC: 0 |
Ah, ok. I was hoping, perhaps too optimistically, that it would be more elegant, with a return of the partially completed work. |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 850,198 RAC: 510 |
Ah, ok. I was hoping, perhaps too optimistically, that it would be more elegant, with a return of the partially completed work.It was graceful in the past, but since the task-lifetime is extended to 10 days instead of the earlier 18 hours, the shutdown procedure is very rarely used by the wrapper itself. It means that if you have such a rarely usefull long running job, it will be killed after 10 days and error out, invalid, no credit and thank you for your time and Watts. I removed (when not testing) the 10 day limit, so I decide myself whether a task is useful and will not run too long. |
Send message Joined: 24 Oct 19 Posts: 155 Credit: 341,141 RAC: 81 |
Do you plan to pass this new app to LHC@Home project? |
Send message Joined: 22 Apr 16 Posts: 673 Credit: 1,915,391 RAC: 2,317 |
This update provides a new version of the VboxWrapper which supports the muliattachmode. Please let me know if there are any issues. Have started one Theory and seeing this message in stderr.txt 2022-06-29 05:04:38 (5020): Detected: vboxwrapper 26204 2022-06-29 05:04:38 (5020): Detected: BOINC client v7.16.20 2022-06-29 05:04:39 (5020): Detected: VirtualBox VboxManage Interface (Version: 6.1.34) 2022-06-29 05:04:39 (5020): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds) 2022-06-29 05:04:40 (5020): Guest Log: BIOS: VirtualBox 6.1.34 2022-06-29 05:04:40 (5020): Guest Log: CPUID EDX: 0x178bfbff 2022-06-29 05:04:40 (5020): Guest Log: BIOS: No PCI IDE controller, not probing IDE 2022-06-29 05:04:40 (5020): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032 2022-06-29 05:04:40 (5020): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=80 2022-06-29 05:04:40 (5020): Guest Log: BIOS: Boot from Hard Disk 0 failed 2022-06-29 05:04:40 (5020): Guest Log: BIOS: Boot : bseqnr=2, bootseq=0003 2022-06-29 05:04:40 (5020): Guest Log: BIOS: CDROM boot failure code : 0002 2022-06-29 05:04:40 (5020): Guest Log: BIOS: Boot from CD-ROM failed 2022-06-29 05:04:40 (5020): Guest Log: Could not read from the boot medium! System halted. 2022-06-29 05:04:40 (5020): Starting VM using VBoxManage interface. (boinc_9f8eee29257d3a7a, slot#9) 2022-06-29 05:04:51 (5020): Successfully started VM. (PID = '1916') 2022-06-29 05:04:51 (5020): Reporting VM Process ID to BOINC. 2022-06-29 05:04:51 (5020): VM state change detected. (old = 'poweredoff', new = 'running') 2022-06-29 05:04:51 (5020): Preference change detected 2022-06-29 05:04:51 (5020): Setting CPU throttle for VM. (100%) 2022-06-29 05:04:52 (5020): Setting checkpoint interval to 1200 seconds. (Higher value of (Preference: 1200 seconds) or (Vbox_job.xml: 600 seconds)) Running task have no RDP in Boinc:https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=2191880 |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 850,198 RAC: 510 |
From the result: Another VirtualBox management application has locked the session for this VM. BOINC cannot properly monitor this VM and so this job will be aborted. 2022-06-28 20:52:50 (8028): Could not create VM 2022-06-28 20:52:50 (8028): ERROR: VM failed to start 2022-06-28 20:52:55 (8028): NOTE: VM session lock error encountered. BOINC will be notified that it needs to clean up the environment. This might be a temporary problem and so this job will be rescheduled for another time. I think you have to cleanup yourself. Best to reset the dev-project. Clean evt. project directory, slots and with Virtual Media Manager remnants of disks etc. |
Send message Joined: 22 Apr 16 Posts: 673 Credit: 1,915,391 RAC: 2,317 |
Thanks, have cleaned old Theory tasks running from Production with SIGUSR1 Error. Now four CMS from -dev are running well. When they are finished making a reboot of the machine. Got this morning also a Win10pro optional update. Have also a CentOS8-VM with emergency atm, because of Hyper-V testing last week. This Computer is for all testing (8-Core). After this reboot will try a Theory in -dev again AND a Atlas ;-). |
Send message Joined: 22 Apr 16 Posts: 673 Credit: 1,915,391 RAC: 2,317 |
CMS is running, Atlas have no Tasks and Theory: Theory_2390-1115156-266 Status Verschoben:VM environment needs to be cleaned up. Have a clean Virtualboxmanager. Theory show entry for Multiattachmode (Theory_2020_05_08.vdi). FATAL: Could not read from the boot medium! System halted. |
Send message Joined: 22 Apr 16 Posts: 673 Credit: 1,915,391 RAC: 2,317 |
2022-07-08 06:59:44 (19180): Adding storage controller(s) to VM. 2022-07-08 06:59:44 (19180): Adding virtual disk drive to VM. (Theory_2020_05_08.vdi) 2022-07-08 07:00:17 (19180): Error in storage attach (fixed disk - multiattach mode) for VM: -2135228409 Command: VBoxManage -q storageattach "boinc_4c0d24e8ecb45ebb" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --mtype multiattach --medium "C:/ProgramData/BOINC/projects/lhcathomedev.cern.ch_lhcathome-dev/Theory_2020_05_08.vdi" Output: VBoxManage.exe: error: Cannot attach medium 'C:\ProgramData\BOINC\projects\lhcathomedev.cern.ch_lhcathome-dev\Theory_2020_05_08.vdi': the media type 'MultiAttach' can only be attached to machines that were created with VirtualBox 4.0 or later https://www.virtualbox.org/ticket/18296 |
Send message Joined: 28 Jul 16 Posts: 478 Credit: 394,720 RAC: 79 |
It's already implemented that way: https://github.com/BOINC/boinc/blob/client_release/7/7.20/samples/vboxwrapper/vbox_vboxmanage.cpp#L542-L592 |
Send message Joined: 22 Apr 16 Posts: 673 Credit: 1,915,391 RAC: 2,317 |
Using multiattach come from which Volunteer? Boinc 5.2.44 running without postponed! |
Send message Joined: 22 Apr 16 Posts: 673 Credit: 1,915,391 RAC: 2,317 |
Virtualbox 6.1.36 Theory postponed after 46 sec. Now 1 Atlas-Production AND 2 CMS from -dev! |
Send message Joined: 22 Apr 16 Posts: 673 Credit: 1,915,391 RAC: 2,317 |
Die Einstellungen konnten nicht gesichert werden. Cannot attach medium 'S:\ProgramData\BOINC\projects\lhcathomedev.cern.ch_lhcathome-dev\Theory_2020_05_08.vdi': the media type 'MultiAttach' can only be attached to machines that were created with VirtualBox 4.0 or later. Fehlercode: VBOX_E_INVALID_OBJECT_STATE (0x80BB0007) Komponente: SessionMachine Interface: IMachine {85632c68-b5bb-4316-a900-5eb28d3413df} |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 850,198 RAC: 510 |
It could be that the two Theory_2020_05_08.vdi's from dev and production have the same UUID VirtualBox don't like that and could be a reason for your problem. Find the path to vboxmanage.exe, go there with a cmd.box and check that by using the command: vboxmanage.exe S:\ProgramData\BOINC\projects\lhcathome.cern.ch_lhcathome\Theory_2020_05_08.vdi and vboxmanage.exe S:\ProgramData\BOINC\projects\lhcathomedev.cern.ch_lhcathome-dev\Theory_2020_05_08.vdi |
Send message Joined: 22 Apr 16 Posts: 673 Credit: 1,915,391 RAC: 2,317 |
This message is, when Theory_2020_05_08.vdi is connected manual in Settings of the Virtualbox-manager after postponed (43 sec. after start of the -dev task). There is no .vdi connected from the Theory-task. 2022-07-21 21:14:07 (5288): Adding virtual disk drive to VM. (Theory_2020_05_08.vdi) 2022-07-21 21:14:40 (5288): Error in storage attach (fixed disk - multiattach mode) for VM: -2135228409 Command: VBoxManage -q storageattach "boinc_34e4e5af62a61d18" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --mtype multiattach --medium "S:/ProgramData/BOINC/projects/lhcathomedev.cern.ch_lhcathome-dev/Theory_2020_05_08.vdi" CMS have connected the multiattach file CMS_2022_06_22.vdi correct and the different .vdi is connected in snapshot-Folder of the Boinc-Task folder. |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 850,198 RAC: 510 |
Sorry maeax, In my commands in my previous post, I forgot the command showmediuminfo you have to give after vboxmanage.exe and before BOINC's virtual disk. But even when the uuid's are different the postponed error may occur: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3101226 Your bolded line about virtualbox 4.0 is in my opinion a wrong interpretation of the error by VirtualBox. |
Send message Joined: 22 Apr 16 Posts: 673 Credit: 1,915,391 RAC: 2,317 |
Ok Crystal, but we two stay alone for Windows and multiattach. CMS, no problems, can be tranfered to Production. The other two (Atlas and Theory) need more investigation. |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 850,198 RAC: 510 |
CMS, no problems, can be tranfered to Production.I'm not so sure. It's hard to exactly reproduce the error, so it's really not known what's causing the error. Most of the time the tasks are running OK, but not always. CMS runs between 12 and 18 hours, so the creation of a VM is done not so often as with Theory- and these ATLAS test-tasks. The only we know now: The problem is at the start of a multi-attach VM where no HD can be attached, although the Hard disk controller is added to the VM. If we can't find the cause of the problem, a solution could be to rewrite vboxwrapper to abort a task instead of postpone the task. Last Theory error: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3101191 I'll try to reproduce the error with CMS, but for that I'll have to shorten the CMS-tasks to force creation of new CMS VM's more often. This will lead to CMS-erros, but at the end this is a development system, so Ivan has to deal with it. |
Send message Joined: 22 Apr 16 Posts: 673 Credit: 1,915,391 RAC: 2,317 |
the media type 'MultiAttach' can only be attached to machines that were created with VirtualBox 4.0 or later. for me it seem a timing problem (Atlas and Theory). There need a wait of about 10 sec. to mount HD. |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 850,198 RAC: 510 |
I'll try to reproduce the error with CMS, but for that I'll have to shorten the CMS-tasks to force creation of new CMS VM's more often. That was fast.https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3101384 and a second one because this one started before I could cleanup: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3101387 |
©2024 CERN