Message boards :
Theory Application :
New Version 5.30
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
This update provides a new version of the VboxWrapper which supports the muliattachmode. Please let me know if there are any issues. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
The difference file for the Theory tasks in the snapshot folder is growing very quickly after the start -> up to 1.6 GB The origin Theory_2020_05_08.vdi file is < 800 MB. Maybe the vdi-file should be updated or this new method is not very useful for the Theory tasks. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
The reason for this is that Theory (unlike CMS and ATLAS) has many different types of scientific apps. Hence, it's VDI file is much smaller and contains not much CVMFS cache data. As a result the CVMFS data for each scientific app has to be downloaded at the beginning of a task. CMS and ATLAS VDIs are distributed with a large CVMFS cache. <edit> If you run many Theory tasks they benefit from a well configured local HTTP proxy. </edit> |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
The reason for this is that Theory (unlike CMS and ATLAS) has many different types of scientific apps.I think that the advantage of not copying the original vdi into the slot-folder is nullified by the overhead of using differencing vdi images. The only reason to use vboxwrapper_26204 for Theory would be, that you have the same vboxwrapper for all VBox-apps. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
The old method copies the master vdi file as "vm_image.vdi" to each slots directory and adds CVMFS updates and result logs to that vm_image.vdi The new method uses the same master vdi file for all tasks and adds CVMFS updates and result logs to the task's differencing image. Hence Theory tasks using the new method avoid writing the master vdi file to each slot. This is an improvement. |
Send message Joined: 24 Oct 19 Posts: 170 Credit: 543,238 RAC: 799 |
Please let me know if there are any issues. I reset all the boinc projects on my pc for space problems on hd. Now, if i try to download the Theory app i have this error: <message> |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Download works fine using wget: wget --timeout=10 -4 -qdO- http://lhcathome-test.cern.ch/lhcathome-dev/download/vboxwrapper_26204_windows_x86_64.exe >/dev/null Setting --output-document (outputdocument) to - DEBUG output created by Wget 1.21.3 on linux-gnu. Reading HSTS entries from /root/.wget-hsts URI encoding = »UTF-8« asking libproxy about url 'http://lhcathome-test.cern.ch/lhcathome-dev/download/vboxwrapper_26204_windows_x86_64.exe' libproxy suggest to use 'direct://' Caching lhcathome-test.cern.ch => 188.185.125.137 Releasing 0x000055621d2460d0 (new refcount 0). Deleting unused 0x000055621d2460d0. Created socket 3. Releasing 0x000055621d249130 (new refcount 1). ---request begin--- GET /lhcathome-dev/download/vboxwrapper_26204_windows_x86_64.exe HTTP/1.1 Host: lhcathome-test.cern.ch User-Agent: Wget/1.21.3 Accept: */* Accept-Encoding: identity Connection: Keep-Alive ---request end--- ---response begin--- HTTP/1.1 200 OK Date: Wed, 15 Jun 2022 10:49:05 GMT Server: Apache Last-Modified: Mon, 13 Jun 2022 10:49:44 GMT ETag: "1cb160-5e1520bc8a265" Accept-Ranges: bytes Content-Length: 1880416 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: application/octet-stream ---response end--- You may restart your client, check your firewall settings and/or temporarily enable debug flags in cc_config.xml like <file_xfer_debug>, <http_debug>, <http_xfer_debug>. See: https://boinc.berkeley.edu/wiki/Client_configuration Be aware: Some debug options may quickly swamp your logfile. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
We can always cache the other apps in the image. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
We can always cache the other apps in the image. Are you referring to CVMFS? At least you could load the most used apps in the image like pythia6 and pythia8. I suppose rivetvm and plotter are already in the vdi-image? |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
I agree. It would be good to know how much data it would be. A master vdi with say 5-6 GB (even a bit more) in connection with differencing images might be more efficient than a fresh download for each task. |
Send message Joined: 24 Oct 19 Posts: 170 Credit: 543,238 RAC: 799 |
You may restart your client, check your firewall settings and/or temporarily enable debug flags in cc_config.xml like <file_xfer_debug>, <http_debug>, <http_xfer_debug>. Solved with a simple reboot of pc. It re-download correctly the wrapper and wus are running |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
|
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
This task has run where creating snapshots was enabled https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3092542 The task survived several actions like suspending, resume, shutdown and restart the system. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Before this can be considered for the production server, we should build a new image containing all the apps and test again. |
Send message Joined: 13 Apr 15 Posts: 138 Credit: 2,969,210 RAC: 0 |
My tasks https://lhcathomedev.cern.ch/lhcathome-dev/results.php?userid=196 1 success then a few Postponed:environment needs to be cleaned up (or similar wording) A Project reset got it working again for a while but, returning home this evening, I find 2 more Postponed. A Boinc restart lets them start again but, although running in Boinc, the VM shows FATAL: could not read from the boot medium! System halted. I suspect they would do nothing useful until timeout so I have Aborted them, which leaves behind the Powered-off vm which has to be manually removed. The Theory_2020_05_08.vdi also needs to be Removed (but kept) to allow the next task to start successfully. (2 instances are attached to that when they are running correctly) I have 3 cores allocated to Boinc with a maximum of 2 from either LHC or -dev allowed to run concurrently (2 LHC, 1 -dev or 1 LHC, 2 -dev) so 5 consecutive successes suggests that it does sometimes clean up on the way out, but the the Postponed ones suggest this is not always the case. Maybe sometimes being Multi-attached, sometimes singly, is confusing it? 1 -dev & 2 LHC running just now after manually doing the cleanup. I don't see others reporting similar issues but I hope this input is helpful. Win 10 |
Send message Joined: 13 Apr 15 Posts: 138 Credit: 2,969,210 RAC: 0 |
Conjecture awaiting further observation: A single task registers the .vdi in VBox Media Manager and the vm attaches and starts up successfully. Starting a 2nd task also successfully attaches to the image and runs (all good so far). If one of those tasks finishes while another is still running, the ending one detaches and a new one attaches (again, all good). If there is continuity of at least one vm attached to the image then there is continued success BUT if an ending task is not replaced and the last connected vm detaches, such that there is no vm attached to the image, the image remains in Media Manager but subsequent tasks are unable to attach to it, resulting in the Postponed/cleanup error. Manual removal of the image in VBox before a new task starts allows normal service to resume. Overnight, I have limited LHC to only one running task to test Part 1 so there should always be at least one -dev task attached, with rolling replacement, and I don't expect any problem. Part 2 will need closer observation to confirm but I won't be able to do that until Friday evening after work as my other host has died so I'm down to only this one. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
At first I thought the issue could be caused by a race condition when more than one vboxwrappers try to register the same vdi. Hence, I tried to force one on my test client. No "success" - vboxmanage takes care of concurrent vdi registration requests. One of your first logs contains the following lines: 2022-06-14 23:43:37 (14208): Error in storage attach (fixed disk - multiattach mode) for VM: -2135228409 Command: VBoxManage -q storageattach "boinc_b01690b3cb94eaa3" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --mtype multiattach --medium "C:/ProgramData/BOINC/projects/lhcathomedev.cern.ch_lhcathome-dev/Theory_2020_05_08.vdi" Output: VBoxManage.exe: error: Cannot attach medium 'C:\ProgramData\BOINC\projects\lhcathomedev.cern.ch_lhcathome-dev\Theory_2020_05_08.vdi': the media type 'MultiAttach' can only be attached to machines that were created with VirtualBox 4.0 or later VBoxManage.exe: error: Details: code VBOX_E_INVALID_OBJECT_STATE (0x80bb0007), component SessionMachine, interface IMachine, callee IUnknown VBoxManage.exe: error: Context: "AttachDevice(Bstr(pszCtl).raw(), port, device, DeviceType_HardDisk, pMedium2Mount)" at line 776 of file VBoxManageStorageController.cpp "VBoxManage -q storageattach ..." looks fine and should succeed but this is weird since the VM is created with v6.1.34: "the media type 'MultiAttach' can only be attached to machines that were created with VirtualBox 4.0 or later" Please locate the file " $HOME/.VirtualBox" mentioned here (https://www.virtualbox.org/manual/ch10.html#vboxconfigdata-global) and post the line starting with "<VirtualBox xmlns="http://www.virtualbox.org/" version=" Did you ever had a VirtualBox version <4.0 installed on that computer? |
Send message Joined: 13 Apr 15 Posts: 138 Credit: 2,969,210 RAC: 0 |
Thanks for looking, I'll look for that when I get home. I don't know what my earliest version of VBox was but it would have been from the time of the first Theory jobs. There have been many uninstalls and upgrades since then so I wouldn't think there would be any of an old version left over, unless there is some fragment lurking somewhere in registry. The overnight test didn't work as well as expected, with another Postponed task. I Aborted it and again manually removed the powered-off vm and the image to allow another to start. I'll report in again when I get home c.17:00UTC |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
BUT if an ending task is not replaced and the last connected vm detaches, such that there is no vm attached to the image, the image remains in Media Manager but subsequent tasks are unable to attach to it, resulting in the Postponed/cleanup error. Manual removal of the image in VBox before a new task starts allows normal service to resumeI tried to reproduce your problem after a very long running Theory finally finished. In File Media Manager I had the Theory-vdi (and ATLAS-vdi) visible and no running tasks. I resumed the not yet started Theory-task. It started normal and ran successfully https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3092970 The from computezrmle requested line in my VirtualBox.xml: <VirtualBox xmlns="http://www.virtualbox.org/" version="1.12-windows"> The single error I had so far was caused by a memory problem. 8 GB RAM - 1 ATLAS and 2 Theory besides several open programs and Chrome-tabs ;) btw: When I have no running tasks and I exit VirtualBox Manager and all VBox-processes are stopped, the vdi's in Media Manager are removed automatically. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Let's focus on Theory for now to get an idea if this is a global issue (maybe only on Windows). So the current situation to start with is that no Theory vdi is registered in VirtualBox media manager. Then start 2 Theory tasks concurrently and after a short while check whether both differencing images are registered and the tasks work as expected. Fine, version="1.12-windows" is >= v4.0 |
©2024 CERN