Message boards : Theory Application : New Version 5.30
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 7329 - Posted: 14 Jun 2022, 13:52:15 UTC

This update provides a new version of the VboxWrapper which supports the muliattachmode. Please let me know if there are any issues.
ID: 7329 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 862,257
RAC: 70
Message 7331 - Posted: 14 Jun 2022, 20:13:50 UTC - in response to Message 7329.  

The difference file for the Theory tasks in the snapshot folder is growing very quickly after the start -> up to 1.6 GB
The origin Theory_2020_05_08.vdi file is < 800 MB.
Maybe the vdi-file should be updated or this new method is not very useful for the Theory tasks.
ID: 7331 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 484
Credit: 394,839
RAC: 0
Message 7332 - Posted: 14 Jun 2022, 20:55:07 UTC - in response to Message 7331.  
Last modified: 14 Jun 2022, 20:57:19 UTC

The reason for this is that Theory (unlike CMS and ATLAS) has many different types of scientific apps.
Hence, it's VDI file is much smaller and contains not much CVMFS cache data.
As a result the CVMFS data for each scientific app has to be downloaded at the beginning of a task.

CMS and ATLAS VDIs are distributed with a large CVMFS cache.

<edit>
If you run many Theory tasks they benefit from a well configured local HTTP proxy.
</edit>
ID: 7332 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 862,257
RAC: 70
Message 7336 - Posted: 15 Jun 2022, 7:22:36 UTC - in response to Message 7332.  
Last modified: 15 Jun 2022, 7:23:12 UTC

The reason for this is that Theory (unlike CMS and ATLAS) has many different types of scientific apps.
I think that the advantage of not copying the original vdi into the slot-folder is nullified by the overhead of using differencing vdi images.
The only reason to use vboxwrapper_26204 for Theory would be, that you have the same vboxwrapper for all VBox-apps.
ID: 7336 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 484
Credit: 394,839
RAC: 0
Message 7337 - Posted: 15 Jun 2022, 8:34:53 UTC - in response to Message 7336.  

The old method copies the master vdi file as "vm_image.vdi" to each slots directory and adds CVMFS updates and result logs to that vm_image.vdi
The new method uses the same master vdi file for all tasks and adds CVMFS updates and result logs to the task's differencing image.

Hence Theory tasks using the new method avoid writing the master vdi file to each slot.
This is an improvement.
ID: 7337 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
boboviz

Send message
Joined: 24 Oct 19
Posts: 170
Credit: 543,238
RAC: 799
Message 7339 - Posted: 15 Jun 2022, 10:03:32 UTC - in response to Message 7329.  

Please let me know if there are any issues.


I reset all the boinc projects on my pc for space problems on hd.
Now, if i try to download the Theory app i have this error:

<message>
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>vboxwrapper_26204_windows_x86_64.exe</file_name>
<error_code>-224 (permanent HTTP error)</error_code>
<error_message>permanent HTTP error</error_message>
</file_xfer_error>
ID: 7339 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 484
Credit: 394,839
RAC: 0
Message 7340 - Posted: 15 Jun 2022, 11:02:46 UTC - in response to Message 7339.  

Download works fine using wget:
wget --timeout=10 -4 -qdO- http://lhcathome-test.cern.ch/lhcathome-dev/download/vboxwrapper_26204_windows_x86_64.exe >/dev/null
Setting --output-document (outputdocument) to -
DEBUG output created by Wget 1.21.3 on linux-gnu.

Reading HSTS entries from /root/.wget-hsts
URI encoding = »UTF-8«
asking libproxy about url 'http://lhcathome-test.cern.ch/lhcathome-dev/download/vboxwrapper_26204_windows_x86_64.exe'
libproxy suggest to use 'direct://'
Caching lhcathome-test.cern.ch => 188.185.125.137
Releasing 0x000055621d2460d0 (new refcount 0).
Deleting unused 0x000055621d2460d0.
Created socket 3.
Releasing 0x000055621d249130 (new refcount 1).

---request begin---
GET /lhcathome-dev/download/vboxwrapper_26204_windows_x86_64.exe HTTP/1.1
Host: lhcathome-test.cern.ch
User-Agent: Wget/1.21.3
Accept: */*
Accept-Encoding: identity
Connection: Keep-Alive

---request end---

---response begin---
HTTP/1.1 200 OK
Date: Wed, 15 Jun 2022 10:49:05 GMT
Server: Apache
Last-Modified: Mon, 13 Jun 2022 10:49:44 GMT
ETag: "1cb160-5e1520bc8a265"
Accept-Ranges: bytes
Content-Length: 1880416
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: application/octet-stream

---response end---



You may restart your client, check your firewall settings and/or temporarily enable debug flags in cc_config.xml like <file_xfer_debug>, <http_debug>, <http_xfer_debug>.
See:
https://boinc.berkeley.edu/wiki/Client_configuration

Be aware:
Some debug options may quickly swamp your logfile.
ID: 7340 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 7343 - Posted: 15 Jun 2022, 14:13:21 UTC - in response to Message 7336.  

We can always cache the other apps in the image.
ID: 7343 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 862,257
RAC: 70
Message 7344 - Posted: 15 Jun 2022, 14:43:35 UTC - in response to Message 7343.  

We can always cache the other apps in the image.

Are you referring to CVMFS?
At least you could load the most used apps in the image like pythia6 and pythia8.
I suppose rivetvm and plotter are already in the vdi-image?
ID: 7344 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 484
Credit: 394,839
RAC: 0
Message 7345 - Posted: 15 Jun 2022, 15:26:32 UTC - in response to Message 7344.  

I agree.

It would be good to know how much data it would be.
A master vdi with say 5-6 GB (even a bit more) in connection with differencing images might be more efficient than a fresh download for each task.
ID: 7345 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
boboviz

Send message
Joined: 24 Oct 19
Posts: 170
Credit: 543,238
RAC: 799
Message 7347 - Posted: 15 Jun 2022, 15:59:39 UTC - in response to Message 7340.  

You may restart your client, check your firewall settings and/or temporarily enable debug flags in cc_config.xml like <file_xfer_debug>, <http_debug>, <http_xfer_debug>.


Solved with a simple reboot of pc.
It re-download correctly the wrapper and wus are running
ID: 7347 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 1
Message 7349 - Posted: 15 Jun 2022, 19:01:10 UTC - in response to Message 7347.  

ID: 7349 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 862,257
RAC: 70
Message 7351 - Posted: 16 Jun 2022, 6:03:14 UTC

This task has run where creating snapshots was enabled https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3092542

The task survived several actions like suspending, resume, shutdown and restart the system.
ID: 7351 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 7356 - Posted: 16 Jun 2022, 12:51:41 UTC - in response to Message 7351.  

Before this can be considered for the production server, we should build a new image containing all the apps and test again.
ID: 7356 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,969,210
RAC: 0
Message 7358 - Posted: 16 Jun 2022, 17:57:11 UTC
Last modified: 16 Jun 2022, 18:05:16 UTC

My tasks https://lhcathomedev.cern.ch/lhcathome-dev/results.php?userid=196
1 success then a few Postponed:environment needs to be cleaned up (or similar wording)
A Project reset got it working again for a while but, returning home this evening, I find 2 more Postponed. A Boinc restart lets them start again but, although running in Boinc, the VM shows FATAL: could not read from the boot medium! System halted. I suspect they would do nothing useful until timeout so I have Aborted them, which leaves behind the Powered-off vm which has to be manually removed. The Theory_2020_05_08.vdi also needs to be Removed (but kept) to allow the next task to start successfully. (2 instances are attached to that when they are running correctly)
I have 3 cores allocated to Boinc with a maximum of 2 from either LHC or -dev allowed to run concurrently (2 LHC, 1 -dev or 1 LHC, 2 -dev) so 5 consecutive successes suggests that it does sometimes clean up on the way out, but the the Postponed ones suggest this is not always the case. Maybe sometimes being Multi-attached, sometimes singly, is confusing it?
1 -dev & 2 LHC running just now after manually doing the cleanup.

I don't see others reporting similar issues but I hope this input is helpful.

Win 10
ID: 7358 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,969,210
RAC: 0
Message 7360 - Posted: 16 Jun 2022, 21:53:00 UTC - in response to Message 7358.  
Last modified: 16 Jun 2022, 21:55:55 UTC

Conjecture awaiting further observation:
A single task registers the .vdi in VBox Media Manager and the vm attaches and starts up successfully. Starting a 2nd task also successfully attaches to the image and runs (all good so far). If one of those tasks finishes while another is still running, the ending one detaches and a new one attaches (again, all good). If there is continuity of at least one vm attached to the image then there is continued success

BUT if an ending task is not replaced and the last connected vm detaches, such that there is no vm attached to the image, the image remains in Media Manager but subsequent tasks are unable to attach to it, resulting in the Postponed/cleanup error. Manual removal of the image in VBox before a new task starts allows normal service to resume.

Overnight, I have limited LHC to only one running task to test Part 1 so there should always be at least one -dev task attached, with rolling replacement, and I don't expect any problem.
Part 2 will need closer observation to confirm but I won't be able to do that until Friday evening after work as my other host has died so I'm down to only this one.
ID: 7360 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 484
Credit: 394,839
RAC: 0
Message 7362 - Posted: 17 Jun 2022, 8:26:31 UTC - in response to Message 7360.  

At first I thought the issue could be caused by a race condition when more than one vboxwrappers try to register the same vdi.
Hence, I tried to force one on my test client.
No "success" - vboxmanage takes care of concurrent vdi registration requests.


One of your first logs contains the following lines:
2022-06-14 23:43:37 (14208): Error in storage attach (fixed disk - multiattach mode) for VM: -2135228409
Command:
VBoxManage -q storageattach "boinc_b01690b3cb94eaa3" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --mtype multiattach --medium "C:/ProgramData/BOINC/projects/lhcathomedev.cern.ch_lhcathome-dev/Theory_2020_05_08.vdi" 
Output:
VBoxManage.exe: error: Cannot attach medium 'C:\ProgramData\BOINC\projects\lhcathomedev.cern.ch_lhcathome-dev\Theory_2020_05_08.vdi': the media type 'MultiAttach' can only be attached to machines that were created with VirtualBox 4.0 or later
VBoxManage.exe: error: Details: code VBOX_E_INVALID_OBJECT_STATE (0x80bb0007), component SessionMachine, interface IMachine, callee IUnknown
VBoxManage.exe: error: Context: "AttachDevice(Bstr(pszCtl).raw(), port, device, DeviceType_HardDisk, pMedium2Mount)" at line 776 of file VBoxManageStorageController.cpp

"VBoxManage -q storageattach ..." looks fine and should succeed but this is weird since the VM is created with v6.1.34:
"the media type 'MultiAttach' can only be attached to machines that were created with VirtualBox 4.0 or later"

Please locate the file " $HOME/.VirtualBox" mentioned here (https://www.virtualbox.org/manual/ch10.html#vboxconfigdata-global) and post the line starting with "<VirtualBox xmlns="http://www.virtualbox.org/" version="


Did you ever had a VirtualBox version <4.0 installed on that computer?
ID: 7362 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,969,210
RAC: 0
Message 7364 - Posted: 17 Jun 2022, 10:11:37 UTC - in response to Message 7362.  
Last modified: 17 Jun 2022, 10:34:59 UTC

Thanks for looking,
I'll look for that when I get home. I don't know what my earliest version of VBox was but it would have been from the time of the first Theory jobs. There have been many uninstalls and upgrades since then so I wouldn't think there would be any of an old version left over, unless there is some fragment lurking somewhere in registry.

The overnight test didn't work as well as expected, with another Postponed task. I Aborted it and again manually removed the powered-off vm and the image to allow another to start.
I'll report in again when I get home c.17:00UTC
ID: 7364 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 862,257
RAC: 70
Message 7365 - Posted: 17 Jun 2022, 11:02:24 UTC - in response to Message 7360.  

BUT if an ending task is not replaced and the last connected vm detaches, such that there is no vm attached to the image, the image remains in Media Manager but subsequent tasks are unable to attach to it, resulting in the Postponed/cleanup error. Manual removal of the image in VBox before a new task starts allows normal service to resume
I tried to reproduce your problem after a very long running Theory finally finished.
In File Media Manager I had the Theory-vdi (and ATLAS-vdi) visible and no running tasks.
I resumed the not yet started Theory-task. It started normal and ran successfully https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3092970

The from computezrmle requested line in my VirtualBox.xml:
<VirtualBox xmlns="http://www.virtualbox.org/" version="1.12-windows">

The single error I had so far was caused by a memory problem.
8 GB RAM - 1 ATLAS and 2 Theory besides several open programs and Chrome-tabs ;)

btw: When I have no running tasks and I exit VirtualBox Manager and all VBox-processes are stopped, the vdi's in Media Manager are removed automatically.
ID: 7365 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 484
Credit: 394,839
RAC: 0
Message 7367 - Posted: 17 Jun 2022, 11:24:58 UTC - in response to Message 7365.  

Let's focus on Theory for now to get an idea if this is a global issue (maybe only on Windows).

So the current situation to start with is that no Theory vdi is registered in VirtualBox media manager.
Then start 2 Theory tasks concurrently and after a short while check whether both differencing images are registered and the tasks work as expected.



Fine, version="1.12-windows" is >= v4.0
ID: 7367 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Theory Application : New Version 5.30


©2024 CERN