Message boards :
ATLAS Application :
ATLAS vbox v.1.15
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
We just released v1.15 which uses a new vboxwrapper version 26205. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Got a task and forced a "vbox 4.0" condition to test whether vboxwrapper can solve it automatically. Yes. It is solved and the task started as usual. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
29.07.2022 11:44:29 | lhcathome-dev | No tasks are available for ATLAS Simulation Windows |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3103156 1st task had only 1 event and finished with: 2022-07-29 11:43:05 (59171): Guest Log: No HITS file was produced Was this by intention? |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
In Production Atlas-Task stopping for confirm-Error after 7-8 min itsself. When there is no input and less 1 min. CPU, the Task running hours, only the volunteer can stop this task. This is for a handful tasks every day seen. Is it possible to make a correction in this new version? This is a example from last night (5 hours! - two of them started at the same time) https://lhcathome.cern.ch/lhcathome/result.php?resultid=361685586 |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
Not intention on my part, but a new kind of task with updated ATLAS simulation software was added recently to the set of tasks automatically submitted here. I've asked the experts to look into why these tasks fail. I have submitted manually a batch of 20 event tasks to keep the queue full. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,827 RAC: 35 |
I got an ATLAS task with a 132MB pool.root input file, so I expected some more events to process. It says however number of events total 1 and after a short time the job was finished without I could see that single event being processed. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,827 RAC: 35 |
I have submitted manually a batch of 20 event tasks to keep the queue full.Sorry, I fetched 16 of them . . . BOINC's estimated runtime is 13 minutes 34 seconds, but in fact they will need almost 2 hours each. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
This task finished with a HITS file: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3103188 Prior to the start of the task I prepared VirtualBox with a dummy vdi having the same UUID than the ATLAS vdi. The objective was to test whether the new vboxwrapper can deal with those very rare issues. It can: 2022-07-29 13:14:11 (13060): Disk UUID conflicts with an already existing disk. Will set a new UUID for 'ATLAS_vbox_1.15_image.vdi'. The project admin should be informed to do this server side running: vboxmanage clonemedium <inputfile> <outputfile> @David You did it right! The error was intentionally forced! |
Send message Joined: 31 Aug 21 Posts: 13 Credit: 1,118,469 RAC: 0 |
Not too many tasks available ... Le attività sono basse ... Ich würde sie gerne mehr sehen. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,827 RAC: 35 |
David Cameron wrote: We just released v1.15 which uses a new vboxwrapper version 26205.In the file reference of your ATLAS app version description, you still use the tag <open_name> for the vdi-file. <file_ref> <file_name>ATLAS_vbox_1.15_image.vdi</file_name> <open_name>vm_image.vdi</open_name> </file_ref> This open_name is not needed for the multi-attach tasks and can be left out. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
This is from a production Atlas under Win11pro, ending after 7-8 min. with confirm-Error stderr.txt: 2022-07-30 11:20:42 (17436): Guest Log: *** The last 20 lines of the pilot log: *** 2022-07-30 11:20:42 (17436): Guest Log: ---- Retrieve pilot code ---- 2022-07-30 11:20:42 (17436): Guest Log: 2022-07-30 09:20:42,676 [wrapper] Using piloturl: local 2022-07-30 11:20:42 (17436): Guest Log: 2022-07-30 09:20:42,676 [wrapper] Only supporting pilot3 so pilotbase directory: pilot3 2022-07-30 11:20:42 (17436): Guest Log: 2022-07-30 09:20:42,677 [wrapper] piloturl=local so download not needed 2022-07-30 11:20:42 (17436): Guest Log: 2022-07-30 09:20:42,678 [wrapper] local tarball pilot3.tar.gz exists OK 2022-07-30 11:20:42 (17436): Guest Log: tar: Skipping to next header 2022-07-30 11:20:42 (17436): Guest Log: gzip: stdin: unexpected end of file 2022-07-30 11:20:42 (17436): Guest Log: tar: Child returned status 1 2022-07-30 11:20:42 (17436): Guest Log: tar: Error is not recoverable: exiting now 2022-07-30 11:20:42 (17436): Guest Log: 2022-07-30 09:20:42,688 [wrapper] ERROR: pilot extraction failed for pilot3.tar.gz 2022-07-30 11:20:42 (17436): Guest Log: 2022-07-30 09:20:42,689 [wrapper] ERROR: pilot extraction failed for pilot3.tar.gz 2022-07-30 11:20:42 (17436): Guest Log: 2022-07-30 09:20:42,690 [wrapper] FATAL: failed to get pilot code 2022-07-30 11:20:42 (17436): Guest Log: 2022-07-30 09:20:42,691 [wrapper] FATAL: failed to get pilot code 2022-07-30 11:20:42 (17436): Guest Log: 2022-07-30 09:20:42,692 [wrapper] apfmon messages muted 2022-07-30 11:20:42 (17436): Guest Log: 2022-07-30 09:20:42,693 [wrapper] ==== wrapper stdout END ==== 2022-07-30 11:20:42 (17436): Guest Log: 2022-07-30 09:20:42,694 [wrapper] ==== wrapper stderr END ==== 2022-07-30 11:20:42 (17436): Guest Log: 2022-07-30 09:20:42,695 [wrapper] wrapperfault ec=1, duration=0 2022-07-30 11:20:42 (17436): Guest Log: 2022-07-30 09:20:42,696 [wrapper] apfmon messages muted 2022-07-30 11:20:42 (17436): Guest Log: *** Listing of results directory *** |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
This has nothing to do with version 1.15 or vboxwrapper 26205. The word "wrapper" from the logfile refers to another wrapper deeper in the scripts. Since it was a task from -prod it should be reported there. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
There are not so many Atlas-Tasks in -dev to see such a problem. Why, is there no chance for the Atlas-Team to take a deeper look? |
Send message Joined: 24 Oct 19 Posts: 170 Credit: 543,238 RAC: 799 |
We just released v1.15 which uses a new vboxwrapper version 26205. What's differences between 26204 (official version on Boinc site) and 26205?? |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
v26205 includes a workaround to avoid errors like this: VBoxManage.exe: error: Cannot attach medium 'D:\Boinc1\projects\lhcathomedev.cern.ch_lhcathome-dev\ATLAS_vbox_1.14_image.vdi': the media type 'MultiAttach' can only be attached to machines that were created with VirtualBox 4.0 or later VBoxManage.exe: error: Details: code VBOX_E_INVALID_OBJECT_STATE (0x80bb0007), component SessionMachine, interface IMachine, callee IUnknown VBoxManage.exe: error: Context: "AttachDevice(Bstr(pszCtl).raw(), port, device, DeviceType_HardDisk, pMedium2Mount)" at line 776 of file VBoxManageStorageController.cpp See: https://github.com/BOINC/boinc/pull/4843 We are testing the vboxwrapper pre-release from github with ATLAS/Theory/CMS and once they keep stable over the weekend we may get new app_versions on -prod shortly. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
In Production Atlas-Task stopping for confirm-Error after 7-8 min itsself. This is, when two Atlas-Tasks starting in the same second on the same PC! <stderr_txt> 2022-07-30 18:59:49 (16048): Detected: vboxwrapper 26197 2022-07-30 18:59:49 (16048): Detected: BOINC client v7.7 2022-07-30 18:59:49 (16048): Detected: VirtualBox VboxManage Interface (Version: 6.1.36) 2022-07-30 18:59:50 (16048): Successfully copied 'init_data.xml' to the shared directory. 2022-07-30 18:59:51 (16048): Create VM. (boinc_95a7fe58546dd873, slot#5) 2022-07-30 18:59:51 (16048): Setting Memory Size for VM. (10250MB) 2022-07-30 18:59:49 (24000): Detected: vboxwrapper 26197 2022-07-30 18:59:49 (24000): Detected: BOINC client v7.7 2022-07-30 18:59:49 (24000): Detected: VirtualBox VboxManage Interface (Version: 6.1.36) 2022-07-30 18:59:50 (24000): Successfully copied 'init_data.xml' to the shared directory. 2022-07-30 18:59:52 (24000): Create VM. (boinc_e094d3f0813a1289, slot#6) 2022-07-30 18:59:52 (24000): Setting Memory Size for VM. (10250MB) Discovered this after 2 hours runtime with less then 1 min. CPU-Time for both (using 10 CPU's). |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
There are not so many Atlas-Tasks in -dev to see such a problem. Is it ok, to test this, when the new wrapper205 is in production? |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,827 RAC: 35 |
It's always OK to test things, especially if something not common happens.There are not so many Atlas-Tasks in -dev to see such a problem. You have to provide as much information as possible. What else is running on the machine. Do you use a second (Linux) VM on a (Windows) machine and run BOINC from there. Of special interest: What do you see in VM-Console with ALT-F1? I see very often "Checking CVMFS ....", but without a response. Do you start several VM's at once? |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
This two PC's are AMD Ryzen Threadripper PRO 3995WX 64-Cores. There running only 6 Atlas from production with 10 CPU's per Task. One of this two PC's running two Atlas-Tasks from -dev if avalaible (The last four days not!). They have squid avalaible from a Win10-Workstation. 100 Atlas-Tasks per day and PC. This never ending tasks are only a handful per day. You can see this in production for this two PC's. ALT+F1 or ALT+F2 or ALT+F3 in Virtualbox is not avalaible. https://lhcathome.cern.ch/lhcathome/top_hosts.php |
©2024 CERN