Message boards :
ATLAS Application :
ATLAS vbox v.1.13
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
Version 1.13 is now available for Windows and Linux. It contains a fix in the bootstrapping that may help with CVMFS problems some people were having (see https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=569. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
atm:30.06.2022 13:32:55 | lhcathome-dev | No tasks are available for ATLAS Simulation |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 61 |
I got 1 task on my laptop https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=4600&offset=0&show_names=0&state=0&appid=5 It started OK: 2022-06-30 13:28:43 (11344): Detected: vboxwrapper 26204 2022-06-30 13:28:43 (11344): Detected: BOINC client v7.20.0 2022-06-30 13:28:43 (11344): Detected: VirtualBox VboxManage Interface (Version: 6.1.34) 2022-06-30 13:28:44 (11344): Successfully copied 'init_data.xml' to the shared directory. 2022-06-30 13:28:44 (11344): Create VM. (boinc_448a99ec88da2bca, slot#1) 2022-06-30 13:28:45 (11344): Setting Memory Size for VM. (4800MB) 2022-06-30 13:28:45 (11344): Setting CPU Count for VM. (2) 2022-06-30 13:28:45 (11344): Setting Chipset Options for VM. 2022-06-30 13:28:45 (11344): Setting Graphics Controller Options for VM. 2022-06-30 13:28:46 (11344): Setting Boot Options for VM. 2022-06-30 13:28:46 (11344): Setting Network Configuration for NAT. 2022-06-30 13:28:46 (11344): Enabling VM Network Access. 2022-06-30 13:28:46 (11344): Disabling USB Support for VM. 2022-06-30 13:28:47 (11344): Disabling COM Port Support for VM. 2022-06-30 13:28:47 (11344): Disabling LPT Port Support for VM. 2022-06-30 13:28:47 (11344): Disabling Audio Support for VM. 2022-06-30 13:28:47 (11344): Disabling Clipboard Support for VM. 2022-06-30 13:28:48 (11344): Disabling Drag and Drop Support for VM. 2022-06-30 13:28:48 (11344): Adding storage controller(s) to VM. 2022-06-30 13:28:48 (11344): Adding virtual disk drive to VM. (ATLAS_vbox_1.13_image.vdi) 2022-06-30 13:28:49 (11344): Adding VirtualBox Guest Additions to VM. 2022-06-30 13:28:50 (11344): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB) 2022-06-30 13:28:50 (11344): forwarding host port 51150 to guest port 80 2022-06-30 13:28:50 (11344): Enabling remote desktop for VM. 2022-06-30 13:28:51 (11344): Enabling shared directory for VM. 2022-06-30 13:28:51 (11344): Starting VM using VBoxManage interface. (boinc_448a99ec88da2bca, slot#1) 2022-06-30 13:29:02 (11344): Successfully started VM. (PID = '10080') 2022-06-30 13:29:02 (11344): Reporting VM Process ID to BOINC. 2022-06-30 13:29:02 (11344): Guest Log: BIOS: VirtualBox 6.1.34 2022-06-30 13:29:02 (11344): Guest Log: CPUID EDX: 0x178bfbff 2022-06-30 13:29:02 (11344): Guest Log: BIOS: No PCI IDE controller, not probing IDE 2022-06-30 13:29:02 (11344): Guest Log: BIOS: AHCI 0-P#0: PCHS=16383/16/63 LCHS=1024/255/63 0x0000000002800000 sectors 2022-06-30 13:29:02 (11344): VM state change detected. (old = 'poweredoff', new = 'running') 2022-06-30 13:29:02 (11344): Detected: Web Application Enabled (http://localhost:51150) 2022-06-30 13:29:02 (11344): Detected: Remote Desktop Enabled (localhost:51151) 2022-06-30 13:29:02 (11344): Preference change detected 2022-06-30 13:29:02 (11344): Setting CPU throttle for VM. (100%) 2022-06-30 13:29:03 (11344): Setting checkpoint interval to 900 seconds. (Higher value of (Preference: 120 seconds) or (Vbox_job.xml: 900 seconds)) 2022-06-30 13:29:04 (11344): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032 2022-06-30 13:29:04 (11344): Guest Log: BIOS: Booting from Hard Disk... 2022-06-30 13:29:07 (11344): Guest Log: BIOS: KBD: unsupported int 16h function 03 2022-06-30 13:29:07 (11344): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=81 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=81 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=82 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=82 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=83 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=83 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=84 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=84 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=85 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=85 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=86 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=86 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=87 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=87 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=88 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=88 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=89 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=89 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8a 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8a 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8b 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8b 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8c 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8c 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8d 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8d 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8e 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8e 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8f 2022-06-30 13:29:07 (11344): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8f 2022-06-30 13:29:17 (11344): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds 2022-06-30 13:29:17 (11344): Guest Log: vboxguest: misc device minor 58, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000) 2022-06-30 13:29:31 (11344): Guest Log: Checking CVMFS... 2022-06-30 13:29:41 (11344): Guest Log: CVMFS is ok 2022-06-30 13:29:41 (11344): Guest Log: Mounting shared directory 2022-06-30 13:29:42 (11344): Guest Log: Copying input files 2022-06-30 13:29:44 (11344): Guest Log: Copied input files into RunAtlas. 2022-06-30 13:29:49 (11344): Guest Log: copied the webapp to /var/www 2022-06-30 13:29:49 (11344): Guest Log: This VM did not configure a local http proxy via BOINC. 2022-06-30 13:29:49 (11344): Guest Log: Small home clusters do not require a local http proxy but it is suggested if 2022-06-30 13:29:49 (11344): Guest Log: more than 10 cores throughout the same LAN segment are regularly running ATLAS like tasks. 2022-06-30 13:29:49 (11344): Guest Log: Further information can be found at the LHC@home message board. 2022-06-30 13:29:49 (11344): Guest Log: Running cvmfs_config stat atlas.cern.ch 2022-06-30 13:29:50 (11344): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2022-06-30 13:29:50 (11344): Guest Log: 2.6.3.0 1458 0 29368 106212 3 1 1499279 4096000 0 65024 0 102 99.0196 639 1226 http://s1cern-cvmfs.openhtc.io/cvmfs/atlas.cern.ch DIRECT 1 2022-06-30 13:29:50 (11344): Guest Log: ATHENA_PROC_NUMBER=2 2022-06-30 13:29:50 (11344): Guest Log: *** Starting ATLAS job. (PandaID=5510419938 taskID=NULL&cor) *** 2022-06-30 13:29:51 (11344): Guest Log: VBoxService 5.2.32 r132073 (verbosity: 0) linux.amd64 (Jul 12 2019 10:32:28) release log 2022-06-30 13:29:51 (11344): Guest Log: 00:00:00.000268 main Log opened 2022-06-30T13:29:48.773140000Z 2022-06-30 13:29:51 (11344): Guest Log: 00:00:00.000432 main OS Product: Linux 2022-06-30 13:29:51 (11344): Guest Log: 00:00:00.000493 main OS Release: 3.10.0-957.27.2.el7.x86_64 2022-06-30 13:29:51 (11344): Guest Log: 00:00:00.000548 main OS Version: #1 SMP Mon Jul 29 17:46:05 UTC 2019 2022-06-30 13:29:51 (11344): Guest Log: 00:00:00.000603 main Executable: /opt/VBoxGuestAdditions-5.2.32/sbin/VBoxService 2022-06-30 13:29:51 (11344): Guest Log: 00:00:00.000605 main Process ID: 2187 2022-06-30 13:29:51 (11344): Guest Log: 00:00:00.000606 main Package type: LINUX_64BITS_GENERIC 2022-06-30 13:29:51 (11344): Guest Log: 00:00:00.002586 main 5.2.32 r132073 started. Verbose level = 0 2022-06-30 13:30:01 (11344): Guest Log: 00:00:10.008691 timesync vgsvcTimeSyncWorker: Radical guest time change: -7 188 330 661 000ns (GuestNow=1 656 588 600 449 933 000 ns GuestLast=1 656 595 788 780 594 000 ns fSetTimeLastLoop=true ) Fishing for tasks for my main cruncher without success so far. https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=4547 |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 61 |
I tested the new application running 9 tasks. I did not see any strange CVMFS-behaviour, only the times between Checking CVMFS... and CVMFS is ok vary very much. From 4 seconds up to 62 seconds. Maybe because of busy system and/or network load. The only peculiarity I noted already yesterday and digged into further: When you have only 2 events and e.g. ATHENA_PROC_NUMBER=4 (VM has 4 processors) there seems to be only 1 worker doing the 2 events in a row. Also no worker info is displayed in Console ALT-F2. I reduced in app_config.xml the #cpu to 2 and all was 'normal' (last 4 ATLAS-tasks). ALT-F2 displays worker-1 and -2 processing or event took ... sec. and also 'Processing HITS file...' was shown. |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
I think this is because the split into parallel processes only happens after the first event is processed. At this point most of the necessary data and libraries are loaded into memory and so the sub-processes can then all share the memory. So if the task is only 2 events then there is only 1 event left to process in the parallel stage. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
The logfiles seem to be fine. @David Can we have more tasks to test the bigger machines under load? |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 61 |
I think this is because the split into parallel processes only happens after the first event is processed. At this point most of the necessary data and libraries are loaded into memory and so the sub-processes can then all share the memory. So if the task is only 2 events then there is only 1 event left to process in the parallel stage.This does not explain why 2 events are processing in parallel when ATHENA_PROC_NUMBER=2 and also not why there is no output to console F2. Moreover with the normal tasks (200 events) I always see 4 workers starting at the same time. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
Crystal, have your task for a restart:https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3096418 |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 61 |
Crystal, Successful for you, I see. I had the same problem you had with some Theory's. Out of the blue, tasks waiting to run (postponed) and want cleaning up the VM-environment. In Virtual Media Manager there was still a 1.11 vdi. Removing that did not help. After BOINC's restart had the same: Couldn't find a boot medium. I stopped BOINC, removed the 1.13 media out of Virtual Media Manager and restarted BOINC. That was enough to have a proper restart, but had to destroy the old tasks. I've 5 ATLAS running at the moment. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
If anybody currently has a running ATLAS vbox task: Please check the /slots/ directory of the task whether there are 2 vdi softlinks: 1. ATLAS_vbox_1.13_image.vdi 2. vm_image.vdi If both are there, do both point to projects/lhcathomedev.cern.ch_lhcathome-dev/ATLAS_vbox_1.13_image.vdi? |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
I've 5 ATLAS running at the moment. Have a absolut clean Vboxmanager shown. Running atm one Atlas (4CPU) from production CMS had finished without problems. Theory get always postponed as documented in Theory thread. Atlas finished also correct, upload stopped because of my squid 5.2, but no problem for me it's -dev. btw, next test is Hyper-V again, because my CentOS8-VM is back (emergencymode) |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
I cannot reproduce the error on my systems, even if I try to force it. To test whether a small change in the app_version definition solves it we would need a new app_version but I'd like to save David's time. Hence, @CP, if you don't mind to patch client_state.xml, please 1. When all ATLAS work is reported, stop BOINC 2. Insert the "open_name" line in the "app_version" section as shown below <file_ref> <file_name>ATLAS_vbox_1.13_image.vdi</file_name> <open_name>vm_image.vdi</open_name> </file_ref> 2. Restart BOINC and run (if available) ATLAS tasks @David Cameron Please provide more ATLAS work |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 61 |
Hence, @CP, if you don't mind to patch client_state.xml, please I have only 1 softlink file in the slot directory (ATLAS_vbox_1.13_image.vdi) with the softlink to the file in project directory.. I have 1 task running and 1 task in the queue (suspended now). I'll try your 'edit' after the running task is reported. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 61 |
I resumed my testtask after
Stop VirtualBox Manager to be sure that VBoxService's memory is cleaned and no files present in Media Manager Added the open_name line Restarted BOINC As expected, the softlink file is now named vm_image.vdi and no other links to the vdi. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Thanks. Now we need tasks to test this configuration. |
Send message Joined: 24 Oct 19 Posts: 170 Credit: 543,238 RAC: 655 |
Thanks. +1 No tasks, no test |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 61 |
I tested enough. No problems seen: https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=4600&offset=0&show_names=0&state=0&appid=5 Although these were all single running 2-core tasks. I'll try tomorrow to run some tasks concurrently on my other host which has morre RAM. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
ATLAS_vbox_job_1.13.xml: <vbox_job> <os_name>Linux26_64</os_name> <memory_size_mb>2241</memory_size_mb> <enable_network/> <enable_remotedesktop/> <enable_shared_directory/> <copy_to_shared>init_data.xml</copy_to_shared> <completion_trigger_file>atlas_done</completion_trigger_file> <disable_automatic_checkpoints/> <enable_vm_savestate_usage/> <minimum_checkpoint_interval>900</minimum_checkpoint_interval> <pf_guest_port>80</pf_guest_port> <multiattach_vdi_file>ATLAS_vbox_1.13_image.vdi</multiattach_vdi_file> </vbox_job> Theory_2022_06_14.xml: <vbox_job> <os_name>Linux26_64</os_name> <memory_size_mb>630</memory_size_mb> <enable_network/> <enable_remotedesktop/> <copy_to_shared>init_data.xml</copy_to_shared> <copy_to_shared>input</copy_to_shared> <completion_trigger_file>shutdown</completion_trigger_file> <enable_shared_directory/> <pf_host_port>7859</pf_host_port> <pf_guest_port>80</pf_guest_port> <job_duration>864000</job_duration> <enable_vm_savestate_usage/> <disable_automatic_checkpoints/> <heartbeat_filename>heartbeat</heartbeat_filename> <minimum_heartbeat_interval>1200</minimum_heartbeat_interval> <enable_screenshots_on_error/> <multiattach_vdi_file>Theory_2020_05_08.vdi</multiattach_vdi_file> </vbox_job> CMS_2022_06_22.xml: <vbox_job> <os_name>Linux26_64</os_name> <memory_size_mb>2048</memory_size_mb> <enable_network/> <enable_remotedesktop/> <copy_to_shared>init_data.xml</copy_to_shared> <completion_trigger_file>shutdown</completion_trigger_file> <enable_shared_directory/> <pf_host_port>7859</pf_host_port> <pf_guest_port>80</pf_guest_port> <job_duration>64800</job_duration> <enable_vm_savestate_usage/> <disable_automatic_checkpoints/> <heartbeat_filename>heartbeat</heartbeat_filename> <minimum_heartbeat_interval>1200</minimum_heartbeat_interval> <multiattach_vdi_file>CMS_2022_06_22.vdi</multiattach_vdi_file> </vbox_job> Have those .xml's the correct entries for all? |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Have those .xml's the correct entries for all? Yes. It's all fine for tasks from -dev. Be aware that some options represent default values that may be changed elsewhere, e.g. <memory_size_mb>. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Rosetta has enough tasks. You already left a comment there (https://boinc.bakerlab.org/rosetta/forum_thread.php?id=14930&postid=106465): "But i prefer an OFFICIAL app from the project." My guess would be this will happen in weeks, months, maybe never there. |
©2024 CERN