Thread 'Testing CentOS 7 vbox image'

Author	Message
maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 2,313	Message 6596 - Posted: 4 Sep 2019, 14:41:12 UTC Have to wait with the CentOS7 -dev after the two Atlas-Production with always 6 CPU's are finished. Than can save Boinc and hopefully will start the CentOs7 again. Will not test a suspend because of your experience, Crystal. ID: 6596 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 2,313	Message 6597 - Posted: 4 Sep 2019, 17:45:00 UTC - in response to Message 6594. Edit: RAM 4.800 MByte. RDP-Console is in Boinc not useable. VM Hypervisor failed to enter an online state in a timely fashion. Have installed a app_config. 4 times a new Start Always Hypervisor failed. ID: 6597 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 2,313	Message 6598 - Posted: 5 Sep 2019, 15:36:06 UTC - in response to Message 6597. Is this variable from Virtualbox fault, when Hypervisor failed? VERR_SVM_IN_USE (For AMD). ID: 6598 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 527 Credit: 400,710 RAC: 0	Message 6599 - Posted: 5 Sep 2019, 18:30:27 UTC https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2813576 This ATLAS VM started until the login screen appeared on all consoles. Then it remained idle. ID: 6599 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 527 Credit: 400,710 RAC: 0	Message 6601 - Posted: 6 Sep 2019, 16:03:35 UTC - in response to Message 6590. pacting the ATLAS vdi file[/b] David Cameron wrote: I've released version 0.81 which is a little smaller (3.4GB). The problem is that VirtualBox has a feature where if you write to disk it doesn't actually free space when files are deleted. So while "df" inside the VM reports 2.2GB used, the vdi is still 3.4GB, even after compacting. To check the partition layout of the vdi file I attached it as /dev/sdb to a self made VM and found the following partitions: /dev/sdb1: xfs, 1.00 GiB /dev/sdb2: lvm2 pv (centos), 18.05 GiB /dev/sdb3: extended /dev/sdb5: linux-swap, 972.00 MiB The reason why this vdi can't be compacted might be that /dev/sdb2 is part of a logical volume. Hence I tried to reformat the partition with ext4. Step 1: Clone ATLAS_vbox_0.81_image.vdi as ATLAS_vbox_0.81_image2.vdi and attach the clone as /dev/sdc to the VM. Step 2: Reformat /dev/sdc2 with ext4 and reboot the VM. Step 3: Copy all files from /dev/sdb2 to /dev/sdc2 (rsync). Step 4: Zero non used space on /dev/sdc1 and /dev/sdc2 (required for "VBoxManage ... --compact"!) To do this use temporary mountpoints /sdc/sdc1 and /sdc/sdc2 Then run: [pre]cat /dev/zero >/sdc/sdc1/tmpzero rm /sdc/sdc1/tmpzero cat /dev/zero >/sdc/sdc2/tmpzero rm /sdc/sdc2/tmpzero[/pre] Step 5: Compact ATLAS_vbox_0.81_image2.vdi [pre]VBoxManage modifyhd ATLAS_vbox_0.81_image2.vdi --compact[/pre] Results: Original size: 3.1 GiB Compacted (including the CVMFS cache): 2.3 GiB Compacted (CVMFS cache removed): 897 MiB Conclusion A vdi partition formatted as part of an lvm volume seems to be overkill for a small standalone VM, especially as the vdi can't be compacted any more. Instead a reliable filesystem like ext4 (or XFS) should be used as default. The method above is not intended to be used by the volunteers but (starting with step 4) should be used by the vdi maintainers before the vdi is released for production. This would result in faster downloads as well as in a faster VM setup on the client computer since the vdi has to be copied from the project folder to the slots folder every time a new task starts. ID: 6601 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 2,313	Message 6602 - Posted: 8 Sep 2019, 8:11:40 UTC - in response to Message 6598. Is this variable from Virtualbox fault, when Hypervisor failed? VERR_SVM_IN_USE (For AMD). The deadline was reached yesterday, so canceled it Always Hypervisor failed: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2812272 ID: 6602 · Rating: 0 · rate: / Reply Quote

David Cameron Project administrator Project developer Project tester Project scientist Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0	Message 6603 - Posted: 10 Sep 2019, 10:51:00 UTC - in response to Message 6601. The method above is not intended to be used by the volunteers but (starting with step 4) should be used by the vdi maintainers before the vdi is released for production. This would result in faster downloads as well as in a faster VM setup on the client computer since the vdi has to be copied from the project folder to the slots folder every time a new task starts. I always do these last steps just before releasing a new vdi, but it could be as you say that the logical volume doesn't allow compacting. I will try to create a new VM with ext4 formatting to see if it helps. Thanks for the tips. In the meantime I tried creating the image with Vbox version 6, and released 0.82 which has a 2.8GB image. This one should also fix the strange error you mentioned above where the CVMFS check failed and the WU got stuck doing nothing. ID: 6603 · Rating: 0 · rate: / Reply Quote

David Cameron Project administrator Project developer Project tester Project scientist Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0	Message 6604 - Posted: 10 Sep 2019, 14:54:12 UTC - in response to Message 6603. I made a new image with an xfs file system for version 0.83. It seems it saves a little space but not that much (~100MB). ID: 6604 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1252 Credit: 996,478 RAC: 78	Message 6605 - Posted: 10 Sep 2019, 19:27:20 UTC - in response to Message 6604. David, is it possible to add the size of the HITS-file when moving to the shared folder? ID: 6605 · Rating: 0 · rate: / Reply Quote

David Cameron Project administrator Project developer Project tester Project scientist Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0	Message 6608 - Posted: 11 Sep 2019, 9:26:15 UTC - in response to Message 6605. David, is it possible to add the size of the HITS-file when moving to the shared folder? You have the size in the log messages: 2019-09-10 17:12:54 (166330): Guest Log: Looking for outputfile HITS.000649-2086812-31151._078090.pool.root.1 2019-09-10 17:12:54 (166330): Guest Log: HITS file was successfully produced 2019-09-10 17:12:54 (166330): Guest Log: -rw-------. 1 atlas atlas 9091177 Sep 10 15:12 /home/atlas/RunAtlas/HITS.000649-2086812-31151._078090.pool.root.1 "9091177" is the size of the HITS file. These test WU only process 10 events so it's much smaller than the real WU results. ID: 6608 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1252 Credit: 996,478 RAC: 78	Message 6610 - Posted: 11 Sep 2019, 11:24:10 UTC - in response to Message 6608. Sorry David, I must have mixed up the results of vbox versus native version. Thus wrong thread. I don't see the size of the HITS-file in the native version. One of your results: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2819384 Your test workunits for v0.82 and v0.83 were quickly distributed, I suppose. ID: 6610 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 2,313	Message 6613 - Posted: 12 Sep 2019, 8:02:44 UTC - in response to Message 6602. Last modified: 12 Sep 2019, 8:21:03 UTC Always Hypervisor failed: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2821665 David, is it possible to get a information to use only Virtualbox 6.0.x when the task starts. For 5.2 Hypervisor failed after 10 Min. and you need to delete the Boinc-VM in Virtualbox manually. ID: 6613 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 527 Credit: 400,710 RAC: 0	Message 6614 - Posted: 12 Sep 2019, 9:47:16 UTC ally downloaded ATLAS_vbox_0.83_image.vdi and as the dev server has no tasks available I used the vdi to replace the original ATLAS-vdi from a test client that is attached to the production server. That way the test client got a task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=245678240 The task is processing hits but as it is a 1-core setup it will take a while until it is finished. The stderr.txt already shows some lines that should be investigated: [pre]2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=81 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=81 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=82 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=82 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=83 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=83 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=84 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=84 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=85 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=85 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=86 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=86 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=87 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=87 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=88 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=88 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=89 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=89 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8a 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8a 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8b 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8b 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8c 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8c 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8d 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8d 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8e 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8e 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8f 2019-09-12 10:50:24 (15611): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8f[/pre] [pre]2019-09-12 10:51:03 (15611): Guest Log: 10:51:00.680177 main Executable: /opt/VBoxGuestAdditions-6.0.12/sbin/VBoxService 2019-09-12 10:51:03 (15611): Guest Log: 10:51:00.680183 main Process ID: 2151 2019-09-12 10:51:03 (15611): Guest Log: 10:51:00.680183 main Package type: LINUX_64BITS_GENERIC 2019-09-12 10:51:03 (15611): Guest Log: 10:51:00.779481 main 6.0.12 r133076 started. Verbose level = 0[/pre] The latter section points out that the VM uses vbox extensions 6.0.12. While this seems to work on my hosts (currently version 6.0.10) it might cause problems if the host runs older vbox extensions as maeax mentioned. BTW: The vdi size is 2.4 GiB (uncompressed) and since the CVMFS cache is prefilled with roughly 1.5 GiB I couldn't compact it to a smaller size. ID: 6614 · Rating: 0 · rate: / Reply Quote

David Cameron Project administrator Project developer Project tester Project scientist Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0	Message 6615 - Posted: 12 Sep 2019, 11:43:14 UTC - in response to Message 6614. I have put a lot more tasks in here now, it seems that some hosts take a lot of tasks at the same time which means there are none left for others. I wonder if I should go back to VBox 5.2 if there are issues with version 6. I see some results look ok with 5.2 though. Those int13_harddisk lines in the stderr have been there in every WU since I started this test, I tried to find out what causes it but couldn't find any information anywhere. I don't think it affects the tasks though. Do the consoles work for you? With the images produced by VBox 6 I couldn't get them to work (using rdesktop-vrdp on linux). ID: 6615 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 2,313	Message 6616 - Posted: 12 Sep 2019, 12:08:21 UTC - in response to Message 6615. Last modified: 12 Sep 2019, 12:15:09 UTC No David, not back to 5.2.x, but a text for those who have this Version 5.2. Hmm, will take a look why other running 5.2.x well. Have other Atlas from production running parallell and saw this line: Is this variable from Virtualbox fault, when Hypervisor failed? VERR_SVM_IN_USE (For AMD). ID: 6616 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 527 Credit: 400,710 RAC: 0	Message 6617 - Posted: 12 Sep 2019, 12:24:34 UTC - in response to Message 6615. Those int13_harddisk lines in the stderr have been there in every WU since I started this test, I tried to find out what causes it but couldn't find any information anywhere. It usually indicates a missing device (harddisk) so I first thought it was caused by the logical volume layout. Since the partitions are now XFS it must be caused by another reason. Did you (or centos) configure some kind of raid when you initially started to set up the VM? I don't think it affects the tasks though. Right. Do the consoles work for you? With the images produced by VBox 6 I couldn't get them to work (using rdesktop-vrdp on linux). The consoles look very unfamiliar, especially the top output. It looks like a tailed logfile and since it updates after a few seconds it is hard to read. The process output at console 2 is much better since it updates less frequent. Part of the problem might be that I monitor the tasks from a remote machine which always makes the consoles a bit sluggish. The task is now running for 3.5 h and has just finished Event nr. 15. ID: 6617 · Rating: 0 · rate: / Reply Quote

David Cameron Project administrator Project developer Project tester Project scientist Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0	Message 6618 - Posted: 12 Sep 2019, 14:39:47 UTC - in response to Message 6617. Ok, I went back to VirtualBox 5.2.32 and made a new vdi for version 0.84. I hope it fixes the issues with Vbox version 5. It's even slightly smaller than the previous version but maybe that's because I'm getting more efficient at creating these images :) I think the "top" output is messed up because the default console size is larger than before, so I've fixed it to fit the larger window. As for the device errors, the disk setup is just the default from CentOS 7 except I now use xfs instead of LVM. ID: 6618 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 2,313	Message 6619 - Posted: 12 Sep 2019, 15:44:39 UTC Last modified: 12 Sep 2019, 16:35:39 UTC F3 in the Console scrolls every 5 sec thru the screen. Console is from Virtualbox (show VM) in Boincmanager Grafic and RDP is not shown. Hypervisor failed is again after 10 min. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2821809 ID: 6619 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 527 Credit: 400,710 RAC: 0	Message 6620 - Posted: 12 Sep 2019, 15:47:04 UTC - in response to Message 6618. ime I got a task from the dev server. I guess this URL will change to an official host when you move it to the production environment: [pre]<hostname censored> 3180 - - [12/Sep/2019:17:16:58 +0200] "GET http://dcameron.web.cern.ch/dcameron/dev/ATLASJobWrapper-test.sh HTTP/1.1" 200 7280 "-" "curl/7.29.0" TCP_REFRESH_MODIFIED:HIER_DIRECT[/pre] Port 25085 is closed at my main firewall. This causes lots of failed requests like this: [pre]<hostname censored> 3128 - - [12/Sep/2019:17:20:00 +0200] "GET http://pandaserver.cern.ch:25085/cache/schedconfig/BOINC-TEST.all.json HTTP/1.1" 503 4346 "-" "Python-urllib/2.7" TCP_MISS:HIER_DIRECT[/pre] Somehow the task got a job from another source and is now processing events. Top output at console 3 is a bit better than before but still hard to read. ID: 6620 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 527 Credit: 400,710 RAC: 0	Message 6621 - Posted: 12 Sep 2019, 16:29:53 UTC - in response to Message 6620. Top output at console 3 is a bit better than before but still hard to read. @David To work out a method that makes the top output less sluggish. Be so kind as to post what method you have currently implemented and why. ID: 6621 · Rating: 0 · rate: / Reply Quote

Development for LHC@home