Thread 'vboxwrapper issue'

Author	Message
Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 1055 - Posted: 8 Sep 2015, 21:29:32 UTC I just noticed, that the vboxwrapper is constantly, without any interruption, writing to the boinc/slots/ x folder. ATLAS and CMS are doing the same thing. It is writing to the VBox.log file at a rate of 100 to 150 KByte/sec. However, the file is not updated and is only 65 KByte in size. Also, when when looking at the slot folder the time of last change of the VBox.log file jumps between the current time and about half an hour earlier back and forth. I moved the boinc data directory to a usb flash drive. Everything else seems to be working. Anybody have any ideas, what is going on? ID: 1055 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1155 Credit: 8,388,869 RAC: 1,080	Message 1057 - Posted: 8 Sep 2015, 22:54:55 UTC - in response to Message 1055. That's disturbing! Anyone else seeing this? Can't try myself as I'm at home at the moment (where my bandwidth has crept up to 2.2 Mbps lately, but still not enough to run CMS@Home). ID: 1057 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 1059 - Posted: 9 Sep 2015, 0:13:05 UTC - in response to Message 1057. I have checked it in more detail. I moved the boinc data directory back to the hard disk, just in case. I used process monitor to view it. It seems, vbox.log is written to about 3 times a second, without actually updating anything(size remains exactly the same, about 65KB). This applies to ATLAS as well. I understand, that logs need to be kept, but this is excessive. -------vboxwrapper_26169_windows_x86_64.exe---- Is the offender. ID: 1059 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1155 Credit: 8,388,869 RAC: 1,080	Message 1061 - Posted: 9 Sep 2015, 9:06:26 UTC - in response to Message 1059. I have checked it in more detail. I moved the boinc data directory back to the hard disk, just in case. I used process monitor to view it. It seems, vbox.log is written to about 3 times a second, without actually updating anything(size remains exactly the same, about 65KB). This applies to ATLAS as well. I understand, that logs need to be kept, but this is excessive. -------vboxwrapper_26169_windows_x86_64.exe---- Is the offender. It's a text file, right? Have you tried tail -f -- oh, you're not using Linux... It's just that 65536 is a convenient binary number, leading to thoughts of a limited buffer. Since it's not limited to CMS it sounds like something to be researched on Oracle's forums, if they have any. ID: 1061 · Rating: 0 · rate: / Reply Quote

PDW Send message Joined: 20 May 15 Posts: 217 Credit: 6,294,052 RAC: 795	Message 1063 - Posted: 9 Sep 2015, 9:18:50 UTC - in response to Message 1061. On the single job I still have running my VBox.log is only 58Kb in size and was last updated over an hour ago just after the job (and VM) got started. But my job is never getting a credential to get access to download any work. ID: 1063 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1267 Credit: 1,027,748 RAC: 113	Message 1064 - Posted: 9 Sep 2015, 9:28:43 UTC First: It's not a vboxwrapper issue, but VirtualBox. I've seen too, that VBox.log is updated frequently - not always, but every now and then. The filesize not updating could be caused by the initial 64k filesize. As long as the written lines fits within that 64k, you don't see a difference. The same log is kept in slots/x/boinc_xxxxx/logs. Maybe the reason of frequent writing to that file is time outsyncing of the host and the guest. ID: 1064 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 1065 - Posted: 9 Sep 2015, 10:10:04 UTC I have deleted the Vbox.log and it reappears instantly. It says it has been created and updated 11 hours ago. I have renamed it, that works, but the original file reappears and still claims to have been created and last changed 11 hours ago. It is not updating but rewriting the file every time. They are absolutely identical. ID: 1065 · Rating: 0 · rate: / Reply Quote

PDW Send message Joined: 20 May 15 Posts: 217 Credit: 6,294,052 RAC: 795	Message 1066 - Posted: 9 Sep 2015, 10:50:19 UTC - in response to Message 1065. If I add a line of text to the file it saves it but then when I open it my line has gone. It does this even if the task is suspended, the VM needs to go to Saved state (by shutting down BOINC) before the constant updating stops and my line of text is preserved. Haven't got any Atlas tasks running to check that does the same. Not looking good for SSD's ID: 1066 · Rating: 0 · rate: / Reply Quote

PDW Send message Joined: 20 May 15 Posts: 217 Credit: 6,294,052 RAC: 795	Message 1067 - Posted: 9 Sep 2015, 12:21:27 UTC - in response to Message 1064. First: It's not a vboxwrapper issue, but VirtualBox. If the vboxwrapper is setting up how virtualbox runs is that not a vboxwrapper problem ? Perhaps the flush interval for the vbox logging is working to the same time scale as the 1 hour x509 credential request delay that lasts only 1 minute ! ID: 1067 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0	Message 1068 - Posted: 9 Sep 2015, 13:25:48 UTC There was a discussion about this on the boinc_alpha email last week. Selected extracts: David Anderson wrote A glance at the source code shows that vboxwrapper copies the entire log file, then looks at the last 16KB of the copy. It does this every second. Rom Walton wrote Ah, that makes sense now. I thought I had converted the log extraction over to using the COM API when running on Windows. My initial intent was to prevent interfering with VirtualBox if it was attempting to do something with its log file. ----- Rom Rom again - Can we improve this? Yes. Just switching over to the API should help a bit. Or at the minimum switches the disk IO over to the vboxheadless process. A couple of more changes will be required to really reduce IO though. IMachine::readLog() accepts offset and size parameters, so we could get the majority case (no new log messages have been written by VirtualBox) down to zero disk IO. ----- Rom So, maybe the next version of the wrapper? ID: 1068 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1267 Credit: 1,027,748 RAC: 113	Message 1069 - Posted: 9 Sep 2015, 13:36:09 UTC - in response to Message 1067. Last modified: 9 Sep 2015, 13:39:02 UTC First: It's not a vboxwrapper issue, but VirtualBox. If the vboxwrapper is setting up how virtualbox runs is that not a vboxwrapper problem ? Maybe you're right. The vbox.log from the Logs-directory (controlled by VirtualBox) is copied to the slots directory. Probably while the wrapper add Guest log information from there to the stderr.txt. Edit: Meanwhile Richard jumped onto this. Thanks Richard! ID: 1069 · Rating: 0 · rate: / Reply Quote

PDW Send message Joined: 20 May 15 Posts: 217 Credit: 6,294,052 RAC: 795	Message 1070 - Posted: 9 Sep 2015, 14:08:41 UTC - in response to Message 1068. There was a discussion about this on the boinc_alpha email last week. So, maybe the next version of the wrapper? Do we have a rough ETA for that ? How long does it take to get a new wrapper to be used on a project ? ID: 1070 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 1072 - Posted: 9 Sep 2015, 14:26:58 UTC Not looking good for SSD's That is my concern.If it is writing the same file with the same length every second or more often, it is going to wear down that sector on the SSD in no time flat. A minimum of 86400 writes per day. I wonder, if the write buffer would catch(minimize) that. ID: 1072 · Rating: 0 · rate: / Reply Quote

PDW Send message Joined: 20 May 15 Posts: 217 Credit: 6,294,052 RAC: 795	Message 1073 - Posted: 9 Sep 2015, 14:31:01 UTC - in response to Message 1068. I thought I had converted the log extraction over to using the COM API when running on Windows. Richard, does this mean it only affects Windows ? ID: 1073 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0	Message 1074 - Posted: 9 Sep 2015, 14:40:35 UTC - in response to Message 1070. There was a discussion about this on the boinc_alpha email last week. So, maybe the next version of the wrapper? Do we have a rough ETA for that ? How long does it take to get a new wrapper to be used on a project ? Jacob Klein started that conversation last Wednesday (2nd September), because he saw the same heavy disk usage at RNA World. First of all, somebody - probably Rom Walton - would have to code the change, and there's no sign of that yet. Rom is no longer a paid employee of BOINC (since the BOINC project failed to get renewal of their NSF grant funding in the summer), and is coding as a volunteer in his spare time. With last weekend being the Labor Day holiday in the USA, I don't blame him taking a break. Compiled wrappers are usually available for downloading soon after coding, but I imagine projects using them would want a period of in-house testing before deploying them for general use. There is a wrapper v26170 already - one newer than we use here - but that was compiled on 25 August, so it won't have this fix. ID: 1074 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0	Message 1075 - Posted: 9 Sep 2015, 14:41:51 UTC - in response to Message 1073. I thought I had converted the log extraction over to using the COM API when running on Windows. Richard, does this mean it only affects Windows ? Someone else will have to answer that one - any 'nixers out there able to comment? ID: 1075 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 17 Aug 15 Posts: 17 Credit: 228,358 RAC: 0	Message 1076 - Posted: 9 Sep 2015, 15:34:58 UTC - in response to Message 1072. Not looking good for SSD's That is my concern.If it is writing the same file with the same length every second or more often, it is going to wear down that sector on the SSD in no time flat. A minimum of 86400 writes per day. I wonder, if the write buffer would catch(minimize) that. The SSD wear-leveling algorithm (implemented via the firmware) will spread out the writes to different cells, even if the logical address is the same. I doubt that the small number of writes to a log will bother it at all. On the other hand, for projects where writes are a real problems (WCG/CEP2 most of all, but also CPDN and ATLAS), I use a write cache (PrimoCache) or a ramdisk. That is mainly to save the SSDs I use, but it also reduces the error rates. Apparently a high write-rate causes contention that some of the programs don't handle very well. ID: 1076 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 1096 - Posted: 16 Sep 2015, 19:26:19 UTC Last modified: 16 Sep 2015, 19:41:12 UTC Writes Write Bytes Path 3.215 152.925.964 <Total> 1.707 115.524.888 \slots\7\VBox.log 1.502 37.398.904 \slots\7\vm_image.vdi 3 1.521 \slots\7\boinc_task_state.xml 3 651 \slots\7\vbox_checkpoint.xml This is a list of all file writes during a 15min period to the active CMS slot folder. Please notice the excessive write activity to the vbox.log file. It writes 3 times the amount of data a single log file than to the ENTIRE VM_IMAGE. I really think, someone should address that sooner than later ID: 1096 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1267 Credit: 1,027,748 RAC: 113	Message 1223 - Posted: 12 Oct 2015, 6:34:13 UTC Hello Ivan/Laurence, I'm testing vboxwrapper v26175. In that version the huge number of disk i/o's of slot's vbox.log should be reduced. I had an issue overnight with this version reading the Volunteer Info: . . . 08:00:01 +0200 2015-10-12 [INFO] Reading the BOINC volunteer's information 08:00:02 +0200 2015-10-12 [INFO] Volunteer: () Host: 08:00:02 +0200 2015-10-12 [INFO] BOINC_USERID is not an integer 08:00:02 +0200 2015-10-12 [INFO] Going to sleep for 1 hour 08:01:02 +0200 2015-10-12 [INFO] Starting CMS Application - Run 8 08:01:02 +0200 2015-10-12 [INFO] Reading the BOINC volunteer's information 08:01:03 +0200 2015-10-12 [INFO] Volunteer: () Host: 08:01:03 +0200 2015-10-12 [INFO] BOINC_USERID is not an integer 08:01:03 +0200 2015-10-12 [INFO] Going to sleep for 1 hour until I rebooted the VM this morning. From where do you read that info? Is it from init_data.xml in the slot directory? 08:13:01 +0200 2015-10-12 [INFO] Starting CMS Application - Run 9 08:13:01 +0200 2015-10-12 [INFO] Reading the BOINC volunteer's information 08:13:02 +0200 2015-10-12 [INFO] Volunteer: Crystal Pellet (38) Host: 37 08:13:02 +0200 2015-10-12 [INFO] VMID: a248a608-bb13-4ecc-8fba-70015f0a4b90 08:13:02 +0200 2015-10-12 [INFO] Requesting an X509 credential subject : /O=Volunteer Computing/O=CERN/CN=CrystalPellet 38/CN=58327480 issuer : /O=Volunteer Computing/O=CERN/CN=CrystalPellet 38 identity : /O=Volunteer Computing/O=CERN/CN=CrystalPellet 38 type : RFC 3820 compliant impersonation proxy strength : 1024 bits path : /tmp/x509up_u500 timeleft : 129:59:57 (5.4 days) 08:13:08 +0200 2015-10-12 [INFO] Downloading glidein 08:13:15 +0200 2015-10-12 [INFO] Running glidein (check logs) Btw: sleep 1 hour seems to be 1 minute. ID: 1223 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1156 Credit: 342,328 RAC: 0	Message 1227 - Posted: 12 Oct 2015, 11:52:41 UTC - in response to Message 1223. In the slot directory you will find a floppy image file. This is attached to the VM and the user information can then be read from /dev/fd0. Although the cron sleeps for 1 hour, the next cron job is run. I guess this is a bug :( ID: 1227 · Rating: 0 · rate: / Reply Quote

Development for LHC@home