Message boards :
Number crunching :
vboxwrapper issue
Message board moderation
Author | Message |
---|---|
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I just noticed, that the vboxwrapper is constantly, without any interruption, writing to the boinc/slots/ x folder. ATLAS and CMS are doing the same thing. It is writing to the VBox.log file at a rate of 100 to 150 KByte/sec. However, the file is not updated and is only 65 KByte in size. Also, when when looking at the slot folder the time of last change of the VBox.log file jumps between the current time and about half an hour earlier back and forth. I moved the boinc data directory to a usb flash drive. Everything else seems to be working. Anybody have any ideas, what is going on? |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 37 |
That's disturbing! Anyone else seeing this? Can't try myself as I'm at home at the moment (where my bandwidth has crept up to 2.2 Mbps lately, but still not enough to run CMS@Home). |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I have checked it in more detail. I moved the boinc data directory back to the hard disk, just in case. I used process monitor to view it. It seems, vbox.log is written to about 3 times a second, without actually updating anything(size remains exactly the same, about 65KB). This applies to ATLAS as well. I understand, that logs need to be kept, but this is excessive. -------vboxwrapper_26169_windows_x86_64.exe---- Is the offender. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 37 |
I have checked it in more detail. I moved the boinc data directory back to the hard disk, just in case. It's a text file, right? Have you tried tail -f -- oh, you're not using Linux... It's just that 65536 is a convenient binary number, leading to thoughts of a limited buffer. Since it's not limited to CMS it sounds like something to be researched on Oracle's forums, if they have any. |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 487 |
On the single job I still have running my VBox.log is only 58Kb in size and was last updated over an hour ago just after the job (and VM) got started. But my job is never getting a credential to get access to download any work. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 8 |
First: It's not a vboxwrapper issue, but VirtualBox. I've seen too, that VBox.log is updated frequently - not always, but every now and then. The filesize not updating could be caused by the initial 64k filesize. As long as the written lines fits within that 64k, you don't see a difference. The same log is kept in slots/x/boinc_xxxxx/logs. Maybe the reason of frequent writing to that file is time outsyncing of the host and the guest. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I have deleted the Vbox.log and it reappears instantly. It says it has been created and updated 11 hours ago. I have renamed it, that works, but the original file reappears and still claims to have been created and last changed 11 hours ago. It is not updating but rewriting the file every time. They are absolutely identical. |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 487 |
If I add a line of text to the file it saves it but then when I open it my line has gone. It does this even if the task is suspended, the VM needs to go to Saved state (by shutting down BOINC) before the constant updating stops and my line of text is preserved. Haven't got any Atlas tasks running to check that does the same. Not looking good for SSD's |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 487 |
First: It's not a vboxwrapper issue, but VirtualBox. If the vboxwrapper is setting up how virtualbox runs is that not a vboxwrapper problem ? Perhaps the flush interval for the vbox logging is working to the same time scale as the 1 hour x509 credential request delay that lasts only 1 minute ! |
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 |
There was a discussion about this on the boinc_alpha email last week. Selected extracts: David Anderson wrote Rom Walton wrote Rom again So, maybe the next version of the wrapper? |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 8 |
First: It's not a vboxwrapper issue, but VirtualBox. Maybe you're right. The vbox.log from the Logs-directory (controlled by VirtualBox) is copied to the slots directory. Probably while the wrapper add Guest log information from there to the stderr.txt. Edit: Meanwhile Richard jumped onto this. Thanks Richard! |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 487 |
There was a discussion about this on the boinc_alpha email last week. Do we have a rough ETA for that ? How long does it take to get a new wrapper to be used on a project ? |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Not looking good for SSD's That is my concern.If it is writing the same file with the same length every second or more often, it is going to wear down that sector on the SSD in no time flat. A minimum of 86400 writes per day. I wonder, if the write buffer would catch(minimize) that. |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 487 |
I thought I had converted the log extraction over to using the COM API when running on Windows. Richard, does this mean it only affects Windows ? |
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 |
There was a discussion about this on the boinc_alpha email last week. Jacob Klein started that conversation last Wednesday (2nd September), because he saw the same heavy disk usage at RNA World. First of all, somebody - probably Rom Walton - would have to code the change, and there's no sign of that yet. Rom is no longer a paid employee of BOINC (since the BOINC project failed to get renewal of their NSF grant funding in the summer), and is coding as a volunteer in his spare time. With last weekend being the Labor Day holiday in the USA, I don't blame him taking a break. Compiled wrappers are usually available for downloading soon after coding, but I imagine projects using them would want a period of in-house testing before deploying them for general use. There is a wrapper v26170 already - one newer than we use here - but that was compiled on 25 August, so it won't have this fix. |
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 |
I thought I had converted the log extraction over to using the COM API when running on Windows. Someone else will have to answer that one - any 'nixers out there able to comment? |
Send message Joined: 17 Aug 15 Posts: 17 Credit: 228,358 RAC: 0 |
Not looking good for SSD's The SSD wear-leveling algorithm (implemented via the firmware) will spread out the writes to different cells, even if the logical address is the same. I doubt that the small number of writes to a log will bother it at all. On the other hand, for projects where writes are a real problems (WCG/CEP2 most of all, but also CPDN and ATLAS), I use a write cache (PrimoCache) or a ramdisk. That is mainly to save the SSDs I use, but it also reduces the error rates. Apparently a high write-rate causes contention that some of the programs don't handle very well. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Writes Write Bytes Path 3.215 152.925.964 <Total> 1.707 115.524.888 \slots\7\VBox.log 1.502 37.398.904 \slots\7\vm_image.vdi 3 1.521 \slots\7\boinc_task_state.xml 3 651 \slots\7\vbox_checkpoint.xml This is a list of all file writes during a 15min period to the active CMS slot folder. Please notice the excessive write activity to the vbox.log file. It writes 3 times the amount of data a single log file than to the ENTIRE VM_IMAGE. I really think, someone should address that sooner than later |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 8 |
Hello Ivan/Laurence, I'm testing vboxwrapper v26175. In that version the huge number of disk i/o's of slot's vbox.log should be reduced. I had an issue overnight with this version reading the Volunteer Info: . . . 08:00:01 +0200 2015-10-12 [INFO] Reading the BOINC volunteer's information 08:00:02 +0200 2015-10-12 [INFO] Volunteer: () Host: 08:00:02 +0200 2015-10-12 [INFO] BOINC_USERID is not an integer 08:00:02 +0200 2015-10-12 [INFO] Going to sleep for 1 hour 08:01:02 +0200 2015-10-12 [INFO] Starting CMS Application - Run 8 08:01:02 +0200 2015-10-12 [INFO] Reading the BOINC volunteer's information 08:01:03 +0200 2015-10-12 [INFO] Volunteer: () Host: 08:01:03 +0200 2015-10-12 [INFO] BOINC_USERID is not an integer 08:01:03 +0200 2015-10-12 [INFO] Going to sleep for 1 hour until I rebooted the VM this morning. From where do you read that info? Is it from init_data.xml in the slot directory? 08:13:01 +0200 2015-10-12 [INFO] Starting CMS Application - Run 9 08:13:01 +0200 2015-10-12 [INFO] Reading the BOINC volunteer's information 08:13:02 +0200 2015-10-12 [INFO] Volunteer: Crystal Pellet (38) Host: 37 08:13:02 +0200 2015-10-12 [INFO] VMID: a248a608-bb13-4ecc-8fba-70015f0a4b90 08:13:02 +0200 2015-10-12 [INFO] Requesting an X509 credential subject : /O=Volunteer Computing/O=CERN/CN=CrystalPellet 38/CN=58327480 issuer : /O=Volunteer Computing/O=CERN/CN=CrystalPellet 38 identity : /O=Volunteer Computing/O=CERN/CN=CrystalPellet 38 type : RFC 3820 compliant impersonation proxy strength : 1024 bits path : /tmp/x509up_u500 timeleft : 129:59:57 (5.4 days) 08:13:08 +0200 2015-10-12 [INFO] Downloading glidein 08:13:15 +0200 2015-10-12 [INFO] Running glidein (check logs) Btw: sleep 1 hour seems to be 1 minute. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
In the slot directory you will find a floppy image file. This is attached to the VM and the user information can then be read from /dev/fd0. Although the cron sleeps for 1 hour, the next cron job is run. I guess this is a bug :( |
©2025 CERN