Message boards : Number crunching : CMS VBox cannot access vm_image.vdi
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Tuna Ertemalp
Avatar

Send message
Joined: 21 Apr 15
Posts: 15
Credit: 106,597
RAC: 0
Message 290 - Posted: 25 Apr 2015, 6:41:15 UTC
Last modified: 25 Apr 2015, 6:50:04 UTC

So, I have many projects crunching, including CMS/ATLAS/vLHC. As such, sometimes they run at the same time. That is what is happening right now, and CMS is in the "postponed" state: http://1drv.ms/1QrWc4H

I looked into Oracle VM VBox, and saw that both of the vLHC (http://1drv.ms/1QrWF71 and http://1drv.ms/1QrWGrz) and CMS (http://1drv.ms/1QrWHM5) are trying to mount vm_image.vdi, with vLHC succeeding and CMS failing. Also, ATLAS was running, with no problems: http://1drv.ms/1QrWIzS

At first, I had thought they were mounting the same file, but upon further investigation in Oracle VM VBox, it seems vLHC VBoxes were using D:\BOINC_DATA\slots\31\vm_image.vdi and D:\BOINC_DATA\slots\25\vm_image.vdi, while CMS was attaching to D:\BOINC_DATA\slots\11\vm_image.vdi. In other words, different files.

Then, when I hovered my mouse pointer over the filename, a descriptive message popped up: http://1drv.ms/1ITChqd

The XML file it is referring to is at http://1drv.ms/1E2j9VH.

This machine has already processed one CMS job and returned it two days ago for 749.55 credit, and this would be the second CMS job it is working on: http://boincai05.cern.ch/CMS-dev/results.php?hostid=315

So, I don't know why the first one worked, this one is not (at least for now), and don't know if it will get out of this Postponed state eventually and continue and finish.

Just thought I'd let folk know in case this means something.

Thanks
Tuna
ID: 290 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 431
Message 291 - Posted: 25 Apr 2015, 7:38:31 UTC - in response to Message 290.  

So, I don't know why the first one worked, this one is not (at least for now), and don't know if it will get out of this Postponed state eventually and continue and finish.

Postponed state of the VM (Boinc-task waiting to run) often happens when the VirtualBox process VBoxSVC.exe was not able to react in a timely fashion.
Maybe too busy or too many VM's running.
After 86400 seconds BOINC will try to restart the task.
You could restart BOINC too speed up the resume of the task.
ID: 291 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tuna Ertemalp
Avatar

Send message
Joined: 21 Apr 15
Posts: 15
Credit: 106,597
RAC: 0
Message 296 - Posted: 26 Apr 2015, 22:06:42 UTC - in response to Message 291.  

So, I don't know why the first one worked, this one is not (at least for now), and don't know if it will get out of this Postponed state eventually and continue and finish.

Postponed state of the VM (Boinc-task waiting to run) often happens when the VirtualBox process VBoxSVC.exe was not able to react in a timely fashion.
Maybe too busy or too many VM's running.
After 86400 seconds BOINC will try to restart the task.
You could restart BOINC too speed up the resume of the task.


Hmmm... That doesn't seem to be the case. To test the situation, I suspended all projects, including CMS, rebooted the machine, waited until everything settled down during boot, then only un-suspended CMS, which means that this was the only task running. Still, after about 15mins, it entered the same postponed state at 61.480% with this in the log "4/26/2015 2:46:48 PM | CMS-dev | task postponed 86400.000000 sec: VM Hypervisor failed to enter an online state in a timely fashion."

Booting the Oracle VM VBox, I first see http://1drv.ms/1JtEdJl, then http://1drv.ms/1JtEnjK after clicking Check.

So, somehow the vm_image.vdi seems to be bad. Since nobody other than CMS is using it, there might be a problem with whatever CMS is doing.

If anybody wants any data from my machine, please let me know. I'll keep this task around for a while as-is.

Tuna
ID: 296 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Duvall

Send message
Joined: 22 Apr 15
Posts: 3
Credit: 2,912,737
RAC: 0
Message 298 - Posted: 27 Apr 2015, 2:32:57 UTC - in response to Message 296.  

Have you checked the VB Virtual Media Manager to see if you have any flagged files that need to be deleted? In the BOINC manager do you have the "leave application in memory when suspended" option checked? I'm asking because it can gobble up a lot of memory and the time for swapping the virtual memory with physical memory can cause the error you are seeing. Just a thought from a foolish old man that hasn't programmed since the 1980s.
ID: 298 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tuna Ertemalp
Avatar

Send message
Joined: 21 Apr 15
Posts: 15
Credit: 106,597
RAC: 0
Message 302 - Posted: 27 Apr 2015, 5:18:38 UTC - in response to Message 298.  

Have you checked the VB Virtual Media Manager to see if you have any flagged files that need to be deleted? In the BOINC manager do you have the "leave application in memory when suspended" option checked? I'm asking because it can gobble up a lot of memory and the time for swapping the virtual memory with physical memory can cause the error you are seeing. Just a thought from a foolish old man that hasn't programmed since the 1980s.


Nobody who was in programming during the 80s was a fool... :)

I believe your question about VB/VMM is answered in the 2nd screenshot I linked to. There is this vm_image.vdi that seems to be in trouble, but I don't know what put it into that troublesome state. There doesn't seem to be any other alerts about anything else.

I used to have the "leave apps in mem" turned on a few weeks ago, before I even heard of CMS, and quickly realized how my 12G RAM was filling up under the weight of suspended tasks from a subset of 40+ projects running on this machine, with apps set to switch every 15mins. So, within a few days, I regretted that decision and turned it off. I guess my next machine will need to be 256G RAM!

Any other ideas?

Tuna
ID: 302 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tuna Ertemalp
Avatar

Send message
Joined: 21 Apr 15
Posts: 15
Credit: 106,597
RAC: 0
Message 306 - Posted: 30 Apr 2015, 21:22:57 UTC

So, I cancelled that task, since nobody asked me for more info from that machine. But, since then, I am noticing it is happening on a lot of my machines, all because vm_image.vdi became inaccessible, with similar data as I have already linked to earlier via screenshots.

No CMS admin/programmer really is interested in this??

Tuna
ID: 306 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : CMS VBox cannot access vm_image.vdi


©2024 CERN