Message boards : News : VBox wrapper problems
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 278
Message 107 - Posted: 20 Mar 2015, 9:19:24 UTC

After upgrading to the version 26155 of the VBox wrappers, we have experienced some problems. Rather than reverting back to a working state we are going to push forwards and help debug them. We hope that this way our development project can then help those in production.


Cheers,

Laurence
ID: 107 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 2,009
Message 108 - Posted: 20 Mar 2015, 12:18:07 UTC - in response to Message 107.  

Hi Laurence,

Who absorbed all the tasks?
This morning > 4,000 - Now 0 (zero)

Without tasks no testing.
ID: 108 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 2,009
Message 109 - Posted: 20 Mar 2015, 12:46:46 UTC - in response to Message 108.  

Who absorbed all the tasks?

I think Zombie did ;)

All tasks went through in 14-40 seconds and were validated OK ??
ID: 109 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 278
Message 111 - Posted: 20 Mar 2015, 13:31:49 UTC - in response to Message 109.  

Can Zombie reduce the number of machines? We really appreciate the support but at the moment we don't need the scale as the jobs are just sample jobs. The cycles would be better used in other projects for now.
ID: 111 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 278
Message 112 - Posted: 20 Mar 2015, 13:33:21 UTC - in response to Message 111.  

Have just sent 20K work units.
ID: 112 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 26 Feb 15
Posts: 26
Credit: 4,101,356
RAC: 0
Message 113 - Posted: 20 Mar 2015, 14:10:48 UTC

I just checked. The tasks I crunched were only across a handful of machines. The number of machines is not the problem, it's the extremely short run time. Any of us could have burned through them. I just had the luck of the BOINC back-off algorithm.
Reno, NV
Team: SETI.USA
ID: 113 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 2,009
Message 114 - Posted: 20 Mar 2015, 15:13:27 UTC - in response to Message 113.  

Sorry, this time I burned 82 tasks into errors. Forgot to change vm_cache into vm_image in app_info.

Current task is running fine and I set the duration to 3 hours: http://boincai05.cern.ch/CMS-dev/results.php?hostid=37

Running with the vboxwrapper_26155_windows_x86_64.pdb for possible debug information.
ID: 114 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 2,009
Message 115 - Posted: 20 Mar 2015, 15:26:12 UTC - in response to Message 113.  

I just checked. The tasks I crunched were only across a handful of machines. The number of machines is not the problem, it's the extremely short run time. Any of us could have burned through them. I just had the luck of the BOINC back-off algorithm.

The question is: why are the tasks running so short on those machines.
As far as I can see, only your machines are affected.
In the results is the line: VM Completion File Detected causing end of the task.

Your machines are still burning the tasks. Please set No New Work on your machines except one and try to find out what's going on.
That would help the project.
ID: 115 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 26 Feb 15
Posts: 26
Credit: 4,101,356
RAC: 0
Message 116 - Posted: 20 Mar 2015, 15:32:26 UTC

They are macs. Could that have something to do with the short run time?
Reno, NV
Team: SETI.USA
ID: 116 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ben Segal
Volunteer moderator
Volunteer developer
Volunteer tester

Send message
Joined: 12 Sep 14
Posts: 65
Credit: 544
RAC: 0
Message 118 - Posted: 20 Mar 2015, 16:04:32 UTC - in response to Message 116.  

They are macs. Could that have something to do with the short run time?

Yes, there are known bugs with the Mac and Linux 26155 vboxwrapper which we are currently debugging. This causes rapid task termination and of course eats these tasks so please hold off for now, OK?
ID: 118 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 26 Feb 15
Posts: 26
Credit: 4,101,356
RAC: 0
Message 119 - Posted: 20 Mar 2015, 16:05:55 UTC - in response to Message 118.  
Last modified: 20 Mar 2015, 16:08:32 UTC

They are macs. Could that have something to do with the short run time?

Yes, there are known bugs with the Mac and Linux 26155 vboxwrapper which we are currently debugging. This causes rapid task termination and of course eats these tasks so please hold off for now, OK?


I have turned them off for now. Let us know when you want to test the fixes.

Edit: Although this doesn't address any other macs attached. A better solution is to remove the mac app until you are ready to have people run it again.
Reno, NV
Team: SETI.USA
ID: 119 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
newman

Send message
Joined: 15 Feb 15
Posts: 10
Credit: 16,387
RAC: 0
Message 120 - Posted: 20 Mar 2015, 16:37:01 UTC

get the message VM Hypervisor failed to enter an online state in a timely fashion

Let it run or can I abort it?
ID: 120 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Steve Hawker*

Send message
Joined: 6 Mar 15
Posts: 19
Credit: 142,109
RAC: 0
Message 121 - Posted: 20 Mar 2015, 17:57:26 UTC - in response to Message 119.  

Although this doesn't address any other macs attached. A better solution is to remove the mac app until you are ready to have people run it again.


I'm only running two Macs but I agree with Z67 that removing the app is the cleaner solution. I only burned through 150 WUs before I saw this.

S.
ID: 121 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 26 Feb 15
Posts: 26
Credit: 4,101,356
RAC: 0
Message 122 - Posted: 20 Mar 2015, 19:15:58 UTC - in response to Message 121.  

Although this doesn't address any other macs attached. A better solution is to remove the mac app until you are ready to have people run it again.


I'm only running two Macs but I agree with Z67 that removing the app is the cleaner solution. I only burned through 150 WUs before I saw this.

S.


If this also effects linux, then that app should also be depreciated.
Reno, NV
Team: SETI.USA
ID: 122 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : VBox wrapper problems


©2024 CERN