Message boards :
News :
New App Version For Linux and Windows
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
A new app version (CMS v46.23) for Linux and Windows has been provided. It contains vboxwrapper v26183 which provides a heartbeat mechanisms that can detect if the VM fails to boot or freezes. It should prevent VMs just sitting idle if such scenarios as a kernel panic occurs at boot. A Mac version will be made available once a build is available. As usual, please let us know if there are any issues with this release. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
Windows version crashed after 11 minutes run time cause VM Heartbeat file specified, but missing. http://boincai05.cern.ch/CMS-dev/result.php?resultid=112887 2nd task error too: http://boincai05.cern.ch/CMS-dev/result.php?resultid=112869 btw: Mac version is also available since 2 days. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Great, it looks like the mechanism is working. The heartbeat file is missing as CVMFS has still not updated. Should be there shortly. I will let Rom know and release the Mac version. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
The Mac version has been released |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
The heartbeat file is missing as CVMFS has still not updated. Should be there shortly. Will you inform us when the filesystem is updated. FTTB I'll run the new wrapper without heartbeat check. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Yes, it should be there soon. I will be offline for an hour or so will check when I can. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 541 |
The updated file has arrived at CERN's /cvmfs so it's in the pipeline... ...and now it's on my VM (1643 UTC). |
Send message Joined: 15 Apr 15 Posts: 38 Credit: 227,251 RAC: 0 |
Using "Show VM Console" starts OK, but after some seconds and switching between screens leads to fault and then crash of the app and Computation Error. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
Hi Laurence, Now the heartbeat file is created (copied to the shared directory) ~2 minutes after boottime and seems to be attached at least every minute, but after 11 minutes runtime the VM is killed. VM Heartbeat file specified, but missing file system status. (errno = '2') |
Send message Joined: 20 Mar 15 Posts: 14 Credit: 5,132 RAC: 0 |
Hi Laurence, How is the heartbeat file specified in the vbox_job.xml file? ----- Rom |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
How is the heartbeat file specified in the vbox_job.xml file? <heartbeat_filename>heartbeat</heartbeat_filename> and <minimum_heartbeat_interval>1200</minimum_heartbeat_interval> |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
A new app version (CMS v46.24) for Linux and Windows has been provided containing vboxwrapper v26184 which will hopefully resolve the issue. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
A new app version (CMS v46.24) for Linux and Windows has been provided containing vboxwrapper v26184 which will hopefully resolve the issue. Already testing v26184. VM is running now 23 minutes and still alive. Too early for Hurray! Typo: heatbeat in stead of heartbeat |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 541 |
Fingers crossed. I didn't get to make my presentation on wish-lists and improvements today. Hopefully Laurence et al. will have squashed a few bugs and introduced new features before its reschedule next week. :-) |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
Already testing v26184. VM is running now 23 minutes and still alive. Too early for Hurray! Did suspending the task with and without "leave applications in memory" for about 5 minutes. After the first CMS-job finished, I ended the task gracefully, detached the project (no longer test-running anonymously), attached again and started a new task with the newest stock application. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
The Mac version is now available |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
Not sure whether this is heartbeat not ticking right. It seems the task was ended in advance of an inside VM-failure and seems to have pushed the shutdown file, but the BOINC-task was reported as computation error. http://boincai05.cern.ch/CMS-dev/result.php?resultid=113055 |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
There was an temporary issue with the credential service. It should be fine again now. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
There was an temporary issue with the credential service. It should be fine again now. No reason to shutdown the VM, I think and more over boinc_finish gives exit core 1 in stead of exit code 0. 2016-02-12 22:20:00 (5416): Guest Log: [ERROR] message 2016-02-12 22:20:00 (5416): VM Completion File Detected. 2016-02-12 22:20:00 (5416): VM Completion Message: Cloud not get an x509 credential . 2016-02-12 22:20:00 (5416): Powering off VM. 2016-02-12 22:20:01 (5416): Successfully stopped VM. 2016-02-12 22:20:06 (5416): Deregistering VM. (boinc_3600229bcc7d43ca, slot#0) 2016-02-12 22:20:06 (5416): Removing virtual disk drive(s) from VM. 2016-02-12 22:20:06 (5416): Removing network bandwidth throttle group from VM. 2016-02-12 22:20:06 (5416): Removing storage controller(s) from VM. 2016-02-12 22:20:06 (5416): Removing VM from VirtualBox. 22:20:11 (5416): called boinc_finish(1) |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
It tried for 2 mins before shutting down. I have increased this to 10 minutes and changed the exit code to 0. |
©2024 CERN