Message boards : News : New App Version For Linux and Windows
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 1937 - Posted: 10 Feb 2016, 14:25:29 UTC

A new app version (CMS v46.23) for Linux and Windows has been provided. It contains vboxwrapper v26183 which provides a heartbeat mechanisms that can detect if the VM fails to boot or freezes. It should prevent VMs just sitting idle if such scenarios as a kernel panic occurs at boot. A Mac version will be made available once a build is available. As usual, please let us know if there are any issues with this release.
ID: 1937 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 862,257
RAC: 15
Message 1938 - Posted: 10 Feb 2016, 15:02:40 UTC - in response to Message 1937.  
Last modified: 10 Feb 2016, 15:12:23 UTC

Windows version crashed after 11 minutes run time cause VM Heartbeat file specified, but missing.

http://boincai05.cern.ch/CMS-dev/result.php?resultid=112887

2nd task error too: http://boincai05.cern.ch/CMS-dev/result.php?resultid=112869

btw: Mac version is also available since 2 days.
ID: 1938 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 1939 - Posted: 10 Feb 2016, 15:19:28 UTC - in response to Message 1938.  

Great, it looks like the mechanism is working. The heartbeat file is missing as CVMFS has still not updated. Should be there shortly. I will let Rom know and release the Mac version.
ID: 1939 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 1940 - Posted: 10 Feb 2016, 15:29:16 UTC - in response to Message 1939.  

The Mac version has been released
ID: 1940 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 862,257
RAC: 15
Message 1941 - Posted: 10 Feb 2016, 15:34:00 UTC - in response to Message 1939.  

The heartbeat file is missing as CVMFS has still not updated. Should be there shortly.

Will you inform us when the filesystem is updated. FTTB I'll run the new wrapper without heartbeat check.
ID: 1941 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 1944 - Posted: 10 Feb 2016, 16:21:23 UTC - in response to Message 1941.  

Yes, it should be there soon. I will be offline for an hour or so will check when I can.
ID: 1944 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1139
Credit: 8,310,612
RAC: 75
Message 1946 - Posted: 10 Feb 2016, 16:34:53 UTC - in response to Message 1941.  
Last modified: 10 Feb 2016, 16:43:55 UTC

The updated file has arrived at CERN's /cvmfs so it's in the pipeline...
...and now it's on my VM (1643 UTC).
ID: 1946 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rbpeake

Send message
Joined: 15 Apr 15
Posts: 38
Credit: 227,251
RAC: 0
Message 1947 - Posted: 10 Feb 2016, 17:26:35 UTC

Using "Show VM Console" starts OK, but after some seconds and switching between screens leads to fault and then crash of the app and Computation Error.
ID: 1947 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 862,257
RAC: 15
Message 1948 - Posted: 10 Feb 2016, 18:28:13 UTC

Hi Laurence,

Now the heartbeat file is created (copied to the shared directory) ~2 minutes after boottime and seems to be attached at least every minute,
but after 11 minutes runtime the VM is killed.

VM Heartbeat file specified, but missing file system status. (errno = '2')
ID: 1948 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rom Walton (BOINC)

Send message
Joined: 20 Mar 15
Posts: 14
Credit: 5,132
RAC: 0
Message 1949 - Posted: 10 Feb 2016, 18:31:31 UTC - in response to Message 1948.  

Hi Laurence,

Now the heartbeat file is created (copied to the shared directory) ~2 minutes after boottime and seems to be attached at least every minute,
but after 11 minutes runtime the VM is killed.

VM Heartbeat file specified, but missing file system status. (errno = '2')


How is the heartbeat file specified in the vbox_job.xml file?

----- Rom
ID: 1949 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 862,257
RAC: 15
Message 1950 - Posted: 10 Feb 2016, 18:38:41 UTC - in response to Message 1949.  
Last modified: 10 Feb 2016, 18:39:18 UTC

How is the heartbeat file specified in the vbox_job.xml file?

----- Rom


<heartbeat_filename>heartbeat</heartbeat_filename>

and

<minimum_heartbeat_interval>1200</minimum_heartbeat_interval>
ID: 1950 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 1951 - Posted: 10 Feb 2016, 20:49:45 UTC - in response to Message 1950.  

A new app version (CMS v46.24) for Linux and Windows has been provided containing vboxwrapper v26184 which will hopefully resolve the issue.
ID: 1951 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 862,257
RAC: 15
Message 1952 - Posted: 10 Feb 2016, 20:53:13 UTC - in response to Message 1951.  
Last modified: 10 Feb 2016, 21:02:16 UTC

A new app version (CMS v46.24) for Linux and Windows has been provided containing vboxwrapper v26184 which will hopefully resolve the issue.

Already testing v26184. VM is running now 23 minutes and still alive. Too early for Hurray!

Typo: heatbeat in stead of heartbeat
ID: 1952 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1139
Credit: 8,310,612
RAC: 75
Message 1954 - Posted: 10 Feb 2016, 21:14:15 UTC - in response to Message 1952.  

Fingers crossed. I didn't get to make my presentation on wish-lists and improvements today. Hopefully Laurence et al. will have squashed a few bugs and introduced new features before its reschedule next week. :-)
ID: 1954 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 862,257
RAC: 15
Message 1955 - Posted: 10 Feb 2016, 22:02:37 UTC - in response to Message 1952.  
Last modified: 10 Feb 2016, 22:14:05 UTC

Already testing v26184. VM is running now 23 minutes and still alive. Too early for Hurray!

Did suspending the task with and without "leave applications in memory" for about 5 minutes.

After the first CMS-job finished, I ended the task gracefully, detached the project (no longer test-running anonymously), attached again and started a new task with the newest stock application.
ID: 1955 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 1956 - Posted: 11 Feb 2016, 8:45:49 UTC - in response to Message 1951.  

The Mac version is now available
ID: 1956 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 862,257
RAC: 15
Message 1971 - Posted: 12 Feb 2016, 21:36:47 UTC

Not sure whether this is heartbeat not ticking right.
It seems the task was ended in advance of an inside VM-failure and seems to have pushed the shutdown file, but the BOINC-task was reported as computation error.

http://boincai05.cern.ch/CMS-dev/result.php?resultid=113055
ID: 1971 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 1972 - Posted: 12 Feb 2016, 21:45:25 UTC - in response to Message 1971.  

There was an temporary issue with the credential service. It should be fine again now.
ID: 1972 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 862,257
RAC: 15
Message 1973 - Posted: 12 Feb 2016, 21:58:13 UTC - in response to Message 1972.  
Last modified: 12 Feb 2016, 21:58:57 UTC

There was an temporary issue with the credential service. It should be fine again now.

No reason to shutdown the VM, I think and more over boinc_finish gives exit core 1 in stead of exit code 0.

2016-02-12 22:20:00 (5416): Guest Log: [ERROR] message
2016-02-12 22:20:00 (5416): VM Completion File Detected.
2016-02-12 22:20:00 (5416): VM Completion Message: Cloud not get an x509 credential
.
2016-02-12 22:20:00 (5416): Powering off VM.
2016-02-12 22:20:01 (5416): Successfully stopped VM.
2016-02-12 22:20:06 (5416): Deregistering VM. (boinc_3600229bcc7d43ca, slot#0)
2016-02-12 22:20:06 (5416): Removing virtual disk drive(s) from VM.
2016-02-12 22:20:06 (5416): Removing network bandwidth throttle group from VM.
2016-02-12 22:20:06 (5416): Removing storage controller(s) from VM.
2016-02-12 22:20:06 (5416): Removing VM from VirtualBox.
22:20:11 (5416): called boinc_finish(1)
ID: 1973 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 1974 - Posted: 12 Feb 2016, 22:08:03 UTC - in response to Message 1973.  
Last modified: 12 Feb 2016, 22:21:08 UTC

It tried for 2 mins before shutting down. I have increased this to 10 minutes and changed the exit code to 0.
ID: 1974 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : News : New App Version For Linux and Windows


©2024 CERN