1) Message boards : Theory Application : Task startup issue (Message 3834)
Posted 27 Jul 2016 by Rom Walton (BOINC)
Post:
Thanks, Rom.
How is determined now, how many cores to be used?
By the project preference file? What if it is undefined?



I believe g_req->effective_ncpus is calculated from the global_prefs.

See:
http://lhcathomedev.cern.ch/vLHCathome-dev/prefs.php?subset=global

I would need to hunt down where exactly, I don't have a Visual Studio project for the server code. So I cannot easily cheat by having Visual Studio do all the hunting.

----- Rom
2) Message boards : Theory Application : Task startup issue (Message 3832)
Posted 27 Jul 2016 by Rom Walton (BOINC)
Post:
The new attributes in the project preferences are just cosmetic for now. Nothing acts upon them. Some of the possibilities are being discussed in this thread.

EDIT: I don't understand why BOINC assigns 1.5 cores.


If you haven't been using the XML plan class stuff it is probably falling back to some hard coded values in the compiled server scheduler dating back years at this point.

See:
https://github.com/BOINC/boinc/blob/master/sched/sched_customize.cpp#L888

If you all want to switch over to the XML based plan-class we can go ahead and remove the hard-coded stuff put in for the theory app.

I guess we always assumed the intuitional knowledge about what was added for the various CERN projects would be passed on from one generation to the next.

----- Rom


Just to be clear, removing line:
https://github.com/BOINC/boinc/blob/master/sched/sched_customize.cpp#L894

Should setup a multi-threaded VM to use as many cores as preferences allow and not cap it at 1.5 CPU(s).

----- Rom


I've gone ahead and removed the old CERN Theory policy from the stock source code.

See:
https://github.com/BOINC/boinc/commit/eac15d6982fc34ca3c781e0580d45b8f32039c22

----- Rom
3) Message boards : Theory Application : Task startup issue (Message 3831)
Posted 27 Jul 2016 by Rom Walton (BOINC)
Post:
The new attributes in the project preferences are just cosmetic for now. Nothing acts upon them. Some of the possibilities are being discussed in this thread.

EDIT: I don't understand why BOINC assigns 1.5 cores.


If you haven't been using the XML plan class stuff it is probably falling back to some hard coded values in the compiled server scheduler dating back years at this point.

See:
https://github.com/BOINC/boinc/blob/master/sched/sched_customize.cpp#L888

If you all want to switch over to the XML based plan-class we can go ahead and remove the hard-coded stuff put in for the theory app.

I guess we always assumed the intuitional knowledge about what was added for the various CERN projects would be passed on from one generation to the next.

----- Rom


Just to be clear, removing line:
https://github.com/BOINC/boinc/blob/master/sched/sched_customize.cpp#L894

Should setup a multi-threaded VM to use as many cores as preferences allow and not cap it at 1.5 CPU(s).

----- Rom
4) Message boards : Theory Application : Task startup issue (Message 3827)
Posted 27 Jul 2016 by Rom Walton (BOINC)
Post:
The new attributes in the project preferences are just cosmetic for now. Nothing acts upon them. Some of the possibilities are being discussed in this thread.

EDIT: I don't understand why BOINC assigns 1.5 cores.


If you haven't been using the XML plan class stuff it is probably falling back to some hard coded values in the compiled server scheduler dating back years at this point.

See:
https://github.com/BOINC/boinc/blob/master/sched/sched_customize.cpp#L888

If you all want to switch over to the XML based plan-class we can go ahead and remove the hard-coded stuff put in for the theory app.

I guess we always assumed the intuitional knowledge about what was added for the various CERN projects would be passed on from one generation to the next.

----- Rom
5) Message boards : News : New CMS App v46.26 (Message 2208)
Posted 4 Mar 2016 by Rom Walton (BOINC)
Post:
This is an issue with the BOINC client. When the computer is restarted, the VM goes in to the power-off state. Ideally, the BOINC client should intercept the shutdown request and save the VM. Does the same thing happen with the Theory Simulations in vLHC@home?


I just verified that the save state stuff is working on CMS (at least as well as it can). There are situations where Windows will not allow applications to take longer than 10-15 seconds to shutdown. In those cases Windows will just terminate the application. One specific case is during the process of installing patches. Another would be if the volunteer has Windows configured to shutdown when a notebook screen is closed.

In those cases Windows terminates the processes. Nothing is spared. While BOINC itself is waiting on vboxwrapper to shutdown, it is terminated by the OS. Vboxwrapper is waiting on VirtualBox to write whatever the configured memory size is for the VM to disk.

All the while every other application on the system is saving state and shutting down.

That is basically why using the save state option is not the default configuration. Ideally using VM snapshots as regular checkpoints means that Vboxwrapper can restore the VM state to a condition where it can resume from a known stable state. It can even survive a power failure.

Basically if you see "Stopping VM." instead of "Powering off VM." Vboxwrapper has issued the command to save state. Whether the OS lets BOINC/Vboxwrapper complete the task is anybody's guess.

----- Rom
6) Message boards : News : New App Version For Linux and Windows (Message 1949)
Posted 10 Feb 2016 by Rom Walton (BOINC)
Post:
Hi Laurence,

Now the heartbeat file is created (copied to the shared directory) ~2 minutes after boottime and seems to be attached at least every minute,
but after 11 minutes runtime the VM is killed.

VM Heartbeat file specified, but missing file system status. (errno = '2')


How is the heartbeat file specified in the vbox_job.xml file?

----- Rom
7) Message boards : Number crunching : Current issues (Message 1915)
Posted 9 Feb 2016 by Rom Walton (BOINC)
Post:
Out of curiosity, what does the sched_request_boincai05.cern.ch_CMS-dev.xml file show as far as free disk space?

My sched_request_boincai05.cern.ch_CMS-dev.xml:
<host_info>
    ...
    <d_total>1999871410176.000000</d_total>
    <d_free>1583612952576.000000</d_free>
    ...
</host_info>


I'm curious if the client is misreporting the amount of free disk space to the project.

----- Rom


Nevermind, I just received the error message too.

2/8/2016 9:23:26 PM | CMS-dev | Message from server: CMS Simulation needs 3007.27MB more disk space.  You currently have 6529.48 MB available and it needs 9536.74 MB.


----- Rom
8) Message boards : Number crunching : Current issues (Message 1914)
Posted 9 Feb 2016 by Rom Walton (BOINC)
Post:
Out of curiosity, what does the sched_request_boincai05.cern.ch_CMS-dev.xml file show as far as free disk space?

My sched_request_boincai05.cern.ch_CMS-dev.xml:
<host_info>
    ...
    <d_total>1999871410176.000000</d_total>
    <d_free>1583612952576.000000</d_free>
    ...
</host_info>


I'm curious if the client is misreporting the amount of free disk space to the project.

----- Rom
9) Message boards : Number crunching : Expect errors eventually (Message 1465)
Posted 16 Nov 2015 by Rom Walton (BOINC)
Post:
If nothing else, can somebody change the Hypervisor timeout value from 24 hours to something more reasonable??? Or do I need to reset the project on all hosts to get the new wrapper, or what?

A simple restart of BOINC client will retry to start the CMS-task immediately.

Vboxwrapper version 26178 is already active.


Ah, okay, then http://boincai05.cern.ch/CMS-dev/result.php?resultid=66754 was just an old task.

----- Rom
10) Message boards : Number crunching : Expect errors eventually (Message 1462)
Posted 15 Nov 2015 by Rom Walton (BOINC)
Post:
All resumed! :-)

edit: Except half of them say "Hypervisor failed to enter an online state in a timely manner" and are sleeping for a day, and of course the Mac STILL says "Virtualbox not installed"...


The latest Vboxwrapper should resolve that issue.

See: https://github.com/BOINC/boinc/releases/tag/vboxwrapper%2F26178

----- Rom
11) Message boards : Number crunching : BOINC_USERID is not an integer (Message 1453)
Posted 13 Nov 2015 by Rom Walton (BOINC)
Post:
I'm not sure how things are handled in the VM these days, so a lot might have changed.

There used to be a bug in the readFloppy.pl script that would attempt to open the floppy device in read/write mode and would fail. This lead to an error on line 55 or 58, I don't remember specifically. It has been a few years since I first looked into it.

For one reason or another when a VM booted under VirtualBox it would flag the floppy device as read-only. This would lead to perl complaining about the device being ready only and not being able to open it for read/write. That in turn lead to the host and user id detection failing.

Does the readFloppy.pl script still attempt to convert XML to JSON in place?

----- Rom
12) Message boards : Number crunching : vboxwrapper issue (Message 1244)
Posted 13 Oct 2015 by Rom Walton (BOINC)
Post:
In the slot directory you will find a floppy image file. This is attached to the VM and the user information can then be read from /dev/fd0.

Although the cron sleeps for 1 hour, the next cron job is run. I guess this is a bug :(


I don't know if this is still the case or not. At one time there was a bug in the readFloppy.pl script run within the guest VM where it would fail if /dev/fd0 was read-only.

IIRC, it was line 58, where it attempted to open the device and perform some in-place memory modifications.

I don't know what was triggering /dev/fd0 to become read-only. It happened on a previous dev box of mine. I haven't checked against my current dev box.
13) Message boards : News : VBox Wrappers Updated to 26158 (Message 194)
Posted 26 Mar 2015 by Rom Walton (BOINC)
Post:
I'm curious what happens during random power failures, reboots, and vboxsvc going nuts.

----- Rom
14) Message boards : News : VBox Wrappers Updated to 26158 (Message 186)
Posted 26 Mar 2015 by Rom Walton (BOINC)
Post:
Hi Ivan (and Rom),

Just tested the new feature on my Mac but it didn't work. The boot of CMS code worked fine. After a BOINC suspend, the VM went into Paused state (and could be resumed OK), but after an exit of the BOINC manager the VM was left in PowerOff state, (not "Saved" state) and hence rebooted when BOINC was restarted.

Ben


Can you abort the task so I can look at the stderr text?

----- Rom



©2024 CERN