21) Message boards : Number crunching : Job queue empty!! (Message 699)
Posted 19 Aug 2015 by Richard Haselgrove
Post:
We're about to run out of jobs again. I need to check a few things on the condor queues so we might sit empty for an hour or two. I'll try to get something running again later this afternoon (London time). Thanks for your patience.

I'm hanging off on this overnight -- it looks like there are enough jobs being resubmitted by the Condor queue to keep things humming,

My Host 380 is looking decidedly yellow!
22) Message boards : Number crunching : Monitoring CMS job activity remotely (Message 697)
Posted 19 Aug 2015 by Richard Haselgrove
Post:
For Windows you could start with the command:

wmic cpu get loadpercentage

Sadly, I'm not getting very far with that.

wmic cpu get loadpercentage
LoadPercentage
93

doesn't tell me much about how many of the four or eight cores on the machine are running flat out - and how many of them are running CMS-dev

wmic process where name="vboxheadless.exe" get UserModeTime
UserModeTime
312002
156001
3996277617

is more promising - I could call it twice, with a timed pause between, and work out the diff. But I'm sure there must be a better way. Anyone?

Depending on the outcome and on your needs use a higher value of ncpus in cc_config.xml in combination with an alternate of global_prefs_override.xml for processors to use.

Yes - I was thinking along those lines too. boinccmd can't control the percentage directly, but we could have

global_prefs_override_CMSruns.xml
global_prefs_override_CMSstop.xml

and copy the right one to plain global_prefs_override.xml and re-read it.
23) Message boards : Number crunching : Monitoring CMS job activity remotely (Message 686)
Posted 19 Aug 2015 by Richard Haselgrove
Post:
On BoincView, if you look in it's program folder, there's a file boincview.ini in which is a SlowProgress item. Changing this alters the colour change threshold but the changes don't stick... not on mine anyway which sometimes stays green between jobs, too.

You can also configure both the colour used and the trigger threshhold from the BoincView GUI itself - toolbox button, tooltip "Open program preferences" (or simply F11). Colour is in the 'Look' section, threshhold in 'Tasks'. That seems to stick. (I'm using v1.4.2 - I think that was the final non-beta)

I'll have a poke around the other suggestions for remote viewing this afternoon, and maybe write up a BOINC enhancement request. And maybe I'll ask Efmer how the gets the efficiency value for BoincTasks, and see if I can work out a way to script it (the rest should be easy). As Raistmer says, we wouldn't want fine-grained control - it wouldn't be the end of the earth if BOINC ran over-committed or under-committed for five minutes, or even for an hour.
24) Message boards : Number crunching : Monitoring CMS job activity remotely (Message 677)
Posted 19 Aug 2015 by Richard Haselgrove
Post:
I have a couple of machines running CMS-dev today for Ivan's challenge, but neither of them is in the room where I'm typing this - one is on the floor above, the other on the floor below.

BOINC's own Manager can connect to both remote machines (one at a time), and manage everything to do with standard BOINC jobs. But neither 'Show graphics' nor 'VM console' is available, except when the Manager on the local machine is used. Perhaps that's one the CERN developers and the BOINC developers could think about together.

For Windows users, there are at least a couple of enhanced management tools with network management capability for BOINC: BoincView (very ancient, and no longer maintained, but it still works), and BoincTasks, which is still under active support. Both of these tools calculate and display a value for "CPU efficiency" (a concept which was dropped from BOINC itself several years ago), and I can tell that both VMs have CMS jobs - well, something - running, because CPU efficiency is currently in the high 90s on both machines. Periodically, I notice that one or other of them has 'gone yellow' (BoincView's colour coding for low efficiency jobs), which I presume implies that CMS has finished one internal job and may (or may not) be starting another.

Which leads me to another thought. Surely some helpful person could write a watchdog script which periodically tested CPU efficiency for CMS jobs, and if it fell below threshhold, use boinccmd to tweak <max_ncpus_pct> to allow an additional task to use the idle core, and stop it again when CMS was active.

Thoughts?
25) Message boards : Number crunching : Heads up! Looking to make a major challenge next Wednesday; a call for volunteers (Message 610)
Posted 17 Aug 2015 by Richard Haselgrove
Post:
ican't get any work

Your PC has only 3GB Memory this may be to low

And not shure if CMS can run on an 32 BIT-System

CMS only offer 64-bit versions for automatic download, but I managed to get it running on a 32-bit machine (with a virtualisation-supporting CPU) by downloading the 32-bit version of the BOINC wrapper and running it through an app_info.xml file.

Interesting experiment, but hardly worth it in the long term.
26) Message boards : Number crunching : Some questions (Message 598)
Posted 16 Aug 2015 by Richard Haselgrove
Post:
I've seen reports that VBox 5.0 isn't Win10 compatible. That's as far as my knowledge goes...

Oracle itself reports that VBox 5.0 isn't (yet) Win10 compatible. From the Oracle VBox download page:

Please be aware that Windows 10 is not yet officially supported! There are known problems with VirtualBox 5.0.2 on Windows 10 hosts and with Windows 10 guests. Some of the problems are fixed in the most recent test build which can be found here.
27) Message boards : News : Agent Update (Message 531)
Posted 6 Aug 2015 by Richard Haselgrove
Post:
Wrapper 26169 seems to be working fine with VBox 4.3.26 here.
28) Message boards : News : Agent Update (Message 504)
Posted 4 Aug 2015 by Richard Haselgrove
Post:
Not (yet) a problem but a related point. More volunteers may now participate in CMS. Presumably those using Windows still either need to use the "CMS patch" (BOINC 7.5.1) or later BOINC version to avoid interference with other projects. The link to the patch given on the CMS main page has not been reinstated nor is the patch now available from the download page. It would be a good idea to either provide a working link to the patch or alter the wording to point to appropriate BOINC versions for Windows.

That would, currently, be BOINC v7.6.6 - available from http://boinc.berkeley.edu/download_all.php

Except we found a new bug in that one yesterday (won't affect CMS-dev, unless members also run multiple GPUs). For that, there's a new patch:

http://www.romwnet.org/files/boinc.030815.x64.zip

That's downloading from Rom Walton's personal webspace - there are currently technical problems restricting access to the BOINC server.
29) Message boards : News : Real CMS Jobs (Message 491)
Posted 24 Jul 2015 by Richard Haselgrove
Post:
As Ivan mentioned, efforts are already underway to run HEP code on GPUs. However, it is not clear whether you can do this from within a VM. This is something that we may consider investigating in the future but first we have to get this project up and running! The advantage of using a VM is that we can easily run anywhere that support virtualization so if a suitable hypervisor exists for Android, we in theory could run there.

Using GPUs in a VM is problematic. You need a virtualisation product which supports a feature called "GPU passthrough". NVidia has a page on the subject: http://www.nvidia.com/object/dedicated-gpus.html, which says that NVidia GPUs can be used with Citrix XenDesktop or VMware Horizon View. I expect AMD will have a similar list for their GPUs.

Unfortunately, the BOINC virtual machine interface - originally developed at CERN, I believe - only supports Oracle VirtualBox, which doesn't currently support GPU passthrough. So, unless either Oracle upgrades VB, or BOINC interfaces with a second VM technology, this doesn't look like a starter.
30) Message boards : Number crunching : Not getting tasks (Message 483)
Posted 18 Jul 2015 by Richard Haselgrove
Post:
Your computer has a 32-bit operating system. This project only has 64-bit applications.
31) Message boards : Science : L H C (Message 474)
Posted 12 Jul 2015 by Richard Haselgrove
Post:
A quick troll through the daily reports shows no significant CMS problems. Guess I'd better start reading them more carefully again, now we're up and running -- I'd gotten out of the habit during the shutdown.

Okay, so how many mb is one cup of coffee, and what beam current at 14TEv is required to brew it in 60s?

Not sure what your "mb" abbreviation is... But assuming a cup of coffee is 200 ml, raising it to 100 C from 20 C is 16,000 joules; in 60 secs that's 266.67 j/s or watts. At 14e12 V, that'd require 1.9e-11 amps or 0.12e-9 particles/sec; last night we were running at ~3e13 pps -- what's that, 80,000 cups of coffee/minute?
I probably lost a zero or two here and there... :-)

Starbucks eat your heart out, here comes the LHC ;)
32) Message boards : Number crunching : CPID (Message 466)
Posted 9 Jul 2015 by Richard Haselgrove
Post:
Hello,
How comes the CPID is not "synchronizing" with my usual CPID, although
CMS is running on several machines running several BOINC projects ...
Thank You

Hmm, you got me there! That's way out of my area of expertise. I hope someone else who understands the nuances can chime in!

The requirements for CPID to align are

1) You register with the same email address at each project.
AND
2) you have one machine connected to every project, or a chain of machines each connected to two or more projects, which between them span your entire project list.
33) Message boards : Number crunching : Job queue empty!! (Message 458)
Posted 3 Jul 2015 by Richard Haselgrove
Post:
There seem to be some characters in it that can't be shown correctly:

As you can see the first character and the one after the '!' are the ones I'm talking about.
Overall the beginning and the end of the line seem a little bit odd. Is that intended?

Those look like some old VT-series terminal escape characters, perhaps designed to set some screen attribute and turn it off again.
34) Message boards : Number crunching : Job queue empty!! (Message 457)
Posted 3 Jul 2015 by Richard Haselgrove
Post:
That's why you should always consult with your users before finalising the design!
35) Message boards : Number crunching : Job queue empty!! (Message 455)
Posted 3 Jul 2015 by Richard Haselgrove
Post:
We should get some interesting data from the latest LHC design features



(scanned from tomorrow's edition of the UK magazine "New Scientist". Work that one out if you can!)
36) Message boards : Number crunching : Multiple Jobs In A Single Host (Message 417)
Posted 28 May 2015 by Richard Haselgrove
Post:
I think the operative word in that note might be 'remove' - either the whole file, or some entry types within it.

I suspect some entries, notably the thread count settings for MT applications, may have a delayed impact - they might only come into effect when a new task is started, or when new work is fetched from the project concerned. And the GPU count is slow to display, even though it comes into effect immediately. Altogether, it's a slightly clumsy and perhaps unfinished (though incredibly useful) mechanism.

I do have an editing account for that Wiki, so if I can think of a better form of words (or if anyone else can suggest one), I can post it.
37) Message boards : News : Urgent Update for Windows Users (Message 413)
Posted 28 May 2015 by Richard Haselgrove
Post:
v7.6.1 has been withdrawn because of Manager problems with Windows 10, and replaced with v7.6.2

Now available via http://boinc.berkeley.edu/download_all.php
38) Message boards : Number crunching : Multiple Jobs In A Single Host (Message 412)
Posted 27 May 2015 by Richard Haselgrove
Post:
Tja, the .xml file I posted above does seem to work for the vLHC tasks. It's a little bit chicken-and-egg as to what order you need to do i) creating the file; ii) resetting the project; and iii) stopping and restarting BOINC -- but it works itself out in short order.
Now the question is, can we ship a one-job-at-a-time default file, and then raise the per-PC job limit so that people who want, and have the capacity, to run more than one can easily edit the file to that end?

Where are you seeing anything about resetting the project?

My recipe for doing that would be
i) Create the file
ii) Issue the 'Read config files' command from BOINC Manager

If you have two tasks running at once, and set <max_concurrent> to 1, one of them will stop. Simple as that.
39) Message boards : News : Urgent Update for Windows Users (Message 407)
Posted 26 May 2015 by Richard Haselgrove
Post:
For anyone who wants it, there's now a full Windows BOINC installer which incorporates the various hotfixes.

It's still very much a beta test build (v7.6.1), so be ready for unexpected surprises - but the BOINC developers want to fast-track the v7.6 line for official release, and I think they feel there's not much more work to do on it - famous last words.

One thing to watch out for - with the full installer, you get the new version of the Manager, as well as the client we need to use here. There's been a major re-organisation of the menus in Advanced view, which is disconcerting at first.

Find your preferred version in the download directory - it hasn't been added to the standard download page yet.

http://boinc.berkeley.edu/dl/?C=M;O=D
40) Message boards : News : Urgent Update for Windows Users (Message 403)
Posted 25 May 2015 by Richard Haselgrove
Post:
The disk error has not happened here or at Atlas since then.

I've not really been looking for errors here or at Atlas - I reckon CERN can look after its own ;)

But before urging Ivan to post this 'urgent update' news, I did spot-check a number of the top hosts here - from a number of users, not just you - and found a number of 'disk usage' errors at the other projects they run. Collatz seems to be a popular 'also runs' project for the users here, and several hosts there show the characteristic errors in their task lists.


Previous 20 · Next 20


©2024 CERN