1) Message boards : CMS Application : No Tasks (Message 5479)
Posted 11 Aug 2018 by Ben Segal
Post:
So, is the CMS-project now completly dead?(dev and production)

No, the CMS people are still trying to get their production job submission working. Don't know how hard it is or how hard they are working - vacation time...
2) Message boards : Number crunching : New Applications (Message 5470)
Posted 18 Jul 2018 by Ben Segal
Post:
These test apps are for now just placeholders for a summer student project which is looking at some problems involving Machine Learning.

Will post if/when we have some real work to crunch. For now please don't waste time wondering, OK?

Ben and Laurence
3) Message boards : CMS Application : Dip? (Message 4315)
Posted 14 Nov 2016 by Ben Segal
Post:
BOINC's feeder is not running: https://lhcathome.cern.ch/vLHCathome-dev/server_status.php

At least when using the new secure project URL.

Thanks for the heads up!
It's been restarted.

Ben and Nils
4) Message boards : Theory Application : Authentication errors - error 206. (Message 4249)
Posted 28 Oct 2016 by Ben Segal
Post:
Yes, we had a firewall issue with the Condor job feeder. Now being fixed so things should recover slowly.
5) Message boards : News : CMS Servers up again (Message 4238)
Posted 26 Oct 2016 by Ben Segal
Post:
My hosts got a couple of WUs from the non-dev project although it was clear they would run into an error.
Why can´t you stop sending out WUs until the patches are installed?

The VMs are also linux machines.
If they use a COW filesystem they are also affected by that bug.
I´m sure I am the very first thinking about that :-))

The patch for this bug was issued yesterday and will be applied automagically when your current task expires and your CernVM reboots.

As I stated a couple of times in this message board my hosts still do not get the most recent application versions (CMS, CMS-dev).
The older apps download/boot older VM images, e.g. CMS_2016_08_08.vdi in case of CMS-dev, which are not patched.

Resetting the projects or rebooting the hosts do not solve the problem.

Well actually you do get the security patches whatever .vdi version gets loaded. CernVM is connected to its file system CVMFS and it is this which does automagical kernel and library updates right after booting the vdi image.
6) Message boards : CMS Application : New Version v47.60 (Message 4237)
Posted 26 Oct 2016 by Ben Segal
Post:
Do you mean vLHC is hosted by an "external" laboratory ? I would have thought this was "inside" LHC infrastructure, somehow...

...

The vLHC BOINC servers are at CERN but the Condor servers which supply jobs can be elsewhere. In this case, CMS jobs are being sent from RAL where Ivan is partly based.
7) Message boards : News : CMS Servers up again (Message 4232)
Posted 25 Oct 2016 by Ben Segal
Post:
My hosts got a couple of WUs from the non-dev project although it was clear they would run into an error.
Why can´t you stop sending out WUs until the patches are installed?

The VMs are also linux machines.
If they use a COW filesystem they are also affected by that bug.
I´m sure I am the very first thinking about that :-))

The patch for this bug was issued yesterday and will be applied automagically when your current task expires and your CernVM reboots.
8) Message boards : CMS Application : New Version v47.60 (Message 4188)
Posted 17 Oct 2016 by Ben Segal
Post:
This new version enables Web proxy auto discovery (wpad) which means the CVMFS traffic should be directed to either CERN for FNAL depending on where you are.

By the way, FNAL means Fermi National Accelerator Lab which is near Chicago USA.
9) Message boards : Theory Application : Multicore settings for 20 cor machine (Message 3892)
Posted 30 Jul 2016 by Ben Segal
Post:
At least without app_config and no tasks, an empty cache and asking 0 days of work, I got 8 tasks for my 8 threaded (4 core HT) machine.
1 task started, created a VM with 8 processors, running 8 jobs with far too low memory. At least the swap is now used ;)

Yes, looks like Work In Progress (:-))
10) Message boards : Theory Application : Multicore settings for 20 cor machine (Message 3890)
Posted 30 Jul 2016 by Ben Segal
Post:
- the new project preference options set to 1 task max, 3 CPUs max

Is this setting only for (ex-)CERN-employees??

I only see a new "Limit the number of tasks per host?" with a tick-box

I got at least 3 tasks with this option set to yes.

Aaaargh!!! Looks like Laurence is doing development under our feet. Those preferences were set by me yesterday, before today's testing. The page with them displayed was still on my screen but I'd not refreshed it today. Now, as you say, they aren't shown any more. But hopefully they are still active somewhere..

Laurence, what goes ???
11) Message boards : Number crunching : Performance Tuning (Message 3887)
Posted 30 Jul 2016 by Ben Segal
Post:
Is there ANY way, i could run 3 jobs with 4 cores?

The load average running 4 jobs is about 7.
That means, that there is much more work per job, than a single core can process.
Before the multi-core version, i could do it, now i can only use 1 core per job, which is substantially overloaded and therefore slow.

If the aim of game is, to run as fast as possible,this should be implemented.
Alternatively, there could be 2 versions.
One as is and the other as it was before (through plan classes).

EDIT: Theory jobs

I am currently doing this on my i7 Macbook Pro (4CPUs and 8 threads), just setting the new project preference options to 1 task max and 3 CPUs max. Try it!

See my post http://lhcathomedev.cern.ch/vLHCathome-dev/forum_thread.php?id=280&postid=3885#3885
12) Message boards : Theory Application : Multicore settings for 20 cor machine (Message 3885)
Posted 30 Jul 2016 by Ben Segal
Post:
Just FYI, with my MacBook Pro (i7, 4 CPUs, 8 threads) and:

- no app_config.xml file
- the new project preference options set to 1 task max, 3 CPUs max
- only the Theory app checked

and with the latest updated server code, I get (YES!):

1 task (Theory) and a 3 CPU VM

and it works correctly so far… I will check for progress today, including some suspends/resumes.

Ben

EDIT: So far all goes well:
4 jobs run to completion, all 3 Condor job threads active. Suspend for now (15:50 CET)
13) Message boards : Number crunching : Respect My Limits! (Message 3865)
Posted 29 Jul 2016 by Ben Segal
Post:
Hi Bryan, it's only very recently that Theory has been able to use multiple cores sensibly and we are slowly gaining experience with that. The code itself is not multi-threaded to any serious extent so one is forced to use either multiple BOINC tasks with a VM in each (ugly), or multiple VMs per task, or multiple jobs per VM (which may be tricky) or a mixture of all that. If the code were designed for multicore and/or multiple threads life would be simpler (like Atlas I believe). In any case going over about 8 threads at a time is application dependent and in general high energy physics code doesn't lend itself to that sort of thing. In the old days we tried and failed to use vector hardware on super computers like the Cray and so on. Sorry not to be able to ace out your super installation!

Ben
14) Message boards : Theory Application : Task startup issue (Message 3850)
Posted 29 Jul 2016 by Ben Segal
Post:
Thanks, Rom.
How is determined now, how many cores to be used?
By the project preference file? What if it is undefined?

...

Anyway be careful as Laurence hasn't yet fully implemented the new project preferences for number of cores and tasks… You may not get what you ask for until he's finished coding it.
15) Message boards : Theory Application : Task startup issue (Message 3829)
Posted 27 Jul 2016 by Ben Segal
Post:
I have a case, where i am running a 4 core tasks. For whatever reason, it decided after 3h or so, to discontinue 3 of them and runs only on one.
( as if it had hit the 12h mark, which it has not.)

I had the same thing yesterday with a 2 core task when one of the two job streams went quiet while the other continued to run jobs.
16) Message boards : Theory Application : Theory Application job errors (Message 3779)
Posted 22 Jul 2016 by Ben Segal
Post:
The Theory Application looks like it is getting a batch of bad input jobs from the MCPlots server. This is causing many task errors.
We are investigating… Please be patient.
17) Message boards : Theory Application : New Version v47.22 (Message 3773)
Posted 22 Jul 2016 by Ben Segal
Post:
Since about 5.44UTC there seem to be no jobs available.

BTW. Boinc-tasks should not error out, if no work is available.

EDIT:And this message shows at thr top of stderr.

<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
Der Ring 2-Stapel wird bereits verwendet.
(0xcf) - exit code 207 (0xcf)
</message>

Yes, there was a problem over the last few hours with both job supply and result processing on the Theory Applicationt. This has now been fixed so things should clear up steadily.

@CP, this also explains your lack of jobs after resuming...
18) Message boards : Theory Application : Task shutting down prematurly (Message 3604)
Posted 23 Jun 2016 by Ben Segal
Post:
As before, now task a running , maybe 1 job and starting a new task.

ARE WE OUT OF JOBS?

Thanks for the detailed reply to my first question.


Hi Rasputin, Laurence is busy for about 10 days so may not reply to you for a while…

Thanks for helping us test this!
19) Message boards : News : Scheduler and vbox update to detect 64-bit enabled computers (Message 3603)
Posted 23 Jun 2016 by Ben Segal
Post:
Hi maeax, Laurence is off for about 10 days so may not reply to you for a while…

Thanks for helping us test this!
20) Message boards : Number crunching : VBox issues (Message 3548)
Posted 8 Jun 2016 by Ben Segal
Post:
Rom just released vboxwrapper 26186 at the weekend to support VirtualBox 5.1 on Windows. The Theory app was updated so that the release on Monday would go with that version but the CMS app wasn't updated. It has been done now so please try again.


Does this apply as well to the production work at vLHC?

Yes it does.


Next 20


©2024 CERN