Message boards : News : No new jobs
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 13 · Next

AuthorMessage
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1128
Credit: 7,870,419
RAC: 595
Message 817 - Posted: 22 Aug 2015, 9:10:13 UTC

We've run out of jobs on the Condor server. Until I can sort out the glitch that's preventing me submitting new jobs you can all take a rest for the weekend, or switch to backup projects.
Cheeers, ivan
ID: 817 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1128
Credit: 7,870,419
RAC: 595
Message 819 - Posted: 22 Aug 2015, 10:10:09 UTC - in response to Message 817.  

I've just had confirmation from CERN developers that changes they have made are preventing job submission. We'll have to wait until they find time to allow our jobs again.
ID: 819 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
newman

Send message
Joined: 15 Feb 15
Posts: 10
Credit: 16,387
RAC: 0
Message 820 - Posted: 22 Aug 2015, 10:57:10 UTC - in response to Message 819.  

What does this mean for running WUs? put thrm on hold? Or let them run?
ID: 820 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 2,009
Message 821 - Posted: 22 Aug 2015, 14:46:33 UTC - in response to Message 820.  

What does this mean for running WUs? put thrm on hold? Or let them run?

It's up to you. The tasks aren't doing real work at the moment.

- You could suspend the BOINC-task (deadline = 7 days) and wait for a signal from the developers that new jobs for the VM are available.
- Or let the BOINC-task run the whole 24 hours setting No New Tasks for the CMS-dev project until further notice.
ID: 821 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1128
Credit: 7,870,419
RAC: 595
Message 822 - Posted: 22 Aug 2015, 15:43:58 UTC - in response to Message 820.  

What does this mean for running WUs? put thrm on hold? Or let them run?

Probably best to put them on hold (set no new work), unless you're chasing credits rather than science. :-)
[Oops, I got distracted and didn't post this when I wrote it several hours ago...]
ID: 822 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,945,852
RAC: 1
Message 823 - Posted: 22 Aug 2015, 17:43:55 UTC

Or you could set No New Work then "end the task gracefully" in the same manner as vLHC;

In Boinc Manager open the Options tab, then Computing Preferences.
In Disk and Memory tab, uncheck the "Leave applications in memory" box. Click OK.
Exit Boinc.
In Computer/ C:/ Program Data/ Boinc/ slots/ find the slot for the CMS VM, (check its name in VBox)
Open vbox_checkpoint with a text editor, eg notepad
Edit the lapsed time eg.
<elapsed_time>77551.758089</elapsed_time> to
<elapsed_time>86400</elapsed_time>
Save.
Restart Boinc.
This will trick Boinc into thinking that the task has reached its 24hrs timeout and will send it home after a minute or so.
You'll only get credit for the actual work done but you won't miss out on the credits that an Abort would cause.
ID: 823 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 734
Credit: 11,558,055
RAC: 2,030
Message 824 - Posted: 22 Aug 2015, 18:51:57 UTC

Why do I still get new tasks if it isn't working?

What am I missing here?

Not that I am complaining
Mad Scientist For Life
ID: 824 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 825 - Posted: 22 Aug 2015, 19:28:26 UTC - in response to Message 824.  
Last modified: 22 Aug 2015, 19:43:55 UTC

You may get a new task and boinc might show progress.
However, it is not doing any work as it needs to download tasks while running.
This does not work right now.
The way to tell is,if your cpu utilization is low (<40%).

Check the task Manager ( Windows) and see, if any of the processes "vboxheadless.exe" have more than 40% utilization continuously.
If none of them are doing that, then it is not working.

Or open vbox, double click the virtual engine on the top left and press F3.

If there is no process at the top of the list CMSrun with close to 100% utilization, then it is not working.
ID: 825 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1128
Credit: 7,870,419
RAC: 595
Message 827 - Posted: 23 Aug 2015, 9:07:10 UTC - in response to Message 825.  

Correct. You run "tasks", which have a 24-hour lifetime. Tasks download "jobs" to run -- lately these have been about 20 mins in duration (depending on your processor capability of course) with something like 10 mins between them to upload results (again, lately about 30 MB -- this is a limiting factor in how long jobs last as we don't want large data transfers) and download a new job.
Currently I'm unable to create jobs so your tasks just spin their wheels for 24 hours. AFAIK we give credit for each task, regardless of how many jobs it ran -- I'm just taking the opportunity to test that empirically now. :-)
ID: 827 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 2,009
Message 828 - Posted: 23 Aug 2015, 10:00:57 UTC - in response to Message 827.  

AFAIK we give credit for each task, regardless of how many jobs it ran -- I'm just taking the opportunity to test that empirically now. :-)

Hi ivan,

For the amount of BOINC-credits, it doesn't matter how many jobs you've done.
It even doesn't matter how many CPU-seconds the task has used/wasted.

Example:

86,100.64 seconds elapsed - 60,907.70 cpu-seconds used - Credits 676.54
85,750.98 seconds elapsed - 18,084.14 cpu-seconds used - Credits 672.23

Remark: The current 'idling' still means a cpu usage of ~18%.
ID: 828 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 829 - Posted: 23 Aug 2015, 10:44:51 UTC - in response to Message 828.  

The whole credit system is totally usless.
Some projects grant huge credits others next to none.
For similare runtimes you can get very different amounts of credits (seti for example).
As a means of determining the contribution to projects or the relative contribution within a Project is extremly innaccurate and inconsistent.

I suggest to implement an other way of measuring the contribution to this Project.
In this Project a count of units processed, would be in order.
Not Boinc Tasks, but the internal unit of records,elements or what you want to call it.

This would be far more meaningfull(within the Project) than boinc credits, which tell you nothing.
ID: 829 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 17 Aug 15
Posts: 17
Credit: 228,358
RAC: 0
Message 830 - Posted: 23 Aug 2015, 13:18:42 UTC - in response to Message 829.  

I quite agree that the credit system does not serve much of a purpose, except to ensure that your PC is working. Isn't the Credit New system supposed to make it more equal between projects? That would suit me fine. The absolute number of the points is useless, as long as they are roughly comparable.

The numbers handed out on a lot of projects should be reduced by two (or three) orders of magnitude anyway, to make the numbers less cumbersome to deal with. If some people leave, you might gain others with a better appreciation of what the project is really all about.
ID: 830 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile tullio

Send message
Joined: 17 Aug 15
Posts: 62
Credit: 296,695
RAC: 0
Message 831 - Posted: 23 Aug 2015, 13:59:31 UTC - in response to Message 830.  

In vLHC@home there is MCPLOTS which counts the number of jobs you have run and the number of events (collisions) you have crunched. When you reach 1 billion events you are admitted to the Billionaires club.
Tullio
ID: 831 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1128
Credit: 7,870,419
RAC: 595
Message 832 - Posted: 23 Aug 2015, 17:35:17 UTC - in response to Message 831.  

In vLHC@home there is MCPLOTS which counts the number of jobs you have run and the number of events (collisions) you have crunched. When you reach 1 billion events you are admitted to the Billionaires club.
Tullio

I agree we should do something like that. I agree Credits should be meaningful. However, I think we're a long way from having the luxury of the time to sit down and work out something equitable. :-(
By all means think about it, but the time for the discussion is not now, IMHO.
Yes, I deliberately raised the point, but only as one of awareness (to forestall cries of, "I didn't know that!"). Now that we're aware, let's have mainly private cogitations until the time comes when it really matters. Cheers.
ID: 832 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 278
Message 833 - Posted: 23 Aug 2015, 21:27:48 UTC - in response to Message 832.  

This is a topic that one message in a thread can not do it justice. We spend a lot of time thinking about this and have done since at least the 70s. The question is how do you measure work done or conversely the potential to do work. What we care about is normalised wall clock time. Measuring the time is easy, normalization is not so easy. Credit should we assigned whether the VM is running or not as we are consuming that potential to do work. How well we use that potential is a question of efficiency i.e, how much of that potential do we translate into work done. The can of worms here is what metric and method are used for the normalisation so as Ivan said, lets keep it closed for now. Efforts in related areas on this topic will most probably filter down to the area of Volunteer Computing so watch this space ...
ID: 833 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1128
Credit: 7,870,419
RAC: 595
Message 836 - Posted: 25 Aug 2015, 15:04:25 UTC

OK, you might notice some small jobs coming through if you've left a task idling. We can submit now, with the new version of CRAB, but we're testing the new stage-out chain.
ID: 836 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 2,009
Message 837 - Posted: 25 Aug 2015, 16:04:14 UTC

I don't think, this is the output you expect:

cmsRun -j FrameworkJobReport.xml PSet.py
%MSG-e Specific: OscarProducer:g4SimHits@beginRun 25-Aug-2015 18:01:52 CEST Run: 1
Could not process q-name of a DDLogicalPart, reason:
No regex-match for namespace= name=HFFibre.*

SpecPar selection is:
//HFFibre.*

%MSG
ID: 837 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 838 - Posted: 25 Aug 2015, 16:12:00 UTC

Don't know if this Shows a Problem or not:

08/25/15 17:59:25 (pid:18348) FILETRANSFER: "/home/boinc/CMSRun/glide_GDtCDO/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring
08/25/15 17:59:25 (pid:18348) FILETRANSFER: failed to add plugin "/home/boinc/CMSRun/glide_GDtCDO/main/condor/libexec/curl_plugin" because: FILETRANSFER:1:"/home/boinc/CMSRun/glide_GDtCDO/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring
08/25/15 17:59:25 (pid:18348) WARNING: Initializing plugins returned: FILETRANSFER:1:"/home/boinc/CMSRun/glide_GDtCDO/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring
ID: 838 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 841 - Posted: 25 Aug 2015, 17:01:56 UTC - in response to Message 837.  

I don't think, this is the output you expect:

cmsRun -j FrameworkJobReport.xml PSet.py
%MSG-e Specific: OscarProducer:g4SimHits@beginRun 25-Aug-2015 18:01:52 CEST Run: 1
Could not process q-name of a DDLogicalPart, reason:
No regex-match for namespace= name=HFFibre.*

SpecPar selection is:
//HFFibre.*

%MSG

See the same message (on ALT F5 screen)
ID: 841 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1128
Credit: 7,870,419
RAC: 595
Message 844 - Posted: 25 Aug 2015, 20:16:35 UTC - in response to Message 837.  

I don't think, this is the output you expect:

cmsRun -j FrameworkJobReport.xml PSet.py
%MSG-e Specific: OscarProducer:g4SimHits@beginRun 25-Aug-2015 18:01:52 CEST Run: 1
Could not process q-name of a DDLogicalPart, reason:
No regex-match for namespace= name=HFFibre.*

SpecPar selection is:
//HFFibre.*

%MSG

Surprisingly, that is an output I'd expect. This batch of jobs is simulating t/t-bar events in the tracker geometry we expect to install about 2025. There are obviously physical differences between the current geometry and what we'll have then, so this is saying there's a mismatch between what currently is there and what will be later. I've seen it many times before -- it's in a different part of the detector to what I'm involved with and will have no effect on the analysis I hope eventually to perform on these results. But thanks for noticing!
ID: 844 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 13 · Next

Message boards : News : No new jobs


©2024 CERN