Message boards :
News :
No new jobs
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,876,541 RAC: 270 |
We've run out of jobs on the Condor server. Until I can sort out the glitch that's preventing me submitting new jobs you can all take a rest for the weekend, or switch to backup projects. Cheeers, ivan |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,876,541 RAC: 270 |
I've just had confirmation from CERN developers that changes they have made are preventing job submission. We'll have to wait until they find time to allow our jobs again. |
Send message Joined: 15 Feb 15 Posts: 10 Credit: 16,387 RAC: 0 |
What does this mean for running WUs? put thrm on hold? Or let them run? |
Send message Joined: 13 Feb 15 Posts: 1180 Credit: 815,336 RAC: 238 |
What does this mean for running WUs? put thrm on hold? Or let them run? It's up to you. The tasks aren't doing real work at the moment. - You could suspend the BOINC-task (deadline = 7 days) and wait for a signal from the developers that new jobs for the VM are available. - Or let the BOINC-task run the whole 24 hours setting No New Tasks for the CMS-dev project until further notice. |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,876,541 RAC: 270 |
What does this mean for running WUs? put thrm on hold? Or let them run? Probably best to put them on hold (set no new work), unless you're chasing credits rather than science. :-) [Oops, I got distracted and didn't post this when I wrote it several hours ago...] |
Send message Joined: 13 Apr 15 Posts: 138 Credit: 2,945,852 RAC: 0 |
Or you could set No New Work then "end the task gracefully" in the same manner as vLHC; In Boinc Manager open the Options tab, then Computing Preferences. In Disk and Memory tab, uncheck the "Leave applications in memory" box. Click OK. Exit Boinc. In Computer/ C:/ Program Data/ Boinc/ slots/ find the slot for the CMS VM, (check its name in VBox) Open vbox_checkpoint with a text editor, eg notepad Edit the lapsed time eg. <elapsed_time>77551.758089</elapsed_time> to <elapsed_time>86400</elapsed_time> Save. Restart Boinc. This will trick Boinc into thinking that the task has reached its 24hrs timeout and will send it home after a minute or so. You'll only get credit for the actual work done but you won't miss out on the credits that an Abort would cause. |
Send message Joined: 8 Apr 15 Posts: 751 Credit: 11,610,299 RAC: 1,436 |
Why do I still get new tasks if it isn't working? What am I missing here? Not that I am complaining Mad Scientist For Life |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
You may get a new task and boinc might show progress. However, it is not doing any work as it needs to download tasks while running. This does not work right now. The way to tell is,if your cpu utilization is low (<40%). Check the task Manager ( Windows) and see, if any of the processes "vboxheadless.exe" have more than 40% utilization continuously. If none of them are doing that, then it is not working. Or open vbox, double click the virtual engine on the top left and press F3. If there is no process at the top of the list CMSrun with close to 100% utilization, then it is not working. |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,876,541 RAC: 270 |
Correct. You run "tasks", which have a 24-hour lifetime. Tasks download "jobs" to run -- lately these have been about 20 mins in duration (depending on your processor capability of course) with something like 10 mins between them to upload results (again, lately about 30 MB -- this is a limiting factor in how long jobs last as we don't want large data transfers) and download a new job. Currently I'm unable to create jobs so your tasks just spin their wheels for 24 hours. AFAIK we give credit for each task, regardless of how many jobs it ran -- I'm just taking the opportunity to test that empirically now. :-) |
Send message Joined: 13 Feb 15 Posts: 1180 Credit: 815,336 RAC: 238 |
AFAIK we give credit for each task, regardless of how many jobs it ran -- I'm just taking the opportunity to test that empirically now. :-) Hi ivan, For the amount of BOINC-credits, it doesn't matter how many jobs you've done. It even doesn't matter how many CPU-seconds the task has used/wasted. Example: 86,100.64 seconds elapsed - 60,907.70 cpu-seconds used - Credits 676.54 85,750.98 seconds elapsed - 18,084.14 cpu-seconds used - Credits 672.23 Remark: The current 'idling' still means a cpu usage of ~18%. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
The whole credit system is totally usless. Some projects grant huge credits others next to none. For similare runtimes you can get very different amounts of credits (seti for example). As a means of determining the contribution to projects or the relative contribution within a Project is extremly innaccurate and inconsistent. I suggest to implement an other way of measuring the contribution to this Project. In this Project a count of units processed, would be in order. Not Boinc Tasks, but the internal unit of records,elements or what you want to call it. This would be far more meaningfull(within the Project) than boinc credits, which tell you nothing. |
Send message Joined: 17 Aug 15 Posts: 17 Credit: 228,358 RAC: 0 |
I quite agree that the credit system does not serve much of a purpose, except to ensure that your PC is working. Isn't the Credit New system supposed to make it more equal between projects? That would suit me fine. The absolute number of the points is useless, as long as they are roughly comparable. The numbers handed out on a lot of projects should be reduced by two (or three) orders of magnitude anyway, to make the numbers less cumbersome to deal with. If some people leave, you might gain others with a better appreciation of what the project is really all about. |
Send message Joined: 17 Aug 15 Posts: 62 Credit: 296,695 RAC: 0 |
In vLHC@home there is MCPLOTS which counts the number of jobs you have run and the number of events (collisions) you have crunched. When you reach 1 billion events you are admitted to the Billionaires club. Tullio |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,876,541 RAC: 270 |
In vLHC@home there is MCPLOTS which counts the number of jobs you have run and the number of events (collisions) you have crunched. When you reach 1 billion events you are admitted to the Billionaires club. I agree we should do something like that. I agree Credits should be meaningful. However, I think we're a long way from having the luxury of the time to sit down and work out something equitable. :-( By all means think about it, but the time for the discussion is not now, IMHO. Yes, I deliberately raised the point, but only as one of awareness (to forestall cries of, "I didn't know that!"). Now that we're aware, let's have mainly private cogitations until the time comes when it really matters. Cheers. |
Send message Joined: 12 Sep 14 Posts: 1064 Credit: 328,405 RAC: 158 |
This is a topic that one message in a thread can not do it justice. We spend a lot of time thinking about this and have done since at least the 70s. The question is how do you measure work done or conversely the potential to do work. What we care about is normalised wall clock time. Measuring the time is easy, normalization is not so easy. Credit should we assigned whether the VM is running or not as we are consuming that potential to do work. How well we use that potential is a question of efficiency i.e, how much of that potential do we translate into work done. The can of worms here is what metric and method are used for the normalisation so as Ivan said, lets keep it closed for now. Efforts in related areas on this topic will most probably filter down to the area of Volunteer Computing so watch this space ... |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,876,541 RAC: 270 |
OK, you might notice some small jobs coming through if you've left a task idling. We can submit now, with the new version of CRAB, but we're testing the new stage-out chain. |
Send message Joined: 13 Feb 15 Posts: 1180 Credit: 815,336 RAC: 238 |
I don't think, this is the output you expect: cmsRun -j FrameworkJobReport.xml PSet.py %MSG-e Specific: OscarProducer:g4SimHits@beginRun 25-Aug-2015 18:01:52 CEST Run: 1 Could not process q-name of a DDLogicalPart, reason: No regex-match for namespace= name=HFFibre.* SpecPar selection is: //HFFibre.* %MSG |
Send message Joined: 29 May 15 Posts: 147 Credit: 2,842,484 RAC: 0 |
Don't know if this Shows a Problem or not: 08/25/15 17:59:25 (pid:18348) FILETRANSFER: "/home/boinc/CMSRun/glide_GDtCDO/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring 08/25/15 17:59:25 (pid:18348) FILETRANSFER: failed to add plugin "/home/boinc/CMSRun/glide_GDtCDO/main/condor/libexec/curl_plugin" because: FILETRANSFER:1:"/home/boinc/CMSRun/glide_GDtCDO/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring 08/25/15 17:59:25 (pid:18348) WARNING: Initializing plugins returned: FILETRANSFER:1:"/home/boinc/CMSRun/glide_GDtCDO/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring |
Send message Joined: 29 May 15 Posts: 147 Credit: 2,842,484 RAC: 0 |
I don't think, this is the output you expect: See the same message (on ALT F5 screen) |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,876,541 RAC: 270 |
I don't think, this is the output you expect: Surprisingly, that is an output I'd expect. This batch of jobs is simulating t/t-bar events in the tracker geometry we expect to install about 2025. There are obviously physical differences between the current geometry and what we'll have then, so this is saying there's a mismatch between what currently is there and what will be later. I've seen it many times before -- it's in a different part of the detector to what I'm involved with and will have no effect on the analysis I hope eventually to perform on these results. But thanks for noticing! |
©2024 CERN