Thread 'No new jobs'

Author	Message
ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1156 Credit: 8,453,729 RAC: 298	Message 817 - Posted: 22 Aug 2015, 9:10:13 UTC We've run out of jobs on the Condor server. Until I can sort out the glitch that's preventing me submitting new jobs you can all take a rest for the weekend, or switch to backup projects. Cheeers, ivan ID: 817 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1156 Credit: 8,453,729 RAC: 298	Message 819 - Posted: 22 Aug 2015, 10:10:09 UTC - in response to Message 817. I've just had confirmation from CERN developers that changes they have made are preventing job submission. We'll have to wait until they find time to allow our jobs again. ID: 819 · Rating: 0 · rate: / Reply Quote

newman Send message Joined: 15 Feb 15 Posts: 10 Credit: 16,387 RAC: 0	Message 820 - Posted: 22 Aug 2015, 10:57:10 UTC - in response to Message 819. What does this mean for running WUs? put thrm on hold? Or let them run? ID: 820 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1279 Credit: 1,045,826 RAC: 136	Message 821 - Posted: 22 Aug 2015, 14:46:33 UTC - in response to Message 820. What does this mean for running WUs? put thrm on hold? Or let them run? It's up to you. The tasks aren't doing real work at the moment. - You could suspend the BOINC-task (deadline = 7 days) and wait for a signal from the developers that new jobs for the VM are available. - Or let the BOINC-task run the whole 24 hours setting No New Tasks for the CMS-dev project until further notice. ID: 821 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1156 Credit: 8,453,729 RAC: 298	Message 822 - Posted: 22 Aug 2015, 15:43:58 UTC - in response to Message 820. What does this mean for running WUs? put thrm on hold? Or let them run? Probably best to put them on hold (set no new work), unless you're chasing credits rather than science. :-) [Oops, I got distracted and didn't post this when I wrote it several hours ago...] ID: 822 · Rating: 0 · rate: / Reply Quote

Ray Murray Send message Joined: 13 Apr 15 Posts: 138 Credit: 3,015,630 RAC: 0	Message 823 - Posted: 22 Aug 2015, 17:43:55 UTC Or you could set No New Work then "end the task gracefully" in the same manner as vLHC; In Boinc Manager open the Options tab, then Computing Preferences. In Disk and Memory tab, uncheck the "Leave applications in memory" box. Click OK. Exit Boinc. In Computer/ C:/ Program Data/ Boinc/ slots/ find the slot for the CMS VM, (check its name in VBox) Open vbox_checkpoint with a text editor, eg notepad Edit the lapsed time eg. <elapsed_time>77551.758089</elapsed_time> to <elapsed_time>86400</elapsed_time> Save. Restart Boinc. This will trick Boinc into thinking that the task has reached its 24hrs timeout and will send it home after a minute or so. You'll only get credit for the actual work done but you won't miss out on the credits that an Abort would cause. ID: 823 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 990 Credit: 17,733,839 RAC: 19,544	Message 824 - Posted: 22 Aug 2015, 18:51:57 UTC Why do I still get new tasks if it isn't working? What am I missing here? Not that I am complaining Mad Scientist For Life ID: 824 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 825 - Posted: 22 Aug 2015, 19:28:26 UTC - in response to Message 824. Last modified: 22 Aug 2015, 19:43:55 UTC You may get a new task and boinc might show progress. However, it is not doing any work as it needs to download tasks while running. This does not work right now. The way to tell is,if your cpu utilization is low (<40%). Check the task Manager ( Windows) and see, if any of the processes "vboxheadless.exe" have more than 40% utilization continuously. If none of them are doing that, then it is not working. Or open vbox, double click the virtual engine on the top left and press F3. If there is no process at the top of the list CMSrun with close to 100% utilization, then it is not working. ID: 825 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1156 Credit: 8,453,729 RAC: 298	Message 827 - Posted: 23 Aug 2015, 9:07:10 UTC - in response to Message 825. Correct. You run "tasks", which have a 24-hour lifetime. Tasks download "jobs" to run -- lately these have been about 20 mins in duration (depending on your processor capability of course) with something like 10 mins between them to upload results (again, lately about 30 MB -- this is a limiting factor in how long jobs last as we don't want large data transfers) and download a new job. Currently I'm unable to create jobs so your tasks just spin their wheels for 24 hours. AFAIK we give credit for each task, regardless of how many jobs it ran -- I'm just taking the opportunity to test that empirically now. :-) ID: 827 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1279 Credit: 1,045,826 RAC: 136	Message 828 - Posted: 23 Aug 2015, 10:00:57 UTC - in response to Message 827. AFAIK we give credit for each task, regardless of how many jobs it ran -- I'm just taking the opportunity to test that empirically now. :-) Hi ivan, For the amount of BOINC-credits, it doesn't matter how many jobs you've done. It even doesn't matter how many CPU-seconds the task has used/wasted. Example: 86,100.64 seconds elapsed - 60,907.70 cpu-seconds used - Credits 676.54 85,750.98 seconds elapsed - 18,084.14 cpu-seconds used - Credits 672.23 Remark: The current 'idling' still means a cpu usage of ~18%. ID: 828 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 829 - Posted: 23 Aug 2015, 10:44:51 UTC - in response to Message 828. The whole credit system is totally usless. Some projects grant huge credits others next to none. For similare runtimes you can get very different amounts of credits (seti for example). As a means of determining the contribution to projects or the relative contribution within a Project is extremly innaccurate and inconsistent. I suggest to implement an other way of measuring the contribution to this Project. In this Project a count of units processed, would be in order. Not Boinc Tasks, but the internal unit of records,elements or what you want to call it. This would be far more meaningfull(within the Project) than boinc credits, which tell you nothing. ID: 829 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 17 Aug 15 Posts: 17 Credit: 228,358 RAC: 0	Message 830 - Posted: 23 Aug 2015, 13:18:42 UTC - in response to Message 829. I quite agree that the credit system does not serve much of a purpose, except to ensure that your PC is working. Isn't the Credit New system supposed to make it more equal between projects? That would suit me fine. The absolute number of the points is useless, as long as they are roughly comparable. The numbers handed out on a lot of projects should be reduced by two (or three) orders of magnitude anyway, to make the numbers less cumbersome to deal with. If some people leave, you might gain others with a better appreciation of what the project is really all about. ID: 830 · Rating: 0 · rate: / Reply Quote

tullio Send message Joined: 17 Aug 15 Posts: 62 Credit: 296,695 RAC: 0	Message 831 - Posted: 23 Aug 2015, 13:59:31 UTC - in response to Message 830. In vLHC@home there is MCPLOTS which counts the number of jobs you have run and the number of events (collisions) you have crunched. When you reach 1 billion events you are admitted to the Billionaires club. Tullio ID: 831 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1156 Credit: 8,453,729 RAC: 298	Message 832 - Posted: 23 Aug 2015, 17:35:17 UTC - in response to Message 831. In vLHC@home there is MCPLOTS which counts the number of jobs you have run and the number of events (collisions) you have crunched. When you reach 1 billion events you are admitted to the Billionaires club. Tullio I agree we should do something like that. I agree Credits should be meaningful. However, I think we're a long way from having the luxury of the time to sit down and work out something equitable. :-( By all means think about it, but the time for the discussion is not now, IMHO. Yes, I deliberately raised the point, but only as one of awareness (to forestall cries of, "I didn't know that!"). Now that we're aware, let's have mainly private cogitations until the time comes when it really matters. Cheers. ID: 832 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1159 Credit: 342,328 RAC: 0	Message 833 - Posted: 23 Aug 2015, 21:27:48 UTC - in response to Message 832. This is a topic that one message in a thread can not do it justice. We spend a lot of time thinking about this and have done since at least the 70s. The question is how do you measure work done or conversely the potential to do work. What we care about is normalised wall clock time. Measuring the time is easy, normalization is not so easy. Credit should we assigned whether the VM is running or not as we are consuming that potential to do work. How well we use that potential is a question of efficiency i.e, how much of that potential do we translate into work done. The can of worms here is what metric and method are used for the normalisation so as Ivan said, lets keep it closed for now. Efforts in related areas on this topic will most probably filter down to the area of Volunteer Computing so watch this space ... ID: 833 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1156 Credit: 8,453,729 RAC: 298	Message 836 - Posted: 25 Aug 2015, 15:04:25 UTC OK, you might notice some small jobs coming through if you've left a task idling. We can submit now, with the new version of CRAB, but we're testing the new stage-out chain. ID: 836 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1279 Credit: 1,045,826 RAC: 136	Message 837 - Posted: 25 Aug 2015, 16:04:14 UTC I don't think, this is the output you expect: cmsRun -j FrameworkJobReport.xml PSet.py %MSG-e Specific: OscarProducer:g4SimHits@beginRun 25-Aug-2015 18:01:52 CEST Run: 1 Could not process q-name of a DDLogicalPart, reason: No regex-match for namespace= name=HFFibre.* SpecPar selection is: //HFFibre.* %MSG ID: 837 · Rating: 0 · rate: / Reply Quote

Yeti Send message Joined: 29 May 15 Posts: 162 Credit: 3,378,875 RAC: 10,628	Message 838 - Posted: 25 Aug 2015, 16:12:00 UTC Don't know if this Shows a Problem or not: 08/25/15 17:59:25 (pid:18348) FILETRANSFER: "/home/boinc/CMSRun/glide_GDtCDO/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring 08/25/15 17:59:25 (pid:18348) FILETRANSFER: failed to add plugin "/home/boinc/CMSRun/glide_GDtCDO/main/condor/libexec/curl_plugin" because: FILETRANSFER:1:"/home/boinc/CMSRun/glide_GDtCDO/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring 08/25/15 17:59:25 (pid:18348) WARNING: Initializing plugins returned: FILETRANSFER:1:"/home/boinc/CMSRun/glide_GDtCDO/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring ID: 838 · Rating: 0 · rate: / Reply Quote

Yeti Send message Joined: 29 May 15 Posts: 162 Credit: 3,378,875 RAC: 10,628	Message 841 - Posted: 25 Aug 2015, 17:01:56 UTC - in response to Message 837. I don't think, this is the output you expect: cmsRun -j FrameworkJobReport.xml PSet.py %MSG-e Specific: OscarProducer:g4SimHits@beginRun 25-Aug-2015 18:01:52 CEST Run: 1 Could not process q-name of a DDLogicalPart, reason: No regex-match for namespace= name=HFFibre.* SpecPar selection is: //HFFibre.* %MSG See the same message (on ALT F5 screen) ID: 841 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1156 Credit: 8,453,729 RAC: 298	Message 844 - Posted: 25 Aug 2015, 20:16:35 UTC - in response to Message 837. I don't think, this is the output you expect: cmsRun -j FrameworkJobReport.xml PSet.py %MSG-e Specific: OscarProducer:g4SimHits@beginRun 25-Aug-2015 18:01:52 CEST Run: 1 Could not process q-name of a DDLogicalPart, reason: No regex-match for namespace= name=HFFibre.* SpecPar selection is: //HFFibre.* %MSG Surprisingly, that is an output I'd expect. This batch of jobs is simulating t/t-bar events in the tracker geometry we expect to install about 2025. There are obviously physical differences between the current geometry and what we'll have then, so this is saying there's a mismatch between what currently is there and what will be later. I've seen it many times before -- it's in a different part of the detector to what I'm involved with and will have no effect on the analysis I hope eventually to perform on these results. But thanks for noticing! ID: 844 · Rating: 0 · rate: / Reply Quote

Development for LHC@home