Thread 'No new jobs'

Author	Message
Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,852 RAC: 0	Message 3439 - Posted: 21 May 2016, 8:25:08 UTC I think that I have just fixed the problem but it might take some time to recover and offer up to date information. Appartently, it did not work. ID: 3439 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1142 Credit: 8,310,612 RAC: 0	Message 3462 - Posted: 21 May 2016, 21:50:35 UTC - in response to Message 3439. Last modified: 21 May 2016, 21:56:00 UTC I think that I have just fixed the problem but it might take some time to recover and offer up to date information. Appartently, it did not work. OK, let's give the Dashboard team (I won't mention names...) the benefit of the doubt. a) I just submitted a 10-job test batch -- all ten are seen by Dashboard. b) The graph of running jobs now matches the number I see on the Condor server. The headline number on our "Jobs page" is only for my CRAB3 jobs, the first graph includes Hassen's WMAGent jobs as well. Combined, they seem to match current truth. c) There hasn't been an "unknown" job reported for several hours. These are the ones that time-out after 24 hours and each one adds 86,400 seconds of failure walltime to our third graph. In most cases these jobs have finished successfully and delivered their results file to the data bridge, but Dashboard has lost track of them in the meantime. d) Dashboard has lost track of many other jobs during the kerfuffle; for example, I count 3185 successful jobs in the current batch[], DB says 2563, and only counts 2992 of the 10,000 jobs submitted. It remains to be seen how well DB recovers as time goes on, but I stress that as far as I can tell the problem is only that DB reporting went AWOL for some time, the project itself isn't losing jobs. [] And incidentally, 62 failures, most of which will be masked by Condor's "retry another two times" mechanism before being reported as Dashboard failures. ID: 3462 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,852 RAC: 0	Message 3463 - Posted: 21 May 2016, 22:18:00 UTC Thanks, Ivan. However, as dashboard really is only a very crude indicator, would you please post the correct figures at the end of a batch? ID: 3463 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1142 Credit: 8,310,612 RAC: 0	Message 3464 - Posted: 21 May 2016, 22:42:05 UTC - in response to Message 3463. Last modified: 21 May 2016, 22:42:52 UTC Thanks, Ivan. However, as dashboard really is only a very crude indicator, would you please post the correct figures at the end of a batch? OK, I have a script that archives the logs when I think a batch is finished. I can add a report that parses out the above statistics. This may become moot soon, however, as we are nearly at the point of moving to production jobs with WMAgent. At which point responsibility shifts elsewhere (not before time as far as my sleep patterns go...) but perhaps reporting will become more transparent, too. Anyway, remind me next Thu/Fri (at current rates) to start upgrading my archive scripts. :-) ID: 3464 · Rating: 0 · rate: / Reply Quote

Nimrod Send message Joined: 24 Apr 17 Posts: 6 Credit: 0 RAC: 0	Message 4896 - Posted: 6 May 2017, 11:53:03 UTC Running BOINC manager version 7.6.33, project states lhcathome-dev: Notice from server VirtualBox is not installed Is there anything wrong with my settings? LHC@Home runs properly, no tasks for LHCdev. Tried to find a solution on the forum but no luck (maybe just overlooked). ID: 4896 · Rating: 0 · rate: / Reply Quote

Development for LHC@home