Message boards : News : No new jobs
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 10 · 11 · 12 · 13

AuthorMessage
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3439 - Posted: 21 May 2016, 8:25:08 UTC

I think that I have just fixed the problem but it might take some time to
recover and offer up to date information.


Appartently, it did not work.
ID: 3439 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1139
Credit: 8,181,211
RAC: 2,023
Message 3462 - Posted: 21 May 2016, 21:50:35 UTC - in response to Message 3439.  
Last modified: 21 May 2016, 21:56:00 UTC

I think that I have just fixed the problem but it might take some time to
recover and offer up to date information.


Appartently, it did not work.

OK, let's give the Dashboard team (I won't mention names...) the benefit of the doubt.

a) I just submitted a 10-job test batch -- all ten are seen by Dashboard.
b) The graph of running jobs now matches the number I see on the Condor server. The headline number on our "Jobs page" is only for my CRAB3 jobs, the first graph includes Hassen's WMAGent jobs as well. Combined, they seem to match current truth.
c) There hasn't been an "unknown" job reported for several hours. These are the ones that time-out after 24 hours and each one adds 86,400 seconds of failure walltime to our third graph. In most cases these jobs have finished successfully and delivered their results file to the data bridge, but Dashboard has lost track of them in the meantime.
d) Dashboard has lost track of many other jobs during the kerfuffle; for example, I count 3185 successful jobs in the current batch[*], DB says 2563, and only counts 2992 of the 10,000 jobs submitted. It remains to be seen how well DB recovers as time goes on, but I stress that as far as I can tell the problem is only that DB reporting went AWOL for some time, the project itself isn't losing jobs.

[*] And incidentally, 62 failures, most of which will be masked by Condor's "retry another two times" mechanism before being reported as Dashboard failures.
ID: 3462 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3463 - Posted: 21 May 2016, 22:18:00 UTC

Thanks, Ivan.

However, as dashboard really is only a very crude indicator, would you please post the correct figures at the end of a batch?
ID: 3463 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1139
Credit: 8,181,211
RAC: 2,023
Message 3464 - Posted: 21 May 2016, 22:42:05 UTC - in response to Message 3463.  
Last modified: 21 May 2016, 22:42:52 UTC

Thanks, Ivan.

However, as dashboard really is only a very crude indicator, would you please post the correct figures at the end of a batch?

OK, I have a script that archives the logs when I think a batch is finished. I can add a report that parses out the above statistics. This may become moot soon, however, as we are nearly at the point of moving to production jobs with WMAgent. At which point responsibility shifts elsewhere (not before time as far as my sleep patterns go...) but perhaps reporting will become more transparent, too.
Anyway, remind me next Thu/Fri (at current rates) to start upgrading my archive scripts. :-)
ID: 3464 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nimrod

Send message
Joined: 24 Apr 17
Posts: 6
Credit: 0
RAC: 0
Message 4896 - Posted: 6 May 2017, 11:53:03 UTC

Running BOINC manager version 7.6.33, project states lhcathome-dev: Notice from server
VirtualBox is not installed

Is there anything wrong with my settings? LHC@Home runs properly, no tasks for LHCdev. Tried to find a solution on the forum but no luck (maybe just overlooked).
ID: 4896 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 10 · 11 · 12 · 13

Message boards : News : No new jobs


©2024 CERN