Message boards :
News :
No new jobs
Message board moderation
Previous · 1 . . . 10 · 11 · 12 · 13
Author | Message |
---|---|
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I think that I have just fixed the problem but it might take some time to Appartently, it did not work. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,181,211 RAC: 2,023 |
I think that I have just fixed the problem but it might take some time to OK, let's give the Dashboard team (I won't mention names...) the benefit of the doubt. a) I just submitted a 10-job test batch -- all ten are seen by Dashboard. b) The graph of running jobs now matches the number I see on the Condor server. The headline number on our "Jobs page" is only for my CRAB3 jobs, the first graph includes Hassen's WMAGent jobs as well. Combined, they seem to match current truth. c) There hasn't been an "unknown" job reported for several hours. These are the ones that time-out after 24 hours and each one adds 86,400 seconds of failure walltime to our third graph. In most cases these jobs have finished successfully and delivered their results file to the data bridge, but Dashboard has lost track of them in the meantime. d) Dashboard has lost track of many other jobs during the kerfuffle; for example, I count 3185 successful jobs in the current batch[*], DB says 2563, and only counts 2992 of the 10,000 jobs submitted. It remains to be seen how well DB recovers as time goes on, but I stress that as far as I can tell the problem is only that DB reporting went AWOL for some time, the project itself isn't losing jobs. [*] And incidentally, 62 failures, most of which will be masked by Condor's "retry another two times" mechanism before being reported as Dashboard failures. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Thanks, Ivan. However, as dashboard really is only a very crude indicator, would you please post the correct figures at the end of a batch? |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,181,211 RAC: 2,023 |
Thanks, Ivan. OK, I have a script that archives the logs when I think a batch is finished. I can add a report that parses out the above statistics. This may become moot soon, however, as we are nearly at the point of moving to production jobs with WMAgent. At which point responsibility shifts elsewhere (not before time as far as my sleep patterns go...) but perhaps reporting will become more transparent, too. Anyway, remind me next Thu/Fri (at current rates) to start upgrading my archive scripts. :-) |
Send message Joined: 24 Apr 17 Posts: 6 Credit: 0 RAC: 0 |
Running BOINC manager version 7.6.33, project states lhcathome-dev: Notice from server VirtualBox is not installed Is there anything wrong with my settings? LHC@Home runs properly, no tasks for LHCdev. Tried to find a solution on the forum but no luck (maybe just overlooked). |
©2024 CERN