Message boards :
News :
No new jobs
Message board moderation
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · Next
Author | Message |
---|---|
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Thanks, Ivan! |
Send message Joined: 12 Sep 14 Posts: 65 Credit: 544 RAC: 0 |
That appears to have been fixed. I sent Laurence et al. an email early this morning -- I'd had a late night... Actually some CMS tasks are going out on vLHCathome, but we have a server configuration problem over there which we will look at after the weekend. For example I get only CMS tasks and no Theory Simulation tasks myself... Patience please! |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,939,683 RAC: 3,233 |
Just a gentle warning that I might not be as responsive as usual for the next three days. I'm off to Scotland for a meeting of the UK LHC Grid Computing community. Sponsored by Dell, so the Dinner on Tuesday night should be memorable. :-) I'll also be providing drinks on your behalf to colleagues at RAL and Imperial College who have been very helpful to our project in the last weeks. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Have a good time. One(two) last questions:Why are the last 5 remaining jobs from the previous batch not finished? And why are there allways jobs left running forever, when a batch ends? |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,939,683 RAC: 3,233 |
Have a good time. Best I can say is that they are jobs that have got "lost" somehow. Perhaps they have fallen into some "corner case" bug in the software that hasn't been recognised yet due to its rarity. Understand that a 99.5% success rate is pretty d*** good for this type of computing. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
To keep the good run going, it would actually be nice, if T4T put some more CMS tasks on. I would really suggest to allow a second (or third) CMS task on this project, as T4T is not reliable. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
We are nearly out of ALL types of boinc-tasks. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Would someone please post here, when the credentials are working again? |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,939,683 RAC: 3,233 |
Would someone please post here, when the credentials are working again? OK, it looks like we are going again. I'll post an amusing screenshot later... |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,939,683 RAC: 3,233 |
Would someone please post here, when the credentials are working again? OK, proof we're up again, and proof that there are dangers of fixed-pixel console windows on modern 4K monitors! |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,939,683 RAC: 3,233 |
DONT PANIC! (With apologies to The Hitch-hiker's Guide to the Galaxy...) I've submitted a new batch of CMS jobs (plus a small test job in-between) which are at the Condor server at RAL, but aren't yet showing up under my name on Dashboard. In fact, Dashboard is showing more pending jobs than there are in actuality (293 vs. 85) so it looks like something's stuck. So, I don't think we'll run out of CRAB jobs, but until Dashboard (or communications from RAL to Dashboard) gets its act together, we may not be able to monitor progress. [Edit@2241] I see gaps now in the "finished jobs" plots, so there's definitely a problem in Dashboard somewhere. [/Edit] |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
There is a batch of 316.Yet there are jobs running with job numbers >316.(like 350) edit:now it is 323.It is still strage, the it processes jobs,that have not been submitted. How is that possible? Dashboard must be screwed up bigtime. |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,939,683 RAC: 3,233 |
Dashboard must be screwed up bigtime. Yup! There are 10,000 jobs in the latest batch, and 10 in an intermediate test batch. [cms005@lcggwms02:~] > condor_q -const 'CRAB_ReqName=?="160509_200134:ireid_crab_CMS_at_Home_TTbar_50ev_prodA"' 2 jobs; 0 completed, 0 removed, 0 idle, 2 running, 0 held, 0 suspended [cms005@lcggwms02:~] > condor_q -const 'CRAB_ReqName=?="160518_201921:ireid_crab_CMS_at_Home_TTbar_50ev_testA"' 6 jobs; 0 completed, 0 removed, 5 idle, 1 running, 0 held, 0 suspended [cms005@lcggwms02:~] > condor_q -const 'CRAB_ReqName=?="160518_203523:ireid_crab_CMS_at_Home_TTbar_50ev_prodB"' 1092 jobs; 0 completed, 0 removed, 1002 idle, 90 running, 0 held, 0 suspended Those are just the live queues; no count of anything finished nor anything in the "pre-"queue. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Any progress on fixing dashboard? Or, at least, is someone in charge actually informed about the malfunctioning? |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,939,683 RAC: 3,233 |
Any progress on fixing dashboard? I've sent a message to a former student who I think is still working on Dashboard. I'll try to see if anyone else is affected -- I can't find a discussion group devoted to it. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Thanks, Ivan. |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,939,683 RAC: 3,233 |
I did find a discussion group -- seems the search facility in Hypernews doesn't work very well. No-one has made any mention of problems, so I'll pop in a query there too. |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,939,683 RAC: 3,233 |
Looks like it's probably not just us. I got this reply back: We are experiencing some issues lately with the MonALISA collectors that collect the job monitoring updates and we are working on them. Sorry for the inconvenience this might have caused. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Thanks for the info. Can you tell, if the jobs still appear to go trough properly? |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,939,683 RAC: 3,233 |
Thanks for the info. It gets better: Hi Ivan, I think that I have just fixed the problem but it might take some time to recover and offer up to date information. Can you tell, if the jobs still appear to go trough properly? Yes, job logs are coming in & output files are in the data bridge. There are ~175 jobs active on the Condor server, ~90 CRAB3 and the rest WMAgent. |
©2024 CERN