Message boards : News : No new jobs
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · Next

AuthorMessage
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2627 - Posted: 10 Apr 2016, 11:02:16 UTC - in response to Message 2626.  

Thanks, Ivan!
ID: 2627 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ben Segal
Volunteer moderator
Volunteer developer
Volunteer tester

Send message
Joined: 12 Sep 14
Posts: 65
Credit: 544
RAC: 0
Message 2629 - Posted: 10 Apr 2016, 14:42:50 UTC - in response to Message 2626.  

That appears to have been fixed. I sent Laurence et al. an email early this morning -- I'd had a late night...

Actually some CMS tasks are going out on vLHCathome, but we have a server configuration problem over there which we will look at after the weekend. For example I get only CMS tasks and no Theory Simulation tasks myself...

Patience please!
ID: 2629 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,939,683
RAC: 3,233
Message 2630 - Posted: 10 Apr 2016, 19:04:56 UTC - in response to Message 2627.  

Just a gentle warning that I might not be as responsive as usual for the next three days. I'm off to Scotland for a meeting of the UK LHC Grid Computing community. Sponsored by Dell, so the Dinner on Tuesday night should be memorable. :-) I'll also be providing drinks on your behalf to colleagues at RAL and Imperial College who have been very helpful to our project in the last weeks.
ID: 2630 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2631 - Posted: 10 Apr 2016, 19:41:53 UTC

Have a good time.
One(two) last questions:Why are the last 5 remaining jobs from the previous batch not finished?
And why are there allways jobs left running forever, when a batch ends?
ID: 2631 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,939,683
RAC: 3,233
Message 2632 - Posted: 11 Apr 2016, 7:06:06 UTC - in response to Message 2631.  

Have a good time.
One(two) last questions:Why are the last 5 remaining jobs from the previous batch not finished?
And why are there allways jobs left running forever, when a batch ends?

Best I can say is that they are jobs that have got "lost" somehow. Perhaps they have fallen into some "corner case" bug in the software that hasn't been recognised yet due to its rarity. Understand that a 99.5% success rate is pretty d*** good for this type of computing.
ID: 2632 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2633 - Posted: 11 Apr 2016, 11:32:07 UTC - in response to Message 2632.  
Last modified: 11 Apr 2016, 11:32:33 UTC

To keep the good run going, it would actually be nice, if T4T put some more CMS tasks on.

I would really suggest to allow a second (or third) CMS task on this project, as T4T is not reliable.
ID: 2633 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2881 - Posted: 21 Apr 2016, 9:32:22 UTC

We are nearly out of ALL types of boinc-tasks.
ID: 2881 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3303 - Posted: 10 May 2016, 10:17:55 UTC

Would someone please post here, when the credentials are working again?
ID: 3303 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,939,683
RAC: 3,233
Message 3305 - Posted: 10 May 2016, 21:32:37 UTC - in response to Message 3303.  

Would someone please post here, when the credentials are working again?

OK, it looks like we are going again. I'll post an amusing screenshot later...
ID: 3305 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,939,683
RAC: 3,233
Message 3307 - Posted: 10 May 2016, 21:54:24 UTC - in response to Message 3305.  

Would someone please post here, when the credentials are working again?

OK, it looks like we are going again. I'll post an amusing screenshot later...

OK, proof we're up again, and proof that there are dangers of fixed-pixel console windows on modern 4K monitors!
ID: 3307 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,939,683
RAC: 3,233
Message 3388 - Posted: 18 May 2016, 22:30:15 UTC
Last modified: 18 May 2016, 22:42:04 UTC

DONT PANIC!
(With apologies to The Hitch-hiker's Guide to the Galaxy...)
I've submitted a new batch of CMS jobs (plus a small test job in-between) which are at the Condor server at RAL, but aren't yet showing up under my name on Dashboard. In fact, Dashboard is showing more pending jobs than there are in actuality (293 vs. 85) so it looks like something's stuck. So, I don't think we'll run out of CRAB jobs, but until Dashboard (or communications from RAL to Dashboard) gets its act together, we may not be able to monitor progress.

[Edit@2241] I see gaps now in the "finished jobs" plots, so there's definitely a problem in Dashboard somewhere. [/Edit]
ID: 3388 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3389 - Posted: 19 May 2016, 8:15:59 UTC - in response to Message 3388.  
Last modified: 19 May 2016, 8:20:02 UTC

There is a batch of 316.Yet there are jobs running with job numbers >316.(like 350)
edit:now it is 323.It is still strage, the it processes jobs,that have not been submitted.

How is that possible?
Dashboard must be screwed up bigtime.
ID: 3389 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,939,683
RAC: 3,233
Message 3392 - Posted: 19 May 2016, 10:01:42 UTC - in response to Message 3389.  
Last modified: 19 May 2016, 10:11:52 UTC

Dashboard must be screwed up bigtime.

Yup! There are 10,000 jobs in the latest batch, and 10 in an intermediate test batch.

[cms005@lcggwms02:~] > condor_q -const 'CRAB_ReqName=?="160509_200134:ireid_crab_CMS_at_Home_TTbar_50ev_prodA"'
2 jobs; 0 completed, 0 removed, 0 idle, 2 running, 0 held, 0 suspended

[cms005@lcggwms02:~] > condor_q -const 'CRAB_ReqName=?="160518_201921:ireid_crab_CMS_at_Home_TTbar_50ev_testA"'
6 jobs; 0 completed, 0 removed, 5 idle, 1 running, 0 held, 0 suspended

[cms005@lcggwms02:~] > condor_q -const 'CRAB_ReqName=?="160518_203523:ireid_crab_CMS_at_Home_TTbar_50ev_prodB"'
1092 jobs; 0 completed, 0 removed, 1002 idle, 90 running, 0 held, 0 suspended


Those are just the live queues; no count of anything finished nor anything in the "pre-"queue.
ID: 3392 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3422 - Posted: 20 May 2016, 11:50:51 UTC - in response to Message 3392.  

Any progress on fixing dashboard?
Or, at least, is someone in charge actually informed about the malfunctioning?
ID: 3422 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,939,683
RAC: 3,233
Message 3426 - Posted: 20 May 2016, 14:57:01 UTC - in response to Message 3422.  

Any progress on fixing dashboard?
Or, at least, is someone in charge actually informed about the malfunctioning?

I've sent a message to a former student who I think is still working on Dashboard. I'll try to see if anyone else is affected -- I can't find a discussion group devoted to it.
ID: 3426 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3427 - Posted: 20 May 2016, 15:06:39 UTC - in response to Message 3426.  

Thanks, Ivan.
ID: 3427 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,939,683
RAC: 3,233
Message 3428 - Posted: 20 May 2016, 15:08:52 UTC - in response to Message 3427.  

I did find a discussion group -- seems the search facility in Hypernews doesn't work very well. No-one has made any mention of problems, so I'll pop in a query there too.
ID: 3428 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,939,683
RAC: 3,233
Message 3429 - Posted: 20 May 2016, 15:32:20 UTC - in response to Message 3428.  

Looks like it's probably not just us. I got this reply back:

We are experiencing some issues lately with the MonALISA collectors that collect the job monitoring updates and we are working on them.
Sorry for the inconvenience this might have caused.

ID: 3429 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3430 - Posted: 20 May 2016, 15:38:05 UTC - in response to Message 3429.  

Thanks for the info.
Can you tell, if the jobs still appear to go trough properly?
ID: 3430 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,939,683
RAC: 3,233
Message 3431 - Posted: 20 May 2016, 16:25:20 UTC - in response to Message 3430.  

Thanks for the info.

It gets better:

Hi Ivan,

I think that I have just fixed the problem but it might take some time to
recover and offer up to date information.


Can you tell, if the jobs still appear to go trough properly?


Yes, job logs are coming in & output files are in the data bridge. There are ~175 jobs active on the Condor server, ~90 CRAB3 and the rest WMAgent.
ID: 3431 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · Next

Message boards : News : No new jobs


©2024 CERN