Message boards : Theory Application : Problem started
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 753
Credit: 11,653,219
RAC: 3,828
Message 5002 - Posted: 15 Jun 2017, 16:49:28 UTC

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=343488

Early this morning this started happening.

They were working perfect for 24+ hours and then started doing this with the Credentials.

The ones that started hours ago are running but newer ones keep doing this so I suspended them all.

No problem like this with the LHC Theory tasks......just the Theory tasks here.
Mad Scientist For Life
ID: 5002 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,945,852
RAC: 0
Message 5003 - Posted: 15 Jun 2017, 17:13:01 UTC
Last modified: 15 Jun 2017, 17:15:34 UTC

Magic beat me to the post button 8¬)

Similarly, tasks that had already made the credentials connection before the problem started are getting new jobs but all new tasks are failing to initialise.
Good thing is that their all tidying up on their way out and not leaving any Vbox ghosts so looks like the ab7 wrapper has solved that problem.

Lots of Sixtrack from the main site to keep my machines busy.
ID: 5003 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,945,852
RAC: 0
Message 5004 - Posted: 15 Jun 2017, 18:30:49 UTC - in response to Message 5003.  
Last modified: 15 Jun 2017, 18:32:23 UTC

Aargh.
Send for the Grammar Police.
Just spotted an unforgiveable error in my post, just after the edit time limit expired.
"their" should obviously be "they're".
Somehow this one slipped past my OCD punctuation pedantry filter. Apologies to anyone else who might be offended by this.
ID: 5004 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 186
Message 5007 - Posted: 16 Jun 2017, 8:29:41 UTC - in response to Message 5002.  

Just your luck :) The credential service which has been running reliably for quite a while just fell over. Back up and running now.
ID: 5007 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 753
Credit: 11,653,219
RAC: 3,828
Message 5011 - Posted: 16 Jun 2017, 19:44:08 UTC - in response to Message 5007.  

Just your luck :) The credential service which has been running reliably for quite a while just fell over. Back up and running now.


Thats for sure!!

Well it did get all of my cores over at LHC for about 24 hours (maybe a few more since I didn't get up until after noon today)

So when all those tasks are finished I will get the vLHC-dev crew back to work here again
ID: 5011 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 753
Credit: 11,653,219
RAC: 3,828
Message 5204 - Posted: 16 Oct 2017, 22:42:14 UTC
Last modified: 16 Oct 2017, 22:43:23 UTC

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=365415

For some reason one of mine will run Valids and then several of these.

Runs newest VB and Boinc

And of course I always check the VB Manager for anything there.

(AND I am not aborting these either)
Mad Scientist For Life
ID: 5204 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 753
Credit: 11,653,219
RAC: 3,828
Message 5208 - Posted: 20 Oct 2017, 3:04:30 UTC

ID: 5208 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 753
Credit: 11,653,219
RAC: 3,828
Message 5344 - Posted: 22 Jan 2018, 15:32:42 UTC
Last modified: 22 Jan 2018, 15:57:47 UTC

I started 6 Theory multi tasks (2-core) a few hours ago and decided to take a look since my high-speed internet slows down in about 30 minutes and sure enough ALL 6 failed to start running.with DC_NOP failed! among other things.

I have to hurry and try to start up 6 new ones while I can have the speed to get them running but I think earlier I saw the same problem over at LHC.........I think it was all the VB tasks or maybe Theory and CMS so I will go back and take a look as soon as I try to get these 6 tasks running again.

EDIT: I guess it still is having problems and now getting beyond HTCondor Ping and then failing with VM Completion Message: Condor exited after 760s without running a job.
So it looks like these are not going to work right now and I will have to switch to Sixtracks for a while.
Mad Scientist For Life
ID: 5344 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Theory Application : Problem started


©2024 CERN