Message boards : Number crunching : issue of the day
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11

AuthorMessage
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 655
Credit: 10,929,747
RAC: 1,604
Message 5118 - Posted: 10 Sep 2017, 3:49:20 UTC

For some reason today one of my 8-core pc's decided to keep running tasks that claim to be 8-core multi-tasks (I ran some of those when multi-core tasks first started) so it will only run one task at a time but when it is finished the stderr says they are just 2-core tasks.

I tried to reboot and get it to start another task but it is still saying it is 8-core

The other two matching 8-core computers next to it is running 2-core tasks so they are able to run several tasks at the same time as they are supposed to be doing (and the server says there are no more Theory tasks available)

So I was hoping to use all the extra cores to run Theory tasks over at LHC too.
ID: 5118 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 601
Credit: 1,452,157
RAC: 1,353
Message 5119 - Posted: 10 Sep 2017, 5:45:38 UTC
Last modified: 10 Sep 2017, 5:47:59 UTC

Hi Magic,

LHC-dev Preferences max# Jobs and max# CPUs.
What are the parameters of them?

For me max# Jobs = 2 and max# CPUs # = 1.
ID: 5119 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 655
Credit: 10,929,747
RAC: 1,604
Message 5120 - Posted: 10 Sep 2017, 23:21:48 UTC - in response to Message 5119.  

Hi Axel.....I'll send you a pm later.....busy day watching NFL games
ID: 5120 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 655
Credit: 10,929,747
RAC: 1,604
Message 5195 - Posted: 12 Oct 2017, 14:20:35 UTC

New Problems today

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=364887

Guest Log: [DEBUG] DC_NOP failed!

And https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=364887

Long list of Credential problems
ID: 5195 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 655
Credit: 10,929,747
RAC: 1,604
Message 5213 - Posted: 23 Oct 2017, 6:14:25 UTC

DAMN IT

No matter how many times I check my 9 computers and try to keep them all running new tasks and THEN set them to NOT allow new tasks..........and I have a nice long list of Valids.......than I go to check and THREE of mine that I run here for some reason are NOT set to stop them from getting new tasks until I actually do it myself when I know the internet speed is fast enough to start these VB tasks.

As usual I see 12 Error while computing because they finished todays and then over and over running each of them for 25 minutes until they give up and do it again.

All of them did eventually get 6 more to start running but at 2am I will have to just suspend them until 7am and hope they will start back up (and I have had plenty of bad luck trying that)

I am not a morning person so I will start them all up again and all the other ones I run here and LHC and back to bed for a few hours but if these don't start back up that will just mean 6 more Errors to add to that 12 and then have to wait until the next morning at 7am to actually start them up the best way possible.

(I'm not asking for any suggestions but just figured it would be better to type than to toss a couple computers out the window)

BUT they are all set to not get any new tasks for sure right now and they better start back up in the morning since I have to suspend them in about 2 hours 45 minutes.

https://youtu.be/8CtjhWhw2I8
ID: 5213 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 655
Credit: 10,929,747
RAC: 1,604
Message 5214 - Posted: 23 Oct 2017, 22:22:42 UTC

https://lhcathomedev.cern.ch/lhcathome-dev/results.php?userid=192

Well it looks like those 6 restarted are going to be finished Valid....... a bit shorter runs than I usually get.

And for the first time since I started doing this the way I do do get the fastest speed and get up at 7am.......well I didn't wake up like I usually do so all I could do is restart those 6 tasks I already started but not start the other ones or the 30 tasks for LHC.

So the computers get to sit and wait until 7am tomorrow.....hope the wife will wake me up if I don't wake up myself.

I tend to stay awake until 3 or 4am so that isn't easy but I am used to doing that.

BUT I guarantee EVERY core will be running tasks by 7:30am

(as always my Einstein GPU's run non-stop and only uses the internet to return the finished tasks)
ID: 5214 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 655
Credit: 10,929,747
RAC: 1,604
Message 5236 - Posted: 3 Nov 2017, 7:48:33 UTC

ID: 5236 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 655
Credit: 10,929,747
RAC: 1,604
Message 5260 - Posted: 3 Dec 2017, 7:19:57 UTC
Last modified: 3 Dec 2017, 7:21:01 UTC

This is always the problem with VB tasks.



Getting the Cern server to allow me to just start running the tasks.

Nothing to do with d/ling data blocks (I always watch it happen)

You would think Cern would realize by now it is me every time and just hand over the credentials. (security)

(yeah that will never happen)
ID: 5260 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 655
Credit: 10,929,747
RAC: 1,604
Message 5261 - Posted: 8 Dec 2017, 18:20:07 UTC

Welcome back to life LHC-dev

Just in time for me to see how many of my computers can't run VB tasks after doing the Windows 10 Update KB4041994 and KB4051963 after hours of trying to get this Fall version 1709 to work.

Three of my 8-cores worked the first time and one refuses to so far (the one with the most memory) and a couple others I use here (quad and 3-core)

Only problem is when it doesn't work it gives me a 5 second computer error and so I have to abort them.

Here is an example on the one I am on right now
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=372085

And it is taking its time Initializing...

Guess I'll go check that 8-core and hope for the best and then the old 3-core..........funny how VB is always the program that refuses to work but no problem if I fired up the GPU cards for Einstein.
ID: 5261 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 655
Credit: 10,929,747
RAC: 1,604
Message 5328 - Posted: 13 Jan 2018, 19:19:16 UTC

Well the server is back to running my d/l's like a snail on a cold day climbing up a flag pole.

I do speed tests on my end and can up and download at full speed unless it is to get tasks here. 50Kbps right now just for a single task taking an hour so far to get to 75%

Then the tasks (Atlas and Theory) run to Valid in less time than it takes to d/l and sent back in a couple seconds.

I decided to give up on the Atlas since I have run plenty of them here and over there ----------->

Right now I wanted to just switch back to Theory on this 8-core but I also aded CMS and as usual I have to d/l the newest .vdi for that and the d/l speed is about 45Kbps........THAT will take hours.

I could d/l 200 Einstein GPU's or 200 Sixtracks in about 3 minutes but not these here (just like the alpha's)

This is still the first 12 hours of my new monthly isp contract which means full speed until I use it all up and only VB tasks and vdi's use that all up after a week if I run them on all 9 computers.

Of course Sixtrack and GPU tasks don't even need to be connected to the internet once I have the tasks........I just let them run and turn them in when I have lots of completes.

For my last Atlas task it is almost finished d/ling ONE task and it will be about 1 hour 15 minutes just to do that and then will run to a Valid task faster than that total time..

So I will finally get to start an Atlas task and then wait 5 hours to get this CMS vdi and then the tasks.

I use a satellite dish for everything here but I have 4 of my computers off just so I can get more done with the 32 cores I have running Sixtracks and GPU's at Einstein without using the internet to run those.

Can't get much done here lately and in the past I did quite a lot every day.
ID: 5328 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 655
Credit: 10,929,747
RAC: 1,604
Message 5335 - Posted: 18 Jan 2018, 1:50:38 UTC

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=380436

Well I just got 2 of these and the good thing is they just ran 16 minutes before the server fell asleep again

DC_NOP failed!

2018-01-17 17:24:45 (7632): Guest Log: AUTHENTICATE:1006:exceeded 1516238684 deadline during authentication

2018-01-17 17:24:45 (7632): Guest Log: AUTHENTICATE:1004:Failed to authenticate using GSI

2018-01-17 17:24:45 (7632): Guest Log: GSI:5002:Failed to authenticate because the remote (server) side was not able to acquire its credentials.
ID: 5335 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 655
Credit: 10,929,747
RAC: 1,604
Message 5356 - Posted: 23 Feb 2018, 1:10:06 UTC


ID: 5356 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
marmot

Send message
Joined: 29 Apr 19
Posts: 10
Credit: 109,352
RAC: 0
Message 6317 - Posted: 2 May 2019, 4:18:32 UTC

Did the 2nd one return after the due date but within a grace period?
Also, are my fine because they are using older version of VBox?

This result is fine when returning a completed on the 3rd attempt on being sent out:
--------------------------------------------------------------------------------------------------------------------






While this one is failed with too many errors upon receiving a completed, good result:
-------------------------------------------------------------------------------------------------------------------


ID: 6317 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 8 · 9 · 10 · 11

Message boards : Number crunching : issue of the day


©2023 CERN