Message boards : Theory Application : Theory v.5.20
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 771
Credit: 11,914,895
RAC: 5,203
Message 6972 - Posted: 4 Feb 2020, 20:27:15 UTC



https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2867832

Thought I would give these a try instead of just trying to get a lucky CMS to start running and as usual I can never trust a VB task without over 1.5Mbps
No Luck again.....I need Quantum ISP before Quantum Computing.
ID: 6972 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 771
Credit: 11,914,895
RAC: 5,203
Message 6991 - Posted: 13 Feb 2020, 16:35:02 UTC


Once again VB can't be trusted.

My connection is running at 25Mbps right now and I started up 8 of these and then on my pc with the most Ram started one task and then one at a time the next 3 Failed *Getting time from pool*

Which in the case of Theory tasks usually means it will run for hours but end up Invalid (but since I watch each one start I just abort it and try again)
ID: 6991 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 771
Credit: 11,914,895
RAC: 5,203
Message 6992 - Posted: 15 Feb 2020, 11:30:02 UTC - in response to Message 6991.  


Once again VB can't be trusted.

My connection is running at 25Mbps right now and I started up 8 of these and then on my pc with the most Ram started one task and then one at a time the next 3 Failed *Getting time from pool*

Which in the case of Theory tasks usually means it will run for hours but end up Invalid (but since I watch each one start I just abort it and try again)


Once again I have to watch each task start (26 tonight at 3am)

Several did that same thing Failed *Getting time from pool*

That is seen in the first 5 seconds on the VM Console yet they will continue loading and then run for hours and hours only to end up computer error (server error is more like it)
And even if this problem doesn't happen it can still get a Failed when it happens on the last page of the VM Console just before they start running like this


Instead of the way it has to start


And members always wonder why they can't just set the computer to run these on auto and then wonder why they run for hours and yet crash instead of just being a longer Valid task (which I have no problem with ever)

They need to be set at the server to abort if they ever have that Failed at the start from that *Getting time from pool* or at the last page of the VM Console start where that other FAILED can be (that one is not in red text)

I already know that is the CVMFS config that is the problem there but it is one that mainly happens if your internet speed is slowed down by using a room full of computers ( I only have 9 running now since I had a power problem with an 8 core I had used here for 7 years LHC and T4T and Atlas-alpha testing)
BUT I finally have all 26 running and will check again at 8am since the internet speed will be throttled then so I will have to suspend the new work and do something else (have been lucky getting Sixtracks lately)

(can't use WAN on the Dish)
ID: 6992 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 771
Credit: 11,914,895
RAC: 5,203
Message 6994 - Posted: 27 Feb 2020, 19:23:47 UTC

Well I think I set a record for the longest running pair of Theory tasks that got finished for no credits.

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2865631

10 days I let them run since there is nothing else to do.....server must have figured another UOTD would be better than credits
ID: 6994 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 674
Credit: 1,931,571
RAC: 298
Message 6995 - Posted: 27 Feb 2020, 23:23:47 UTC

[boinc ee zhad 206 - - sherpa 1.2.3 default 100000 16]
2020-02-27 09:31:02 (3756): Status Report: Elapsed Time: '864000.751564'
2020-02-27 09:31:02 (3756): Status Report: CPU Time: '858299.671875'
This can be a worldrecord in -dev ;-)
ID: 6995 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 771
Credit: 11,914,895
RAC: 5,203
Message 6996 - Posted: 27 Feb 2020, 23:39:47 UTC - in response to Message 6995.  

[boinc ee zhad 206 - - sherpa 1.2.3 default 100000 16]
2020-02-27 09:31:02 (3756): Status Report: Elapsed Time: '864000.751564'
2020-02-27 09:31:02 (3756): Status Report: CPU Time: '858299.671875'
This can be a worldrecord in -dev ;-)




Yes Axel I have had long tasks over the years but I think that is my longest one!
( I will send you a pm reply when I get a chance.....just got home from the shopping trip the wife made me go on)

I have run quite a lot of Theory tasks over the years.
Mad Scientist For Life
ID: 6996 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 854,498
RAC: 83
Message 6997 - Posted: 28 Feb 2020, 10:40:55 UTC - in response to Message 6996.  

[ee zhad 206 - - sherpa 1.2.3 default 100000 16]
It's on my list to abort, when such a job arrives on my 2 hosts where I'm monitoring the incoming Theory-jobs.
ID: 6997 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 771
Credit: 11,914,895
RAC: 5,203
Message 7001 - Posted: 22 Mar 2020, 3:47:36 UTC - in response to Message 6997.  
Last modified: 22 Mar 2020, 4:00:42 UTC

[ee zhad 206 - - sherpa 1.2.3 default 100000 16]
It's on my list to abort, when such a job arrives on my 2 hosts where I'm monitoring the incoming Theory-jobs.


As I mentioned over at LHC I always abort the sherpa tasks and I monitor EVERY Theory task right at the start in the first 3 minutes or less it is seen in the VM Console so there is no reason to let them run longer than that if it is a sherpa.

I only ran that long one to see how long it would run before crashing since I was running Sixtracks while waiting for my new month of high-speed (like I am running here right now on 20 cores) and aborting all of the sherpa's....and always hope I don't get them since every 3 minutes is still a waste of time to me.

Of course the average VB runner over at LHC would rather let them run on their own without watching each one start via VM Console.

I guess I should add I do know that sherpa's can run Valids in over 50 hours on a Linux OS and many of the 5 hour Valids.....I guess I could run them on a Window OS every time just to see if I get any to run Valids but I'm not wasting any high-speed internet on that and will leave that to somebody who never has isp throttle-downs.
ID: 7001 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 771
Credit: 11,914,895
RAC: 5,203
Message 7002 - Posted: 27 Mar 2020, 0:56:14 UTC

The ^%# Cern server is misbehaving right now with these Theory-Alice tasks getting started.

I am running at 3Mbps on this end but I keep getting these.


So I got a couple to start running on time (longer than 3 minutes and these will fail which is ridiculous)

I guess I will just wait until 2am when I have full speed and maybe the server will do its job.
Mad Scientist For Life
ID: 7002 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 771
Credit: 11,914,895
RAC: 5,203
Message 7003 - Posted: 28 Mar 2020, 10:16:45 UTC
Last modified: 28 Mar 2020, 10:20:18 UTC

What a disaster tonight has been

https://lhcathomedev.cern.ch/lhcathome-dev/results.php?userid=192 look at all the FAILING Theory tasks.

It is between 2am and 3am when I have high-speed on the Dish and yet these Fail over and over.........I guess I will fire up the Six Tracks and get off the___ computers since computer errors are something I do not like.....even when just testing.( I see the ones on Linux right now are running and I don't know of any running Windows that can be checked)

I did waste an hour talking to Hughes Satellite.....they spend more time asking for all my info that doing anything
ID: 7003 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 771
Credit: 11,914,895
RAC: 5,203
Message 7004 - Posted: 31 Mar 2020, 8:54:52 UTC

Well last night they all worked perfect BUT once again tonight running at high-speed I get some to start running or right t\at the end I have got several of these again.


These can not be trusted or should I say the Cern server can't be trusted.

I did get a *vincia* for the first time and it u\is one that is actually running right now.[/url]

Helicity antenna showers for hadron colliders
ID: 7004 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 771
Credit: 11,914,895
RAC: 5,203
Message 7006 - Posted: 1 Apr 2020, 5:40:07 UTC

Today I had many of them running as Valid and I just found some that had been running normal for several hours but just now I take a look at them after running for over 9 hours and the VM Console now says this.....



Five of them and still running 9.5 hours later and these were pythia
ID: 7006 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 771
Credit: 11,914,895
RAC: 5,203
Message 7010 - Posted: 11 Apr 2020, 2:32:58 UTC

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2891032
Well imagine that....a *Sherpa* that was Valid.

A short one too.
ID: 7010 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 674
Credit: 1,931,571
RAC: 298
Message 7011 - Posted: 13 Apr 2020, 9:57:50 UTC
Last modified: 13 Apr 2020, 9:59:42 UTC

You have a Error-rate from above 70 %.
http://mcplots-dev.cern.ch/production.php?view=user&system=2&userid=192
So, this Sherpa 2.2.7 was a Run with Status-Code for crunky = 1.
Don't know if it is a correct Sherpa.
ID: 7011 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 771
Credit: 11,914,895
RAC: 5,203
Message 7012 - Posted: 13 Apr 2020, 19:01:27 UTC - in response to Message 7011.  

those are not actually "errors" but are Aborts

Check my account tasks here instead of that page since it doesn't know the difference.
Mad Scientist For Life
ID: 7012 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 771
Credit: 11,914,895
RAC: 5,203
Message 7013 - Posted: 14 Apr 2020, 1:05:25 UTC
Last modified: 14 Apr 2020, 1:05:40 UTC

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2892376

One thing for sure is I can NEVER trust these to get new tasks and start without me watching them.

So far on just this pc I find 6 cores still running almost 5 hours with worthless tasks (wth that even happens is my question)
They should be stopped and aborted on their own after about 4 minutes max since that is how long it takes to see this......

....not hours and hours later when I have time to check them and Abort them and try more with me watching them make it all the way to this as it should start....


Today is day one of my new month isp and running as fast as 30Mbps
(ok off to see wth happened on the other pc's)
ID: 7013 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 771
Credit: 11,914,895
RAC: 5,203
Message 7015 - Posted: 14 Apr 2020, 18:18:37 UTC

BTW since we only need one Valid result each why are they sent out twice?
I got several *EXIT_ABORTED_VIA_GUI* of tasks I had started and ready to run today and when I start all of mine about 10 not so magically disappeared.

They didn't run any time but the problem is in order to set up a new batch to run the following morning I have to get each one running ( I have them run 4.5mins of more) then suspend them so I can do the same thing until all tasks are set up and ready.......this is during the time after 2am where I have 30Mbps and I suspend them all until after 8am.

Then start them all up.......of course I know I am the only one running these here via microwave communications and just how this works where we have 325million on the internet 24/7 so there is no way possible to get high-speed 24/7 365......which is why I don't let these run between 2am and 8am since that would use up all that high-speed I use to get them started in about 3 days every month.......which means not getting much done.

So I do it this way just to test them and I know no other public member would ever do this since they expect them to run on their own without having to watch them all the time since as I also mentioned a FAILED task will just keep on running for hours and finally be Invalid/Error......and you know that will start a new thread over there

[size]I'm not looking for tips or help but just saying that VB needs high-speed for the starting of every task and has nothing to do with the initial d/l of the project and vdi[/size]

Btw these Theory tasks sure are short ones lately.......some days will have the longer ones that will run fine and then you don't have to worry about starting new ones all the time....and what that can involve.
The good thing is I will have 62 Valids today.

you don't have to read my long story,it is just me thinking about this right now
ID: 7015 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 771
Credit: 11,914,895
RAC: 5,203
Message 7018 - Posted: 20 Apr 2020, 10:40:32 UTC

#501

Well running at high-speed internet right now but do you think that would work with Cern?



I got several of these
ID: 7018 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 674
Credit: 1,931,571
RAC: 298
Message 7019 - Posted: 22 Apr 2020, 9:22:59 UTC

Can you see in RDP this lines from Theory:
This task stopped directly after 2 Min.
2020-04-11 16:13:40 (7888): Guest Log: 01:13:39 CEST +02:00 2020-04-12: cranky: [INFO] Detected Theory App
2020-04-11 16:13:40 (7888): Guest Log: 01:13:39 CEST +02:00 2020-04-12: cranky: [INFO] Checking CVMFS.
2020-04-11 16:13:43 (7888): Guest Log: 01:13:43 CEST +02:00 2020-04-12: cranky: [ERROR] 'cvmfs_config probe grid.cern.ch' failed.
2020-04-11 16:13:50 (7888): Capturing screenshot.
2020-04-11 16:13:51 (7888): Screenshot completed.
2020-04-11 16:13:51 (7888): Powering off VM.

This task was running for hours.
2020-04-13 13:35:26 (9100): Guest Log: 22:35:23 CEST +02:00 2020-04-13: cranky: [INFO] Checking CVMFS.
2020-04-13 13:35:27 (9100): Guest Log: 22:35:24 CEST +02:00 2020-04-13: cranky: [ERROR] 'cvmfs_config probe sft.cern.ch' failed.

Have no idea how you can proof cvmfs_config by yourself in Windows before you start the task.
This cvmfs_config ok's are important to finish the task well.
ID: 7019 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 771
Credit: 11,914,895
RAC: 5,203
Message 7020 - Posted: 23 Apr 2020, 1:35:25 UTC - in response to Message 7019.  

Yes I can see that on the VM Console and the problem is that even if they do start running if the internet is having a problem while it is running it changes to what you see there AND they will run Failed for hours until you see it and Abort them.

As usual I can't trust Hughes Satellite isp so I had 60 tasks running and yesterday my isp would not ul/dl so those good running tasks then changed to Failed as you can see........I did get lots of Valids sent in before that happened but when I checked I saw all those Failed tasks so I aborted them and just disconnected all my ethernets and might try some on this one 8-core later tonight and get 12 running and then suspend them until the next morning since I only have 25% of my high-speed left and you only need that to start these and slow will work after that......BUT if the internet stops for more than a few minutes these VB tasks will Fail.

I watch EVERY task start running (60 or more) so I know if they are Failed or not in the first 3 minutes.

You can see on this other post here on this thread exactly what I mean


FAILED


RUNNING


THAT can change from Running to Failed if it loses the internet connection......nothing I can do about that.....especially now since 325 million sit at home all day watching internet videos and I am just trying to run these Cern tasks (and Einsteins but those GPU tasks are NOT internet depending)


AND I see which Monte Carlo event generator happens to be running every time too (pythia 6 and 8, sherpa,and herwig++ etc.)
ID: 7020 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Theory Application : Theory v.5.20


©2024 CERN