Message boards : Theory Application : New version 5.00
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5

AuthorMessage
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 2,009
Message 6866 - Posted: 29 Nov 2019, 13:14:57 UTC - in response to Message 6864.  

That means the VirtualBox COM (VBoxSVC.exe) can't communicate fast enough with the wrapper, mostly caused by a too busy system.

That's strange. I crunch with my pc during the night, when i'm not working....

The VM periodically 'touches' a heartbeat-file to let vboxwrapper know that the VM is still alive.
If the wrapper does not detect a change to the modified time of the file at the specified interval it assumes the worst and aborts the job :-(
ID: 6866 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Luigi R.

Send message
Joined: 29 Sep 15
Posts: 5
Credit: 454,762
RAC: 0
Message 6869 - Posted: 1 Dec 2019, 19:17:20 UTC
Last modified: 1 Dec 2019, 19:25:10 UTC

2 tasks are using 0.632 CPUs. What is that?

I have 5 running tasks for 4 threads now.


Edit: I aborted-reported tasks and reset project. It looks fine.
ID: 6869 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Luigi R.

Send message
Joined: 29 Sep 15
Posts: 5
Credit: 454,762
RAC: 0
Message 6870 - Posted: 2 Dec 2019, 19:42:59 UTC

Again. :|

ID: 6870 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 734
Credit: 11,558,055
RAC: 2,030
Message 6890 - Posted: 8 Dec 2019, 16:39:26 UTC

I got a couple short Valids but now on the 2nd batch of this.



I aborted the 1st batch after 12+ hours and just noticed the next batch after 8+ hours (just got out of

https://lhcathomedev.cern.ch/lhcathome-dev/show_host_detail.php?hostid=1816
ID: 6890 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 734
Credit: 11,558,055
RAC: 2,030
Message 6905 - Posted: 15 Dec 2019, 18:12:00 UTC
Last modified: 15 Dec 2019, 18:40:18 UTC

We need to do something to have the server stop these Cranky-failed tasks in the first 10 minutes instead of just letting them run until we check to find out that they are running for no reason.


This


That happens in the first 3 minutes of booting up and there is no reason to keep them running.
First thing this morning I had to go and check all of these tasks I have running to see if they are actually running or like this morning where I found half of them had been running 8 hours and are Cranky-failed and have to Abort them and try to get new ones to start running as I am watching so I know for sure and can either let them run or Abort them.

That is a waste of time and I know having this over at LHC will not work since nobody over there likes wasting computer time.

I got most of mine running again except 3 cores that refused to get Cranky to run .......and well it is time to watch some sunday football .....but since 2 of those cores are here watching the game with me I will see if I can talk them into running before it gets Cranky and suspends for a few hours.

THIS is what we want to see in the first 5 minutes (actually less)
ID: 6905 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 734
Credit: 11,558,055
RAC: 2,030
Message 6912 - Posted: 22 Dec 2019, 8:53:44 UTC



These sure can be a problem if you don't have an internet speed of at least 1mbps because they have to start running in less than 3 minutes or this happens and it even gets worse since they will just keep on running this Invalid tasks for many,many hours if you don't happen to check any particular task and see this in the log or in the VM Console where you can see if it is running a Valid task or not (as I have shown snap shots of already)

I don't know what the reason is for these having to start running in less than 3 minutes to actually work and it seems you must have a way to get around this.

I saw it mentioned over at the LHC board and I know some of them live a few miles from Cern with faster than lightning ISP's so they only see it once in a while and wonder why they get these Invalids...........well I watch this happen every day and just so I can make sure they will all start I have to do it after 2am when my ISP speed is back to the fastest I get and I can start up 24 of these.

BUT if I try this any other time (other than 2am-8am) it is just pure luck to get any of them to start.......like just now I do a speed test first and it is right on the line around 1mbps so I get one to start up and try a second task and it seems to be still running fast enough BUT as I watch the VM Console I see it took longer than the 1min 45 seconds so they jump to the next pages and then tell me it Failed

So since it is 1am now I will just wait until 2am and get them all running again.

This type of thing has always been the main problem with VB tasks since day one 9 years ago.
ID: 6912 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 734
Credit: 11,558,055
RAC: 2,030
Message 6923 - Posted: 31 Dec 2019, 23:30:53 UTC

Well here I go again after 2 weeks and 1200 Valid tasks here my satellite isp throttles me down to dialup speed for the next 2 weeks so I can't run any of these Theory VB tasks

If they don't start *running* in less than 3 minutes then they will just run for hours and give you a Failed task which you can see in the VM Console in the first 3 minutes so no reason to just let it run an Invalid for hours and hours.

Sure would be nice if it didn't demand high-speed internet to just start the VB tasks (since 2011) and after they do start the slow speed does not matter at all.

So as UOTD this is pretty much all I will be doing.

(and it would be nice if the CMS were fixed so they run on a Windows OS even though since it is VB they do the same thing as the rest)

HAPPY NEW YEAR
ID: 6923 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 734
Credit: 11,558,055
RAC: 2,030
Message 6934 - Posted: 8 Jan 2020, 23:33:51 UTC

Something is wrong with these tasks today.
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2857294

I have about 15 in a row and the strange part is they start as if they are going to run Valids but end up failing.


They start that way and end up
Guest Log: 00:08:32 CET +01:00 2020-01-09: cranky: [ERROR] Container 'runc' terminated with status code 1
.[ERROR] Job Failed
ID: 6934 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 2,009
Message 6936 - Posted: 9 Jan 2020, 9:57:56 UTC - in response to Message 6934.  

Something is wrong with these tasks today.
...
[ERROR] Job Failed[/b]

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5265
ID: 6936 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5

Message boards : Theory Application : New version 5.00


©2024 CERN