Message boards : Theory Application : New Version 3.08
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 755
Credit: 11,756,461
RAC: 8,388
Message 5460 - Posted: 4 Jul 2018, 19:34:02 UTC - in response to Message 5459.  

I was having problems here and over at LHC a few days ago but the last 3 days all have been Valids here and over there and finally with the multi-core over there I finally don't have to *remove* those old vdi's twice a day on all the hosts.

The only strange thing at the end of these Valids is as usual the VB saying this......

VM did not power off when requested.
VM was successfully terminated.

Both the exact same second so that is ok.

I'm just checking these tasks on my laptop and haven't made the climb back upstairs yet to look at the hosts I have here and over at LHC BUT I do know that since that Server thing happened that deleted all the history all I have is Valids and not any Errors since July 1st (I think it is about 40 Valid multi-cores over there so far)
Mad Scientist For Life
ID: 5460 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 673
Credit: 1,903,036
RAC: 4,774
Message 5504 - Posted: 7 Sep 2018, 9:05:31 UTC
Last modified: 7 Sep 2018, 9:06:08 UTC

Ok, first answer was in LHCb-folder:
btw, be testing Theory in -dev. openhtc.io is NOT avalaible so long.
2018-09-06 21:50:23 (11276): Guest Log: Probing /cvmfs/grid.cern.ch... OK

2018-09-06 21:50:24 (11276): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE

2018-09-06 21:50:24 (11276): Guest Log: 2.4.4.0 3513 0 24900 7115 3 1 183743 10240001 2 65024 0 20 90 17 50 http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch http://128.142.33.31:3125 1
ID: 5504 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 5506 - Posted: 8 Sep 2018, 7:27:33 UTC

@Laurence:

Any chance the suspend/resume will be fixed in a next version reported in https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=418&postid=5459

Tested this again here at dev and noticed that after the resume the events are happily processed until the processes are killed after 40 minutes, certainly after this:

09/08/18 07:40:57 (pid:636) Lost connection to shadow, waiting 2400 secs for reconnect
09/08/18 07:41:58 (pid:636) WARNING: forward resolution of 167.142.142.128.in-addr.arpa doesn't match 128.142.142.167!
09/08/18 07:41:58 (pid:636) CCBListener: registered with CCB server vccondor01.cern.ch as ccbid 128.142.142.167:9618?addrs=128.142.142.167-9618+[2001-1458-301-98--100-99]-9618#42776305
09/08/18 08:20:57 (pid:636) No reconnect from shadow for 2400 seconds, aborting job execution!
09/08/18 08:20:57 (pid:636) ShutdownFast all jobs.
09/08/18 08:20:57 (pid:636) Process exited, pid=640, signal=9
09/08/18 08:20:57 (pid:636) About to exec Post script: /var/lib/condor/execute/dir_636/post.sh 2110-628514-304
09/08/18 08:20:57 (pid:636) Create_Process succeeded, pid=23756
09/08/18 08:20:58 (pid:636) Process exited, pid=23756, status=0
09/08/18 08:20:58 (pid:636) Failed to send job exit status to shadow

Thereafter nothing is happining. No reconnections, no new job, no proper VM-shutdown, no VM kill, only keeping an idle VM probably until BOINC's 18hours limit jumps in.
ID: 5506 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 673
Credit: 1,903,036
RAC: 4,774
Message 5512 - Posted: 13 Sep 2018, 5:29:50 UTC - in response to Message 5440.  

Updates the CernVM cache and now uses OpenHTC.io, for CVMFS.

Laurence, you wrote this at the beginning of this thread for Version 3.08.
Theory show this info in the stderr :
2018-09-12 17:55:21 (12920): Guest Log: Probing /cvmfs/grid.cern.ch... OK

2018-09-12 17:55:21 (12920): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE

2018-09-12 17:55:21 (12920): Guest Log: 2.4.4.0 3516 0 26936 7161 3 1 183743 10240001 2 65024 0 20 90 17 56 http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch http://128.142.168.202:3125 1
ID: 5512 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 107
Message 5515 - Posted: 13 Sep 2018, 13:32:28 UTC - in response to Message 5512.  

Is this working ok on the production server?
ID: 5515 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 107
Message 5516 - Posted: 13 Sep 2018, 13:49:16 UTC - in response to Message 5506.  

This keeps coming back to bite us. I am wondering if we can simplify things.
ID: 5516 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 673
Credit: 1,903,036
RAC: 4,774
Message 5517 - Posted: 13 Sep 2018, 14:40:32 UTC - in response to Message 5515.  
Last modified: 13 Sep 2018, 14:42:43 UTC

Is this working ok on the production server?

Yes, it works in production!

This is from a Theory in production:
2018-09-13 03:22:50 (6300): Guest Log: Probing /cvmfs/grid.cern.ch... OK

2018-09-13 03:22:51 (6300): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE

2018-09-13 03:22:51 (6300): Guest Log: 2.4.4.0 3551 0 24904 7165 3 1 183743 10240001 2 65024 0 20 90 17 61 http://s1cern-cvmfs.openhtc.io/cvmfs/grid.cern.ch DIRECT 1
ID: 5517 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 107
Message 5518 - Posted: 14 Sep 2018, 7:52:24 UTC - in response to Message 5517.  

I think a small change was made to the image before it went to production. I can put that image here.
ID: 5518 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Theory Application : New Version 3.08


©2024 CERN