Message boards :
Theory Application :
New Version 3.08
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 8 Apr 15 Posts: 759 Credit: 11,782,220 RAC: 2,576 |
I was having problems here and over at LHC a few days ago but the last 3 days all have been Valids here and over there and finally with the multi-core over there I finally don't have to *remove* those old vdi's twice a day on all the hosts. The only strange thing at the end of these Valids is as usual the VB saying this...... VM did not power off when requested. VM was successfully terminated. Both the exact same second so that is ok. I'm just checking these tasks on my laptop and haven't made the climb back upstairs yet to look at the hosts I have here and over at LHC BUT I do know that since that Server thing happened that deleted all the history all I have is Valids and not any Errors since July 1st (I think it is about 40 Valid multi-cores over there so far) Mad Scientist For Life |
Send message Joined: 22 Apr 16 Posts: 673 Credit: 1,921,933 RAC: 1,541 |
Ok, first answer was in LHCb-folder: btw, be testing Theory in -dev. openhtc.io is NOT avalaible so long. 2018-09-06 21:50:23 (11276): Guest Log: Probing /cvmfs/grid.cern.ch... OK 2018-09-06 21:50:24 (11276): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2018-09-06 21:50:24 (11276): Guest Log: 2.4.4.0 3513 0 24900 7115 3 1 183743 10240001 2 65024 0 20 90 17 50 http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch http://128.142.33.31:3125 1 |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 850,198 RAC: 209 |
@Laurence: Any chance the suspend/resume will be fixed in a next version reported in https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=418&postid=5459 Tested this again here at dev and noticed that after the resume the events are happily processed until the processes are killed after 40 minutes, certainly after this: 09/08/18 07:40:57 (pid:636) Lost connection to shadow, waiting 2400 secs for reconnect 09/08/18 07:41:58 (pid:636) WARNING: forward resolution of 167.142.142.128.in-addr.arpa doesn't match 128.142.142.167! 09/08/18 07:41:58 (pid:636) CCBListener: registered with CCB server vccondor01.cern.ch as ccbid 128.142.142.167:9618?addrs=128.142.142.167-9618+[2001-1458-301-98--100-99]-9618#42776305 09/08/18 08:20:57 (pid:636) No reconnect from shadow for 2400 seconds, aborting job execution! 09/08/18 08:20:57 (pid:636) ShutdownFast all jobs. 09/08/18 08:20:57 (pid:636) Process exited, pid=640, signal=9 09/08/18 08:20:57 (pid:636) About to exec Post script: /var/lib/condor/execute/dir_636/post.sh 2110-628514-304 09/08/18 08:20:57 (pid:636) Create_Process succeeded, pid=23756 09/08/18 08:20:58 (pid:636) Process exited, pid=23756, status=0 09/08/18 08:20:58 (pid:636) Failed to send job exit status to shadow Thereafter nothing is happining. No reconnections, no new job, no proper VM-shutdown, no VM kill, only keeping an idle VM probably until BOINC's 18hours limit jumps in. |
Send message Joined: 22 Apr 16 Posts: 673 Credit: 1,921,933 RAC: 1,541 |
Updates the CernVM cache and now uses OpenHTC.io, for CVMFS. Laurence, you wrote this at the beginning of this thread for Version 3.08. Theory show this info in the stderr : 2018-09-12 17:55:21 (12920): Guest Log: Probing /cvmfs/grid.cern.ch... OK 2018-09-12 17:55:21 (12920): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2018-09-12 17:55:21 (12920): Guest Log: 2.4.4.0 3516 0 26936 7161 3 1 183743 10240001 2 65024 0 20 90 17 56 http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch http://128.142.168.202:3125 1 |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 332,243 RAC: 194 |
Is this working ok on the production server? |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 332,243 RAC: 194 |
This keeps coming back to bite us. I am wondering if we can simplify things. |
Send message Joined: 22 Apr 16 Posts: 673 Credit: 1,921,933 RAC: 1,541 |
Is this working ok on the production server? Yes, it works in production! This is from a Theory in production: 2018-09-13 03:22:50 (6300): Guest Log: Probing /cvmfs/grid.cern.ch... OK 2018-09-13 03:22:51 (6300): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2018-09-13 03:22:51 (6300): Guest Log: 2.4.4.0 3551 0 24904 7165 3 1 183743 10240001 2 65024 0 20 90 17 61 http://s1cern-cvmfs.openhtc.io/cvmfs/grid.cern.ch DIRECT 1 |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 332,243 RAC: 194 |
I think a small change was made to the image before it went to production. I can put that image here. |
©2024 CERN