Thread 'New Version 3.08'

Author	Message
Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 905 Credit: 16,502,457 RAC: 6,286	Message 5460 - Posted: 4 Jul 2018, 19:34:02 UTC - in response to Message 5459. I was having problems here and over at LHC a few days ago but the last 3 days all have been Valids here and over there and finally with the multi-core over there I finally don't have to remove those old vdi's twice a day on all the hosts. The only strange thing at the end of these Valids is as usual the VB saying this...... VM did not power off when requested. VM was successfully terminated. Both the exact same second so that is ok. I'm just checking these tasks on my laptop and haven't made the climb back upstairs yet to look at the hosts I have here and over at LHC BUT I do know that since that Server thing happened that deleted all the history all I have is Valids and not any Errors since July 1st (I think it is about 40 Valid multi-cores over there so far) Mad Scientist For Life ID: 5460 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 786 Credit: 4,066,222 RAC: 1	Message 5504 - Posted: 7 Sep 2018, 9:05:31 UTC Last modified: 7 Sep 2018, 9:06:08 UTC Ok, first answer was in LHCb-folder: btw, be testing Theory in -dev. openhtc.io is NOT avalaible so long. 2018-09-06 21:50:23 (11276): Guest Log: Probing /cvmfs/grid.cern.ch... OK 2018-09-06 21:50:24 (11276): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2018-09-06 21:50:24 (11276): Guest Log: 2.4.4.0 3513 0 24900 7115 3 1 183743 10240001 2 65024 0 20 90 17 50 http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch http://128.142.33.31:3125 1 ID: 5504 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1273 Credit: 1,032,882 RAC: 235	Message 5506 - Posted: 8 Sep 2018, 7:27:33 UTC @Laurence: Any chance the suspend/resume will be fixed in a next version reported in https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=418&postid=5459 Tested this again here at dev and noticed that after the resume the events are happily processed until the processes are killed after 40 minutes, certainly after this: 09/08/18 07:40:57 (pid:636) Lost connection to shadow, waiting 2400 secs for reconnect 09/08/18 07:41:58 (pid:636) WARNING: forward resolution of 167.142.142.128.in-addr.arpa doesn't match 128.142.142.167! 09/08/18 07:41:58 (pid:636) CCBListener: registered with CCB server vccondor01.cern.ch as ccbid 128.142.142.167:9618?addrs=128.142.142.167-9618+[2001-1458-301-98--100-99]-9618#42776305 09/08/18 08:20:57 (pid:636) No reconnect from shadow for 2400 seconds, aborting job execution! 09/08/18 08:20:57 (pid:636) ShutdownFast all jobs. 09/08/18 08:20:57 (pid:636) Process exited, pid=640, signal=9 09/08/18 08:20:57 (pid:636) About to exec Post script: /var/lib/condor/execute/dir_636/post.sh 2110-628514-304 09/08/18 08:20:57 (pid:636) Create_Process succeeded, pid=23756 09/08/18 08:20:58 (pid:636) Process exited, pid=23756, status=0 09/08/18 08:20:58 (pid:636) Failed to send job exit status to shadow Thereafter nothing is happining. No reconnections, no new job, no proper VM-shutdown, no VM kill, only keeping an idle VM probably until BOINC's 18hours limit jumps in. ID: 5506 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 786 Credit: 4,066,222 RAC: 1	Message 5512 - Posted: 13 Sep 2018, 5:29:50 UTC - in response to Message 5440. Updates the CernVM cache and now uses OpenHTC.io, for CVMFS. Laurence, you wrote this at the beginning of this thread for Version 3.08. Theory show this info in the stderr : 2018-09-12 17:55:21 (12920): Guest Log: Probing /cvmfs/grid.cern.ch... OK 2018-09-12 17:55:21 (12920): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2018-09-12 17:55:21 (12920): Guest Log: 2.4.4.0 3516 0 26936 7161 3 1 183743 10240001 2 65024 0 20 90 17 56 http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch http://128.142.168.202:3125 1 ID: 5512 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1156 Credit: 342,328 RAC: 0	Message 5515 - Posted: 13 Sep 2018, 13:32:28 UTC - in response to Message 5512. Is this working ok on the production server? ID: 5515 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1156 Credit: 342,328 RAC: 0	Message 5516 - Posted: 13 Sep 2018, 13:49:16 UTC - in response to Message 5506. This keeps coming back to bite us. I am wondering if we can simplify things. ID: 5516 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 786 Credit: 4,066,222 RAC: 1	Message 5517 - Posted: 13 Sep 2018, 14:40:32 UTC - in response to Message 5515. Last modified: 13 Sep 2018, 14:42:43 UTC Is this working ok on the production server? Yes, it works in production! This is from a Theory in production: 2018-09-13 03:22:50 (6300): Guest Log: Probing /cvmfs/grid.cern.ch... OK 2018-09-13 03:22:51 (6300): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2018-09-13 03:22:51 (6300): Guest Log: 2.4.4.0 3551 0 24904 7165 3 1 183743 10240001 2 65024 0 20 90 17 61 http://s1cern-cvmfs.openhtc.io/cvmfs/grid.cern.ch DIRECT 1 ID: 5517 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1156 Credit: 342,328 RAC: 0	Message 5518 - Posted: 14 Sep 2018, 7:52:24 UTC - in response to Message 5517. I think a small change was made to the image before it went to production. I can put that image here. ID: 5518 · Rating: 0 · rate: / Reply Quote

Development for LHC@home