Message boards : CMS Application : Early shutdown of the VM
Message board moderation

To post messages, you must log in.

AuthorMessage
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 1,800
Message 2531 - Posted: 23 Mar 2016, 17:24:29 UTC
Last modified: 23 Mar 2016, 17:41:33 UTC

It looks like all VM's are shutdown early with the remark "No output", before the normal duration time is over.

2016-03-23 17:49:39 (4224): Guest Log: [INFO] No output. Shutting down!
2016-03-23 17:49:39 (4224): VM Completion File Detected.


Running jobs down to zero.
ID: 2531 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,870,629
RAC: 576
Message 2532 - Posted: 23 Mar 2016, 17:43:09 UTC - in response to Message 2531.  
Last modified: 23 Mar 2016, 18:01:04 UTC

It looks like all VM's are shutdown early with the remark "No output", before the normal duration time is over.

2016-03-23 17:49:39 (4224): Guest Log: [INFO] No output. Shutting down!
2016-03-23 17:49:39 (4224): VM Completion File Detected.

Thanks for spotting that. I've been poring over the server trying to find a clue as to why the number of Condor jobs is falling. I wonder if there was an error/typo in the last change?
[Edit] It does look like a problem with the glide-ins, they seem to be returning without having fetched any jobs. [/Edit]
[Edit^2] Yes, we have a problem:
Setting X509_USER_PROXY to canonical path /tmp/x509up_u500
Wed Mar 23 17:31:06 GMT 2016 Failed to load file 'description.g2ifBf.cfg' from 'http://lcggwms01.gridpp.rl.ac.uk:8319/factory/stage/glidein_v3_2_7'.
Wed Mar 23 17:31:08 GMT 2016 Sleeping 308
Wed Mar 23 17:36:16 GMT 2016 Sleeping 335
Wed Mar 23 17:41:51 GMT 2016 Sleeping 267
Wed Mar 23 17:46:19 GMT 2016 Sleeping 287

That's a file we changed to today. I've sent the appropriate e-mails, expect several hours at least for things to come good as any fix will have to percolate through cvmfs once it is made. [/Edit^2]
ID: 2532 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2533 - Posted: 23 Mar 2016, 17:57:26 UTC - in response to Message 2532.  

Wed Mar 23 18:40:09 CET 2016 Failed to load file 'description.g2ifBf.cfg' from 'http://lcggwms01.gridpp.rl.ac.uk:8319/factory/stage/glidein_v3_2_7'.
Wed Mar 23 18:40:10 CET 2016 Sleeping 294
Wed Mar 23 18:45:04 CET 2016 Sleeping 250
Wed Mar 23 18:49:14 CET 2016 Sleeping 257
Wed Mar 23 18:53:31 CET 2016 Sleeping 301


Glidein not working
ID: 2533 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 249
Message 2534 - Posted: 23 Mar 2016, 18:07:31 UTC - in response to Message 2533.  

Sorry, my mistake. Should be fixed in CVFMS, just have to wait for the caches to update.
ID: 2534 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 249
Message 2535 - Posted: 23 Mar 2016, 19:13:59 UTC - in response to Message 2534.  

Looks good now. Things are recovering.
ID: 2535 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2536 - Posted: 23 Mar 2016, 19:21:48 UTC - in response to Message 2535.  

Please be more careful next time!
ID: 2536 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 1,800
Message 2537 - Posted: 23 Mar 2016, 20:13:03 UTC - in response to Message 2535.  

Looks good now. Things are recovering.

Thanks and have a nice and quiet evening.
ID: 2537 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 1,800
Message 2540 - Posted: 23 Mar 2016, 21:46:32 UTC

A new task did only 1 job and after the glidein run ended, the shutdown was initiated with the argument 'No more jobs. Shutting down!'
ID: 2540 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2542 - Posted: 23 Mar 2016, 23:05:28 UTC - in response to Message 2540.  
Last modified: 23 Mar 2016, 23:10:15 UTC

Had the same happening.

Another task actually continued to start a second job in the same run.

Looks like the exit criteria need to be looked at.
ID: 2542 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 1,800
Message 2555 - Posted: 24 Mar 2016, 15:36:44 UTC - in response to Message 2533.  

Glidein not working

Same here with 2nd Run:

------- Initial environment ---------------
MANPATH=/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/usr/share/man
SHELL=/bin/sh
GFAL_PLUGIN_DIR=/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/usr/lib64/gfal2-plugins/
VOMS_USERCONF=/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/etc/vomses
GLOBUS_LOCATION=/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/usr
PERL5LIB=/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/usr/lib64/perl5/vendor_perl:/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/usr/lib/perl5/vendor_perl
GT_PROXY_MODE=old
X509_CERT_DIR=/etc/grid-security/certificates
USER=boinc
LD_LIBRARY_PATH=/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/lib64:/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/lib:/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/usr/lib64:/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/usr/lib
EMI_TARBALL_BASE=/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1
LCG_LOCATION=/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/usr
PATH=/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/bin:/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/sbin:/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/usr/bin:/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/usr/sbin:/usr/bin:/bin
PWD=/home/boinc/CMSRun
JAVA_HOME=/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/usr/lib/jvm/jre-1.6.0-openjdk.x86_64
LANG=C
X509_VOMS_DIR=/cvmfs/grid.cern.ch/etc/grid-security/vomsdir
BDII_LIST=lcg-bdii.cern.ch:2170
MYPROXY_SERVER=myproxy.cern.ch
SHLVL=3
HOME=/home/boinc
GLITE_LOCATION_VAR=/var
X509_USER_PROXY=/tmp/x509up_u500
GFAL_CONFIG_DIR=/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/etc/gfal2.d/
LOGNAME=boinc
PYTHONPATH=/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/usr/lib64/python2.6/site-packages:/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/usr/lib/python2.6/site-packages
CERNVM_UUID=e9a40930-863c-4e95-b27a-44abf7940b9c
GLITE_LOCATION=/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/usr
SRM_PATH=/cvmfs/grid.cern.ch/emi-wn-3.15.3-1_sl6v1/usr/share/srm
_=/usr/bin/env
------- =================== ---------------
Setting X509_USER_PROXY to canonical path /tmp/x509up_u500
signature.ebsf2L.sha1: OK
Thu Mar 24 16:16:05 CET 2016 Failed to load file 'description.g2ifBz.cfg.cfg' from 'http://lcggwms01.gridpp.rl.ac.uk:8319/factory/stage/glidein_v3_2_7/entry_volunteer'.
Thu Mar 24 16:16:05 CET 2016 Sleeping 263
Thu Mar 24 16:20:28 CET 2016 Sleeping 333
Thu Mar 24 16:26:02 CET 2016 Sleeping 345
Thu Mar 24 16:31:47 CET 2016 Sleeping 257
ID: 2555 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : CMS Application : Early shutdown of the VM


©2024 CERN