Message boards : Theory Application : Task not starting and not shutting down !
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,193,337
RAC: 8,222
Message 2903 - Posted: 21 Apr 2016, 15:49:22 UTC

This is StartLog from a task that hasn't done any real work and won't exit...

04/11/16 16:18:12 ******************************************************
04/11/16 16:18:12 ** condor_startd (CONDOR_STARTD) STARTING UP
04/11/16 16:18:12 ** /usr/sbin/condor_startd
04/11/16 16:18:12 ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1)
04/11/16 16:18:12 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON
04/11/16 16:18:12 ** $CondorVersion: 8.0.6 Feb 01 2014 BuildID: 225363 $
04/11/16 16:18:12 ** $CondorPlatform: x86_64_RedHat6 $
04/11/16 16:18:12 ** PID = 4464
04/11/16 16:18:12 ** Log last touched time unavailable (No such file or directory)
04/11/16 16:18:12 ******************************************************
04/11/16 16:18:12 Using config source: /etc/condor/condor_config
04/11/16 16:18:12 Using local config sources:
04/11/16 16:18:12 /etc/condor/config.d/10_security.config
04/11/16 16:18:12 /etc/condor/config.d/14_network.config
04/11/16 16:18:12 /etc/condor/config.d/20_workernode.config
04/11/16 16:18:12 /etc/condor/config.d/30_lease.config
04/11/16 16:18:12 /etc/condor/config.d/35_theory.config
04/11/16 16:18:12 /etc/condor/config.d/40_ccb.config
04/11/16 16:18:12 /etc/condor/condor_config.local
04/11/16 16:18:12 Daemon Log is logging: D_ALWAYS D_ERROR
04/11/16 16:18:12 DaemonCore: command socket at <10.0.2.15:33147?noUDP>
04/11/16 16:18:12 DaemonCore: private command socket at <10.0.2.15:33147>
04/11/16 16:18:12 ERROR: Could not open canonicalization file '/etc/condor/certificate_mapfile' (No such file or directory)
04/11/16 16:18:13 CCBListener: heartbeat disabled because interval is configured to be 0
04/11/16 16:18:13 CCBListener: registered with CCB server alicondor01.cern.ch as ccbid 188.184.129.127:9618?addrs=188.184.129.127-9618&noUDP&sock=collector#497
04/11/16 16:18:13 HibernationSupportedStates invalid '' in ad from hibernation plugin /usr/libexec/condor/condor_power_state
04/11/16 16:18:26 VM-gahp server reported an internal error
04/11/16 16:18:26 VM universe will be tested to check if it is available
04/11/16 16:18:26 History file rotation is enabled.
04/11/16 16:18:26 Maximum history file size is: 20971520 bytes
04/11/16 16:18:26 Number of rotated history files is: 2
slot type 0: Cpus: 1, Memory: auto, Swap: auto, Disk: auto
slot type 0: Cpus: 1, Memory: 4500, Swap: 100.00%, Disk: 100.00%
04/11/16 16:18:26 New machine resource allocated
04/11/16 16:18:26 CronJobList: Adding job 'mips'
04/11/16 16:18:26 CronJobList: Adding job 'kflops'
04/11/16 16:18:26 CronJob: Initializing job 'mips' (/usr/libexec/condor/condor_mips)
04/11/16 16:18:26 CronJob: Initializing job 'kflops' (/usr/libexec/condor/condor_kflops)
04/11/16 16:18:26 State change: IS_OWNER is false
04/11/16 16:18:26 Changing state: Owner -> Unclaimed
04/11/16 16:18:26 State change: RunBenchmarks is TRUE
04/11/16 16:18:26 Changing activity: Idle -> Benchmarking
04/11/16 16:18:26 BenchMgr:StartBenchmarks()
04/11/16 16:18:39 Request accepted.
04/11/16 16:18:39 Remote owner is test4theory@cern.ch
04/11/16 16:18:39 State change: claiming protocol successful
04/11/16 16:18:39 Changing state and activity: Unclaimed/Benchmarking -> Claimed/Idle
04/11/16 16:18:40 Got activate_claim request from shadow (188.184.187.167)
04/11/16 16:18:40 Remote job ID is 260339.0
04/11/16 16:18:40 Got universe "VANILLA" (5) from request classad
04/11/16 16:18:40 State change: claim-activation protocol successful
04/11/16 16:18:40 Changing activity: Idle -> Busy
04/11/16 16:18:41 PERMISSION DENIED to condor@localhost from host 10.0.2.15 for command 60008 (DC_CHILDALIVE), access level DAEMON: reason: DAEMON authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 10.0.2.15,10.0.2.15, hostname size = 1, original ip address = 10.0.2.15
04/11/16 16:18:53 State change: benchmarks completed
04/11/16 16:21:46 Called deactivate_claim_forcibly()
04/11/16 16:21:46 Starter pid 4557 exited with status 0
04/11/16 16:21:46 State change: starter exited
04/11/16 16:21:46 Changing activity: Busy -> Idle
04/11/16 16:21:47 Got activate_claim request from shadow (188.184.187.167)
04/11/16 16:21:47 Remote job ID is 260340.0
04/11/16 16:21:47 Got universe "VANILLA" (5) from request classad
04/11/16 16:21:47 State change: claim-activation protocol successful
04/11/16 16:21:47 Changing activity: Idle -> Busy
04/11/16 16:21:48 PERMISSION DENIED to condor@localhost from host 10.0.2.15 for command 60008 (DC_CHILDALIVE), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason
04/11/16 16:24:15 Called deactivate_claim_forcibly()
04/11/16 16:24:15 Starter pid 5159 exited with status 0
04/11/16 16:24:15 State change: starter exited
04/11/16 16:24:15 Changing activity: Busy -> Idle
04/11/16 16:24:16 Got activate_claim request from shadow (188.184.187.167)
04/11/16 16:24:16 Remote job ID is 260341.0
04/11/16 16:24:16 Got universe "VANILLA" (5) from request classad
04/11/16 16:24:16 State change: claim-activation protocol successful
04/11/16 16:24:16 Changing activity: Idle -> Busy
04/11/16 16:24:17 PERMISSION DENIED to condor@localhost from host 10.0.2.15 for command 60008 (DC_CHILDALIVE), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason
04/11/16 16:24:17 PERMISSION DENIED to condor@localhost from host 10.0.2.15 for command 60008 (DC_CHILDALIVE), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason
04/11/16 16:24:17 PERMISSION DENIED to condor@localhost from host 10.0.2.15 for command 60008 (DC_CHILDALIVE), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason
04/11/16 16:24:17 Starter pid 5731 exited with status 4
04/11/16 16:24:17 State change: starter exited
04/11/16 16:24:17 Changing activity: Busy -> Idle
04/11/16 16:24:18 Got activate_claim request from shadow (188.184.187.167)
04/11/16 16:24:18 Remote job ID is 260341.0
04/11/16 16:24:18 Got universe "VANILLA" (5) from request classad
04/11/16 16:24:18 State change: claim-activation protocol successful
04/11/16 16:24:18 Changing activity: Idle -> Busy
04/11/16 16:24:19 PERMISSION DENIED to condor@localhost from host 10.0.2.15 for command 60008 (DC_CHILDALIVE), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason
04/11/16 16:26:44 Called deactivate_claim_forcibly()
04/11/16 16:26:44 Starter pid 5739 exited with status 0
04/11/16 16:26:44 State change: starter exited
04/11/16 16:26:44 Changing activity: Busy -> Idle
04/11/16 16:26:44 State change: received RELEASE_CLAIM command
04/11/16 16:26:44 Changing state and activity: Claimed/Idle -> Preempting/Vacating
04/11/16 16:26:44 State change: No preempting claim, returning to owner
04/11/16 16:26:44 Changing state and activity: Preempting/Vacating -> Owner/Idle
04/11/16 16:26:44 State change: IS_OWNER is false
04/11/16 16:26:44 Changing state: Owner -> Unclaimed
04/21/16 16:24:03 ******************************************************
04/21/16 16:24:03 ** condor_startd (CONDOR_STARTD) STARTING UP
04/21/16 16:24:03 ** /usr/sbin/condor_startd
04/21/16 16:24:03 ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1)
04/21/16 16:24:03 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON
04/21/16 16:24:03 ** $CondorVersion: 8.0.6 Feb 01 2014 BuildID: 225363 $
04/21/16 16:24:03 ** $CondorPlatform: x86_64_RedHat6 $
04/21/16 16:24:03 ** PID = 3550
04/21/16 16:24:03 ** Log last touched 4/21 16:23:51
04/21/16 16:24:03 ******************************************************
04/21/16 16:24:03 Using config source: /etc/condor/condor_config
04/21/16 16:24:03 Using local config sources:
04/21/16 16:24:03 /etc/condor/config.d/10_security.config
04/21/16 16:24:03 /etc/condor/config.d/14_network.config
04/21/16 16:24:03 /etc/condor/config.d/20_workernode.config
04/21/16 16:24:03 /etc/condor/config.d/30_lease.config
04/21/16 16:24:03 /etc/condor/config.d/35_theory.config
04/21/16 16:24:03 /etc/condor/config.d/40_ccb.config
04/21/16 16:24:03 /etc/condor/condor_config.local
04/21/16 16:24:03 Daemon Log is logging: D_ALWAYS D_ERROR
04/21/16 16:24:03 DaemonCore: command socket at <10.0.2.15:42749?noUDP>
04/21/16 16:24:03 DaemonCore: private command socket at <10.0.2.15:42749>
04/21/16 16:24:03 ERROR: Could not open canonicalization file '/etc/condor/certificate_mapfile' (No such file or directory)
04/21/16 16:24:16 CCBListener: heartbeat disabled because interval is configured to be 0
04/21/16 16:24:16 CCBListener: registered with CCB server alicondor01.cern.ch as ccbid 188.184.129.127:9618?addrs=188.184.129.127-9618&noUDP&sock=collector#13882
04/21/16 16:24:16 HibernationSupportedStates invalid '' in ad from hibernation plugin /usr/libexec/condor/condor_power_state
04/21/16 16:24:23 VM-gahp server reported an internal error
04/21/16 16:24:23 VM universe will be tested to check if it is available
04/21/16 16:24:23 History file rotation is enabled.
04/21/16 16:24:23 Maximum history file size is: 20971520 bytes
04/21/16 16:24:23 Number of rotated history files is: 2
slot type 0: Cpus: 1, Memory: auto, Swap: auto, Disk: auto
slot type 0: Cpus: 1, Memory: 4500, Swap: 100.00%, Disk: 100.00%
04/21/16 16:24:23 New machine resource allocated
04/21/16 16:24:23 CronJobList: Adding job 'mips'
04/21/16 16:24:23 CronJobList: Adding job 'kflops'
04/21/16 16:24:23 CronJob: Initializing job 'mips' (/usr/libexec/condor/condor_mips)
04/21/16 16:24:23 CronJob: Initializing job 'kflops' (/usr/libexec/condor/condor_kflops)
04/21/16 16:24:23 State change: IS_OWNER is false
04/21/16 16:24:23 Changing state: Owner -> Unclaimed
04/21/16 16:24:23 State change: RunBenchmarks is TRUE
04/21/16 16:24:23 Changing activity: Idle -> Benchmarking
04/21/16 16:24:23 BenchMgr:StartBenchmarks()
04/21/16 16:24:46 State change: benchmarks completed
04/21/16 16:24:46 Changing activity: Benchmarking -> Idle
04/21/16 16:25:18 Request accepted.
04/21/16 16:25:18 Remote owner is test4theory@cern.ch
04/21/16 16:25:18 State change: claiming protocol successful
04/21/16 16:25:18 Changing state: Unclaimed -> Claimed
04/21/16 16:25:20 Got activate_claim request from shadow (188.184.187.167)
04/21/16 16:25:20 Remote job ID is 271551.0
04/21/16 16:25:20 Got universe "VANILLA" (5) from request classad
04/21/16 16:25:20 State change: claim-activation protocol successful
04/21/16 16:25:20 Changing activity: Idle -> Busy
04/21/16 16:25:30 PERMISSION DENIED to condor@246-776-24187 from host 10.0.2.15 for command 60008 (DC_CHILDALIVE), access level DAEMON: reason: DAEMON authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 10.0.2.15,10.0.2.15, hostname size = 1, original ip address = 10.0.2.15
04/21/16 16:39:11 Called deactivate_claim_forcibly()
04/21/16 16:39:13 Got activate claim while starter is still alive.
04/21/16 16:39:13 Telling shadow to try again later.
04/21/16 16:39:14 Got activate claim while starter is still alive.
04/21/16 16:39:14 Telling shadow to try again later.
04/21/16 16:39:15 Got activate claim while starter is still alive.
04/21/16 16:39:15 Telling shadow to try again later.
04/21/16 16:39:16 Got activate claim while starter is still alive.
04/21/16 16:39:16 Telling shadow to try again later.
04/21/16 16:39:17 Got activate claim while starter is still alive.
04/21/16 16:39:17 Telling shadow to try again later.
04/21/16 16:39:19 Got activate claim while starter is still alive.
04/21/16 16:39:19 Telling shadow to try again later.
04/21/16 16:39:20 Got activate claim while starter is still alive.
04/21/16 16:39:20 Telling shadow to try again later.
04/21/16 16:39:21 Got activate claim while starter is still alive.
04/21/16 16:39:21 Telling shadow to try again later.
04/21/16 16:39:22 Got activate claim while starter is still alive.
04/21/16 16:39:22 Telling shadow to try again later.
04/21/16 16:39:23 Got activate claim while starter is still alive.
04/21/16 16:39:23 Telling shadow to try again later.
04/21/16 16:39:24 Got activate claim while starter is still alive.
04/21/16 16:39:24 Telling shadow to try again later.
04/21/16 16:39:25 Got activate claim while starter is still alive.
04/21/16 16:39:25 Telling shadow to try again later.
04/21/16 16:39:26 Got activate claim while starter is still alive.
04/21/16 16:39:26 Telling shadow to try again later.
04/21/16 16:39:29 Got activate claim while starter is still alive.
04/21/16 16:39:29 Telling shadow to try again later.
04/21/16 16:39:30 Got activate claim while starter is still alive.
04/21/16 16:39:30 Telling shadow to try again later.
04/21/16 16:39:31 Got activate claim while starter is still alive.
04/21/16 16:39:31 Telling shadow to try again later.
04/21/16 16:39:32 Got activate claim while starter is still alive.
04/21/16 16:39:32 Telling shadow to try again later.
04/21/16 16:39:33 Got activate claim while starter is still alive.
04/21/16 16:39:33 Telling shadow to try again later.
04/21/16 16:39:34 Got activate claim while starter is still alive.
04/21/16 16:39:34 Telling shadow to try again later.
04/21/16 16:39:36 Got activate claim while starter is still alive.
04/21/16 16:39:36 Telling shadow to try again later.
04/21/16 16:39:37 Got activate claim while starter is still alive.
04/21/16 16:39:37 Telling shadow to try again later.
04/21/16 16:39:37 Called deactivate_claim()
04/21/16 16:39:39 State change: received RELEASE_CLAIM command
04/21/16 16:39:39 Changing state and activity: Claimed/Busy -> Preempting/Vacating
04/21/16 16:39:41 starter (pid 3590) is not responding to the request to hardkill its job. The startd will now directly hard kill the starter and all its decendents.
04/21/16 16:39:41 Starter pid 3590 died on signal 9 (signal 9 (Killed))
04/21/16 16:39:41 State change: starter exited
04/21/16 16:39:41 State change: No preempting claim, returning to owner
04/21/16 16:39:41 Changing state and activity: Preempting/Vacating -> Owner/Idle
04/21/16 16:39:41 State change: IS_OWNER is false
04/21/16 16:39:41 Changing state: Owner -> Unclaimed
04/21/16 16:40:19 Request accepted.
04/21/16 16:40:19 Remote owner is test4theory@cern.ch
04/21/16 16:40:19 State change: claiming protocol successful
04/21/16 16:40:19 Changing state: Unclaimed -> Claimed
04/21/16 16:40:21 Got activate_claim request from shadow (188.184.187.167)
04/21/16 16:40:21 Remote job ID is 271557.0
04/21/16 16:40:21 Got universe "VANILLA" (5) from request classad
04/21/16 16:40:21 State change: claim-activation protocol successful
04/21/16 16:40:21 Changing activity: Idle -> Busy
04/21/16 16:40:28 PERMISSION DENIED to condor@246-776-24187 from host 10.0.2.15 for command 60008 (DC_CHILDALIVE), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason
04/21/16 16:40:28 PERMISSION DENIED to condor@246-776-24187 from host 10.0.2.15 for command 60008 (DC_CHILDALIVE), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason
04/21/16 16:40:47 Called deactivate_claim_forcibly()
04/21/16 16:40:48 Got activate claim while starter is still alive.
04/21/16 16:40:48 Telling shadow to try again later.
04/21/16 16:40:49 Got activate claim while starter is still alive.
04/21/16 16:40:49 Telling shadow to try again later.
04/21/16 16:40:50 Got activate claim while starter is still alive.
04/21/16 16:40:50 Telling shadow to try again later.
04/21/16 16:40:51 Got activate claim while starter is still alive.
04/21/16 16:40:51 Telling shadow to try again later.
04/21/16 16:40:52 Got activate claim while starter is still alive.
04/21/16 16:40:52 Telling shadow to try again later.
04/21/16 16:40:54 Got activate claim while starter is still alive.
04/21/16 16:40:54 Telling shadow to try again later.
04/21/16 16:40:55 Got activate claim while starter is still alive.
04/21/16 16:40:55 Telling shadow to try again later.
04/21/16 16:40:56 Got activate claim while starter is still alive.
04/21/16 16:40:56 Telling shadow to try again later.
04/21/16 16:40:57 Got activate claim while starter is still alive.
04/21/16 16:40:57 Telling shadow to try again later.
04/21/16 16:40:58 Got activate claim while starter is still alive.
04/21/16 16:40:58 Telling shadow to try again later.
04/21/16 16:40:59 Got activate claim while starter is still alive.
04/21/16 16:40:59 Telling shadow to try again later.
04/21/16 16:41:00 Got activate claim while starter is still alive.
04/21/16 16:41:00 Telling shadow to try again later.
04/21/16 16:41:01 Got activate claim while starter is still alive.
04/21/16 16:41:01 Telling shadow to try again later.
04/21/16 16:41:02 Got activate claim while starter is still alive.
04/21/16 16:41:02 Telling shadow to try again later.
04/21/16 16:41:04 Got activate claim while starter is still alive.
04/21/16 16:41:04 Telling shadow to try again later.
04/21/16 16:41:05 Got activate claim while starter is still alive.
04/21/16 16:41:05 Telling shadow to try again later.
04/21/16 16:41:06 Got activate claim while starter is still alive.
04/21/16 16:41:06 Telling shadow to try again later.
04/21/16 16:41:07 Got activate claim while starter is still alive.
04/21/16 16:41:07 Telling shadow to try again later.
04/21/16 16:41:08 Got activate claim while starter is still alive.
04/21/16 16:41:08 Telling shadow to try again later.
04/21/16 16:41:09 Got activate claim while starter is still alive.
04/21/16 16:41:09 Telling shadow to try again later.
04/21/16 16:41:10 Got activate claim while starter is still alive.
04/21/16 16:41:10 Telling shadow to try again later.
04/21/16 16:41:10 Called deactivate_claim()
04/21/16 16:41:11 State change: received RELEASE_CLAIM command
04/21/16 16:41:11 Changing state and activity: Claimed/Busy -> Preempting/Vacating
04/21/16 16:41:17 starter (pid 6986) is not responding to the request to hardkill its job. The startd will now directly hard kill the starter and all its decendents.
04/21/16 16:41:17 Starter pid 6986 died on signal 9 (signal 9 (Killed))
04/21/16 16:41:17 State change: starter exited
04/21/16 16:41:17 State change: No preempting claim, returning to owner
04/21/16 16:41:17 Changing state and activity: Preempting/Vacating -> Owner/Idle
04/21/16 16:41:17 State change: IS_OWNER is false
04/21/16 16:41:17 Changing state: Owner -> Unclaimed


I killed an earlier one with a shutdown file and in the completed log it shows...

2016-04-21 15:46:06 (16476): Guest Log: [INFO] VMID: db552956-770a-4b03-9ed9-316e25ec1573
2016-04-21 15:46:06 (16476): Guest Log: [INFO] Requesting an X509 credential from vLHC@home
2016-04-21 15:46:06 (16476): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev
2016-04-21 15:46:06 (16476): Guest Log: [INFO] Theory application starting. Check log files.
2016-04-21 16:22:33 (16476): VM Completion File Detected.
2016-04-21 16:22:33 (16476): Powering off VM.
2016-04-21 16:22:35 (16476): Successfully stopped VM.
2016-04-21 16:22:40 (16476): Deregistering VM. (boinc_c915505983d72e43, slot#8)
ID: 2903 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Theory Application : Task not starting and not shutting down !


©2024 CERN