Message boards :
CMS Application :
CMS 46.27 on vLHC looping but doing nothing usefull
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 May 15 Posts: 147 Credit: 2,842,484 RAC: 0 |
In the last days I got the Feeling that my CMS-Task at vLHC is doing nothing usefull. Every time I checked inside the VM the Console ALT/F5 I saw nothing. ATL/F1 - ALT/F4 are showing "normal" Content, but ALT/F5 is always empty. I checked logs, But I'm not shure where to take a look. I see run-1 up to run-30, the WU is running 4 hours and 20 minutes Run-1 has the latest / newest Date-Signatur: http://localhost:55538/logs/run-1/glide_GJSlZn/ startDLOG: 04/01/16 11:46:22 (pid:10314) ****************************************************** 04/01/16 11:46:22 (pid:10314) ** condor_startd (CONDOR_STARTD) STARTING UP 04/01/16 11:46:22 (pid:10314) ** /home/boinc/CMSRun/glide_GJSlZn/main/condor/sbin/condor_startd 04/01/16 11:46:22 (pid:10314) ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1) 04/01/16 11:46:22 (pid:10314) ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON 04/01/16 11:46:22 (pid:10314) ** $CondorVersion: 8.2.3 Sep 30 2014 BuildID: 274619 $ 04/01/16 11:46:22 (pid:10314) ** $CondorPlatform: x86_64_RedHat5 $ 04/01/16 11:46:22 (pid:10314) ** PID = 10314 04/01/16 11:46:22 (pid:10314) ** Log last touched time unavailable (No such file or directory) 04/01/16 11:46:22 (pid:10314) ****************************************************** 04/01/16 11:46:22 (pid:10314) Using config source: /home/boinc/CMSRun/glide_GJSlZn/condor_config 04/01/16 11:46:22 (pid:10314) config Macros = 213, Sorted = 213, StringBytes = 10679, TablesBytes = 7708 04/01/16 11:46:22 (pid:10314) CLASSAD_CACHING is ENABLED 04/01/16 11:46:22 (pid:10314) Daemon Log is logging: D_ALWAYS D_ERROR D_JOB 04/01/16 11:46:22 (pid:10314) DaemonCore: command socket at <10.0.2.15:34546?noUDP> 04/01/16 11:46:22 (pid:10314) DaemonCore: private command socket at <10.0.2.15:34546> 04/01/16 11:46:22 (pid:10314) CCBListener: registered with CCB server lcggwms02.gridpp.rl.ac.uk:9621 as ccbid 130.246.180.120:9621#152909 04/01/16 11:46:22 (pid:10314) my_popenv failed 04/01/16 11:46:22 (pid:10314) Failed to run hibernation plugin '/home/boinc/CMSRun/glide_GJSlZn/main/condor/libexec/condor_power_state ad' 04/01/16 11:46:22 (pid:10314) VM-gahp server reported an internal error 04/01/16 11:46:22 (pid:10314) VM universe will be tested to check if it is available 04/01/16 11:46:22 (pid:10314) History file rotation is enabled. 04/01/16 11:46:22 (pid:10314) Maximum history file size is: 20971520 bytes 04/01/16 11:46:22 (pid:10314) Number of rotated history files is: 2 04/01/16 11:46:22 (pid:10314) Allocating auto shares for slot type 1: Cpus: 1.000000, Memory: auto, Swap: auto, Disk: auto slot type 1: Cpus: 1.000000, Memory: 2002, Swap: 100.00%, Disk: 100.00% 04/01/16 11:46:22 (pid:10314) New machine resource of type 1 allocated 04/01/16 11:46:22 (pid:10314) Setting up slot pairings 04/01/16 11:46:22 (pid:10314) my_popenv failed 04/01/16 11:46:22 (pid:10314) Adding 'mips' to the Supplimental ClassAd list 04/01/16 11:46:22 (pid:10314) CronJobList: Adding job 'mips' 04/01/16 11:46:22 (pid:10314) Adding 'kflops' to the Supplimental ClassAd list 04/01/16 11:46:22 (pid:10314) CronJobList: Adding job 'kflops' 04/01/16 11:46:22 (pid:10314) CronJob: Initializing job 'mips' (/home/boinc/CMSRun/glide_GJSlZn/main/condor/libexec/condor_mips) 04/01/16 11:46:22 (pid:10314) CronJob: Initializing job 'kflops' (/home/boinc/CMSRun/glide_GJSlZn/main/condor/libexec/condor_kflops) 04/01/16 11:46:22 (pid:10314) State change: IS_OWNER is false 04/01/16 11:46:22 (pid:10314) Changing state: Owner -> Unclaimed 04/01/16 11:46:22 (pid:10314) State change: RunBenchmarks is TRUE 04/01/16 11:46:22 (pid:10314) Changing activity: Idle -> Benchmarking 04/01/16 11:46:22 (pid:10314) BenchMgr:StartBenchmarks() 04/01/16 11:46:40 (pid:10314) State change: benchmarks completed 04/01/16 11:46:40 (pid:10314) Changing activity: Benchmarking -> Idle 04/01/16 11:52:17 (pid:10314) No resources have been claimed for 30 seconds 04/01/16 11:52:17 (pid:10314) Shutting down Condor on this machine. 04/01/16 11:52:17 (pid:10314) Got SIGTERM. Performing graceful shutdown. 04/01/16 11:52:17 (pid:10314) shutdown graceful 04/01/16 11:52:17 (pid:10314) Cron: Killing all jobs 04/01/16 11:52:17 (pid:10314) Cron: Killing all jobs 04/01/16 11:52:17 (pid:10314) Killing job mips 04/01/16 11:52:17 (pid:10314) Killing job kflops 04/01/16 11:52:17 (pid:10314) Deleting cron job manager 04/01/16 11:52:17 (pid:10314) Cron: Killing all jobs 04/01/16 11:52:17 (pid:10314) Cron: Killing all jobs 04/01/16 11:52:17 (pid:10314) CronJobList: Deleting all jobs 04/01/16 11:52:17 (pid:10314) Cron: Killing all jobs 04/01/16 11:52:17 (pid:10314) CronJobList: Deleting all jobs 04/01/16 11:52:17 (pid:10314) Deleting benchmark job mgr 04/01/16 11:52:17 (pid:10314) Cron: Killing all jobs 04/01/16 11:52:17 (pid:10314) Killing job mips 04/01/16 11:52:17 (pid:10314) Killing job kflops 04/01/16 11:52:17 (pid:10314) Cron: Killing all jobs 04/01/16 11:52:17 (pid:10314) Killing job mips 04/01/16 11:52:17 (pid:10314) Killing job kflops 04/01/16 11:52:17 (pid:10314) CronJobList: Deleting all jobs 04/01/16 11:52:17 (pid:10314) CronJobList: Deleting job 'mips' 04/01/16 11:52:17 (pid:10314) CronJob: Deleting job 'mips' (/home/boinc/CMSRun/glide_GJSlZn/main/condor/libexec/condor_mips), timer -1 04/01/16 11:52:17 (pid:10314) CronJobList: Deleting job 'kflops' 04/01/16 11:52:17 (pid:10314) CronJob: Deleting job 'kflops' (/home/boinc/CMSRun/glide_GJSlZn/main/condor/libexec/condor_kflops), timer -1 04/01/16 11:52:17 (pid:10314) Cron: Killing all jobs 04/01/16 11:52:17 (pid:10314) CronJobList: Deleting all jobs 04/01/16 11:52:17 (pid:10314) All resources are free, exiting. 04/01/16 11:52:17 (pid:10314) **** condor_startd (condor_STARTD) pid 10314 EXITING WITH STATUS 0 StarterLog: 04/01/16 11:46:22 (pid:10318) FILETRANSFER: "/home/boinc/CMSRun/glide_GJSlZn/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring 04/01/16 11:46:22 (pid:10318) FILETRANSFER: failed to add plugin "/home/boinc/CMSRun/glide_GJSlZn/main/condor/libexec/curl_plugin" because: FILETRANSFER:1:"/home/boinc/CMSRun/glide_GJSlZn/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring 04/01/16 11:46:22 (pid:10318) WARNING: Initializing plugins returned: FILETRANSFER:1:"/home/boinc/CMSRun/glide_GJSlZn/main/condor/libexec/curl_plugin -classad" did not produce any output, ignoring MasterLog 04/01/16 11:46:21 (pid:10311) ****************************************************** 04/01/16 11:46:21 (pid:10311) ** condor_master (CONDOR_MASTER) STARTING UP 04/01/16 11:46:21 (pid:10311) ** /home/boinc/CMSRun/glide_GJSlZn/main/condor/sbin/condor_master 04/01/16 11:46:21 (pid:10311) ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1) 04/01/16 11:46:21 (pid:10311) ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON 04/01/16 11:46:21 (pid:10311) ** $CondorVersion: 8.2.3 Sep 30 2014 BuildID: 274619 $ 04/01/16 11:46:21 (pid:10311) ** $CondorPlatform: x86_64_RedHat5 $ 04/01/16 11:46:21 (pid:10311) ** PID = 10311 04/01/16 11:46:21 (pid:10311) ** Log last touched time unavailable (No such file or directory) 04/01/16 11:46:21 (pid:10311) ****************************************************** 04/01/16 11:46:21 (pid:10311) Using config source: /home/boinc/CMSRun/glide_GJSlZn/condor_config 04/01/16 11:46:21 (pid:10311) config Macros = 212, Sorted = 212, StringBytes = 10635, TablesBytes = 7672 04/01/16 11:46:21 (pid:10311) CLASSAD_CACHING is OFF 04/01/16 11:46:21 (pid:10311) Daemon Log is logging: D_ALWAYS D_ERROR 04/01/16 11:46:21 (pid:10311) DaemonCore: command socket at <10.0.2.15:32779?noUDP> 04/01/16 11:46:21 (pid:10311) DaemonCore: private command socket at <10.0.2.15:32779> 04/01/16 11:46:22 (pid:10311) CCBListener: registered with CCB server lcggwms02.gridpp.rl.ac.uk:9621 as ccbid 130.246.180.120:9621#152908 04/01/16 11:46:22 (pid:10311) Master restart (GRACEFUL) is watching /home/boinc/CMSRun/glide_GJSlZn/main/condor/sbin/condor_master (mtime:1459503969) 04/01/16 11:46:22 (pid:10311) Started DaemonCore process "/home/boinc/CMSRun/glide_GJSlZn/main/condor/sbin/condor_startd", pid and pgroup = 10314 04/01/16 11:52:17 (pid:10311) Got SIGTERM. Performing graceful shutdown. 04/01/16 11:52:17 (pid:10311) condor_write(): Socket closed when trying to write 409 bytes to collector lcggwms02.gridpp.rl.ac.uk:9621, fd is 10 04/01/16 11:52:17 (pid:10311) Buf::write(): condor_write() failed 04/01/16 11:52:17 (pid:10311) Sent SIGTERM to STARTD (pid 10314) 04/01/16 11:52:17 (pid:10311) AllReaper unexpectedly called on pid 10314, status 0. 04/01/16 11:52:17 (pid:10311) The STARTD (pid 10314) exited with status 0 04/01/16 11:52:17 (pid:10311) All daemons are gone. Exiting. 04/01/16 11:52:17 (pid:10311) **** condor_master (condor_MASTER) pid 10311 EXITING WITH STATUS 0 |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
There has not been any work for cms-tasks in over a week. Tasks are supposed to shout down, when there is no work. The fact, that it was running for over 4h is concerning and needs to be looked at. Here: http://dashb-cms-job-task.cern.ch/dashboard/templates/task-analysis/#user=ivan+reid&refresh=0&table=Mains&p=1&records=25&activemenu=1&pattern=&task=&from=&till=&timerange=lastMonth You can see, if any work is available. |
©2024 CERN