Message boards :
Number crunching :
Current issues
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5
Author | Message |
---|---|
Send message Joined: 20 Jan 15 Posts: 1138 Credit: 8,132,416 RAC: 876 |
Thanks, Ivan, i saw it. I think we just need to be patient, there may be operational reasons they can't do it at the moment. I know the WMAgent submissions were still not working last time they tried, and I think they are only going to vLHC, not here. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
There appears to be a problem. Neither vLHC cms jobs nor CMs job are running. I started a new task and it keeps running glidein again and again. (run-1, run-2,etc [list=]=== Condor starting Sun Feb 14 14:43:29 CET 2016 (1455457409) === === Condor started in background, now waiting on process 7628 === === Condor ended Sun Feb 14 14:49:35 CET 2016 (1455457775) after 366 === [/list] Condor appears to be starting, but is exiting straight away. As if there were no jobs available. STARTD LOG: 02/14/16 14:58:22 (pid:16669) ****************************************************** 02/14/16 14:58:22 (pid:16669) ** condor_startd (CONDOR_STARTD) STARTING UP 02/14/16 14:58:22 (pid:16669) ** /home/boinc/CMSRun/glide_E0i2vG/main/condor/sbin/condor_startd 02/14/16 14:58:22 (pid:16669) ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1) 02/14/16 14:58:22 (pid:16669) ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON 02/14/16 14:58:22 (pid:16669) ** $CondorVersion: 8.2.3 Sep 30 2014 BuildID: 274619 $ 02/14/16 14:58:22 (pid:16669) ** $CondorPlatform: x86_64_RedHat5 $ 02/14/16 14:58:22 (pid:16669) ** PID = 16669 02/14/16 14:58:22 (pid:16669) ** Log last touched time unavailable (No such file or directory) 02/14/16 14:58:22 (pid:16669) ****************************************************** 02/14/16 14:58:22 (pid:16669) Using config source: /home/boinc/CMSRun/glide_E0i2vG/condor_config 02/14/16 14:58:22 (pid:16669) config Macros = 211, Sorted = 211, StringBytes = 10616, TablesBytes = 7636 02/14/16 14:58:22 (pid:16669) CLASSAD_CACHING is ENABLED 02/14/16 14:58:22 (pid:16669) Daemon Log is logging: D_ALWAYS D_ERROR D_JOB 02/14/16 14:58:22 (pid:16669) DaemonCore: command socket at <10.0.2.15:49109?noUDP> 02/14/16 14:58:22 (pid:16669) DaemonCore: private command socket at <10.0.2.15:49109> 02/14/16 14:58:23 (pid:16669) CCBListener: registered with CCB server lcggwms02.gridpp.rl.ac.uk:9619 as ccbid 130.246.180.120:9619#109149 02/14/16 14:58:23 (pid:16669) my_popenv failed 02/14/16 14:58:23 (pid:16669) Failed to run hibernation plugin '/home/boinc/CMSRun/glide_E0i2vG/main/condor/libexec/condor_power_state ad' 02/14/16 14:58:23 (pid:16669) VM-gahp server reported an internal error 02/14/16 14:58:23 (pid:16669) VM universe will be tested to check if it is available 02/14/16 14:58:23 (pid:16669) History file rotation is enabled. 02/14/16 14:58:23 (pid:16669) Maximum history file size is: 20971520 bytes 02/14/16 14:58:23 (pid:16669) Number of rotated history files is: 2 02/14/16 14:58:23 (pid:16669) Allocating auto shares for slot type 1: Cpus: 1.000000, Memory: auto, Swap: auto, Disk: auto slot type 1: Cpus: 1.000000, Memory: 2048, Swap: 100.00%, Disk: 100.00% 02/14/16 14:58:23 (pid:16669) New machine resource of type 1 allocated 02/14/16 14:58:23 (pid:16669) Setting up slot pairings 02/14/16 14:58:23 (pid:16669) my_popenv failed 02/14/16 14:58:23 (pid:16669) Adding 'mips' to the Supplimental ClassAd list 02/14/16 14:58:23 (pid:16669) CronJobList: Adding job 'mips' 02/14/16 14:58:23 (pid:16669) Adding 'kflops' to the Supplimental ClassAd list 02/14/16 14:58:23 (pid:16669) CronJobList: Adding job 'kflops' 02/14/16 14:58:23 (pid:16669) CronJob: Initializing job 'mips' (/home/boinc/CMSRun/glide_E0i2vG/main/condor/libexec/condor_mips) 02/14/16 14:58:23 (pid:16669) CronJob: Initializing job 'kflops' (/home/boinc/CMSRun/glide_E0i2vG/main/condor/libexec/condor_kflops) 02/14/16 14:58:23 (pid:16669) State change: IS_OWNER is false 02/14/16 14:58:23 (pid:16669) Changing state: Owner -> Unclaimed 02/14/16 14:58:23 (pid:16669) State change: RunBenchmarks is TRUE 02/14/16 14:58:23 (pid:16669) Changing activity: Idle -> Benchmarking 02/14/16 14:58:23 (pid:16669) BenchMgr:StartBenchmarks() 02/14/16 14:58:49 (pid:16669) State change: benchmarks completed 02/14/16 14:58:49 (pid:16669) Changing activity: Benchmarking -> Idle 02/14/16 15:04:16 (pid:16669) No resources have been claimed for 30 seconds 02/14/16 15:04:16 (pid:16669) Shutting down Condor on this machine. EDIT: is there a limit of how many users are connected? The number is about 300 currently running jobs. |
©2024 CERN