Message boards : Number crunching : Current issues
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5

AuthorMessage
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,945,813
RAC: 2,949
Message 1953 - Posted: 10 Feb 2016, 21:10:35 UTC - in response to Message 1943.  

Thanks, Ivan, i saw it.

Is there any way to get the vLHC guys to put on CMS tasks?

They have been asked over an over again, no answer.

I think we just need to be patient, there may be operational reasons they can't do it at the moment. I know the WMAgent submissions were still not working last time they tried, and I think they are only going to vLHC, not here.
ID: 1953 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2002 - Posted: 14 Feb 2016, 14:00:11 UTC
Last modified: 14 Feb 2016, 14:10:03 UTC

There appears to be a problem.
Neither vLHC cms jobs nor CMs job are running.
I started a new task and it keeps running glidein again and again.
(run-1, run-2,etc

[list=]=== Condor starting Sun Feb 14 14:43:29 CET 2016 (1455457409) ===
=== Condor started in background, now waiting on process 7628 ===
=== Condor ended Sun Feb 14 14:49:35 CET 2016 (1455457775) after 366 ===
[/list]

Condor appears to be starting, but is exiting straight away.

As if there were no jobs available.

STARTD LOG:

02/14/16 14:58:22 (pid:16669) ******************************************************
02/14/16 14:58:22 (pid:16669) ** condor_startd (CONDOR_STARTD) STARTING UP
02/14/16 14:58:22 (pid:16669) ** /home/boinc/CMSRun/glide_E0i2vG/main/condor/sbin/condor_startd
02/14/16 14:58:22 (pid:16669) ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1)
02/14/16 14:58:22 (pid:16669) ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON
02/14/16 14:58:22 (pid:16669) ** $CondorVersion: 8.2.3 Sep 30 2014 BuildID: 274619 $
02/14/16 14:58:22 (pid:16669) ** $CondorPlatform: x86_64_RedHat5 $
02/14/16 14:58:22 (pid:16669) ** PID = 16669
02/14/16 14:58:22 (pid:16669) ** Log last touched time unavailable (No such file or directory)
02/14/16 14:58:22 (pid:16669) ******************************************************
02/14/16 14:58:22 (pid:16669) Using config source: /home/boinc/CMSRun/glide_E0i2vG/condor_config
02/14/16 14:58:22 (pid:16669) config Macros = 211, Sorted = 211, StringBytes = 10616, TablesBytes = 7636
02/14/16 14:58:22 (pid:16669) CLASSAD_CACHING is ENABLED
02/14/16 14:58:22 (pid:16669) Daemon Log is logging: D_ALWAYS D_ERROR D_JOB
02/14/16 14:58:22 (pid:16669) DaemonCore: command socket at <10.0.2.15:49109?noUDP>
02/14/16 14:58:22 (pid:16669) DaemonCore: private command socket at <10.0.2.15:49109>
02/14/16 14:58:23 (pid:16669) CCBListener: registered with CCB server lcggwms02.gridpp.rl.ac.uk:9619 as ccbid 130.246.180.120:9619#109149
02/14/16 14:58:23 (pid:16669) my_popenv failed
02/14/16 14:58:23 (pid:16669) Failed to run hibernation plugin '/home/boinc/CMSRun/glide_E0i2vG/main/condor/libexec/condor_power_state ad'
02/14/16 14:58:23 (pid:16669) VM-gahp server reported an internal error
02/14/16 14:58:23 (pid:16669) VM universe will be tested to check if it is available
02/14/16 14:58:23 (pid:16669) History file rotation is enabled.
02/14/16 14:58:23 (pid:16669) Maximum history file size is: 20971520 bytes
02/14/16 14:58:23 (pid:16669) Number of rotated history files is: 2
02/14/16 14:58:23 (pid:16669) Allocating auto shares for slot type 1: Cpus: 1.000000, Memory: auto, Swap: auto, Disk: auto
slot type 1: Cpus: 1.000000, Memory: 2048, Swap: 100.00%, Disk: 100.00%
02/14/16 14:58:23 (pid:16669) New machine resource of type 1 allocated
02/14/16 14:58:23 (pid:16669) Setting up slot pairings
02/14/16 14:58:23 (pid:16669) my_popenv failed
02/14/16 14:58:23 (pid:16669) Adding 'mips' to the Supplimental ClassAd list
02/14/16 14:58:23 (pid:16669) CronJobList: Adding job 'mips'
02/14/16 14:58:23 (pid:16669) Adding 'kflops' to the Supplimental ClassAd list
02/14/16 14:58:23 (pid:16669) CronJobList: Adding job 'kflops'
02/14/16 14:58:23 (pid:16669) CronJob: Initializing job 'mips' (/home/boinc/CMSRun/glide_E0i2vG/main/condor/libexec/condor_mips)
02/14/16 14:58:23 (pid:16669) CronJob: Initializing job 'kflops' (/home/boinc/CMSRun/glide_E0i2vG/main/condor/libexec/condor_kflops)
02/14/16 14:58:23 (pid:16669) State change: IS_OWNER is false
02/14/16 14:58:23 (pid:16669) Changing state: Owner -> Unclaimed
02/14/16 14:58:23 (pid:16669) State change: RunBenchmarks is TRUE
02/14/16 14:58:23 (pid:16669) Changing activity: Idle -> Benchmarking
02/14/16 14:58:23 (pid:16669) BenchMgr:StartBenchmarks()
02/14/16 14:58:49 (pid:16669) State change: benchmarks completed
02/14/16 14:58:49 (pid:16669) Changing activity: Benchmarking -> Idle
02/14/16 15:04:16 (pid:16669) No resources have been claimed for 30 seconds
02/14/16 15:04:16 (pid:16669) Shutting down Condor on this machine.

EDIT: is there a limit of how many users are connected?
The number is about 300 currently running jobs.
ID: 2002 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5

Message boards : Number crunching : Current issues


©2024 CERN