Message boards : CMS Application : New Refactored Version (47.01)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,944,443
RAC: 3,276
Message 2915 - Posted: 21 Apr 2016, 21:00:52 UTC - in response to Message 2913.  

Are you sure that was in the CMS app? "theory-pilot" seems an unlikely name... But maybe Laurence is re-using existing scripts.
Looking at it on a CERN machine, it's a 65-line shell script, the last lines of which are
pid_file="/var/run/condor/condor_master.pid"

/usr/sbin/condor_master -f -pidfile ${pid_file} &
sleep 5
ln /var/log/condor/MasterLog /var/www/html/logs/MasterLog
ln /var/log/condor/StartLog /var/www/html/logs/StartLog
ln /var/log/condor/StarterLog /var/www/html/logs/StarterLog

if [ -e ${pid_file} ]; then
wait $(cat ${pid_file})
fi

return_value=$?

if [ ${return_value} -eq 99 ]; then
message="Normal DAEMON_SHUTDOWN encountered" 1>&2
else
message="Condor exited with ${return_value}"
fi

shut_down 0 "${message}"


So it looks like it's trying to get and report the end status of a Condor process. Something must have been garbled, resulting in a malformed string.
ID: 2915 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 2916 - Posted: 21 Apr 2016, 21:14:16 UTC - in response to Message 2915.  
Last modified: 21 Apr 2016, 21:23:10 UTC

Are you sure that was in the CMS app? "theory-pilot" seems an unlikely name... But maybe Laurence is re-using existing scripts.

Last CMS-job 5395. (A 2nd VM has the seems idling end)

The only logs available MasterLog, stderr.log and stdout.log.
Earlier today there were more Logs present. Ghost Purge?
I'll kill those 2 VM's now. Last lines from stdout.log:

INFO Event triggered: BOTH http_plugin TRANSFER:EXIT file:///var/lib/condor/execute/dir_7429/step1.root => davs://data-bridge-test.cern.ch/myfed/cms-boinc/output/user/ireid/CMS_at_Home/CRAB3_MinBias/160419_185901/0005/step1_5395.root
INFO Event triggered: DESTINATION http_plugin CHECKSUM:ENTER
INFO Event triggered: DESTINATION http_plugin CHECKSUM:EXIT
DEBUG <- Gfal::Transfer::FileCopy
DEBUG gridftp session cache garbage collection ...
2016-04-21T21:25:50 : Command exited with status: 0

<----- Stageout implementation log finish
Setting storage_site = 'T3_CH_Volunteer', direct_stageout = True for file step1.root in job report.
<----- Thu Apr 21 19:25:50 2016: Finished remote stageout of step1.root (status 0).
====== Thu Apr 21 19:25:50 2016: Finished remote stageout of user output files (status 0).
Will not inject transfer requests to ASO for the user output files, because they were staged out directly to the permanent storage.
====== Thu Apr 21 19:25:50 2016: cmscp.py FINISHING (status 0).
======== Stageout at Thu Apr 21 19:25:50 GMT 2016 FINISHING (short status 0) ========
======== gWMS-CMSRunAnalysis.sh FINISHING at Thu Apr 21 19:25:50 GMT 2016 on 38-37-22634 with (short) status 0 ========
Local time: Thu Apr 21 21:25:50 CEST 2016
Short exit status: 0
Cannot find condor_chirp to set up the job exit code (ignoring, that's for monitoring purposes)
Job Running time in seconds: 7583
ID: 2916 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2917 - Posted: 21 Apr 2016, 21:17:22 UTC

It says: cms application starting.

I checked-theory tasks say Theory application starting.

So, it is a CMS application.
ID: 2917 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 107
Message 2919 - Posted: 21 Apr 2016, 21:40:46 UTC - in response to Message 2915.  
Last modified: 21 Apr 2016, 21:41:16 UTC

Sorry for the confusion. Yes it was first written for Theory but then we also used it for CMS. Have just renamed it the 'instant-glidein'.

The issue is that shut_down is a function which is defined above and was copied from somewhere else. The script starts /bin/sh rather than /bin/bash and the way functions are defined differs slightly.
ID: 2919 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2925 - Posted: 22 Apr 2016, 7:59:45 UTC
Last modified: 22 Apr 2016, 8:04:49 UTC

All starting tasks are failing after about 100 sec.

2016-04-22 09:52:34 (2380): Guest Log: [ERROR] Condor exited with 0
ID: 2925 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keiken

Send message
Joined: 2 Jul 15
Posts: 15
Credit: 140,962
RAC: 0
Message 2934 - Posted: 22 Apr 2016, 12:55:12 UTC - in response to Message 2925.  

All starting tasks are failing after about 100 sec.

2016-04-22 09:52:34 (2380): Guest Log: [ERROR] Condor exited with 0


Same error here
ID: 2934 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 107
Message 2935 - Posted: 22 Apr 2016, 13:10:43 UTC - in response to Message 2934.  

Am investigating ...
ID: 2935 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2936 - Posted: 22 Apr 2016, 13:26:03 UTC - in response to Message 2935.  

Same with Theory tasks.
ID: 2936 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 107
Message 2937 - Posted: 22 Apr 2016, 13:51:46 UTC - in response to Message 2935.  

The fix has been pushed and will be available in soon.
ID: 2937 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Phil

Send message
Joined: 9 Apr 15
Posts: 57
Credit: 230,221
RAC: 0
Message 2938 - Posted: 22 Apr 2016, 13:54:51 UTC - in response to Message 2935.  

Am investigating ...

It is a surprise to wake up and discover that 100+ jobs have gone by...
ID: 2938 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keiken

Send message
Joined: 2 Jul 15
Posts: 15
Credit: 140,962
RAC: 0
Message 2939 - Posted: 22 Apr 2016, 14:03:46 UTC

error seems to be fixed
ID: 2939 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 90
Message 2949 - Posted: 22 Apr 2016, 17:48:43 UTC

Running OK here, too. Time to start event 1; 11 mins.

But no F5 output.
ID: 2949 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,944,443
RAC: 3,276
Message 2964 - Posted: 22 Apr 2016, 22:45:54 UTC - in response to Message 2949.  

Running OK here, too. Time to start event 1; 11 mins.

But no F5 output.

Yes, I noticed that too. I'm about to call it a day, I seem to be having trouble with Condor variables that are supposed to be case-insensitive, but somehow aren't... Laurence is an hour ahead of me. You'll probably see both of us active again sometime tomorrow.
ID: 2964 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,944,443
RAC: 3,276
Message 2973 - Posted: 23 Apr 2016, 17:13:50 UTC

Figured out the magic Condor incantation to get the -beta project rolling again. -dev here till has a lot of jobs to get through so I probably won't have to bother about harmonising the two until next working week.
ID: 2973 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : CMS Application : New Refactored Version (47.01)


©2024 CERN