Message boards : CMS Application : New Refactored Version (47.01)
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
Are you sure that was in the CMS app? "theory-pilot" seems an unlikely name... But maybe Laurence is re-using existing scripts. Looking at it on a CERN machine, it's a 65-line shell script, the last lines of which are pid_file="/var/run/condor/condor_master.pid" /usr/sbin/condor_master -f -pidfile ${pid_file} & sleep 5 ln /var/log/condor/MasterLog /var/www/html/logs/MasterLog ln /var/log/condor/StartLog /var/www/html/logs/StartLog ln /var/log/condor/StarterLog /var/www/html/logs/StarterLog if [ -e ${pid_file} ]; then wait $(cat ${pid_file}) fi return_value=$? if [ ${return_value} -eq 99 ]; then message="Normal DAEMON_SHUTDOWN encountered" 1>&2 else message="Condor exited with ${return_value}" fi shut_down 0 "${message}" So it looks like it's trying to get and report the end status of a Condor process. Something must have been garbled, resulting in a malformed string. ![]() |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 878,593 RAC: 27 ![]() ![]() |
Are you sure that was in the CMS app? "theory-pilot" seems an unlikely name... But maybe Laurence is re-using existing scripts. Last CMS-job 5395. (A 2nd VM has the seems idling end) The only logs available MasterLog, stderr.log and stdout.log. Earlier today there were more Logs present. Ghost Purge? I'll kill those 2 VM's now. Last lines from stdout.log: INFO Event triggered: BOTH http_plugin TRANSFER:EXIT file:///var/lib/condor/execute/dir_7429/step1.root => davs://data-bridge-test.cern.ch/myfed/cms-boinc/output/user/ireid/CMS_at_Home/CRAB3_MinBias/160419_185901/0005/step1_5395.root INFO Event triggered: DESTINATION http_plugin CHECKSUM:ENTER INFO Event triggered: DESTINATION http_plugin CHECKSUM:EXIT DEBUG <- Gfal::Transfer::FileCopy DEBUG gridftp session cache garbage collection ... 2016-04-21T21:25:50 : Command exited with status: 0 <----- Stageout implementation log finish Setting storage_site = 'T3_CH_Volunteer', direct_stageout = True for file step1.root in job report. <----- Thu Apr 21 19:25:50 2016: Finished remote stageout of step1.root (status 0). ====== Thu Apr 21 19:25:50 2016: Finished remote stageout of user output files (status 0). Will not inject transfer requests to ASO for the user output files, because they were staged out directly to the permanent storage. ====== Thu Apr 21 19:25:50 2016: cmscp.py FINISHING (status 0). ======== Stageout at Thu Apr 21 19:25:50 GMT 2016 FINISHING (short status 0) ======== ======== gWMS-CMSRunAnalysis.sh FINISHING at Thu Apr 21 19:25:50 GMT 2016 on 38-37-22634 with (short) status 0 ======== Local time: Thu Apr 21 21:25:50 CEST 2016 Short exit status: 0 Cannot find condor_chirp to set up the job exit code (ignoring, that's for monitoring purposes) Job Running time in seconds: 7583 |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
It says: cms application starting. I checked-theory tasks say Theory application starting. So, it is a CMS application. |
![]() ![]() Send message Joined: 12 Sep 14 Posts: 1072 Credit: 335,251 RAC: 30 ![]() |
Sorry for the confusion. Yes it was first written for Theory but then we also used it for CMS. Have just renamed it the 'instant-glidein'. The issue is that shut_down is a function which is defined above and was copied from somewhere else. The script starts /bin/sh rather than /bin/bash and the way functions are defined differs slightly. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
All starting tasks are failing after about 100 sec. 2016-04-22 09:52:34 (2380): Guest Log: [ERROR] Condor exited with 0 |
Send message Joined: 2 Jul 15 Posts: 15 Credit: 140,962 RAC: 0 ![]() |
All starting tasks are failing after about 100 sec. Same error here |
![]() ![]() Send message Joined: 12 Sep 14 Posts: 1072 Credit: 335,251 RAC: 30 ![]() |
Am investigating ... |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
Same with Theory tasks. |
![]() ![]() Send message Joined: 12 Sep 14 Posts: 1072 Credit: 335,251 RAC: 30 ![]() |
The fix has been pushed and will be available in soon. |
Send message Joined: 9 Apr 15 Posts: 57 Credit: 230,221 RAC: 0 ![]() |
Am investigating ... It is a surprise to wake up and discover that 100+ jobs have gone by... |
Send message Joined: 2 Jul 15 Posts: 15 Credit: 140,962 RAC: 0 ![]() |
error seems to be fixed |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 0 ![]() ![]() |
Running OK here, too. Time to start event 1; 11 mins. But no F5 output. |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
Running OK here, too. Time to start event 1; 11 mins. Yes, I noticed that too. I'm about to call it a day, I seem to be having trouble with Condor variables that are supposed to be case-insensitive, but somehow aren't... Laurence is an hour ahead of me. You'll probably see both of us active again sometime tomorrow. ![]() |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
Figured out the magic Condor incantation to get the -beta project rolling again. -dev here till has a lot of jobs to get through so I probably won't have to bother about harmonising the two until next working week. ![]() |
©2025 CERN