Message boards : Number crunching : Job information in Task report
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,946,836
RAC: 2,933
Message 1970 - Posted: 12 Feb 2016, 16:31:25 UTC

From now on you should see some job information, including exit status, in the task reports on your account pages. E.g in http://lhcathome2.cern.ch/vLHCathome/result.php?resultid=5428296 there is:

2016-02-12 14:07:07 (15624): Guest Log: ======== gWMS-CMSRunAnalysis.sh STARTING at Fri Feb 12 07:12:20 GMT 2016 on 32157-79553-26164 ========
2016-02-12 14:07:07 (15624): Guest Log: Local time : Fri Feb 12 07:12:20 GMT 2016
2016-02-12 14:07:07 (15624): Guest Log: Current system : Linux 32157-79553-26164 3.10.64-85.cernvm.x86_64 #1 SMP Fri Jan 9 09:53:29 CET 2015 x86_64 x86_64 x86_64 GNU/Linux
2016-02-12 14:07:07 (15624): Guest Log: .....
2016-02-12 14:07:07 (15624): Guest Log: ====== Fri Feb 12 08:29:34 2016: Finished remote stageout of user output files (status 0).
2016-02-12 14:07:07 (15624): Guest Log: Will not inject transfer requests to ASO for the user output files, because they were staged out directly to the permanent storage.
2016-02-12 14:07:07 (15624): Guest Log: ====== Fri Feb 12 08:29:34 2016: cmscp.py FINISHING (status 0).
2016-02-12 14:07:07 (15624): Guest Log: ======== Stageout at Fri Feb 12 08:29:38 GMT 2016 FINISHING (short status 0) ========
2016-02-12 14:07:07 (15624): Guest Log: ======== gWMS-CMSRunAnalysis.sh FINISHING at Fri Feb 12 08:29:38 GMT 2016 on 32157-79553-26164 with (short) status 0 ========
2016-02-12 14:07:07 (15624): Guest Log: Local time: Fri Feb 12 08:29:38 GMT 2016
2016-02-12 14:07:07 (15624): Guest Log: Short exit status: 0
2016-02-12 14:07:07 (15624): Guest Log: Job Running time in seconds: 4638


i.e. the first three and last eight lines from the _condor_stdout file. We hope this will go some way towards providing information you have been asking for.
ID: 1970 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,946,836
RAC: 2,933
Message 1991 - Posted: 13 Feb 2016, 16:21:09 UTC - in response to Message 1970.  

Feedback?
ID: 1991 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1992 - Posted: 13 Feb 2016, 17:09:03 UTC
Last modified: 13 Feb 2016, 17:22:33 UTC

Hi Ivan,
Generally it is very good. All information a volunteer might want.

However, i would remove the following lines per job:

    ======== gWMS-CMSRunAnalysis.sh STARTING at Sat Feb 13 01:04:13 GMT 2016 on 277-617-13516 ========

    Current system : Linux 277-617-13516 3.10.64-85.cernvm.x86_64 #1 SMP Fri Jan 9 09:53:29 CET 2015 x86_64 x86_64 x86_64 GNU/Linux



I also think, mentioning the exit status of a job once is enough.

EDIT: We also should agree on time system.
Having local and GMT(UTC) mixed all over is not good.

I have to wait for an error in a job, to see, what that looks like.

ID: 1992 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,946,836
RAC: 2,933
Message 1998 - Posted: 13 Feb 2016, 22:48:59 UTC - in response to Message 1992.  

Thanks for the comment. We're trying to keep it simple, hence just using "head -3" and "tail -8". Now we have it working refinements are, as Laurence said to me, just a SMOP. :-)
ID: 1998 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,116
Message 2006 - Posted: 15 Feb 2016, 10:23:10 UTC

Hello Ivan,

Towards the end of a task (longer running than 24 hours) the info is not always consistent.
I saw on the Console that the run was ended and also the INFO "Time exceeded. Shutting down!", but that info and the extracted lines from the jobs of the last run was not in the stderr.

My last result:

2016-02-14 20:32:37 (5716): Status Report: Elapsed Time: '96666.741228'
2016-02-14 20:32:37 (5716): Status Report: CPU Time: '87505.734531'
2016-02-14 21:55:38 (5716): VM Completion File Detected.
2016-02-14 21:55:38 (5716): Powering off VM.


and a correct result from another cruncher:

2016-02-14 09:08:01 (70714): Guest Log: [INFO] CMS glidein Run 8 ended
2016-02-14 09:08:01 (70714): Guest Log: Log extracts for Run 8 jobs
.
. Job extracts
.
2016-02-14 09:09:02 (70714): Guest Log: [INFO] Time exceeded. Shutting down!
2016-02-14 09:09:02 (70714): VM Completion File Detected.
2016-02-14 09:09:02 (70714): Powering off VM.


In my result also the line "Guest Log: [INFO] CMS glidein Run XX ended" was missing, probably therefore also missing the other info. XX should be 17.
ID: 2006 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2007 - Posted: 15 Feb 2016, 11:22:07 UTC

It would also be good to include the job number, as one could look it up in dashboard.
ID: 2007 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2019 - Posted: 16 Feb 2016, 18:40:09 UTC
Last modified: 16 Feb 2016, 18:44:20 UTC

I have noticed, that the info for last run is not listed .
EDITSo, everything past the 24h mark is missing.
ID: 2019 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 87
Message 2021 - Posted: 16 Feb 2016, 19:18:48 UTC - in response to Message 2019.  

Something doesn't make sense. In your task I see:

2016-02-15 10:44:09 (4340): Guest Log: [INFO] Starting CMS Application - Run 4

But then I see.

2016-02-15 13:41:04 (4340): Guest Log: [INFO] CMS glidein Run 1 ended

After that message it should immediately print. Log extracts for Run 1 jobs but it doesn't. Why is your version
Anonymous platform (CPU)
?


http://boincai05.cern.ch/CMS-dev/result.php?resultid=114497
ID: 2021 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2023 - Posted: 16 Feb 2016, 19:38:23 UTC - in response to Message 2021.  

Because i have an app_info.xml to test 2 core operation.
This only works a little, as it only reduces the linux overhead a bit.
In this case it is set to single core.
ID: 2023 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,946,836
RAC: 2,933
Message 2024 - Posted: 16 Feb 2016, 19:56:20 UTC - in response to Message 2007.  

It would also be good to include the job number, as one could look it up in dashboard.

We could do that if we included the next line from the HEAD of _condor_stdout -- it's quite a long line though.
ID: 2024 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 87
Message 2026 - Posted: 16 Feb 2016, 20:18:37 UTC - in response to Message 2024.  

or this?
grep ^jobNumber ./run-4/glide_iCLO8R/dir_11492/_condor_stdout
jobNumber: 5519
ID: 2026 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2027 - Posted: 16 Feb 2016, 20:19:15 UTC

or:
Output files: step1.root=step1_9573.root
ID: 2027 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 87
Message 2028 - Posted: 16 Feb 2016, 20:21:46 UTC - in response to Message 2026.  
Last modified: 16 Feb 2016, 20:21:57 UTC

Update should be in CVMFS in a few hours :)
ID: 2028 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Job information in Task report


©2024 CERN