Message boards : CMS Application : Dashboard again
Message board moderation

To post messages, you must log in.

AuthorMessage
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,202
RAC: 2,083
Message 3282 - Posted: 6 May 2016, 8:33:00 UTC

I see in the current batch (ireid:crab:CMS:at:Home:MinBias:250ev10Kl) several finished jobs with SiteName 'unknown'
I know that I did Job with ID 6217, so SiteName should be T3_CH_Volunteer and other info added to the detail information about the Job.
ID: 3282 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3283 - Posted: 6 May 2016, 8:36:23 UTC - in response to Message 3282.  
Last modified: 6 May 2016, 8:36:44 UTC

If a job is "abandoned" it retains the details of the first use.
It is then re-assigned to someone else, but the details (ip address for example ) are not updated.
ID: 3283 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,202
RAC: 2,083
Message 3284 - Posted: 6 May 2016, 8:52:27 UTC

The same with Job 6216 (me too) and Job 6243 (not me).

The jobs were sent for the first time to me and Job info for several fields is unknown (incl IP). Also the start is "-" and although status is finished the wall time is 00:00:00
ID: 3284 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1128
Credit: 7,870,419
RAC: 595
Message 3285 - Posted: 6 May 2016, 14:35:55 UTC - in response to Message 3284.  

Yes, you got the only try for 6216 and 6217 as far as my logs go. I don't know how many times the server connects to Dashboard with info, but obviously some get lost.

The json file for 6217 is:
{"task": null, "skippedFiles": [], "phedex_node": "T3_CH_Volunteer", "exitMsg": "OK", "executed_site": "T3_CH_Volunteer",
"exitAcronym": "OK", "fallbackFiles": [], "postjob": {"exitMsg": "", "exitCode": 0},
"steps": {"cmsRun": {"status": 1, "errors": [], "logs": {}, "parameters": {}, "performance": {"multicore": {}, "storage": {"writeTotalMB": 78.738, "readPercentageOps": 1.0, "readAveragekB": 543.59905635648749, "readTotalMB": 2430.27, "readNumOps": 4578.0, "readCachePercentageOps": 0.0, "readMBSec": 0.62409670112941251, "readMaxMSec": 102.10299999999999, "readTotalSecs": 0, "writeTotalSecs": 196339.0},
"cpu": {"TotalJobCPU": "6013.63", "AvgEventCPU": "23.5348", "MaxEventCPU": "112.745", "AvgEventTime": "24.7244", "MinEventCPU": "0.006", "TotalEventCPU": "5883.69", "TotalJobTime": "6181.1", "MinEventTime": "0.00633287", "MaxEventTime": "115.273"},
"memory": {"PeakValueRss": "831.477", "PeakValueVsize": "1187.82"}}, "stop": null, "site": {}, "analysis": {}, "start": null, "cleanup": {}, "input": {},
"output": {"FEVTDEBUGoutput": [{"runs": {"1": [18649, 18650, 18651]}, "guid": "A2A45DF4-4D13-E611-B286-080027BFF040", "ouput_module_class": "PoolOutputModule", "direct_stageout": true, "branch_hash": "8e72aba58af78268e2527cecd474b04d", "pset_hash": "a06ed74de6d8882ee0dc49bc90fa2715", "lfn": "", "pfn": "step1.root", "catalog": "", "module_label": "FEVTDEBUGoutput",
"checksums": {"adler32": "2c65fb17", "cksum": "2230350667"}, "storage_site": "T3_CH_Volunteer", "events": 250, "size": 82562652}], "analysis": []}}}, "jobExitCode": 0, "exitCode": 0}


Looking for Dashboard info in the log file:
Local time : Fri May 6 07:45:05 2016
Dashboard early startup params: {'MonitorID': '160501_095735:ireid_crab_CMS_at_Home_MinBias_250ev10Kl', 'MonitorJobID': '6217_https://glidein.cern.ch/6217/160501:095735:ireid:crab:CMS:at:Home:MinBias:250ev10Kl_0', 'SyncCE': 'boinc.cern.ch', 'OverflowFlag': 0, 'SyncSite': 'T3_CH_Volunteer', 'SyncGridJobId': 'https://glidein.cern.ch/6217/160501:095735:ireid:crab:CMS:at:Home:MinBias:250ev10Kl', 'WNHostName': 'localhost.localdomain'}

==== WMCore filesystem preparation FINISHING at Fri May 6 05:45:07 2016 ====
Dashboard startup parameters: {'MonitorID': '160501_095735:ireid_crab_CMS_at_Home_MinBias_250ev10Kl', 'MonitorJobID': '6217_https://glidein.cern.ch/6217/160501:095735:ireid:crab:CMS:at:Home:MinBias:250ev10Kl_0', 'WNHostName': 'localhost.localdomain', 'ExeStart': 'cmsRun'}

Dashboard end parameters: {'MonitorID': '160501_095735:ireid_crab_CMS_at_Home_MinBias_250ev10Kl', 'CRABUserReadMB': 2430.27, 'MonitorJobID': '6217_https://glidein.cern.ch/6217/160501:095735:ireid:crab:CMS:at:Home:MinBias:250ev10Kl_0', 'CrabCpuPercentage': 0.97292185730464331, 'CRABUserWriteMB': 78.738, 'CrabUserCpuTime': 6013.6300000000001, 'NEventsProcessed': 0, 'ExeTime': 6181, 'JobExitCode': 0, 'CRABUserPeakRss': 831.47699999999998, 'ExeExitCode': 0}
Not sending data to popularity service because no input sources found.
Dashboard popularity report: {'Basename': '', 'inputFiles': '', 'BasenameParent': '', 'inputBlocks': 'MCFakeBlock', 'parentFiles': ''}
==== Report file creation FINISHING at Fri May 6 07:30:06 2016 ====

ID: 3285 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,202
RAC: 2,083
Message 4212 - Posted: 21 Oct 2016, 11:12:44 UTC

I can't find in Dashboard a job from Federica Fanzago I just got: wmagent_fanzago_TESTFF_RALCMSHOME_181016-T3_CH_VolunteerBackfill_161020_191735_8115

No output to running.log, but cmsRun is using a full core for 20 minutes now.
ID: 4212 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1128
Credit: 7,870,419
RAC: 595
Message 4213 - Posted: 21 Oct 2016, 11:32:49 UTC - in response to Message 4212.  
Last modified: 21 Oct 2016, 11:34:27 UTC

I can't find in Dashboard a job from Federica Fanzago I just got: wmagent_fanzago_TESTFF_RALCMSHOME_181016-T3_CH_VolunteerBackfill_161020_191735_8115

No output to running.log, but cmsRun is using a full core for 20 minutes now.

Neither can Federica! She has just managed to get WMAgent submission running, but there are probably still bugs -- I see strange messages in StartLog suggesting that there are attempts to access her submission machine rather than our Condor server. I've a query out to the experts.
These jobs don't appear to register successfully with Dashboard, tho' we used to see Hassen's jobs in Interactive View; I didn't see Federica's when I looked this morning. IIRC, I discussed this with Dashboard people, they knew about it, but it wasn't high up their TODO list.
Output files are appearing on the DataBridge, as are log files so some jobs must be running successfully. Note these jobs don't show in the jobs-in-progress chart either, which is why that is taking a dip.
ID: 4213 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,202
RAC: 2,083
Message 4214 - Posted: 21 Oct 2016, 12:05:54 UTC

Thanks Ivan for looking into it.

It's nice to know that also other scientists try to use capacity of the Tier3 volunteers ;)

An easier (working) way to find the batches we (BOINC) are involved would be helpful.
ID: 4214 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,202
RAC: 2,083
Message 4553 - Posted: 20 Dec 2016, 15:26:27 UTC

Faster than light at CERN. Started before submitting time. That's fast:

Jobid:         312938f2-c68f-11e6-bebc-02163e018309-132_0
Site:	       T3_CH_Volunteer
Grid Status:   SUCCEEDED
Submitted:     2016-12-20T09:02:27
Started:       1970-01-01T00:00:00
Finished:      2016-12-20T13:39:13
Task:          wmagent_ireid_MonteCarlo_eff_IDR_CMS_Home_161219_233044_8212
ID: 4553 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1128
Credit: 7,870,419
RAC: 595
Message 4554 - Posted: 20 Dec 2016, 16:05:19 UTC - in response to Message 4553.  

Yes, we get some of those. 1970-01-01T00:00:00 is, of course, the epoch base in Linux (and many other operating systems); times are measured in seconds from then. Ergo, that's what an unfilled timestamp of 0.0 gets reported as.
ID: 4554 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,202
RAC: 2,083
Message 4557 - Posted: 20 Dec 2016, 16:16:19 UTC - in response to Message 4554.  

Yes, we get some of those. 1970-01-01T00:00:00 is, of course, the epoch base in Linux (and many other operating systems); times are measured in seconds from then. Ergo, that's what an unfilled timestamp of 0.0 gets reported as.

Yeah, I knew.

A lot of credits to Linus Torvalds, but it is the epoch UNIX time stamp ;)
ID: 4557 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1128
Credit: 7,870,419
RAC: 595
Message 4560 - Posted: 20 Dec 2016, 16:24:26 UTC - in response to Message 4557.  

Oops. YKWIM!
ID: 4560 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : CMS Application : Dashboard again


©2024 CERN