Message boards : CMS Application : Dashboard again
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 878,593 RAC: 36 ![]() ![]() |
I see in the current batch (ireid:crab:CMS:at:Home:MinBias:250ev10Kl) several finished jobs with SiteName 'unknown' I know that I did Job with ID 6217, so SiteName should be T3_CH_Volunteer and other info added to the detail information about the Job. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
If a job is "abandoned" it retains the details of the first use. It is then re-assigned to someone else, but the details (ip address for example ) are not updated. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 878,593 RAC: 36 ![]() ![]() |
The same with Job 6216 (me too) and Job 6243 (not me). The jobs were sent for the first time to me and Job info for several fields is unknown (incl IP). Also the start is "-" and although status is finished the wall time is 00:00:00 |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
Yes, you got the only try for 6216 and 6217 as far as my logs go. I don't know how many times the server connects to Dashboard with info, but obviously some get lost. The json file for 6217 is: {"task": null, "skippedFiles": [], "phedex_node": "T3_CH_Volunteer", "exitMsg": "OK", "executed_site": "T3_CH_Volunteer", "exitAcronym": "OK", "fallbackFiles": [], "postjob": {"exitMsg": "", "exitCode": 0}, "steps": {"cmsRun": {"status": 1, "errors": [], "logs": {}, "parameters": {}, "performance": {"multicore": {}, "storage": {"writeTotalMB": 78.738, "readPercentageOps": 1.0, "readAveragekB": 543.59905635648749, "readTotalMB": 2430.27, "readNumOps": 4578.0, "readCachePercentageOps": 0.0, "readMBSec": 0.62409670112941251, "readMaxMSec": 102.10299999999999, "readTotalSecs": 0, "writeTotalSecs": 196339.0}, "cpu": {"TotalJobCPU": "6013.63", "AvgEventCPU": "23.5348", "MaxEventCPU": "112.745", "AvgEventTime": "24.7244", "MinEventCPU": "0.006", "TotalEventCPU": "5883.69", "TotalJobTime": "6181.1", "MinEventTime": "0.00633287", "MaxEventTime": "115.273"}, "memory": {"PeakValueRss": "831.477", "PeakValueVsize": "1187.82"}}, "stop": null, "site": {}, "analysis": {}, "start": null, "cleanup": {}, "input": {}, "output": {"FEVTDEBUGoutput": [{"runs": {"1": [18649, 18650, 18651]}, "guid": "A2A45DF4-4D13-E611-B286-080027BFF040", "ouput_module_class": "PoolOutputModule", "direct_stageout": true, "branch_hash": "8e72aba58af78268e2527cecd474b04d", "pset_hash": "a06ed74de6d8882ee0dc49bc90fa2715", "lfn": "", "pfn": "step1.root", "catalog": "", "module_label": "FEVTDEBUGoutput", "checksums": {"adler32": "2c65fb17", "cksum": "2230350667"}, "storage_site": "T3_CH_Volunteer", "events": 250, "size": 82562652}], "analysis": []}}}, "jobExitCode": 0, "exitCode": 0} Looking for Dashboard info in the log file: Local time : Fri May 6 07:45:05 2016 Dashboard early startup params: {'MonitorID': '160501_095735:ireid_crab_CMS_at_Home_MinBias_250ev10Kl', 'MonitorJobID': '6217_https://glidein.cern.ch/6217/160501:095735:ireid:crab:CMS:at:Home:MinBias:250ev10Kl_0', 'SyncCE': 'boinc.cern.ch', 'OverflowFlag': 0, 'SyncSite': 'T3_CH_Volunteer', 'SyncGridJobId': 'https://glidein.cern.ch/6217/160501:095735:ireid:crab:CMS:at:Home:MinBias:250ev10Kl', 'WNHostName': 'localhost.localdomain'} ==== WMCore filesystem preparation FINISHING at Fri May 6 05:45:07 2016 ==== Dashboard startup parameters: {'MonitorID': '160501_095735:ireid_crab_CMS_at_Home_MinBias_250ev10Kl', 'MonitorJobID': '6217_https://glidein.cern.ch/6217/160501:095735:ireid:crab:CMS:at:Home:MinBias:250ev10Kl_0', 'WNHostName': 'localhost.localdomain', 'ExeStart': 'cmsRun'} Dashboard end parameters: {'MonitorID': '160501_095735:ireid_crab_CMS_at_Home_MinBias_250ev10Kl', 'CRABUserReadMB': 2430.27, 'MonitorJobID': '6217_https://glidein.cern.ch/6217/160501:095735:ireid:crab:CMS:at:Home:MinBias:250ev10Kl_0', 'CrabCpuPercentage': 0.97292185730464331, 'CRABUserWriteMB': 78.738, 'CrabUserCpuTime': 6013.6300000000001, 'NEventsProcessed': 0, 'ExeTime': 6181, 'JobExitCode': 0, 'CRABUserPeakRss': 831.47699999999998, 'ExeExitCode': 0} Not sending data to popularity service because no input sources found. Dashboard popularity report: {'Basename': '', 'inputFiles': '', 'BasenameParent': '', 'inputBlocks': 'MCFakeBlock', 'parentFiles': ''} ==== Report file creation FINISHING at Fri May 6 07:30:06 2016 ==== ![]() |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 878,593 RAC: 36 ![]() ![]() |
I can't find in Dashboard a job from Federica Fanzago I just got: wmagent_fanzago_TESTFF_RALCMSHOME_181016-T3_CH_VolunteerBackfill_161020_191735_8115 No output to running.log, but cmsRun is using a full core for 20 minutes now. |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
I can't find in Dashboard a job from Federica Fanzago I just got: wmagent_fanzago_TESTFF_RALCMSHOME_181016-T3_CH_VolunteerBackfill_161020_191735_8115 Neither can Federica! She has just managed to get WMAgent submission running, but there are probably still bugs -- I see strange messages in StartLog suggesting that there are attempts to access her submission machine rather than our Condor server. I've a query out to the experts. These jobs don't appear to register successfully with Dashboard, tho' we used to see Hassen's jobs in Interactive View; I didn't see Federica's when I looked this morning. IIRC, I discussed this with Dashboard people, they knew about it, but it wasn't high up their TODO list. Output files are appearing on the DataBridge, as are log files so some jobs must be running successfully. Note these jobs don't show in the jobs-in-progress chart either, which is why that is taking a dip. ![]() |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 878,593 RAC: 36 ![]() ![]() |
Thanks Ivan for looking into it. It's nice to know that also other scientists try to use capacity of the Tier3 volunteers ;) An easier (working) way to find the batches we (BOINC) are involved would be helpful. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 878,593 RAC: 36 ![]() ![]() |
Faster than light at CERN. Started before submitting time. That's fast: Jobid: 312938f2-c68f-11e6-bebc-02163e018309-132_0 Site: T3_CH_Volunteer Grid Status: SUCCEEDED Submitted: 2016-12-20T09:02:27 Started: 1970-01-01T00:00:00 Finished: 2016-12-20T13:39:13 Task: wmagent_ireid_MonteCarlo_eff_IDR_CMS_Home_161219_233044_8212 |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
Yes, we get some of those. 1970-01-01T00:00:00 is, of course, the epoch base in Linux (and many other operating systems); times are measured in seconds from then. Ergo, that's what an unfilled timestamp of 0.0 gets reported as. ![]() |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 878,593 RAC: 36 ![]() ![]() |
Yes, we get some of those. 1970-01-01T00:00:00 is, of course, the epoch base in Linux (and many other operating systems); times are measured in seconds from then. Ergo, that's what an unfilled timestamp of 0.0 gets reported as. Yeah, I knew. A lot of credits to Linus Torvalds, but it is the epoch UNIX time stamp ;) |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
Oops. YKWIM! |
©2025 CERN