Thread 'Dashboard again'

Author	Message
Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1269 Credit: 1,028,011 RAC: 29	Message 3282 - Posted: 6 May 2016, 8:33:00 UTC I see in the current batch (ireid:crab:CMS:at:Home:MinBias:250ev10Kl) several finished jobs with SiteName 'unknown' I know that I did Job with ID 6217, so SiteName should be T3_CH_Volunteer and other info added to the detail information about the Job. ID: 3282 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 3283 - Posted: 6 May 2016, 8:36:23 UTC - in response to Message 3282. Last modified: 6 May 2016, 8:36:44 UTC If a job is "abandoned" it retains the details of the first use. It is then re-assigned to someone else, but the details (ip address for example ) are not updated. ID: 3283 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1269 Credit: 1,028,011 RAC: 29	Message 3284 - Posted: 6 May 2016, 8:52:27 UTC The same with Job 6216 (me too) and Job 6243 (not me). The jobs were sent for the first time to me and Job info for several fields is unknown (incl IP). Also the start is "-" and although status is finished the wall time is 00:00:00 ID: 3284 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1155 Credit: 8,388,869 RAC: 155	Message 3285 - Posted: 6 May 2016, 14:35:55 UTC - in response to Message 3284. Yes, you got the only try for 6216 and 6217 as far as my logs go. I don't know how many times the server connects to Dashboard with info, but obviously some get lost. The json file for 6217 is: {"task": null, "skippedFiles": [], "phedex_node": "T3_CH_Volunteer", "exitMsg": "OK", "executed_site": "T3_CH_Volunteer", "exitAcronym": "OK", "fallbackFiles": [], "postjob": {"exitMsg": "", "exitCode": 0}, "steps": {"cmsRun": {"status": 1, "errors": [], "logs": {}, "parameters": {}, "performance": {"multicore": {}, "storage": {"writeTotalMB": 78.738, "readPercentageOps": 1.0, "readAveragekB": 543.59905635648749, "readTotalMB": 2430.27, "readNumOps": 4578.0, "readCachePercentageOps": 0.0, "readMBSec": 0.62409670112941251, "readMaxMSec": 102.10299999999999, "readTotalSecs": 0, "writeTotalSecs": 196339.0}, "cpu": {"TotalJobCPU": "6013.63", "AvgEventCPU": "23.5348", "MaxEventCPU": "112.745", "AvgEventTime": "24.7244", "MinEventCPU": "0.006", "TotalEventCPU": "5883.69", "TotalJobTime": "6181.1", "MinEventTime": "0.00633287", "MaxEventTime": "115.273"}, "memory": {"PeakValueRss": "831.477", "PeakValueVsize": "1187.82"}}, "stop": null, "site": {}, "analysis": {}, "start": null, "cleanup": {}, "input": {}, "output": {"FEVTDEBUGoutput": [{"runs": {"1": [18649, 18650, 18651]}, "guid": "A2A45DF4-4D13-E611-B286-080027BFF040", "ouput_module_class": "PoolOutputModule", "direct_stageout": true, "branch_hash": "8e72aba58af78268e2527cecd474b04d", "pset_hash": "a06ed74de6d8882ee0dc49bc90fa2715", "lfn": "", "pfn": "step1.root", "catalog": "", "module_label": "FEVTDEBUGoutput", "checksums": {"adler32": "2c65fb17", "cksum": "2230350667"}, "storage_site": "T3_CH_Volunteer", "events": 250, "size": 82562652}], "analysis": []}}}, "jobExitCode": 0, "exitCode": 0} Looking for Dashboard info in the log file: Local time : Fri May 6 07:45:05 2016 Dashboard early startup params: {'MonitorID': '160501_095735:ireid_crab_CMS_at_Home_MinBias_250ev10Kl', 'MonitorJobID': '6217_https://glidein.cern.ch/6217/160501:095735:ireid:crab:CMS:at:Home:MinBias:250ev10Kl_0', 'SyncCE': 'boinc.cern.ch', 'OverflowFlag': 0, 'SyncSite': 'T3_CH_Volunteer', 'SyncGridJobId': 'https://glidein.cern.ch/6217/160501:095735:ireid:crab:CMS:at:Home:MinBias:250ev10Kl', 'WNHostName': 'localhost.localdomain'} ==== WMCore filesystem preparation FINISHING at Fri May 6 05:45:07 2016 ==== Dashboard startup parameters: {'MonitorID': '160501_095735:ireid_crab_CMS_at_Home_MinBias_250ev10Kl', 'MonitorJobID': '6217_https://glidein.cern.ch/6217/160501:095735:ireid:crab:CMS:at:Home:MinBias:250ev10Kl_0', 'WNHostName': 'localhost.localdomain', 'ExeStart': 'cmsRun'} Dashboard end parameters: {'MonitorID': '160501_095735:ireid_crab_CMS_at_Home_MinBias_250ev10Kl', 'CRABUserReadMB': 2430.27, 'MonitorJobID': '6217_https://glidein.cern.ch/6217/160501:095735:ireid:crab:CMS:at:Home:MinBias:250ev10Kl_0', 'CrabCpuPercentage': 0.97292185730464331, 'CRABUserWriteMB': 78.738, 'CrabUserCpuTime': 6013.6300000000001, 'NEventsProcessed': 0, 'ExeTime': 6181, 'JobExitCode': 0, 'CRABUserPeakRss': 831.47699999999998, 'ExeExitCode': 0} Not sending data to popularity service because no input sources found. Dashboard popularity report: {'Basename': '', 'inputFiles': '', 'BasenameParent': '', 'inputBlocks': 'MCFakeBlock', 'parentFiles': ''} ==== Report file creation FINISHING at Fri May 6 07:30:06 2016 ==== ID: 3285 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1269 Credit: 1,028,011 RAC: 29	Message 4212 - Posted: 21 Oct 2016, 11:12:44 UTC I can't find in Dashboard a job from Federica Fanzago I just got: wmagent_fanzago_TESTFF_RALCMSHOME_181016-T3_CH_VolunteerBackfill_161020_191735_8115 No output to running.log, but cmsRun is using a full core for 20 minutes now. ID: 4212 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1155 Credit: 8,388,869 RAC: 155	Message 4213 - Posted: 21 Oct 2016, 11:32:49 UTC - in response to Message 4212. Last modified: 21 Oct 2016, 11:34:27 UTC I can't find in Dashboard a job from Federica Fanzago I just got: wmagent_fanzago_TESTFF_RALCMSHOME_181016-T3_CH_VolunteerBackfill_161020_191735_8115 No output to running.log, but cmsRun is using a full core for 20 minutes now. Neither can Federica! She has just managed to get WMAgent submission running, but there are probably still bugs -- I see strange messages in StartLog suggesting that there are attempts to access her submission machine rather than our Condor server. I've a query out to the experts. These jobs don't appear to register successfully with Dashboard, tho' we used to see Hassen's jobs in Interactive View; I didn't see Federica's when I looked this morning. IIRC, I discussed this with Dashboard people, they knew about it, but it wasn't high up their TODO list. Output files are appearing on the DataBridge, as are log files so some jobs must be running successfully. Note these jobs don't show in the jobs-in-progress chart either, which is why that is taking a dip. ID: 4213 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1269 Credit: 1,028,011 RAC: 29	Message 4214 - Posted: 21 Oct 2016, 12:05:54 UTC Thanks Ivan for looking into it. It's nice to know that also other scientists try to use capacity of the Tier3 volunteers ;) An easier (working) way to find the batches we (BOINC) are involved would be helpful. ID: 4214 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1269 Credit: 1,028,011 RAC: 29	Message 4553 - Posted: 20 Dec 2016, 15:26:27 UTC than light at CERN. Started before submitting time. That's fast: [pre]Jobid: 312938f2-c68f-11e6-bebc-02163e018309-132_0 Site: T3_CH_Volunteer Grid Status: SUCCEEDED Submitted: 2016-12-20T09:02:27 Started: 1970-01-01T00:00:00 Finished: 2016-12-20T13:39:13 Task: wmagent_ireid_MonteCarlo_eff_IDR_CMS_Home_161219_233044_8212[/pre] ID: 4553 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1155 Credit: 8,388,869 RAC: 155	Message 4554 - Posted: 20 Dec 2016, 16:05:19 UTC - in response to Message 4553. Yes, we get some of those. 1970-01-01T00:00:00 is, of course, the epoch base in Linux (and many other operating systems); times are measured in seconds from then. Ergo, that's what an unfilled timestamp of 0.0 gets reported as. ID: 4554 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1269 Credit: 1,028,011 RAC: 29	Message 4557 - Posted: 20 Dec 2016, 16:16:19 UTC - in response to Message 4554. Yes, we get some of those. 1970-01-01T00:00:00 is, of course, the epoch base in Linux (and many other operating systems); times are measured in seconds from then. Ergo, that's what an unfilled timestamp of 0.0 gets reported as. Yeah, I knew. A lot of credits to Linus Torvalds, but it is the epoch UNIX time stamp ;) ID: 4557 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1155 Credit: 8,388,869 RAC: 155	Message 4560 - Posted: 20 Dec 2016, 16:24:26 UTC - in response to Message 4557. Oops. YKWIM! ID: 4560 · Rating: 0 · rate: / Reply Quote

Development for LHC@home