Message boards :
Number crunching :
cmsRun Fatal Exception
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 859,751 RAC: 161 |
First job, Run 1 in fresh VM == CMSSW: Executing CMSSW == CMSSW: cmsRun -j FrameworkJobReport.xml PSet.py == CMSSW: ----- Begin Fatal Exception 02-Mar-2016 19:29:23 CET----------------------- == CMSSW: An exception of category 'Incomplete configuration' occurred while == CMSSW: [0] Constructing the EventProcessor == CMSSW: [1] Constructing ESSource: class=PoolDBESSource label='GlobalTag' == CMSSW: Exception Message: == CMSSW: Valid site-local-config not found at /cvmfs/cms.cern.ch/SITECONF/local/JobConfig/site-local-config.xml == CMSSW: ----- End Fatal Exception ------------------------------------------------- == CMSSW: Complete == CMSSW: process id is 7889 status is 65 ======== CMSSW OUTPUT FINSHING ======== ERROR: Caught WMExecutionFailure - code = 65 - name = CmsRunFailure - detail = Error running cmsRun {'arguments': ['/bin/bash', '/home/boinc/CMSRun/glide_yaapii/execute/dir_7495/cmsRun-main.sh', '', 'slc6_amd64_gcc472', 'scramv1', 'CMSSW', 'CMSSW_6_2_0_SLHC26_patch3', 'FrameworkJobReport.xml', 'cmsRun', 'PSet.py', 'sandbox.tar.gz', '', '']} Return code: 65 NOTE: FJR has exit code 8001 and WMCore reports 65; preferring the FJR one. ERROR: Exceptional exit at Wed Mar 2 18:29:23 2016 (8001): CmsRunFailure CMSSW error message follows. Fatal Exception An exception of category 'Incomplete configuration' occurred while [0] Constructing the EventProcessor [1] Constructing ESSource: class=PoolDBESSource label='GlobalTag' Exception Message: Valid site-local-config not found at /cvmfs/cms.cern.ch/SITECONF/local/JobConfig/site-local-config.xml ERROR: Traceback follows: Traceback (most recent call last): File "CMSRunAnalysis.py", line 890, in <module> cmssw = executeCMSSWStack(opts, scram) File "CMSRunAnalysis.py", line 688, in executeCMSSWStack cmssw.execute() File "/home/boinc/CMSRun/glide_yaapii/execute/dir_7495/WMCore.zip/WMCore/WMSpec/Steps/Executors/CMSSW.py", line 233, in execute raise WMExecutionFailure(returncode, "CmsRunFailure", msg) WMExecutionFailure: CmsRunFailure Message: Error running cmsRun {'arguments': ['/bin/bash', '/home/boinc/CMSRun/glide_yaapii/execute/dir_7495/cmsRun-main.sh', '', 'slc6_amd64_gcc472', 'scramv1', 'CMSSW', 'CMSSW_6_2_0_SLHC26_patch3', 'FrameworkJobReport.xml', 'cmsRun', 'PSet.py', 'sandbox.tar.gz', '', '']} Return code: 65 ModuleName : WMCore.WMSpec.Steps.WMExecutionFailure MethodName : __init__ ClassInstance : None FileName : /home/boinc/CMSRun/glide_yaapii/execute/dir_7495/WMCore.zip/WMCore/WMSpec/Steps/WMExecutionFailure.py ClassName : None LineNumber : 18 ErrorNr : 65 Traceback: ERROR: Failed to record execution site name in the FJR from the site-local-config.xml Traceback (most recent call last): File "CMSRunAnalysis.py", line 386, in handleException slc = SiteLocalConfig.loadSiteLocalConfig() File "/home/boinc/CMSRun/glide_yaapii/execute/dir_7495/WMCore.zip/WMCore/Storage/SiteLocalConfig.py", line 51, in loadSiteLocalConfig raise SiteConfigError(msg) SiteConfigError: Unable to find site local config file: /cvmfs/cms.cern.ch/SITECONF/local/JobConfig/site-local-config.xml Dashboard end parameters: {'MonitorID': '160226_150549:ireid_crab_CMS_at_Home_MinBias_250evE', 'MonitorJobID': '2936_https://glidein.cern.ch/2936/160226:150549:ireid:crab:CMS:at:Home:MinBias:250evE_0', 'NEventsProcessed': 0, 'JobExitCode': 8001, 'ExeExitCode': 8001} Not sending data to popularity service because no input sources found. Dashboard popularity report: {'Basename': '', 'inputFiles': '', 'BasenameParent': '', 'inputBlocks': 'MCFakeBlock', 'parentFiles': ''} ==== Failure sleep STARTING at Wed Mar 2 18:29:24 2016 ==== Sleeping for 1097 seconds due to failure. |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
Investigating ... |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
We removed a temporary fix to try to speed up the boot time but it looks like it may still be needed. Have reverted back and we should see the result in a few hours. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I suggest to make a "Change-log" thread. This should contain entries with changes made and time and date, when they were applied. This way, we could look out for changes in behavior and not be surprised by them. |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
Done. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Thanks, Laurence. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 859,751 RAC: 161 |
We removed a temporary fix to try to speed up the boot time but it looks like it may still be needed. Have reverted back and we should see the result in a few hours. A few hours are over, rebooted the VM and it's working again. Thanks |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,136,075 RAC: 494 |
We removed a temporary fix to try to speed up the boot time but it looks like it may still be needed. Have reverted back and we should see the result in a few hours. Sorry, mea culpa, I suggested it. :-{( |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,136,075 RAC: 494 |
We removed a temporary fix to try to speed up the boot time but it looks like it may still be needed. Have reverted back and we should see the result in a few hours. Strike (:-) that, I didn't think it through far enough. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,136,075 RAC: 494 |
It's not getting better. Was I right the first time? If it's not picking siteinfo up properly from cvmfs, then it is a per-task problem, and the VM needs rebooting to pick up what we originally kludged into cvmfs to get around the problem. If it is picking it up from cvmfs, and the change should have percolated through by now, why is the failure rate still so high? My Brian hurts! |
©2024 CERN