Message boards : CMS Application : Failure to get X509 credential
Message board moderation

To post messages, you must log in.

AuthorMessage
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 300
Message 2983 - Posted: 24 Apr 2016, 0:41:12 UTC

Hosts here are failing. Nothing to do but set NNW for the rest of the night.
ID: 2983 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile tullio

Send message
Joined: 17 Aug 15
Posts: 62
Credit: 296,695
RAC: 0
Message 2984 - Posted: 24 Apr 2016, 0:59:18 UTC

All tasks failing after 7 minutes on the Windows 10 PC. Two still running on this Linux host.
Tullio
ID: 2984 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,251
RAC: 171
Message 2985 - Posted: 24 Apr 2016, 5:34:20 UTC - in response to Message 2983.  
Last modified: 24 Apr 2016, 5:35:38 UTC

Thanks. The disk was full on the server. Note that this did not affect the beta app as it uses a different server. Over the next week or so all the services that are needed to support the Theory and CMS apps in the production project will be reviewed to ensure that they are production quality.
ID: 2985 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,101
RAC: 103
Message 2991 - Posted: 24 Apr 2016, 10:15:53 UTC - in response to Message 2983.  
Last modified: 24 Apr 2016, 10:32:55 UTC

Sorry I didn't notice that last night, I was concentrating on the -beta. Is it OK now? My last failure was only 20 mins ago...

[Edit] Spoke too soon, and now I'm "over quota" too. [/Edit]
ID: 2991 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,101
RAC: 103
Message 2992 - Posted: 24 Apr 2016, 12:24:06 UTC

Final (?) hurdle overcome, jobs are starting to run again.
ID: 2992 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 300
Message 2994 - Posted: 24 Apr 2016, 13:22:50 UTC - in response to Message 2992.  
Last modified: 24 Apr 2016, 13:44:15 UTC

Started a host up by hand to check. Running OK now, thanks.
Laurence posted here what each console should eventually show but these logs seem somewhat confused:-

From the web server:-
MasterLog is OK
StartLog is OK
Starter.Log is OK
stderr.log shows cmsRun-stdout.log: No such file or directory.
stdout.log is OK.

No other logs are listed.

From the consoles:-
F1 is OK (boot& init)
F2 no output (should be running.log??)
F3 is OK (top)
F4 not sure, definitely a log of some sort. (Could be wrapper stdout,lots of CMSSW messages)
F5 shows cmsRun.stdout: No such file or directory. (should be wrapper stderr??)
F6 is OK (login)

cmsRun is using 80-90% CPU so presumably it's running OK.
ID: 2994 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,101
RAC: 103
Message 2996 - Posted: 24 Apr 2016, 14:08:53 UTC - in response to Message 2994.  

You have a Condor job running on HostID 1033, currently showing 66 mins of "activity time" (whatever that means exactly...).
ID: 2996 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,251
RAC: 171
Message 2997 - Posted: 24 Apr 2016, 14:20:39 UTC - in response to Message 2994.  

I have (hopefully) added some messages to the consoles to indicate what should be there even if they are blank.
ID: 2997 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,251
RAC: 171
Message 2998 - Posted: 24 Apr 2016, 14:24:31 UTC - in response to Message 2994.  

Will hopefully now find the cmsRun-stdout.log.
ID: 2998 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 300
Message 3000 - Posted: 24 Apr 2016, 15:03:59 UTC - in response to Message 2996.  
Last modified: 24 Apr 2016, 15:04:22 UTC

Thanks, Ivan, but I've turned the host off again now. It will start itself up tonight. Not sure what the fate of that job will be but I don't expect it to resume successfully.
ID: 3000 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 266
Message 3001 - Posted: 24 Apr 2016, 16:51:22 UTC - in response to Message 2998.  

Will hopefully now find the cmsRun-stdout.log.

Yes, displayed in Console 2 and in the logs called: running.log
ID: 3001 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,101
RAC: 103
Message 3002 - Posted: 24 Apr 2016, 17:25:56 UTC - in response to Message 3000.  

As you will.
ID: 3002 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3018 - Posted: 25 Apr 2016, 18:21:20 UTC
Last modified: 25 Apr 2016, 18:22:10 UTC

What is the expected upload size for a job from the 250ev10ke batch?
(Logs+results)
ID: 3018 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 266
Message 3020 - Posted: 25 Apr 2016, 19:41:22 UTC - in response to Message 3018.  

What is the expected upload size for a job from the 250ev10ke batch?
(Logs+results)

The major root-result file for 250 events is about 66MB.
ID: 3020 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3021 - Posted: 25 Apr 2016, 19:52:57 UTC - in response to Message 3020.  

Thanks,CP.
I got about 80MB total, so with logs, it is in the ball-park.
Just a sanity check, as atlas is very different(for me).
ID: 3021 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,874,101
RAC: 103
Message 3023 - Posted: 25 Apr 2016, 20:12:56 UTC - in response to Message 3020.  

What is the expected upload size for a job from the 250ev10ke batch?
(Logs+results)

The major root-result file for 250 events is about 66MB.

Yes, it varies +/- 10-20%. The logfile upload to the Condor server is the _condor_stdout, 130KB or so for a good job; the stderr that you see on the -dev website for your tasks is relatively small. I don't think anything else goes anywhere else, but I could be wrong.
ID: 3023 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : CMS Application : Failure to get X509 credential


©2024 CERN