Message boards :
Number crunching :
Issue of the day - 5th September 2015
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 975 |
I'm getting the following error in my cron-stdout logs... 11:13:05 +0100 2015-09-05 [INFO] Requesting an X509 credential 11:13:12 +0100 2015-09-05 [ERROR] Proxy error 11:13:12 +0100 2015-09-05 [INFO] Going to sleep for 1 hour Me or you ? |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 975 |
In the cron-stderr it says... ERROR: Couldn't read proxy from: /tmp/x509up_u500 globus_credential: Error reading proxy credential globus_credential: Error reading proxy credential: Couldn't read PEM from bio OpenSSL Error: pem_lib.c:703: in library: PEM routines, function PEM_read_bio: no start line Use -debug for further information. Is no-one else (who have read the thread) getting these errors ? It was working fine up to about 8:30am, a run would finish and then get these errors. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Sorry, PDW, all fine here. |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 975 |
Sorry, PDW, all fine here. Okay, thanks for the reply. Must be me then, all the machines I've checked so far have this problem but I haven't touched anything ! |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Do they have a thing in common? (router/modem) |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 975 |
Yes, they all share router and modem. |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 975 |
Have started new (24 hour) jobs and output to Graphics is only the boot.log that gets down to Activating Fuse module. No other files/directories have been created. On the VM Console ALT-F1 shows the Proxy error I listed in my first post but although it says Going to sleep for 1 hour it only lasts for a minute before the next entry and it tries to start another Run. ALT-F2 is a list of PIDs for sh; CMSJobAgent.sh; sleep; These 3 CMDs repeat down the screen. ALT-F3 shows not much is doing anything ALT-F4 and F5 are blank |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Try resetting router and modem. Pull the power for 20sec and reconnect each one. Try router first and if that does not help the modem. Wait for 10 to 20 minutes and see if it works. |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 975 |
I'm going to try turning the router off for a bit longer than that but am waiting for an Atlas task to finish first. Can appreciate this might make it work again but what made it go wrong ? |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
There are a lot of constantly active network connections with atlas/cms. You might want to check for a firmware update for your router. If it is old, there might be issues with the power supply for it.(they tend to go bad after a few years(increasing noise and voltage variations)). I can only guess. Once in a while, my router screws up and i have to reset it,but with "normal" boinc tasks it does not show up so easily as there is not anywhere near as much network traffic as with atlas/CMS. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 75 |
Nothing leaps out at me. Certificate stuff (X509) is more Laurence's domain, and he's off for a week. (Actually, I'll be off for the last 60% of the week too, but I'll try to engineer more jobs before I have to head off to Liverpool.) |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 975 |
After turning the router back on contact has been re-established and files/directories are being created and console windows are back working. It's a brand new router and modem that were installed on Thursday and had been working fine since they were put in. Everything else was okay, just CMS ! Will see if it happens again. Would suggest Laurence checks his 1 hour sleep command though. I waited a minute while a minute passed, and then, what seemed like an hour but was only a minute passed. It was a minute past. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I am glad, you got it working again. Please check for a firmware update. Sometimes, the manufacturer discovers a bug, after they launched the device. (i did a firmware update on mine, which was less than 6 month old at that time) |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 975 |
It's gone again :-( Went this morning but after I had checked everything was fine ! Have restarted the router but that hasn't fixed running jobs or new jobs (24 hour ones) so I am going to turn off the router for a few minutes (again once an Atlas task completes). Laurence, can you give me any more info about what is happening when requesting the X509 credential please ? 08:07:05 +0100 2015-09-07 [INFO] Requesting an X509 credential 08:07:06 +0100 2015-09-07 [ERROR] Proxy error 08:07:06 +0100 2015-09-07 [INFO] Going to sleep for 1 hour Looks like I need to talk to my ISP and any info on what might be going on would be useful to avoid their 'Have you tried turning it off and on again' response. Thanks. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Maybe you can borrow a different router or get a new one,to see, if that is causing the problem? Please check for a firmware update.(it does not matter, if the router is new or not) The fact, that it works for a while indicates, that it is a router issue. Just trying to help. |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 975 |
I appreciate the suggestions but I have not found a newer version of software to update the router. It's a TG582n currently running 10.2.5.2 FO. I also think it is likely to be the router but when everything else seems to be working fine I need something specific to challenge the ISP with. [I'll put this bit in brackets so ivan won't read it and get upset two weeks in a row, but I've upgraded to 40Mb fibre with the same ISP. I don't have a spare router that I could try for this. A restart of the router didn't fix it, a 10 minute switch off hasn't cured it. When I 'fixed' it last time the router was off for 2 hours or more but I can't do that at the moment as I am running in the CSG challenge and that needs fairly frequent access as it limits the number of tasks I can download.] I think I read that Lawrence is away this week, but if there is some command or site to access that can instantly confirm that access is blocked I can use that to check with my ISP. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
One last thing to try. There is usually a reset button on the router. This is a bit different than a re-power as it is supposed to reload the defaults values (if you forgot the password, for example). You need to hold it down for about 5 to 10 sec. If you have done this already, ignore this message. Good luck! |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 975 |
The last thing in boot.log is Activating Fuse module. When viewing ALT-F3 console as the job sets itself up, just after the above message and before it goes into the loop of trying to start a Run, failing to get an x509 credential and waiting for a 1 minute hour it says this... Starting httpd: httpd: Could not reliably determine the server domain name, using 127.0.0.1 for ServerName Starting vmcontext_epilog ... bootlogd: no process killed Is it trying to use 127.0.0.1 as the place to get the x509 credential from ? Can't compare it with a working one as I still don't have that ability yet ! |
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 |
The last thing in boot.log is Activating Fuse module. That's the last line in my boot.log too. Tue Sep 8 21:38:23 2015: cms.cern.ch: Activating Fuse module The next step seems to be at the start of cron-stdout 21:39:01 +0100 2015-09-08 [INFO] Starting CMS Application - Run 1 and so on - this task went on to create four run-x folders and processed an uncounted number of events. |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 975 |
I'm afraid you will need to look at the ALT-F3 console at the very start of the job setting itself up. Once the messages about Activating Fuse module and the domain name being set to 127.0.0.1 has been displayed it clears the screen and there is nothing for me to see [though you will see the Starting CMS Application - Run x appear]. After the first time of seeing the gist of what it was saying I used the video on my phone to record it :-) Edit: I keep saying ALT-F3 but I'm not sure I mean that console. Whatever the default screen is that is put up when you click the Show VM Console button. I can't actually change to another screen, the ALT-Fx buttons don't do anything now. |
©2024 CERN