Message boards : Number crunching : Issue of the day - 5th September 2015
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,968,227
RAC: 13
Message 1004 - Posted: 5 Sep 2015, 16:53:36 UTC

I'm getting the following error in my cron-stdout logs...

11:13:05 +0100 2015-09-05 [INFO] Requesting an X509 credential
11:13:12 +0100 2015-09-05 [ERROR] Proxy error
11:13:12 +0100 2015-09-05 [INFO] Going to sleep for 1 hour

Me or you ?
ID: 1004 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,968,227
RAC: 13
Message 1005 - Posted: 5 Sep 2015, 17:34:39 UTC - in response to Message 1004.  

In the cron-stderr it says...


ERROR: Couldn't read proxy from: /tmp/x509up_u500
globus_credential: Error reading proxy credential
globus_credential: Error reading proxy credential: Couldn't read PEM from bio
OpenSSL Error: pem_lib.c:703: in library: PEM routines, function PEM_read_bio: no start line

Use -debug for further information.

Is no-one else (who have read the thread) getting these errors ?

It was working fine up to about 8:30am, a run would finish and then get these errors.
ID: 1005 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1006 - Posted: 5 Sep 2015, 17:49:59 UTC - in response to Message 1005.  

Sorry, PDW, all fine here.
ID: 1006 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,968,227
RAC: 13
Message 1007 - Posted: 5 Sep 2015, 18:00:54 UTC - in response to Message 1006.  

Sorry, PDW, all fine here.

Okay, thanks for the reply.

Must be me then, all the machines I've checked so far have this problem but I haven't touched anything !
ID: 1007 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1008 - Posted: 5 Sep 2015, 18:14:01 UTC - in response to Message 1007.  

Do they have a thing in common? (router/modem)
ID: 1008 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,968,227
RAC: 13
Message 1009 - Posted: 5 Sep 2015, 18:18:28 UTC - in response to Message 1008.  

Yes, they all share router and modem.
ID: 1009 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,968,227
RAC: 13
Message 1012 - Posted: 5 Sep 2015, 21:14:50 UTC

Have started new (24 hour) jobs and output to Graphics is only the boot.log that gets down to Activating Fuse module.

No other files/directories have been created.

On the VM Console ALT-F1 shows the Proxy error I listed in my first post but although it says Going to sleep for 1 hour it only lasts for a minute before the next entry and it tries to start another Run.

ALT-F2 is a list of PIDs for sh; CMSJobAgent.sh; sleep;
These 3 CMDs repeat down the screen.

ALT-F3 shows not much is doing anything

ALT-F4 and F5 are blank
ID: 1012 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1013 - Posted: 5 Sep 2015, 21:45:47 UTC - in response to Message 1012.  

Try resetting router and modem.
Pull the power for 20sec and reconnect each one.

Try router first and if that does not help the modem.
Wait for 10 to 20 minutes and see if it works.
ID: 1013 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,968,227
RAC: 13
Message 1014 - Posted: 5 Sep 2015, 21:52:31 UTC - in response to Message 1013.  

I'm going to try turning the router off for a bit longer than that but am waiting for an Atlas task to finish first.

Can appreciate this might make it work again but what made it go wrong ?
ID: 1014 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1015 - Posted: 5 Sep 2015, 22:13:19 UTC - in response to Message 1014.  
Last modified: 5 Sep 2015, 22:14:22 UTC

There are a lot of constantly active network connections with atlas/cms.
You might want to check for a firmware update for your router.
If it is old, there might be issues with the power supply for it.(they tend to go bad after a few years(increasing noise and voltage variations)).

I can only guess. Once in a while, my router screws up and i have to reset it,but with "normal" boinc tasks it does not show up so easily as there is not anywhere near as much network traffic as with atlas/CMS.
ID: 1015 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1138
Credit: 8,067,191
RAC: 2,972
Message 1016 - Posted: 5 Sep 2015, 23:48:36 UTC

Nothing leaps out at me. Certificate stuff (X509) is more Laurence's domain, and he's off for a week. (Actually, I'll be off for the last 60% of the week too, but I'll try to engineer more jobs before I have to head off to Liverpool.)
ID: 1016 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,968,227
RAC: 13
Message 1017 - Posted: 6 Sep 2015, 0:43:14 UTC - in response to Message 1016.  

After turning the router back on contact has been re-established and files/directories are being created and console windows are back working.

It's a brand new router and modem that were installed on Thursday and had been working fine since they were put in. Everything else was okay, just CMS !

Will see if it happens again.

Would suggest Laurence checks his 1 hour sleep command though. I waited a minute while a minute passed, and then, what seemed like an hour but was only a minute passed. It was a minute past.
ID: 1017 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1018 - Posted: 6 Sep 2015, 7:33:25 UTC - in response to Message 1017.  

I am glad, you got it working again.

Please check for a firmware update.

Sometimes, the manufacturer discovers a bug, after they launched the device.
(i did a firmware update on mine, which was less than 6 month old at that time)
ID: 1018 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,968,227
RAC: 13
Message 1033 - Posted: 7 Sep 2015, 18:25:05 UTC - in response to Message 1018.  

It's gone again :-(

Went this morning but after I had checked everything was fine !

Have restarted the router but that hasn't fixed running jobs or new jobs (24 hour ones) so I am going to turn off the router for a few minutes (again once an Atlas task completes).

Laurence, can you give me any more info about what is happening when requesting the X509 credential please ?

08:07:05 +0100 2015-09-07 [INFO] Requesting an X509 credential
08:07:06 +0100 2015-09-07 [ERROR] Proxy error
08:07:06 +0100 2015-09-07 [INFO] Going to sleep for 1 hour

Looks like I need to talk to my ISP and any info on what might be going on would be useful to avoid their 'Have you tried turning it off and on again' response.

Thanks.
ID: 1033 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1034 - Posted: 7 Sep 2015, 20:15:41 UTC

Maybe you can borrow a different router or get a new one,to see, if that is causing the problem?
Please check for a firmware update.(it does not matter, if the router is new or not)

The fact, that it works for a while indicates, that it is a router issue.

Just trying to help.
ID: 1034 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,968,227
RAC: 13
Message 1035 - Posted: 7 Sep 2015, 21:10:01 UTC - in response to Message 1034.  

I appreciate the suggestions but I have not found a newer version of software to update the router. It's a TG582n currently running 10.2.5.2 FO.

I also think it is likely to be the router but when everything else seems to be working fine I need something specific to challenge the ISP with.

[I'll put this bit in brackets so ivan won't read it and get upset two weeks in a row, but I've upgraded to 40Mb fibre with the same ISP. I don't have a spare router that I could try for this. A restart of the router didn't fix it, a 10 minute switch off hasn't cured it. When I 'fixed' it last time the router was off for 2 hours or more but I can't do that at the moment as I am running in the CSG challenge and that needs fairly frequent access as it limits the number of tasks I can download.]

I think I read that Lawrence is away this week, but if there is some command or site to access that can instantly confirm that access is blocked I can use that to check with my ISP.
ID: 1035 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 1036 - Posted: 7 Sep 2015, 21:23:46 UTC - in response to Message 1035.  

One last thing to try. There is usually a reset button on the router.
This is a bit different than a re-power as it is supposed to reload the defaults
values (if you forgot the password, for example).
You need to hold it down for about 5 to 10 sec.

If you have done this already, ignore this message.
Good luck!
ID: 1036 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,968,227
RAC: 13
Message 1071 - Posted: 9 Sep 2015, 14:10:19 UTC - in response to Message 1036.  

The last thing in boot.log is Activating Fuse module.
When viewing ALT-F3 console as the job sets itself up, just after the above message and before it goes into the loop of trying to start a Run, failing to get an x509 credential and waiting for a 1 minute hour it says this...

Starting httpd: httpd: Could not reliably determine the server domain name, using 127.0.0.1 for ServerName

Starting vmcontext_epilog ...
bootlogd: no process killed


Is it trying to use 127.0.0.1 as the place to get the x509 credential from ?

Can't compare it with a working one as I still don't have that ability yet !
ID: 1071 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 May 15
Posts: 64
Credit: 55,584
RAC: 0
Message 1077 - Posted: 9 Sep 2015, 15:43:57 UTC - in response to Message 1071.  

The last thing in boot.log is Activating Fuse module.

That's the last line in my boot.log too.

Tue Sep 8 21:38:23 2015: cms.cern.ch: Activating Fuse module

The next step seems to be at the start of cron-stdout

21:39:01 +0100 2015-09-08 [INFO] Starting CMS Application - Run 1
21:39:01 +0100 2015-09-08 [INFO] Reading the BOINC volunteer's information
21:39:02 +0100 2015-09-08 [INFO] Volunteer: Richard Haselgrove (229) Host: 379

and so on - this task went on to create four run-x folders and processed an uncounted number of events.
ID: 1077 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,968,227
RAC: 13
Message 1078 - Posted: 9 Sep 2015, 16:16:47 UTC - in response to Message 1077.  
Last modified: 9 Sep 2015, 16:27:58 UTC

I'm afraid you will need to look at the ALT-F3 console at the very start of the job setting itself up. Once the messages about Activating Fuse module and the domain name being set to 127.0.0.1 has been displayed it clears the screen and there is nothing for me to see [though you will see the Starting CMS Application - Run x appear].

After the first time of seeing the gist of what it was saying I used the video on my phone to record it :-)

Edit: I keep saying ALT-F3 but I'm not sure I mean that console. Whatever the default screen is that is put up when you click the Show VM Console button. I can't actually change to another screen, the ALT-Fx buttons don't do anything now.
ID: 1078 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Issue of the day - 5th September 2015


©2024 CERN