Message boards : CMS Application : Error rate going up
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1148 Credit: 8,310,612 RAC: 0 ![]() |
|
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1148 Credit: 8,310,612 RAC: 0 ![]() |
Reply from RAL: There was a network "glitch" last night affecting everything in the RAL Tier-1. I don't know yet what caused it. ...and the official announcement: Date: Sat, 16 Apr 2016 09:53:44 +0100 From: EGI BROADCAST Subject: [ EGI BROADCAST ] Network problems at RAL Tier1 We have experienced a series of network outages overnight, affecting the Tier 1 and other RAL based services. A member of the network team is on-site. There is no time to fix yet. link to this broadcast : https://operations-portal.egi.eu/broadcast/archive/id/1357 ...so that may explain some of our ongoing problems (both CRAB and WMAgent submissions are via servers at RAL). ![]() |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,215,383 RAC: 257 ![]() |
Thanks, Ivan. https://operations-portal.egi.eu/broadcast/archive/id/1357 Does not work.(permissions) Any news on the large number of WNPostproc and unknown state jobs of the previous batch? It is not improving. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,215,383 RAC: 257 ![]() |
CMS task on t4t site are shutting down after 5min. They do 3 runs and quit. EDIT: NO CMS TASK STARTS UP. |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 0 ![]() ![]() |
Thanks, Ivan. Go to the home page and look under Latest News. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,215,383 RAC: 257 ![]() |
Thanks, m. I allways thought, we are Tier 3. |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1148 Credit: 8,310,612 RAC: 0 ![]() |
Thanks, Ivan.I wasn't sure if it would or not; I included in case it did. Any news on the large number of WNPostproc and unknown state jobs of the previous batch? It is not improving. I don't think it will, overmuch. I just renewed the proxy for another 8 days, as it was coming up to the seven-day lifetime, so the black sheep have that long to come home. :-) ![]() |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1148 Credit: 8,310,612 RAC: 0 ![]() |
|
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1148 Credit: 8,310,612 RAC: 0 ![]() |
Any news on the large number of WNPostproc and unknown state jobs of the previous batch? It is not improving. [Edit] We've definitely had communications issues over the life of the last batch; Dashboard says there were 9193 successes to date, but I only count 8843 apparently-good result files on the Data Bridge. [/Edit] ![]() |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1148 Credit: 8,310,612 RAC: 0 ![]() |
Any news on the large number of WNPostproc and unknown state jobs of the previous batch? It is not improving. # But did they ever return? # No they never returned, # And their fate is still unlearnt. # They may glide forever # 'Round the Internet fibres. # They're the jobs that never returned! https://www.youtube.com/watch?v=Dh994JcEfkI ![]() |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,215,383 RAC: 257 ![]() |
|
![]() ![]() Send message Joined: 8 Apr 15 Posts: 807 Credit: 14,897,702 RAC: 13,777 ![]() ![]() ![]() |
Kingston Trio ![]() About the same age as NASA........and me........watching NASA on black and white TV |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1148 Credit: 8,310,612 RAC: 0 ![]() |
You must be the same age as me -- conscripted to fight in Vietnam, reprieved by a change in government. Another famous Kinston Trio song. Coincidentally, that song earwormed me for a day or two recently, with no trigger whatsoever that I recall! ![]() |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1148 Credit: 8,310,612 RAC: 0 ![]() |
BTW, you might have noticed that job rates have fallen in the past few hours. I've not found a reason for that, and I'm about to slope off for a good night's sleep. Investigation will continue about 1000Z tomorrow! [Edit] Ah, I had to look one more time before bed... WMAgent jobs are rising, pre-empting CRAB3 jobs. (I'll go to sleep now, promise!) [/Edit] ![]() |
![]() ![]() Send message Joined: 8 Apr 15 Posts: 807 Credit: 14,897,702 RAC: 13,777 ![]() ![]() ![]() |
You must be the same age as me -- conscripted to fight in Vietnam, reprieved by a change in government. THAT is a Kingston Trio classic......and I know if I play that song I will still have it in my head when I wake up tomorrow ![]() I think that is one of those things that happen to us *oldtimers* Ivan ![]() These days I can get them stuck in my head for days and some are ones I would rather not have in my head at all! I even get those from old tv shows I watch these days since I have a channel that shows the old classics from the 50's,60's,and 70's The only way I can get rid of them is play another one from the past that I don't mind.....even after a day or two. Ok now as far as our tasks here.....as usual if I don't watch my hosts 24/7 those "Error while computing" will run that streak over and over every 10 minutes. So when I checked I had a Valid task and then 14 *Error while computing* in a row so I stopped them and did a reboot and got a new task running and set them to NOT get new ones just in case that started again. So far one host got another Valid and once again it gave me a ERROR on the CMS in 10 minutes so since I was watching I took a few minutes to do a Windows 10 update and reboot and tried again and this time got a LHCb ERROR. Since it is set to not get new tasks it won't start another streak (it just happened on that desktop upstairs and I am back in the livingroom on the laptop) and I tried one on here just to see if it would run one since I suspended my Atlas tasks......no luck so it is back off my host list here. My other 2 hosts still have tasks running that I started yesterday so maybe they will finish Valid. They all have looked ok when I watched them start on the VM Console BUT what I have been getting on those ERRORS when I look at the stderr is VM Heartbeat file specified, but missing. VM Heartbeat file specified, but missing file system status. (errno = '2') Not sure if it is just on the server end having trouble connecting the opposite side of the planet or my DSL snail. But then ALL 7 of my hosts run GPU's for Einstein 24/7 and never have a problem and the same with my vLHC tasks and Atlas tasks. I will watch the two I have running since I tend to stay up until 3am and they must be getting close and see if I can get the other two hosts to get back to work without the errors after 10 minutes. |
Send message Joined: 13 Feb 15 Posts: 1251 Credit: 994,625 RAC: 441 ![]() ![]() |
BUT what I have been getting on those ERRORS when I look at the stderr is The bolded kind of message is a local problem from your VM and host, not from a connection going outside your home. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,215,383 RAC: 257 ![]() |
I guess, the previos batch has been abandoned ? There has been next to no change in the past few days and there are still 100+ pending. |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1148 Credit: 8,310,612 RAC: 0 ![]() |
|
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,215,383 RAC: 257 ![]() |
"unknown" and "WNPostproc" state jobs are rising rapidly. |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1148 Credit: 8,310,612 RAC: 0 ![]() |
|
©2025 CERN