Message boards :
Number crunching :
issue of the day
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 11 · Next
Author | Message |
---|---|
Send message Joined: 20 May 15 Posts: 217 Credit: 6,190,193 RAC: 3,145 |
I'll go and have a check, but I think all the time, ie once an hour on every machine. I can't check all of them as the console doesn't work on some and the boot log is the only file showing through the 'Graphics' button. I can check cpu usage though. I'll be back... |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,190,193 RAC: 3,145 |
Yes, all machines running CMS are showing this error. They are all happily running vLHC though. Haven't received a CMS job through vLHC yet. Any reason the LHC@home check doesn't say vLHC@home ? |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,190,193 RAC: 3,145 |
I tried a re-boot but still the same. The CMS request takes a few seconds (5-10 ?) before it tries LHC, that takes no time at all before it puts out a curl command and then the 'Cloud not get' message. |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,190,193 RAC: 3,145 |
Still getting the 'Cloud not get' error. After the last line in the boot.log file of Activating Fuse module the console display flashes this up quickly: Starting httpd: httpd: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for Servername [OK] Starting vmcontext_epilog ... bootlogd: no process killed It says OK, so is it ? |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 270 |
Still getting the 'Cloud not get' error. I believe so; certainly I've seen those messages with no obvious detrimental effect. |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,190,193 RAC: 3,145 |
That's what I'd thought I'd seen as well but after suffering withdrawal symptoms for so long you start to grasp at straws ! |
Send message Joined: 29 May 15 Posts: 147 Credit: 2,842,484 RAC: 0 |
Yes, a normal running WU spottes so many Errors, that no normal Cruncher has a chance to find the real problem |
Send message Joined: 21 Sep 15 Posts: 89 Credit: 383,017 RAC: 0 |
Linux host (Ubuntu) - after today's OS update and restart: CMS "Waiting to run (Scheduler wait: Please update/recompile VirtualBox Kernal Drivers.)" Downloading VBox 5.0.10 now (was on 5.0.8). Sigh. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 270 |
Linux host (Ubuntu) - after today's OS update and restart: Well, that's quite usual for Linux. I have to re-run the Nvidia driver installer on my Linux machines with their GPUs every time there's a kernel update -- you're supposed to be able to set it up to do it automagically, but I've had problems with that. Also have to recompile the XeonPhi drivers after a kernel update and that's a whole other kettle of fish! |
Send message Joined: 17 Aug 15 Posts: 62 Credit: 296,695 RAC: 0 |
I've installed Leap 42.1, the latest SuSE release, as a Virtual Machine on this Windows host. I installed BOINC client and manager from SuSE and also Virtual Box. Nothing works. Then I downloaded gcc, make and kernel sources from SuSE and am trying to rebuild all that works on OpenSuSE 13.1 and 13.2 on my Linux boxes. Tullio |
Send message Joined: 18 Aug 15 Posts: 14 Credit: 125,335 RAC: 0 |
Bill Michael said: CMS "Waiting to run (Scheduler wait: Please update/recompile VirtualBox Kernal Drivers.)" If you install DKMS (Dynamic Kernel Management System) before you install VirtualBox, you shouldn't need to recomiple VirtualBox after a kernel update. Ivan, the same principle applies for Nvidia drivers. If you install DKMS before you install the Nvidia drivers, you shouldn't have to re-install the Nvidia drivers after a kernel update. The only time I have to re-install Nvidia drivers after a kernel update is when I am running a pre-release (alpha or beta) version of the Ubuntu operating system. I don't have any experience with XeonPhi drivers, but the same principle might apply. It would certainly be worth a test. Hope that helps. |
Send message Joined: 17 Aug 15 Posts: 62 Credit: 296,695 RAC: 0 |
I have DKMS but I need gcc,make and kernel sources to recompile the kernel modules of VirtualBox. Tullio |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 270 |
Ivan, the same principle applies for Nvidia drivers. If you install DKMS before you install the Nvidia drivers, you shouldn't have to re-install the Nvidia drivers after a kernel update. The only time I have to re-install Nvidia drivers after a kernel update is when I am running a pre-release (alpha or beta) version of the Ubuntu operating system. Could well be so, I relied on the Nvidia installer the one or two times I tried it, it may not have got everything right. Not sure about Xeon Phi, section 2.7 of this manual. You weren't a Eurodance star in the 1990's were you? (Not really SFW, the Germans never really grasped the offensiveness of some English swear-words.) |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,190,193 RAC: 3,145 |
I tried a re-boot but still the same. I see Yeti is getting this error now (he has reported it on the vLHC forum), any updates on it being fixed ? |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 0 |
"Cloud not get proxy..." (sic) I'm sure I've seen a post about this before, but can't find it. All my machines running CMS are stuck with this.:- It looks as though they've done nothing useful for a couple of days, so I've set NNW for now. Is this something I can do anything about? |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 270 |
Only thing I can suggest at the moment is to reset the project on that machine, to get a fresh VM image. Sometimes things get corrupted, mainly I guess by network glitches as the cvmfs file-system is being updated. FWIW, in the latest batch (since 13/12/15) that machine has returned 7 success exits and one 151 (stage-out error). We're trying to chase the stage-out errors; there is some correlation with distance from CERN apparently. I've just got the exit status from 3300-odd jobs in the current batch and counted the different statuses for each machine -- I might try to tie that in with IP but it involves a laborious manual look-up. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
All of a sudden, all jobs, that have not been run,turned status to "unknown" on dashboard. Previously they were listed as status "Pending". Any idea, why? Shouldn't they have had the status "Unknown" to begin with? |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 270 |
All of a sudden, all jobs, that have not been run,turned status to "unknown" on dashboard. Previously they were listed as status "Pending". Well, if it's not been run it should be pending. Note that some of the "unknown" jobs have been run already. However, I've given up expecting Dashboard to give more than an approximation to reality while a batch is "live". There seem to be too many uncertainties for it to accurately interpret all the return codes. Those that have run but are marked as unknown appear to have timed out or some such; there is no job log, just the placeholder "Job output has not been processed by post-job." -- the Dashboard details give "N/A / Error return without specification". |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 0 |
Thanks, Ivan. Only thing I can suggest at the moment is to reset the project on that machine, to get a fresh VM image. Sometimes things get corrupted, mainly I guess by network glitches as the cvmfs file-system is being updated. They've all got work from other projects at the moment, but forced one to get a new CMS BOINC task. This started OK without having to reset the project. The others should do this on their own eventually. Presumably they would have recovered on their own when the 24hr task time expired and they started afresh. This could take up to 4 days here. We're trying to chase the stage-out errors; there is some correlation with distance from CERN apparently. I've just got the exit status from 3300-odd jobs in the current batch and counted the different statuses for each machine -- I might try to tie that in with IP but it involves a laborious manual look-up. Maybe treat Dashboard-reported IPs for "jobs with retries" and IP-to-location with some suspicion, too. Once my ISP bought some v4 IPs from some outfit in the USA; as I remember it took quite a while before users could access "UK only" content. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 270 |
We're trying to chase the stage-out errors; there is some correlation with distance from CERN apparently. I've just got the exit status from 3300-odd jobs in the current batch and counted the different statuses for each machine -- I might try to tie that in with IP but it involves a laborious manual look-up. I have actually dug the user and machine names from the end of the log-file (the line that says "FINISHING on user-machine-pid with status X") but for each "interesting" one I have to use my BOINC admin account to find what IP that machine last used, to report to the crew chasing down stage-out problems. |
©2024 CERN