41)
Message boards :
CMS Application :
New version 49.00
(Message 6566)
Posted 19 Aug 2019 by ivan Post: @#%$@%^&&^%$$#@* Sorry, there was a huge increase in CMS jobs being run last night, so the queue drained before I could replenish it. New batch sent, should be OK in a few minutes. |
42)
Message boards :
CMS Application :
New version 49.00
(Message 6563)
Posted 17 Aug 2019 by ivan Post: Each of your failed VMs requested 4584MB RAM which is close to 60% of the computer's total RAM. Hah! Didn't think about that. It didn't like 3 cores either -- come to think of it, the error at one point was "waiting for memory"... Yes, it was set to 50/90%, I changed to 90/90%. |
43)
Message boards :
CMS Application :
New version 49.00
(Message 6553)
Posted 15 Aug 2019 by ivan Post: Well, my little £130 Celeron J1900 certainly didn't like trying to run a 4-core VM! Continually timed out "Waiting for memory"! I've dropped down to 3 cores to see if that runs. |
44)
Message boards :
CMS Application :
New version 49.00
(Message 6548)
Posted 14 Aug 2019 by ivan Post: Ok I just ran all of those and they are all open (TRUE) I think my one Win10 failure was after I upgraded the memory to 8 GB because 4 GB wasn't quite enough and I guess the VM got confused when it restarted. I'm now running 1x 2-core VM on it with no apparent problem. Googling that error message turns up some interesting things. At the moment your i7-3770 seems to be running 4x 2-core VMs (so you must have hyperthreading enabled); is that right? One of the comments I saw was that it's best to keep one core free to run the VM. Others suggested time-outs to slow peripheral storage, mismatch with Guest Addition modules, and a few other more exotic things. |
45)
Message boards :
News :
CMS@Home: Disruption to our condor server next Monday
(Message 6519)
Posted 7 Aug 2019 by ivan Post: There's been a slight change in plans. "Given that we do not need to redeploy the agent, but only kill jobs in condor and let them get recreated with the JobSubmitter/schedd changes, I think you can go ahead and submit another workflow to [keep] volunteers happy." So, I'll continue to submit smaller batches and you can resume new tasks. |
46)
Message boards :
News :
CMS@Home: Disruption to our condor server next Monday
(Message 6518)
Posted 7 Aug 2019 by ivan Post: We've run down the CMS job queue to make some changes to the submission environment. Please set No New Tasks so that you don't have excessive churning waiting for jobs that are not available. |
47)
Message boards :
News :
CMS@Home: Disruption to our condor server next Monday
(Message 6509)
Posted 1 Aug 2019 by ivan Post: OK, we eventually found the problem and jobs are flowing again. |
48)
Message boards :
CMS Application :
New version 49.00
(Message 6503)
Posted 25 Jul 2019 by ivan Post: Tried another test this morning and still these will not start because of the Cern server problems. Yes, we're having condor problems. WMStats shows jobs pending, but the condor schedd isn't sending any out. Must be a ClassAd mismatch that arose since the reboots. |
49)
Message boards :
Cafe :
LHC asleep all day
(Message 6494)
Posted 23 Jul 2019 by ivan Post: Good luck with your house projects Ivan and hope your ISP gives you a new modem without making you pay too. Thanks. It's what they call a "semi-detached" built by the Council in 1967. I should have had it renovated when I moved in 15 years ago, but as you say, over 60 you start losing enthusiasm for hard work (it's not the bending down that's the problem, it's the straightening back up that hurts). I've never owned a cell/mobile/handy either. I leave the ringer on the landline switched off to avoid chimney-sweeps touting for business... |
50)
Message boards :
Cafe :
LHC asleep all day
(Message 6492)
Posted 23 Jul 2019 by ivan Post: At least you're connected. My broadband modem died in last Saturday's flood, so no news, no BBC iPlayer, no crossword, limited Android games. That's why I'm in my office before 7am. I also had to cancel a 2-day CMS-UK meeting in Oxfordshire while I try to work out a way out of the mess. |
51)
Message boards :
News :
CMS@Home: Disruption to our condor server next Monday
(Message 6489)
Posted 22 Jul 2019 by ivan Post: I have jobs coming in from -dev now, but none from the main project yet -- it still has a lot of services, including the feeder, showing as not running. Hmm, but some of them are dying, VirtualBox is reporting inaccessible vdis. |
52)
Message boards :
News :
CMS@Home: Disruption to our condor server next Monday
(Message 6442)
Posted 17 Jul 2019 by ivan Post: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5087#39376 |
53)
Message boards :
News :
Using a local proxy to reduce network traffic for CMS
(Message 6397)
Posted 7 Jun 2019 by ivan Post: Thanks to computezrmle, with additional work from Laurence and a couple of CMS experts (and my adding one line to the site-local-config file) there is now a way to set up a local caching proxy to greatly reduce your network traffic. Each job instance that runs within s CMS BOINC task must retrieve a lot of set-up data from our database. This data doesn't change very often, so if you keep a local copy the job can access that rather than going over the network every time. Instructions on how to do this are available at https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.phpp?id=475&postid=6396 or https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5052&postid=39072 |
54)
Message boards :
News :
CMS -- Please set "no new tasks"
(Message 6368)
Posted 15 May 2019 by ivan Post: Intervention over, CMS jobs are available again |
55)
Message boards :
News :
CMS -- Please set "no new tasks"
(Message 6367)
Posted 15 May 2019 by ivan Post: Yes, it was OK to let running tasks finish (sorry I only saw your message just now). I let the queue run down to less than 30 "running" tasks before aborting the workflow and giving Alan free rein to update the WMAgent. As yet it hasn't re-appeared and he hasn't let me know the update is over. Once I know that I'll submit a new workflow and when it appears on WMStats I'll issue the go-ahead to resume tasks. |
56)
Message boards :
News :
CMS -- Please set "no new tasks"
(Message 6365)
Posted 14 May 2019 by ivan Post: Hi to all CMS-ers. We need to drain the job queue so that a new version of the WMAgent can be installed. Can you please set No New Tasks so that your current tasks can run out and no new jobs start? If you have any tasks waiting to run, please suspend or abort them. Thanks, I'll let you know as soon as the change is done. |
57)
Message boards :
CMS Application :
New version 49.00
(Message 6312)
Posted 30 Apr 2019 by ivan Post: It's probably not a problem, but in the vdi coming from the project, the log-files MasterLog, StartLog and StarterLog exist and aren't empty. Ah, so that's what that is! I had two tasks fail tonight and I noticed the strange time-stamps in the output. |
58)
Message boards :
CMS Application :
New version 49.00
(Message 6309)
Posted 29 Apr 2019 by ivan Post: https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=1891827 Looking at some of the timings suggests that something is hanging for quite a while, then after Condor is contacted it times out at ~20 minutes without returning a job. |
59)
Message boards :
CMS Application :
New version 49.00
(Message 6307)
Posted 28 Apr 2019 by ivan Post: https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=1891941 That looks like (a small) progress to me. Did you change anything? I'm still perplexed as to why the job plots are showing so many "unknown" status jobs. For the record, WMAgent monitoring suggests no failures in Volunteer MonteCarlo jobs (i.e. no job has failed more than 3 (5?) attempts at running). OTOH, operations change the core submissions software quite often, they may have changed criteria... |
60)
Message boards :
CMS Application :
New version 49.00
(Message 6302)
Posted 27 Apr 2019 by ivan Post: You *may* have been unlucky with your timing. Since the resolution of the Easter problem ("quota exceeded", I've yet to get a full explanation) there have been a number of "unknown" entries showing up in the job plots. So, I submitted a new batch last night, waited until it showed up in my monitor, then aborted the previous batch -- this is also why there is a large "cancelled" peak in the plots. Let's wait a few more hours to see if things settle down, though I'm concerned that I'm still seeing unknowns in the plots. |
©2024 CERN