Thread 'CMS job shortage Wednesday 13th November'

Author	Message
ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1156 Credit: 8,453,729 RAC: 13	Message 6812 - Posted: 11 Nov 2019, 15:50:55 UTC CMS IT will be installing a new version of WMAgent on Wednesday. This will impact job availability for the duration of the intervention. We might be able to eliminate the little gremlin that's been plaguing us for the last few weeks, too. So, please set your CMS processors to No New Tasks sometime tomorrow, Tuesday 12th, so that current tasks will stop requesting new jobs before the queues get cut. I'll let you know when jobs are available again. Thanks. ID: 6812 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 1006 Credit: 18,363,068 RAC: 17,163	Message 6813 - Posted: 12 Nov 2019, 4:02:29 UTC - in response to Message 6812. Thanks Ivan I will just finish the ones I have running. ID: 6813 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1156 Credit: 8,453,729 RAC: 13	Message 6821 - Posted: 14 Nov 2019, 9:47:07 UTC OK, jobs are available now, so you can start running CMS tasks again. ID: 6821 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 1006 Credit: 18,363,068 RAC: 17,163	Message 6826 - Posted: 15 Nov 2019, 0:40:59 UTC I have several running and a few finished and returned with no problems. Mad Scientist For Life ID: 6826 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 1006 Credit: 18,363,068 RAC: 17,163	Message 6827 - Posted: 15 Nov 2019, 1:50:37 UTC Last modified: 15 Nov 2019, 2:25:00 UTC edit: well I may have run out of luck since after turning in the finished tasks I started up new ones and on this 8-core I run 2 of the 2-core tasks and the first one was real slow getting connected with Cern and it failed at the VM and watching the Console as usual it was real slow getting beyond daemon and then froze up soon after page two started......then the VM killed it. I just started a new one and it was slow getting started too but it looks like it might make it to HTCondor ping before too long (but I wouldn't bet on it) And as far as my end my isp is running at full speed. (oh and I do NOT like the time limit on editing here either since I just typed all this out and it then would not let me so I had to copy and paste it here) -152 (0xFFFFFF68) ERR_NETOPEN [ERROR] Could not connect to lhchomeproxy.cern.ch on port 3125 Mad Scientist For Life ID: 6827 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 1006 Credit: 18,363,068 RAC: 17,163	Message 6830 - Posted: 15 Nov 2019, 11:40:42 UTC I just got another one of those errors on a different host Testing CVMFS connection to lhchomeproxy.cern.ch on port 3125 10:47:19.272426 VMMDev: Guest Log: [DEBUG] nc: connect to lhchomeproxy.cern.ch port 3125 (tcp) timed out: Operation now in progress 10:47:19.273231 VMMDev: Guest Log: nc: connect to lhchomeproxy.cern.ch port 3125 (tcp) timed out: Operation now in progress 10:47:19.326805 VMMDev: Guest Log: [DEBUG] 1 10:47:19.398225 VMMDev: Guest Log: [ERROR] Could not connect to lhchomeproxy.cern.ch on port 3125 10:47:19.456350 VMMDev: Guest Log: [INFO] Shutting Down. Both times they already had one of these tasks running (3;30am right now) And both times they got hung up at starting daemon Also both time I sent those back and started a new one and they both are up and running and most likely will be Valids .....goodnight Mad Scientist For Life ID: 6830 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 1006 Credit: 18,363,068 RAC: 17,163	Message 6831 - Posted: 15 Nov 2019, 16:38:54 UTC - in response to Message 6830. Ivan could you try your Windows OS pc again and see if it has this same problem I keep having? I am either having the Linux server (CentOS) not wanting to connect with Windows or I just need to move my computers 5000 miles closer to have a more reliable communication of servers. Many of mine are working and I have 4 different hosts (3 on Win 10 OS and one on Win7) but I keep watching this sloooooow connection between Cern and my end. -152 (0xFFFFFF68) ERR_NETOPEN this host had 3 Valids and now one of these. [ERROR] Could not connect to lhchomeproxy.cern.ch on port 3125 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2835807 This one started real slow on page one of the VM Console right from the start and did make it to page 2 and started to look like it would run but it froze up again where it is testing connection to the Condor. I just started another one and once again it looks like it will run this time. I haven't checked over at LHC yet so I don't know if anyone else had this happen. And I am still running at full speed on the isp (4Mbps) One thing for sure is I am probably the only human who would spend this much time just to get them to run and watching the thousands of VM Console pages and VB Manager Logs ID: 6831 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1281 Credit: 1,047,960 RAC: 57	Message 6832 - Posted: 15 Nov 2019, 18:11:58 UTC - in response to Message 6831. .... you try your Windows OS pc again and see if it has this same problem I keep having? I loaded 1 CMS on my rather old Windows host. Host heavy loaded with other tasks, cmsRun started 13 minutes after the boot and started running full load after 22 minutes up-time. ID: 6832 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 1006 Credit: 18,363,068 RAC: 17,163	Message 6833 - Posted: 16 Nov 2019, 0:35:40 UTC - in response to Message 6832. .... you try your Windows OS pc again and see if it has this same problem I keep having? I loaded 1 CMS on my rather old Windows host. Host heavy loaded with other tasks, cmsRun started 13 minutes after the boot and started running full load after 22 minutes up-time. Yes 12 - 13 mins or sooner is the best chance at getting them to run these (or any VB task) They always run if that happens. Like I said I have Valids on all of mine yet they can't be trusted since when the next one starts it may just sit there for 12mins or longer at daemon which means they won;t make it to HTCondor ping or cmsRun It would be nice if I didn't have to watch all of them. They are all running now but it wasn't on their own and just now I used up my day time high-speed (less than 2 days just because I run these) But when all of the current running tasks are finished and ready it will be late at night and I get this goofy bonus 50MB data use at 4-15Mbps from 2am to 8am so all will run them that way until I use all of that up. (87% left) (mine are also using all the rest of the cores running all the versions of Sixtracks and some of them run over 2 days) ID: 6833 · Rating: 0 · rate: / Reply Quote

Development for LHC@home