Message boards : News : CMS job shortage Wednesday 13th November
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1093
Credit: 6,893,316
RAC: 0
Message 6812 - Posted: 11 Nov 2019, 15:50:55 UTC

CMS IT will be installing a new version of WMAgent on Wednesday. This will impact job availability for the duration of the intervention. We might be able to eliminate the little gremlin that's been plaguing us for the last few weeks, too.
So, please set your CMS processors to No New Tasks sometime tomorrow, Tuesday 12th, so that current tasks will stop requesting new jobs before the queues get cut. I'll let you know when jobs are available again.
Thanks.
ID: 6812 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 536
Credit: 7,500,539
RAC: 4,901
Message 6813 - Posted: 12 Nov 2019, 4:02:29 UTC - in response to Message 6812.  

Thanks Ivan

I will just finish the ones I have running.
ID: 6813 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1093
Credit: 6,893,316
RAC: 0
Message 6821 - Posted: 14 Nov 2019, 9:47:07 UTC

OK, jobs are available now, so you can start running CMS tasks again.
ID: 6821 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 536
Credit: 7,500,539
RAC: 4,901
Message 6826 - Posted: 15 Nov 2019, 0:40:59 UTC

I have several running and a few finished and returned with no problems.
ID: 6826 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 536
Credit: 7,500,539
RAC: 4,901
Message 6827 - Posted: 15 Nov 2019, 1:50:37 UTC
Last modified: 15 Nov 2019, 2:25:00 UTC

edit: well I may have run out of luck since after turning in the finished tasks I started up new ones and on this 8-core I run 2 of the 2-core tasks and the first one was real slow getting connected with Cern and it failed at the VM and watching the Console as usual it was real slow getting beyond *daemon* and then froze up soon after page two started......then the VM killed it.

I just started a new one and it was slow getting started too but it looks like it might make it to HTCondor ping before too long (but I wouldn't bet on it)

And as far as my end my isp is running at full speed.

(oh and I do NOT like the time limit on editing here either since I just typed all this out and it then would not let me so I had to copy and paste it here)

-152 (0xFFFFFF68) ERR_NETOPEN [ERROR] Could not connect to lhchomeproxy.cern.ch on port 3125
ID: 6827 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 536
Credit: 7,500,539
RAC: 4,901
Message 6830 - Posted: 15 Nov 2019, 11:40:42 UTC

I just got another one of those errors on a different host

Testing CVMFS connection to lhchomeproxy.cern.ch on port 3125
10:47:19.272426 VMMDev: Guest Log: [DEBUG] nc: connect to lhchomeproxy.cern.ch port 3125 (tcp) timed out: Operation now in progress
10:47:19.273231 VMMDev: Guest Log: nc: connect to lhchomeproxy.cern.ch port 3125 (tcp) timed out: Operation now in progress
10:47:19.326805 VMMDev: Guest Log: [DEBUG] 1
10:47:19.398225 VMMDev: Guest Log: [ERROR] Could not connect to lhchomeproxy.cern.ch on port 3125
10:47:19.456350 VMMDev: Guest Log: [INFO] Shutting Down.


Both times they already had one of these tasks running (3;30am right now)

And both times they got hung up at starting daemon

Also both time I sent those back and started a new one and they both are up and running and most likely will be Valids

.....goodnight
ID: 6830 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 536
Credit: 7,500,539
RAC: 4,901
Message 6831 - Posted: 15 Nov 2019, 16:38:54 UTC - in response to Message 6830.  

Ivan could you try your Windows OS pc again and see if it has this same problem I keep having?

I am either having the Linux server (CentOS) not wanting to connect with Windows or I just need to move my computers 5000 miles closer to have a more reliable communication of servers.

Many of mine are working and I have 4 different hosts (3 on Win 10 OS and one on Win7) but I keep watching this sloooooow connection between Cern and my end.

-152 (0xFFFFFF68) ERR_NETOPEN this host had 3 Valids and now one of these.
[ERROR] Could not connect to lhchomeproxy.cern.ch on port 3125

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2835807

This one started real slow on page one of the VM Console right from the start and did make it to page 2 and started to look like it would run but it froze up again where it is testing connection to the Condor.

I just started another one and once again it looks like it will run this time.

I haven't checked over at LHC yet so I don't know if anyone else had this happen.

And I am still running at full speed on the isp (4Mbps)

One thing for sure is I am probably the only human who would spend this much time just to get them to run and watching the thousands of VM Console pages and VB Manager Logs
ID: 6831 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1010
Credit: 591,548
RAC: 2
Message 6832 - Posted: 15 Nov 2019, 18:11:58 UTC - in response to Message 6831.  

.... you try your Windows OS pc again and see if it has this same problem I keep having?

I loaded 1 CMS on my rather old Windows host.
Host heavy loaded with other tasks, cmsRun started 13 minutes after the boot and started running full load after 22 minutes up-time.
ID: 6832 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 536
Credit: 7,500,539
RAC: 4,901
Message 6833 - Posted: 16 Nov 2019, 0:35:40 UTC - in response to Message 6832.  

.... you try your Windows OS pc again and see if it has this same problem I keep having?

I loaded 1 CMS on my rather old Windows host.
Host heavy loaded with other tasks, cmsRun started 13 minutes after the boot and started running full load after 22 minutes up-time.



Yes 12 - 13 mins or sooner is the best chance at getting them to run these (or any VB task)

They always run if that happens.

Like I said I have Valids on all of mine yet they can't be trusted since when the next one starts it may just sit there for 12mins or longer at daemon which means they won;t make it to HTCondor ping or cmsRun

It would be nice if I didn't have to watch all of them.

They are all running now but it wasn't on their own and just now I used up my day time high-speed (less than 2 days just because I run these)

But when all of the current running tasks are finished and ready it will be late at night and I get this goofy *bonus* 50MB data use at 4-15Mbps from 2am to 8am so all will run them that way until I use all of that up. (87% left)

(mine are also using all the rest of the cores running all the versions of Sixtracks and some of them run over 2 days)
ID: 6833 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : CMS job shortage Wednesday 13th November


©2020 CERN