Message boards : CMS Application : CMS network test are getting more strict
Message board moderation

To post messages, you must log in.

AuthorMessage
computezrmle
Avatar

Send message
Joined: 28 Jul 16
Posts: 382
Credit: 369,221
RAC: 32
Message 7220 - Posted: 6 Jul 2021, 14:00:43 UTC

The new CMS bootstrap does a couple of basic network connection tests to the following target systems:
cern.ch port 80
vccs.cern.ch port 443
vocms0840.cern.ch port 9618 (HTCondor)
vocms0267.cern.ch port 4080 (WMAgent)

I recently noticed a couple of volunteer computers that don't pass this tests which causes an EXIT_INIT_FAILURE and shuts down the VM.

The tests are done using ncat and each test is repeated up to 3 times (=runs).
The timeout is currently set to 15 s for each run which should be enough to send packets around the world a couple of times.
Setting a higher timeout would not make much sense since especially the HTCondor server will be contacted by CMS every minute.


This example shows a computer that successfully passes run 1 to cern.ch and VCCS and run 3 to HTCondor but fails all 3 runs to WMAgent:
2021-07-06 08:20:53 (7517): Guest Log: [INFO] Testing connection to cern.ch
2021-07-06 08:20:53 (7517): Guest Log: [INFO] Testing connection to VCCS
2021-07-06 08:20:53 (7517): Guest Log: [INFO] Testing connection to HTCondor
2021-07-06 08:21:05 (7517): Guest Log: [DEBUG] Status run 1 of up to 3: 1
2021-07-06 08:21:26 (7517): Guest Log: [DEBUG] Status run 2 of up to 3: 1
2021-07-06 08:21:31 (7517): Guest Log: [INFO] Testing connection to WMAgent
2021-07-06 08:21:44 (7517): Guest Log: [DEBUG] Status run 1 of up to 3: 1
2021-07-06 08:22:05 (7517): Guest Log: [DEBUG] Status run 2 of up to 3: 1
2021-07-06 08:22:26 (7517): Guest Log: [DEBUG] Status run 3 of up to 3: 1
2021-07-06 08:22:26 (7517): Guest Log: [DEBUG] Ncat: Version 7.50 ( https://nmap.org/ncat )
2021-07-06 08:22:26 (7517): Guest Log: Ncat: Connection timed out.
2021-07-06 08:22:26 (7517): Guest Log: [ERROR] Could not connect to vocms0267.cern.ch on port 4080
2021-07-06 08:22:26 (7517): Guest Log: [INFO] Shutting Down.



I suspect the affected computers may be located behind a heavily loaded router.
I'd like to ask affected testers to describe under which local conditions the ncat tests fail.
ID: 7220 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 647
Credit: 10,234,351
RAC: 11,074
Message 7222 - Posted: 6 Jul 2021, 19:46:10 UTC - in response to Message 7220.  

Yes you know how that goes here for some reason so I always have to check by looking here
https://lhcathomedev.cern.ch/lhcathome-dev/top_hosts.php
And that of course only tells what computer and not who it is and several never seem to check it if they get the credit for those failed tasks and never seem to look here either.

We shouldn't have to do that here since this is for testing and hiding a computer is ..........
ID: 7222 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 586
Credit: 1,286,811
RAC: 143
Message 7225 - Posted: 7 Jul 2021, 6:56:17 UTC - in response to Message 7222.  

Yes you know how that goes here for some reason so I always have to check by looking here
https://lhcathomedev.cern.ch/lhcathome-dev/top_hosts.php
And that of course only tells what computer and not who it is and several never seem to check it if they get the credit for those failed tasks and never seem to look here either.

We shouldn't have to do that here since this is for testing and hiding a computer is ..........

for example:
Volunteer: mmonnin (451)
ID: 7225 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 20 Jun 17
Posts: 22
Credit: 1,067,855
RAC: 11
Message 7226 - Posted: 7 Jul 2021, 16:07:47 UTC - in response to Message 7225.  

Yes you know how that goes here for some reason so I always have to check by looking here
https://lhcathomedev.cern.ch/lhcathome-dev/top_hosts.php
And that of course only tells what computer and not who it is and several never seem to check it if they get the credit for those failed tasks and never seem to look here either.

We shouldn't have to do that here since this is for testing and hiding a computer is ..........

for example:
Volunteer: mmonnin (451)


Most of those are not mine so go bark at someone else.

Why doesn't LHC just remove the stats export option required by GDPR as well then?
ID: 7226 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : CMS Application : CMS network test are getting more strict


©2022 CERN