Message boards : CMS Application : New Version 60.70
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 278
Message 7852 - Posted: 7 Nov 2022, 7:59:29 UTC

Synchronizing the vboxwrapper with the latest official version.
ID: 7852 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 467
Credit: 389,411
RAC: 503
Message 7854 - Posted: 7 Nov 2022, 9:03:25 UTC

As for vboxwrapper 26206 this task is running fine:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3141537

I'm still missing the modifications of the vdi's boot partition that should make CVMFS fail-over and load balancing more robust.
Are there plans to implement them before this vdi is used on prod?
ID: 7854 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 278
Message 7855 - Posted: 7 Nov 2022, 9:50:41 UTC - in response to Message 7854.  

No, I spoke to Jakob about this. He said that the values in the contextualization should overwrite the kernel values soon after the VM boots. They should get fixed in new CernVM release.
ID: 7855 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Aug 22
Posts: 19
Credit: 42,516
RAC: 443
Message 7858 - Posted: 7 Nov 2022, 14:58:01 UTC - in response to Message 7855.  
Last modified: 7 Nov 2022, 14:59:51 UTC

I get this when i try to start cms_mt


My ISP doesn't support ipv6
ID: 7858 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 467
Credit: 389,411
RAC: 503
Message 7859 - Posted: 7 Nov 2022, 15:03:56 UTC

@Laurence
The CMS vdi currently in use sets this link in /usr/sbin/bootstrap:
/cvmfs/grid.cern.ch/vc/vm-qa/sbin/bootstrap-idtoken

bootstrap-idtoken sets this variable:
branch=qa

If it was the intention to implement a switch dev/prod it will not work.


I recently implemented a switch for ATLAS that solves the same objective:
https://github.com/davidgcameron/boinc-scripts/blob/master/vbox/ATLASbootstrap.sh#L41-L45

It could easily be rewritten and tested for CMS.
Just give me a "go".
ID: 7859 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 467
Credit: 389,411
RAC: 503
Message 7860 - Posted: 7 Nov 2022, 15:16:50 UTC - in response to Message 7858.  

*.openhtc.io responds to both, ipv4 and ipv6.

from your screenshot -> 188.114.96.1
At least this one should have worked.

It is clearly reported by the frontier client.
Was it a transient error (1 task only) or do all tasks report it?

Could you check whether Cloudflare is blocked in your firewall (maybe only for this box)?
ID: 7860 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Aug 22
Posts: 19
Credit: 42,516
RAC: 443
Message 7861 - Posted: 7 Nov 2022, 15:43:56 UTC - in response to Message 7860.  

When i try to open cms4-frontier.openhtc.io in chrome i get this.

Probably problems on their side.
ID: 7861 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 278
Message 7862 - Posted: 7 Nov 2022, 16:19:05 UTC - in response to Message 7859.  

This was the plan. Ideally the switch would be done before the bootstrap script. Something like updating the /sbin to /cmvfs link. Was going to think about later so any ideas would be welcome.

@Laurence
The CMS vdi currently in use sets this link in /usr/sbin/bootstrap:
/cvmfs/grid.cern.ch/vc/vm-qa/sbin/bootstrap-idtoken

bootstrap-idtoken sets this variable:
branch=qa

If it was the intention to implement a switch dev/prod it will not work.


I recently implemented a switch for ATLAS that solves the same objective:
https://github.com/davidgcameron/boinc-scripts/blob/master/vbox/ATLASbootstrap.sh#L41-L45

It could easily be rewritten and tested for CMS.
Just give me a "go".
ID: 7862 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 467
Credit: 389,411
RAC: 503
Message 7863 - Posted: 7 Nov 2022, 16:30:29 UTC - in response to Message 7861.  

Your CVMFS requests from Theory tasks are also sent to *.openhtc.io.
Surely to the same Cloudflare datacenter.
Could even be that CVMFS and frontier requests are processed by the very same Cloudflare Squid instance there.

Those Squids get their data from backend systems at CERN, RAL, Fermilab ... and do an automatic fail-over.
Very unlikely that all backend systems are down at the same moment.
Especially since this would crash nearly all CMS tasks worldwide.


Please upgrade VirtualBox to the recent v6.1.
The new vboxwrapper 26206 does not have the .com interface any more which was responsible for problems in the past.
ID: 7863 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Aug 22
Posts: 19
Credit: 42,516
RAC: 443
Message 7864 - Posted: 7 Nov 2022, 16:32:56 UTC - in response to Message 7863.  

What if i install virtualbox 7.0.2?
ID: 7864 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 467
Credit: 389,411
RAC: 503
Message 7865 - Posted: 7 Nov 2022, 16:45:57 UTC - in response to Message 7862.  

Some kind of a "bootstrap preloader" that does this:
1. mount the shared folder
2. parse init_data.xml from there (it tells you whether you are in dev or prod)
3. modify the link to the main bootstrap script in /sbin according to (2.)
4. mount grid.cern.ch (the link points to this repo; unlike ATLAS which gets it's boot script from atlas.cern.ch)
5. execute the main bootstrap script on CVMFS


This is very close to the ATLAS script.
I'll prepare a suggestion.
ID: 7865 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 467
Credit: 389,411
RAC: 503
Message 7866 - Posted: 7 Nov 2022, 16:51:43 UTC - in response to Message 7864.  

7.x might work, but since it is rather new you may stumble over unexpected issues.

I'm already aware of a modification that affects the media manager.
So far it's not a show stopper here but it needs a closer look.
ID: 7866 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 734
Credit: 11,558,055
RAC: 2,030
Message 7895 - Posted: 25 Nov 2022, 12:19:19 UTC

Well it looks like suspend time again and it is friday so who know when......good thing I was up at 4am watching this happen.
I must have forgot to set the one laptop to "no more work" so I got 9 of those on that one but the main 3 hosts only got one.

I checked other members running CMS and they have the same thing.
It happens fast too (4 min 46 sec)

I see hundreds of these around here but I think we have them all suspended now.
https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=4252

goodnight
ID: 7895 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1128
Credit: 7,870,419
RAC: 595
Message 7896 - Posted: 25 Nov 2022, 15:26:52 UTC - in response to Message 7895.  

We've reverted the change that garbled our glidein script -- I'm running main and -dev jobs successfully now.
ID: 7896 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 734
Credit: 11,558,055
RAC: 2,030
Message 7897 - Posted: 26 Nov 2022, 1:19:23 UTC

Thanks Ivan
ID: 7897 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 734
Credit: 11,558,055
RAC: 2,030
Message 8157 - Posted: 1 Sep 2023, 19:10:17 UTC

https://lhcathomedev.cern.ch/lhcathome-dev/server_status.php

It looks like we have CMS work here again and over at production
ID: 8157 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 734
Credit: 11,558,055
RAC: 2,030
Message 8216 - Posted: 8 Nov 2023, 12:53:56 UTC

ID: 8216 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 734
Credit: 11,558,055
RAC: 2,030
Message 8247 - Posted: 10 Dec 2023, 1:57:18 UTC

Well this is the first time I saw a CMS multi do this.
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3280035

Run time 17 hours 47 min 11 sec
CPU time 2 days 16 hours 5 min 45 sec
Validate state Valid
Credit 579.47
60.70 (vbox64_mt_mcore_cms)
I had another one running at the same time but it was just the way they have been since we started these a year ago.
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3280034
ID: 8247 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 734
Credit: 11,558,055
RAC: 2,030
Message 8249 - Posted: 11 Dec 2023, 1:09:33 UTC

Well I got 3 of those that actually had more CPU time than run time but my next 3 went back to how they have been for the entire year and I see Ivan also had several of these using 4 cores and I had 8 cores running per task and I saw a couple more by other members.

I will continue running them and see what happens.
ID: 8249 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 734
Credit: 11,558,055
RAC: 2,030
Message 8250 - Posted: 11 Dec 2023, 22:45:37 UTC - in response to Message 8249.  
Last modified: 11 Dec 2023, 22:57:43 UTC

I see Ivan has about 5 in a row of these running multi on the same Win 10 as I am running but mine so far stopped getting them so the last 6 of mine are back to single core when running all 8 cores (I have never changed that setting since last year at this time)
Since that is what I run here I will just load up more and let them run.
https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=3907

I see Mikey got one on Linux
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3280141
Toby couple on another Linux version with a Threadripper
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3280481
Didn't see any on those "hidden computers"
Mad Scientist For Life
ID: 8250 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : CMS Application : New Version 60.70


©2024 CERN