Message boards :
CMS Application :
New Version 60.70
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
Synchronizing the vboxwrapper with the latest official version. |
Send message Joined: 28 Jul 16 Posts: 482 Credit: 394,720 RAC: 0 |
As for vboxwrapper 26206 this task is running fine: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3141537 I'm still missing the modifications of the vdi's boot partition that should make CVMFS fail-over and load balancing more robust. Are there plans to implement them before this vdi is used on prod? |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
No, I spoke to Jakob about this. He said that the values in the contextualization should overwrite the kernel values soon after the VM boots. They should get fixed in new CernVM release. |
Send message Joined: 22 Aug 22 Posts: 22 Credit: 63,680 RAC: 166 |
I get this when i try to start cms_mt My ISP doesn't support ipv6 |
Send message Joined: 28 Jul 16 Posts: 482 Credit: 394,720 RAC: 0 |
@Laurence The CMS vdi currently in use sets this link in /usr/sbin/bootstrap: /cvmfs/grid.cern.ch/vc/vm-qa/sbin/bootstrap-idtoken bootstrap-idtoken sets this variable: branch=qa If it was the intention to implement a switch dev/prod it will not work. I recently implemented a switch for ATLAS that solves the same objective: https://github.com/davidgcameron/boinc-scripts/blob/master/vbox/ATLASbootstrap.sh#L41-L45 It could easily be rewritten and tested for CMS. Just give me a "go". |
Send message Joined: 28 Jul 16 Posts: 482 Credit: 394,720 RAC: 0 |
*.openhtc.io responds to both, ipv4 and ipv6. from your screenshot -> 188.114.96.1 At least this one should have worked. It is clearly reported by the frontier client. Was it a transient error (1 task only) or do all tasks report it? Could you check whether Cloudflare is blocked in your firewall (maybe only for this box)? |
Send message Joined: 22 Aug 22 Posts: 22 Credit: 63,680 RAC: 166 |
When i try to open cms4-frontier.openhtc.io in chrome i get this. Probably problems on their side. |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
This was the plan. Ideally the switch would be done before the bootstrap script. Something like updating the /sbin to /cmvfs link. Was going to think about later so any ideas would be welcome. @Laurence |
Send message Joined: 28 Jul 16 Posts: 482 Credit: 394,720 RAC: 0 |
Your CVMFS requests from Theory tasks are also sent to *.openhtc.io. Surely to the same Cloudflare datacenter. Could even be that CVMFS and frontier requests are processed by the very same Cloudflare Squid instance there. Those Squids get their data from backend systems at CERN, RAL, Fermilab ... and do an automatic fail-over. Very unlikely that all backend systems are down at the same moment. Especially since this would crash nearly all CMS tasks worldwide. Please upgrade VirtualBox to the recent v6.1. The new vboxwrapper 26206 does not have the .com interface any more which was responsible for problems in the past. |
Send message Joined: 22 Aug 22 Posts: 22 Credit: 63,680 RAC: 166 |
What if i install virtualbox 7.0.2? |
Send message Joined: 28 Jul 16 Posts: 482 Credit: 394,720 RAC: 0 |
Some kind of a "bootstrap preloader" that does this: 1. mount the shared folder 2. parse init_data.xml from there (it tells you whether you are in dev or prod) 3. modify the link to the main bootstrap script in /sbin according to (2.) 4. mount grid.cern.ch (the link points to this repo; unlike ATLAS which gets it's boot script from atlas.cern.ch) 5. execute the main bootstrap script on CVMFS This is very close to the ATLAS script. I'll prepare a suggestion. |
Send message Joined: 28 Jul 16 Posts: 482 Credit: 394,720 RAC: 0 |
7.x might work, but since it is rather new you may stumble over unexpected issues. I'm already aware of a modification that affects the media manager. So far it's not a show stopper here but it needs a closer look. |
Send message Joined: 8 Apr 15 Posts: 780 Credit: 12,151,937 RAC: 2,140 |
Well it looks like suspend time again and it is friday so who know when......good thing I was up at 4am watching this happen. I must have forgot to set the one laptop to "no more work" so I got 9 of those on that one but the main 3 hosts only got one. I checked other members running CMS and they have the same thing. It happens fast too (4 min 46 sec) I see hundreds of these around here but I think we have them all suspended now. https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=4252 goodnight |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,182,521 RAC: 2,043 |
We've reverted the change that garbled our glidein script -- I'm running main and -dev jobs successfully now. |
Send message Joined: 8 Apr 15 Posts: 780 Credit: 12,151,937 RAC: 2,140 |
Thanks Ivan |
Send message Joined: 8 Apr 15 Posts: 780 Credit: 12,151,937 RAC: 2,140 |
https://lhcathomedev.cern.ch/lhcathome-dev/server_status.php It looks like we have CMS work here again and over at production |
Send message Joined: 8 Apr 15 Posts: 780 Credit: 12,151,937 RAC: 2,140 |
|
Send message Joined: 8 Apr 15 Posts: 780 Credit: 12,151,937 RAC: 2,140 |
Well this is the first time I saw a CMS multi do this. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3280035 Run time 17 hours 47 min 11 sec CPU time 2 days 16 hours 5 min 45 sec Validate state Valid Credit 579.47 60.70 (vbox64_mt_mcore_cms) I had another one running at the same time but it was just the way they have been since we started these a year ago. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3280034 |
Send message Joined: 8 Apr 15 Posts: 780 Credit: 12,151,937 RAC: 2,140 |
Well I got 3 of those that actually had more CPU time than run time but my next 3 went back to how they have been for the entire year and I see Ivan also had several of these using 4 cores and I had 8 cores running per task and I saw a couple more by other members. I will continue running them and see what happens. |
Send message Joined: 8 Apr 15 Posts: 780 Credit: 12,151,937 RAC: 2,140 |
I see Ivan has about 5 in a row of these running multi on the same Win 10 as I am running but mine so far stopped getting them so the last 6 of mine are back to single core when running all 8 cores (I have never changed that setting since last year at this time) Since that is what I run here I will just load up more and let them run. https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=3907 I see Mikey got one on Linux https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3280141 Toby couple on another Linux version with a Threadripper https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3280481 Didn't see any on those "hidden computers" Mad Scientist For Life |
©2024 CERN