Message boards : News : Agent Update
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Phil

Send message
Joined: 9 Apr 15
Posts: 57
Credit: 230,221
RAC: 0
Message 521 - Posted: 5 Aug 2015, 16:57:30 UTC
Last modified: 5 Aug 2015, 16:59:14 UTC

I get a bunch of condor_mips and condor_kflops (benchmarkings?) and now some cmsRun jobs running for about an hour each.
ID: 521 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 111
Message 522 - Posted: 5 Aug 2015, 18:59:26 UTC

It took ~40mins for cmsRun to first appear and about 70mins before it seemed to settle down to running continuously at ~80%CPU and 30%MEM.

Hopefully the startup sequence can be made a lot faster than this.
ID: 522 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 523 - Posted: 5 Aug 2015, 20:15:01 UTC - in response to Message 522.  

Thanks for the feedback. It is good to see that it is working for most. We can always optimize later once we know where the issues are.
ID: 523 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 111
Message 524 - Posted: 5 Aug 2015, 21:03:30 UTC - in response to Message 523.  

It is good to see that it is working for most.

Well, so far it's had an easy run here. The PC nearly to itself and it has been running (cmsRun now shows nearly two hours) undisturbed. It has yet to live with vLHC and a couple of others. Also, tomorrow morning a cron job will turn the PC off. The CMS task will have to survive all this.
ID: 524 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 111
Message 525 - Posted: 5 Aug 2015, 22:07:23 UTC

The first cmsRun "job" seems to have finished after ~174min and another has started. After only a few minutes it was up to 80% in contrast to the first job which took a long time to get going. Now to see how well task switching works.
ID: 525 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 526 - Posted: 6 Aug 2015, 7:22:26 UTC - in response to Message 525.  

This suggests that the first time it was downloading files into the CVMFS cache. Proving a new image that has these files baked in should solve the problem.
ID: 526 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 527 - Posted: 6 Aug 2015, 7:45:09 UTC - in response to Message 512.  

I have created a new app version (46.17) that upgrades the vboxwrapper to version 26169.
ID: 527 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,934,535
RAC: 3,078
Message 528 - Posted: 6 Aug 2015, 9:51:29 UTC - in response to Message 527.  
Last modified: 6 Aug 2015, 10:09:25 UTC

I have created a new app version (46.17) that upgrades the vboxwrapper to version 26169.

The bad news is, this still doesn't run on my Win7 box with VirtualBox 4.3.30. :-(

The good news is -- it does run under VirtualBox 5.0! :-)
Still have no idea what the problem was, tho'but...

Usual problem, though, it's stalled after startup.

getProxystderr:
$ cat getProxystderr

curl: (60) Peer certificate cannot be authenticated with known CA certificates
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
of Certificate Authority (CA) public keys (CA certs). If the default
bundle file isn't adequate, you can specify an alternate file
using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
the bundle, the certificate verification probably failed due to a
problem with the certificate (it might be expired, or the name might
not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
the -k (or --insecure) option.
Error opening input file cert.p12
cert.p12: No such file or directory
Error opening input file cert.p12
cert.p12: No such file or directory
chmod: cannot access `userkey.pem': No such file or directory
chmod: cannot access `usercert.pem': No such file or directory
get_proxy.sh: line 27: grid-proxy-init: command not found
get_proxy.sh: line 32: grid-proxy-info: command not found


...and -- now it's run further.
ID: 528 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,812,831
RAC: 17,251
Message 529 - Posted: 6 Aug 2015, 10:40:24 UTC - in response to Message 528.  

So CMS-dev needs Vbox 5 but Atlas won't run with that version yet ?
ID: 529 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,329
RAC: 1,481
Message 530 - Posted: 6 Aug 2015, 10:48:53 UTC - in response to Message 527.  

I have created a new app version (46.17) that upgrades the vboxwrapper to version 26169.

I had v26169 already running with VBox5.0, but will reset the project to get the stock versions of the files, when current cmsRun has finished.

Proving a new image that has these files baked in should solve the problem.

Did you also changed the VM?
ID: 530 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 May 15
Posts: 64
Credit: 55,584
RAC: 0
Message 531 - Posted: 6 Aug 2015, 10:54:41 UTC

Wrapper 26169 seems to be working fine with VBox 4.3.26 here.
ID: 531 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,812,831
RAC: 17,251
Message 532 - Posted: 6 Aug 2015, 11:09:58 UTC - in response to Message 531.  

Okay thanks, my 46.16 jobs are due to start finishing in about an hours time, will watch and see how 46.17 jobs get on without changing anything.
ID: 532 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,934,535
RAC: 3,078
Message 533 - Posted: 6 Aug 2015, 11:30:46 UTC - in response to Message 529.  

So CMS-dev needs Vbox 5 but Atlas won't run with that version yet ?

No, my machine wouldn't run with later versions of Vbox 4 (it ran OK with 4.3.12); others had no such problem.
ID: 533 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 534 - Posted: 6 Aug 2015, 12:23:42 UTC - in response to Message 529.  

We don't need Vbox 5 but the new wrapper should support its use.
ID: 534 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 535 - Posted: 6 Aug 2015, 12:26:34 UTC - in response to Message 530.  

No not yet, I want to clean a few things up first. Once CVMFS has updated you should see lots of new log files in the graphics. Also console 2 should show a clean ps output that should help you to see when cmsRun is running.
ID: 535 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Phil

Send message
Joined: 9 Apr 15
Posts: 57
Credit: 230,221
RAC: 0
Message 536 - Posted: 6 Aug 2015, 13:30:20 UTC

Working fine for me.
So far done two cmsrun jobs at around 50 mins each.
ID: 536 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,812,831
RAC: 17,251
Message 537 - Posted: 6 Aug 2015, 13:37:09 UTC - in response to Message 535.  

Three Win 7 boxes have completed their 46.16 jobs and started 46.17, all seem to be running okay and after 40 minutes CMSrun is running though not at full power yet.

Out of interest, the credit scoring seems all over the place. A slower box got a credit of 467 for 66,672 seconds of cpu time, two similar faster boxes got 872 points for 61,537 cpu time and 933 points for 52,396 cpu time. All 3 took the usual elapsed time (for Windows) of just over 24 hours.
ID: 537 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 538 - Posted: 6 Aug 2015, 14:05:24 UTC - in response to Message 537.  

The credit system is probably normalizing for the power of the different machines.
ID: 538 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,329
RAC: 1,481
Message 539 - Posted: 6 Aug 2015, 14:08:20 UTC - in response to Message 530.  

I had v26169 already running with VBox5.0, but will reset the project to get the stock versions of the files, when current cmsRun has finished.

I resetted the project. Got vboxwrapper_26169 from the project, yesterday's CMS*.xml and the CMS-vdi of March.
This time the 1st cmsRun started 8 minutes after the boot.
Alt+F2 now shows some PID's, runtimes and CMD's (processnames).
Alt+F5 no output (yet?) and in the logs only a boot.log
ID: 539 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,812,831
RAC: 17,251
Message 540 - Posted: 6 Aug 2015, 14:15:23 UTC - in response to Message 538.  

Two linux boxes have now started 46.17 jobs, still no problems.

They took just under 24 hours (as usual) to complete the 46.16 jobs and scored 854 for 67,849 cpu and 924 for 73,717 cpu seconds.

The low scoring windows box is much older/slower but the other 4 are quite similar though they are running other different things at the same time. Before you did these recent changes it used to be that the older box and my slow laptop got better scores than all the faster boxes !
ID: 540 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : News : Agent Update


©2024 CERN