Message boards : News : Agent Update
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 142
Message 500 - Posted: 4 Aug 2015, 11:21:50 UTC

We will shortly be updating the agent that is running in the virtual machine. This should start using the new infrastructure that we have been working on recently. It may not work first time so if you have any feedback, please respond to this post.
ID: 500 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 142
Message 501 - Posted: 4 Aug 2015, 15:05:05 UTC - in response to Message 500.  

CVMFS has finally updated so the new agent is there. You should see different messages now. The job wrapper is running but we are checking the actual jobs.
ID: 501 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rbpeake

Send message
Joined: 15 Apr 15
Posts: 38
Credit: 227,251
RAC: 0
Message 502 - Posted: 4 Aug 2015, 16:34:39 UTC

Seems to be stuck on:
Condor started in background, now waiting on process 952
ID: 502 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 134
Message 503 - Posted: 4 Aug 2015, 17:09:12 UTC
Last modified: 4 Aug 2015, 17:11:49 UTC

Not (yet) a problem but a related point. More volunteers may now participate in CMS. Presumably those using Windows still either need to use the "CMS patch" (BOINC 7.5.1) or later BOINC version to avoid interference with other projects. The link to the patch given on the CMS main page has not been reinstated nor is the patch now available from the download page. It would be a good idea to either provide a working link to the patch or alter the wording to point to appropriate BOINC versions for Windows.
ID: 503 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 May 15
Posts: 64
Credit: 55,584
RAC: 0
Message 504 - Posted: 4 Aug 2015, 17:33:27 UTC - in response to Message 503.  

Not (yet) a problem but a related point. More volunteers may now participate in CMS. Presumably those using Windows still either need to use the "CMS patch" (BOINC 7.5.1) or later BOINC version to avoid interference with other projects. The link to the patch given on the CMS main page has not been reinstated nor is the patch now available from the download page. It would be a good idea to either provide a working link to the patch or alter the wording to point to appropriate BOINC versions for Windows.

That would, currently, be BOINC v7.6.6 - available from http://boinc.berkeley.edu/download_all.php

Except we found a new bug in that one yesterday (won't affect CMS-dev, unless members also run multiple GPUs). For that, there's a new patch:

http://www.romwnet.org/files/boinc.030815.x64.zip

That's downloading from Rom Walton's personal webspace - there are currently technical problems restricting access to the BOINC server.
ID: 504 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 134
Message 505 - Posted: 4 Aug 2015, 17:57:16 UTC - in response to Message 502.  
Last modified: 4 Aug 2015, 18:10:28 UTC

Seems to be stuck on:
Condor started in background, now waiting on process 952


and mine is similarly stuck. Process 9816.

I also see the message:-

"Unsupported GLIDEIN_SiteWMS encountered" Is this a problem?
ID: 505 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,858
RAC: 1,746
Message 506 - Posted: 4 Aug 2015, 20:06:40 UTC - in response to Message 501.  

CVMFS has finally updated so the new agent is there. You should see different messages now. The job wrapper is running but we are checking the actual jobs.

Seems to do something somewhere:

[04/08/15 21:55:31] returncode was: 0
[04/08/15 21:55:31] --- we had a job exeption!
[04/08/15 21:55:31] output was: 
[04/08/15 21:55:31] None
[04/08/15 21:55:31] --- error output was: 
[04/08/15 21:55:31] None
[04/08/15 21:55:31] Done with CMS Job, uploading results to somewehere...
[04/08/15 22:02:33] returncode was: 0
[04/08/15 22:02:33] --- we had a job exeption!
[04/08/15 22:02:33] output was: 
[04/08/15 22:02:33] None
[04/08/15 22:02:33] --- error output was: 
[04/08/15 22:02:33] None
[04/08/15 22:02:33] Done with CMS Job, uploading results to somewehere...


and

[04/08/15 21:49:49] Self-signed certificate encountered.
[04/08/15 21:49:49] HTTP request sent, awaiting response... 200 OK
[04/08/15 21:49:49] Length: 53577 (52K) [application/x-sh]
[04/08/15 21:49:49] Saving to: “glidein_startup.sh”
[04/08/15 21:49:49] 
[04/08/15 21:49:49] 0K .......... .......... .......... .......... .......... 95% 866K 0s
[04/08/15 21:49:49] 50K .. 100% 4428G=0.06s
[04/08/15 21:49:49] 
[04/08/15 21:49:49] 2015-08-04 21:49:10 (906 KB/s) - “glidein_startup.sh” saved [53577/53577]
[04/08/15 21:49:49] 
[04/08/15 21:57:03] --2015-08-04 21:57:03-- https://lcggwms02.gridpp.rl.ac.uk:9817/glidein_startup.sh
[04/08/15 21:57:03] Resolving lcggwms02.gridpp.rl.ac.uk... 130.246.180.120
[04/08/15 21:57:03] Connecting to lcggwms02.gridpp.rl.ac.uk|130.246.180.120|:9817... connected.
[04/08/15 21:57:03] WARNING: cannot verify lcggwms02.gridpp.rl.ac.uk’s certificate, issued by “/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B”:
[04/08/15 21:57:03] Self-signed certificate encountered.
[04/08/15 21:57:03] HTTP request sent, awaiting response... 200 OK
[04/08/15 21:57:03] Length: 53577 (52K) [application/x-sh]
[04/08/15 21:57:03] Saving to: “glidein_startup.sh”
[04/08/15 21:57:03] 
[04/08/15 21:57:03] 0K .......... .......... .......... .......... .......... 95% 1.23M 0s
[04/08/15 21:57:04] 50K .. 100% 4428G=0.04s
[04/08/15 21:57:04] 
[04/08/15 21:57:04] 2015-08-04 21:57:03 (1.29 MB/s) - “glidein_startup.sh” saved [53577/53577]
[04/08/15 21:57:04] 
[04/08/15 22:04:03] --2015-08-04 22:04:03-- https://lcggwms02.gridpp.rl.ac.uk:9817/glidein_startup.sh
[04/08/15 22:04:03] Resolving lcggwms02.gridpp.rl.ac.uk... 130.246.180.120
[04/08/15 22:04:03] Connecting to lcggwms02.gridpp.rl.ac.uk|130.246.180.120|:9817... connected.
[04/08/15 22:04:03] WARNING: cannot verify lcggwms02.gridpp.rl.ac.uk’s certificate, issued by “/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B”:
[04/08/15 22:04:03] Self-signed certificate encountered.
[04/08/15 22:04:03] HTTP request sent, awaiting response... 200 OK
[04/08/15 22:04:03] Length: 53577 (52K) [application/x-sh]
[04/08/15 22:04:03] Saving to: “glidein_startup.sh”
[04/08/15 22:04:03] 
[04/08/15 22:04:03] 0K .......... .......... .......... .......... .......... 95% 1.14M 0s
[04/08/15 22:04:03] 50K .. 100% 4428G=0.04s
[04/08/15 22:04:03] 
[04/08/15 22:04:03] 2015-08-04 22:04:03 (1.20 MB/s) - “glidein_startup.sh” saved [53577/53577]
[04/08/15 22:04:03] 
ID: 506 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 142
Message 507 - Posted: 4 Aug 2015, 21:51:27 UTC - in response to Message 505.  

This could be an issue. We had something similar before where due to some CMS configuration magic it would work if you were associated with a traditional Grid site but would fail for people at home. Will check this in more detail tomorrow.
ID: 507 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 134
Message 508 - Posted: 4 Aug 2015, 23:52:56 UTC - in response to Message 507.  

This could be an issue. We had something similar before where due to some CMS configuration magic it would work if you were associated with a traditional Grid site but would fail for people at home. Will check this in more detail tomorrow.


The setup sequence waits at the X509 setup script for a very long time ~30mins

I caught these error messages:-

ID: 508 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 753
Credit: 11,734,345
RAC: 9,133
Message 509 - Posted: 5 Aug 2015, 5:40:24 UTC - in response to Message 507.  

This could be an issue. We had something similar before where due to some CMS configuration magic it would work....



No doubt about it
ID: 509 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 142
Message 510 - Posted: 5 Aug 2015, 9:38:09 UTC - in response to Message 509.  

Many jobs seem to be running but failing for memory reasons. We will need to increase the memory of the VM to 2 Gb, which requires a new release. Shortly we will also update the consoles to show better logging.
ID: 510 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Phil

Send message
Joined: 9 Apr 15
Posts: 57
Credit: 230,221
RAC: 0
Message 511 - Posted: 5 Aug 2015, 10:34:14 UTC
Last modified: 5 Aug 2015, 10:38:07 UTC

I'm seeing VBOX start a job start every 7-8 minutes which runs for about 30s.
ID: 511 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,858
RAC: 1,746
Message 512 - Posted: 5 Aug 2015, 10:36:06 UTC - in response to Message 510.  

Many jobs seem to be running but failing for memory reasons. We will need to increase the memory of the VM to 2 Gb, which requires a new release. Shortly we will also update the consoles to show better logging.

You could also consider upgrading vboxwrapper to version 26169.
Rom (BOINC) fixed several issues compared with your stock wrapper 26165 and the newest wrapper is also compatible with VirtualBox 5.0.
ID: 512 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 142
Message 513 - Posted: 5 Aug 2015, 10:38:45 UTC - in response to Message 512.  

Will do it in the next iteration. I have just made the new version (46.16) with 2Gb. Will try to do some more improvements this afternoon.
ID: 513 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Phil

Send message
Joined: 9 Apr 15
Posts: 57
Credit: 230,221
RAC: 0
Message 514 - Posted: 5 Aug 2015, 12:14:37 UTC
Last modified: 5 Aug 2015, 12:18:35 UTC

Aha!
Seem to be generating output called <OSGTestResults> now.
ID: 514 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 142
Message 515 - Posted: 5 Aug 2015, 15:24:34 UTC - in response to Message 514.  

OK. If everyone looks at the top output (console 3), you should see the cmsRun process. If not, let me know. Tomorrow I will try to ensure that the interesting log files are available for download via the graphics feature so that we can investigate any issues.
ID: 515 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Phil

Send message
Joined: 9 Apr 15
Posts: 57
Credit: 230,221
RAC: 0
Message 516 - Posted: 5 Aug 2015, 15:36:45 UTC
Last modified: 5 Aug 2015, 15:40:49 UTC

Wow!!!
current CMSRun has been going 50 mins, loadav 1.4...
ID: 516 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,858
RAC: 1,746
Message 517 - Posted: 5 Aug 2015, 16:07:33 UTC - in response to Message 515.  

OK. If everyone looks at the top output (console 3), you should see the cmsRun process. If not, let me know.

No cmsRun process at all. Sleeping time incraesing:

ID: 517 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,926,620
RAC: 2,840
Message 519 - Posted: 5 Aug 2015, 16:16:39 UTC - in response to Message 517.  

OK. If everyone looks at the top output (console 3), you should see the cmsRun process. If not, let me know.
That's Alt+F3, if you weren't aware.

No cmsRun process at all. Sleeping time incraesing:
It does that on my Linux box, then gets working ~30 mins later.
ID: 519 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,858
RAC: 1,746
Message 520 - Posted: 5 Aug 2015, 16:34:57 UTC - in response to Message 519.  
Last modified: 5 Aug 2015, 17:05:31 UTC

No cmsRun process at all. Sleeping time incraesing:
It does that on my Linux box, then gets working ~30 mins later.

After 21 minutes came activity from the cvmfs2 processes, then after a while glidein_startup, condor_startup and short periods of high cpu usage of the condor_mips and condor_kflops processes.
Didn't see any fraction of a cmsrun-process. Uptime on my Windows host 40 minutes now.

Edit: After 1hr+ I see cmsRun for the first time with about 10% CPU and then increasing up to 93% and keeps on running ... now 2m39s
ID: 520 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : News : Agent Update


©2024 CERN