Message boards :
News :
Agent Update
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 329,589 RAC: 142 |
We will shortly be updating the agent that is running in the virtual machine. This should start using the new infrastructure that we have been working on recently. It may not work first time so if you have any feedback, please respond to this post. |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 329,589 RAC: 142 |
CVMFS has finally updated so the new agent is there. You should see different messages now. The job wrapper is running but we are checking the actual jobs. |
Send message Joined: 15 Apr 15 Posts: 38 Credit: 227,251 RAC: 0 |
Seems to be stuck on: Condor started in background, now waiting on process 952 |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 134 |
Not (yet) a problem but a related point. More volunteers may now participate in CMS. Presumably those using Windows still either need to use the "CMS patch" (BOINC 7.5.1) or later BOINC version to avoid interference with other projects. The link to the patch given on the CMS main page has not been reinstated nor is the patch now available from the download page. It would be a good idea to either provide a working link to the patch or alter the wording to point to appropriate BOINC versions for Windows. |
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 |
Not (yet) a problem but a related point. More volunteers may now participate in CMS. Presumably those using Windows still either need to use the "CMS patch" (BOINC 7.5.1) or later BOINC version to avoid interference with other projects. The link to the patch given on the CMS main page has not been reinstated nor is the patch now available from the download page. It would be a good idea to either provide a working link to the patch or alter the wording to point to appropriate BOINC versions for Windows. That would, currently, be BOINC v7.6.6 - available from http://boinc.berkeley.edu/download_all.php Except we found a new bug in that one yesterday (won't affect CMS-dev, unless members also run multiple GPUs). For that, there's a new patch: http://www.romwnet.org/files/boinc.030815.x64.zip That's downloading from Rom Walton's personal webspace - there are currently technical problems restricting access to the BOINC server. |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 134 |
Seems to be stuck on: and mine is similarly stuck. Process 9816. I also see the message:- "Unsupported GLIDEIN_SiteWMS encountered" Is this a problem? |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 848,858 RAC: 1,746 |
CVMFS has finally updated so the new agent is there. You should see different messages now. The job wrapper is running but we are checking the actual jobs. Seems to do something somewhere: [37m[04/08/15 21:55:31] returncode was: 0[0m [37m[04/08/15 21:55:31] --- we had a job exeption![0m [37m[04/08/15 21:55:31] output was: [0m [37m[04/08/15 21:55:31] None[0m [37m[04/08/15 21:55:31] --- error output was: [0m [37m[04/08/15 21:55:31] None[0m [37m[04/08/15 21:55:31] Done with CMS Job, uploading results to somewehere...[0m [37m[04/08/15 22:02:33] returncode was: 0[0m [37m[04/08/15 22:02:33] --- we had a job exeption![0m [37m[04/08/15 22:02:33] output was: [0m [37m[04/08/15 22:02:33] None[0m [37m[04/08/15 22:02:33] --- error output was: [0m [37m[04/08/15 22:02:33] None[0m [37m[04/08/15 22:02:33] Done with CMS Job, uploading results to somewehere...[0m and [37m[04/08/15 21:49:49] Self-signed certificate encountered.[0m [37m[04/08/15 21:49:49] HTTP request sent, awaiting response... 200 OK[0m [37m[04/08/15 21:49:49] Length: 53577 (52K) [application/x-sh][0m [37m[04/08/15 21:49:49] Saving to: “glidein_startup.sh”[0m [37m[04/08/15 21:49:49] [0m [37m[04/08/15 21:49:49] 0K .......... .......... .......... .......... .......... 95% 866K 0s[0m [37m[04/08/15 21:49:49] 50K .. 100% 4428G=0.06s[0m [37m[04/08/15 21:49:49] [0m [37m[04/08/15 21:49:49] 2015-08-04 21:49:10 (906 KB/s) - “glidein_startup.sh” saved [53577/53577][0m [37m[04/08/15 21:49:49] [0m [37m[04/08/15 21:57:03] --2015-08-04 21:57:03-- https://lcggwms02.gridpp.rl.ac.uk:9817/glidein_startup.sh[0m [37m[04/08/15 21:57:03] Resolving lcggwms02.gridpp.rl.ac.uk... 130.246.180.120[0m [37m[04/08/15 21:57:03] Connecting to lcggwms02.gridpp.rl.ac.uk|130.246.180.120|:9817... connected.[0m [37m[04/08/15 21:57:03] WARNING: cannot verify lcggwms02.gridpp.rl.ac.uk’s certificate, issued by “/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B”:[0m [37m[04/08/15 21:57:03] Self-signed certificate encountered.[0m [37m[04/08/15 21:57:03] HTTP request sent, awaiting response... 200 OK[0m [37m[04/08/15 21:57:03] Length: 53577 (52K) [application/x-sh][0m [37m[04/08/15 21:57:03] Saving to: “glidein_startup.sh”[0m [37m[04/08/15 21:57:03] [0m [37m[04/08/15 21:57:03] 0K .......... .......... .......... .......... .......... 95% 1.23M 0s[0m [37m[04/08/15 21:57:04] 50K .. 100% 4428G=0.04s[0m [37m[04/08/15 21:57:04] [0m [37m[04/08/15 21:57:04] 2015-08-04 21:57:03 (1.29 MB/s) - “glidein_startup.sh” saved [53577/53577][0m [37m[04/08/15 21:57:04] [0m [37m[04/08/15 22:04:03] --2015-08-04 22:04:03-- https://lcggwms02.gridpp.rl.ac.uk:9817/glidein_startup.sh[0m [37m[04/08/15 22:04:03] Resolving lcggwms02.gridpp.rl.ac.uk... 130.246.180.120[0m [37m[04/08/15 22:04:03] Connecting to lcggwms02.gridpp.rl.ac.uk|130.246.180.120|:9817... connected.[0m [37m[04/08/15 22:04:03] WARNING: cannot verify lcggwms02.gridpp.rl.ac.uk’s certificate, issued by “/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B”:[0m [37m[04/08/15 22:04:03] Self-signed certificate encountered.[0m [37m[04/08/15 22:04:03] HTTP request sent, awaiting response... 200 OK[0m [37m[04/08/15 22:04:03] Length: 53577 (52K) [application/x-sh][0m [37m[04/08/15 22:04:03] Saving to: “glidein_startup.sh”[0m [37m[04/08/15 22:04:03] [0m [37m[04/08/15 22:04:03] 0K .......... .......... .......... .......... .......... 95% 1.14M 0s[0m [37m[04/08/15 22:04:03] 50K .. 100% 4428G=0.04s[0m [37m[04/08/15 22:04:03] [0m [37m[04/08/15 22:04:03] 2015-08-04 22:04:03 (1.20 MB/s) - “glidein_startup.sh” saved [53577/53577][0m [37m[04/08/15 22:04:03] [0m |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 329,589 RAC: 142 |
This could be an issue. We had something similar before where due to some CMS configuration magic it would work if you were associated with a traditional Grid site but would fail for people at home. Will check this in more detail tomorrow. |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 134 |
This could be an issue. We had something similar before where due to some CMS configuration magic it would work if you were associated with a traditional Grid site but would fail for people at home. Will check this in more detail tomorrow. The setup sequence waits at the X509 setup script for a very long time ~30mins I caught these error messages:- |
Send message Joined: 8 Apr 15 Posts: 753 Credit: 11,734,345 RAC: 9,133 |
This could be an issue. We had something similar before where due to some CMS configuration magic it would work.... No doubt about it |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 329,589 RAC: 142 |
Many jobs seem to be running but failing for memory reasons. We will need to increase the memory of the VM to 2 Gb, which requires a new release. Shortly we will also update the consoles to show better logging. |
Send message Joined: 9 Apr 15 Posts: 57 Credit: 230,221 RAC: 0 |
I'm seeing VBOX start a job start every 7-8 minutes which runs for about 30s. |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 848,858 RAC: 1,746 |
Many jobs seem to be running but failing for memory reasons. We will need to increase the memory of the VM to 2 Gb, which requires a new release. Shortly we will also update the consoles to show better logging. You could also consider upgrading vboxwrapper to version 26169. Rom (BOINC) fixed several issues compared with your stock wrapper 26165 and the newest wrapper is also compatible with VirtualBox 5.0. |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 329,589 RAC: 142 |
Will do it in the next iteration. I have just made the new version (46.16) with 2Gb. Will try to do some more improvements this afternoon. |
Send message Joined: 9 Apr 15 Posts: 57 Credit: 230,221 RAC: 0 |
Aha! Seem to be generating output called <OSGTestResults> now. |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 329,589 RAC: 142 |
OK. If everyone looks at the top output (console 3), you should see the cmsRun process. If not, let me know. Tomorrow I will try to ensure that the interesting log files are available for download via the graphics feature so that we can investigate any issues. |
Send message Joined: 9 Apr 15 Posts: 57 Credit: 230,221 RAC: 0 |
Wow!!! current CMSRun has been going 50 mins, loadav 1.4... |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 848,858 RAC: 1,746 |
OK. If everyone looks at the top output (console 3), you should see the cmsRun process. If not, let me know. No cmsRun process at all. Sleeping time incraesing: |
Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,926,620 RAC: 2,840 |
That's Alt+F3, if you weren't aware.OK. If everyone looks at the top output (console 3), you should see the cmsRun process. If not, let me know. No cmsRun process at all. Sleeping time incraesing:It does that on my Linux box, then gets working ~30 mins later. |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 848,858 RAC: 1,746 |
No cmsRun process at all. Sleeping time incraesing:It does that on my Linux box, then gets working ~30 mins later. After 21 minutes came activity from the cvmfs2 processes, then after a while glidein_startup, condor_startup and short periods of high cpu usage of the condor_mips and condor_kflops processes. Didn't see any fraction of a cmsrun-process. Uptime on my Windows host 40 minutes now. Edit: After 1hr+ I see cmsRun for the first time with about 10% CPU and then increasing up to 93% and keeps on running ... now 2m39s |
©2024 CERN