Message boards :
News :
New developments
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 728 |
We're at the stage where we have to make disruptive changes to the workflow, in order to get the results onto the Grid from the data-bridge. At some point soon we'll start getting errors for jobs in the current batch, at which time I'll ditch the rest and submit a small test batch. If we're lucky that may be the end of it, we'll have to see. Thanks in advance for your understanding. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 728 |
Oops, we do seem to have broken it... Waiting for feedback from the experts. Later: modified script submitted. Even later: unsuccessfully... :-( |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 728 |
Oh, well, looks like a quiet night. Take a rest, everyone. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 728 |
Just a note, after a very long day (up at 0400 to go to CERN, back home at 2345, still "catching up" with things at You'll have probably noticed the strange behaviour of the "CMS Jobs" graphs. We made an (essential) change, but it's b0rked things. We're trying to work out what, and the best debug tool at the moment seems to be small batches of short jobs. Hence the weird Dashboard reportage. As ever, we value your participation and your perseverance. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
How did your presentation go? |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 728 |
How did your presentation go? Reasonably well. As Ben said afterwards, "At least they didn't tell you to stop!" It's hard to get the discussion going in the way we want, there is a lot of concern about the validity of the results [e.g. malicious Trojan horses] and this skews the dialogue. Still, a lot of new ideas to think upon. You will probably see some changes in the weeks ahead, perhaps even progress on more-than-one-job-at-a-time. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
Reasonably well. As Ben said afterwards, "At least they didn't tell you to stop!" It's good to hear, that Ben is still in a good shape ;) |
Send message Joined: 29 May 15 Posts: 147 Credit: 2,842,484 RAC: 0 |
So, now at the Moment, are we running or not? I see this in my logs: 12:18:01 +0200 2015-10-19 [INFO] CMS glidein Run 13 ended 12:19:01 +0200 2015-10-19 [INFO] Starting CMS Application - Run 14 12:19:01 +0200 2015-10-19 [INFO] Reading the BOINC volunteer's information 12:19:02 +0200 2015-10-19 [INFO] Volunteer: Yeti (250) Host: 495 12:19:02 +0200 2015-10-19 [INFO] VMID: a248a608-bb13-4ecc-8fba-70015f0a4b90 12:19:02 +0200 2015-10-19 [INFO] Requesting an X509 credential subject : /O=Volunteer Computing/O=CERN/CN=Yeti 250/CN=1181170921 issuer : /O=Volunteer Computing/O=CERN/CN=Yeti 250 identity : /O=Volunteer Computing/O=CERN/CN=Yeti 250 type : RFC 3820 compliant impersonation proxy strength : 1024 bits path : /tmp/x509up_u500 timeleft : 129:03:58 (5.4 days) 12:19:06 +0200 2015-10-19 [INFO] Downloading glidein 12:19:07 +0200 2015-10-19 [INFO] Running glidein (check logs) 12:25:01 +0200 2015-10-19 [INFO] CMS glidein Run 14 ended 12:26:01 +0200 2015-10-19 [INFO] Starting CMS Application - Run 15 12:26:01 +0200 2015-10-19 [INFO] Reading the BOINC volunteer's information 12:26:02 +0200 2015-10-19 [INFO] Volunteer: Yeti (250) Host: 495 12:26:02 +0200 2015-10-19 [INFO] VMID: a248a608-bb13-4ecc-8fba-70015f0a4b90 12:26:02 +0200 2015-10-19 [INFO] Requesting an X509 credential subject : /O=Volunteer Computing/O=CERN/CN=Yeti 250/CN=30085940 issuer : /O=Volunteer Computing/O=CERN/CN=Yeti 250 identity : /O=Volunteer Computing/O=CERN/CN=Yeti 250 type : RFC 3820 compliant impersonation proxy strength : 1024 bits path : /tmp/x509up_u500 timeleft : 130:00:00 (5.4 days) 12:26:02 +0200 2015-10-19 [INFO] Downloading glidein 12:26:03 +0200 2015-10-19 [INFO] Running glidein (check logs) 12:31:01 +0200 2015-10-19 [INFO] CMS glidein Run 15 ended 12:32:01 +0200 2015-10-19 [INFO] Starting CMS Application - Run 16 12:32:01 +0200 2015-10-19 [INFO] Reading the BOINC volunteer's information 12:32:02 +0200 2015-10-19 [INFO] Volunteer: Yeti (250) Host: 495 12:32:02 +0200 2015-10-19 [INFO] VMID: a248a608-bb13-4ecc-8fba-70015f0a4b90 12:32:02 +0200 2015-10-19 [INFO] Requesting an X509 credential 12:32:03 +0200 2015-10-19 [ERROR] Proxy error 12:32:03 +0200 2015-10-19 [INFO] Going to sleep for 1 hour 12:33:01 +0200 2015-10-19 [INFO] Starting CMS Application - Run 17 12:33:01 +0200 2015-10-19 [INFO] Reading the BOINC volunteer's information 12:33:02 +0200 2015-10-19 [INFO] Volunteer: Yeti (250) Host: 495 12:33:02 +0200 2015-10-19 [INFO] VMID: a248a608-bb13-4ecc-8fba-70015f0a4b90 12:33:02 +0200 2015-10-19 [INFO] Requesting an X509 credential subject : /O=Volunteer Computing/O=CERN/CN=Yeti 250/CN=30085940 issuer : /O=Volunteer Computing/O=CERN/CN=Yeti 250 identity : /O=Volunteer Computing/O=CERN/CN=Yeti 250 type : RFC 3820 compliant impersonation proxy strength : 1024 bits path : /tmp/x509up_u500 timeleft : 129:53:00 (5.4 days) 12:33:02 +0200 2015-10-19 [INFO] Downloading glidein 12:33:03 +0200 2015-10-19 [INFO] Running glidein (check logs) and in the stderr: ERROR: Couldn't read proxy from: /tmp/x509up_u500 globus_credential: Error reading proxy credential globus_credential: Error reading proxy credential: Couldn't read PEM from bio OpenSSL Error: pem_lib.c:703: in library: PEM routines, function PEM_read_bio: no start line Use -debug for further information. ERROR: Couldn't read proxy from: /tmp/x509up_u500 globus_credential: Error reading proxy credential globus_credential: Error reading proxy credential: Couldn't read PEM from bio OpenSSL Error: pem_lib.c:703: in library: PEM routines, function PEM_read_bio: no start line Use -debug for further information. ERROR: Couldn't read proxy from: /tmp/x509up_u500 globus_credential: Error reading proxy credential globus_credential: Error reading proxy credential: Couldn't read PEM from bio OpenSSL Error: pem_lib.c:703: in library: PEM routines, function PEM_read_bio: no start line Use -debug for further information. ----------------------------- Should we go standby or stay beeing online ? |
Send message Joined: 17 Aug 15 Posts: 62 Credit: 296,695 RAC: 0 |
Since I cannot see the consoles due to RDP not existing on my Windows 10 Home edition, I cannot understand what is going or not going on. I see the two consoles on the Challenge tasks using Databridge and I see the charts on standard vLHC@home, nothing on Atlas@home. I used to see all on Windows 8.1 provided by HP, then Microsoft "updated" my Windows and all went down the river. Tullio |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 728 |
Sorry, we are still having problems. There's a small batch of 5 running at the moment which is finishing jobs but failing on transfer to the databridge, so they are resubmitting to Condor. Awaiting the experts' decision on what to do next. |
Send message Joined: 29 May 15 Posts: 147 Credit: 2,842,484 RAC: 0 |
Okay, I have set "No New Work" for CMS. Let me know if it makes sense to return to crunching |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 728 |
OK, we are starting to get some successful jobs through now. However, there are a large number of curious failures leading to retries, so I'm not going to unleash a large batch just yet. I think I have to analyse the failures to see if there is any commonality -- it might be just one or two hosts misconfigured somehow, or short of the requirements. Many just stop within a few seconds of the main cmsRun process starting. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
It's hard to get the discussion going in the way we want, there is a lot of concern about the validity of the results [e.g. malicious Trojan horses] and this skews the dialogue What about atlas? They do have the same situation. Is nobody concerned about that? |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 728 |
It's hard to get the discussion going in the way we want, there is a lot of concern about the validity of the results [e.g. malicious Trojan horses] and this skews the dialogue That's probably something for someone like Ben to comment on, but from what I've seen their "success rate" is also less than what we have when things are going tickety-boo (not tohuwabohu!). I am finally getting some analyses done, but severely hampered by a low-bandwidth connexion to our NFS file-server and at least three jobs wanting to use all of that 100 Mbps link! At the moment, eyeballing a few output graphs, the differences between results from CMS@Home and jobs submitted to the Grid are probably statistically insignificant. This may finally be the chance to use my infamous Energy Test for 2-D histogram comparisons in anger. :-) There is the point, though, that one proposed use of CMS@Home is to look for very rare events. Currently one rogue result is buried in the statistics of tens of thousands of other jobs; when you are looking for one event in 100 million that rogue job might become more significant. |
Send message Joined: 17 Aug 15 Posts: 62 Credit: 296,695 RAC: 0 |
I got a computation error with the new wrapper. vLHC@home and Atlas@home tasks running fine despite reboots by Windows 10 for unknown reasons, not related to updates, Tullio |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 728 |
I got a computation error with the new wrapper. vLHC@home and Atlas@home tasks running fine despite reboots by Windows 10 for unknown reasons, not related to updates,Have you got a job number for that, Tullio, so I can check my logs? |
Send message Joined: 8 Apr 15 Posts: 781 Credit: 12,324,905 RAC: 1,506 |
http://boincai05.cern.ch/CMS-dev/results.php?userid=192 I have never had a problem here with these tasks using Win7 and Win10 and in the past a Win8.1 And Windows 10 does not do reboots every day so that is another problem. I have 3 running Windows 10 right now and they all run vLHC,LHC,CMS,and Atlas 24/7 Mad Scientist For Life |
Send message Joined: 8 Apr 15 Posts: 781 Credit: 12,324,905 RAC: 1,506 |
(I guess I should add that after I said that I decided to update my VB version and some other updating on one of my Win7's so I had to do a clean install and as usual you may have to give it a couple tries to get back up and running so just ignore that part) |
Send message Joined: 17 Aug 15 Posts: 62 Credit: 296,695 RAC: 0 |
68091 |
Send message Joined: 8 Apr 15 Posts: 781 Credit: 12,324,905 RAC: 1,506 |
68091 http://boincai05.cern.ch/CMS-dev/result.php?resultid=68091 Ah OK I see what you are trying to say. Well it does start off by saying "Exit status 194 (0xc2) EXIT_ABORTED_BY_CLIENT" And then Error Description: The session is not locked (session state: Unlocked) Did you check the VB Manager before starting a new task? VB never likes starting and stopping and reboots during tasks and VB is famous for having problems before anything else on your pc......you do a security scan of any type recently? I usually haven't used the newest versions of VB or Boinc over the years but lately since I run so many VB tasks I have been just to see if they are more reliable. No problems with the newest VB or Boinc lately. |
©2024 CERN