Message boards :
News :
Jobs incoming!
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next
Author | Message |
---|---|
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
Glad there's progress again. Hopefully you are able to iron out all important issues, so you can start your BOINC <==> GRID comparison. Tja, I ran the GRID jobs on Friday night, expecting the CMS@Home to run at the same time -- but then I found the recent problems. So, current run is to compare with what I found on Friday night. Unfortunately, that finished in about seven hours (as I said earlier, there were 1600 jobs running simultaneously at one point). It's a pity that the recent difficulties led many people to switch off, and alerts can only reach people who are (ahem!) alert. OK, we're up to 24 jobs running now, not as many as earlier last week (80 or so) but we have to take what we get. I see 61 results (of 2000 submitted) returned so far. We're not going to win on turnaround -- we never really expected to. What I'm looking for now is completeness (1.5% of the GRID jobs didn't complete) and statistical similarity -- which obviously requires me to run further analysis programmes once the results are in. Expect another delay once these results are in, because I'll be collating the data for official reports, but by then I hope the path is clearer from us to move from current pre-beta to a short beta; I really do hope we're into production soon but this weekend's events make me a little wary of saying just when. Thanks again for your support[1]. [1] "I shall wear it always" -- Major Dennis Bloodnok |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 164 |
Alright. After Run 1 gave up, Run 2 has started with what looks like good news... type : RFC 3820 compliant impersonation proxy strength : 1024 bits path : /tmp/x509up_u500 timeleft : 129:50:58 (5.4 days) 15:39:06 -0700 2015-08-31 [INFO] Downloading glidein 15:39:07 -0700 2015-08-31 [INFO] Running glidein (check logs) so am hopeful of better things ! |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
Fingers crossed! Thanks for the feedback. |
Send message Joined: 4 May 15 Posts: 64 Credit: 55,584 RAC: 0 |
"It was a long weekend", "There was a challenge on", "You told us it was broken", ... two machines woken up and told to report for duty. They'll have to fend for themselves while I sleep. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
Aha, someone's woken up! We've got 67 jobs running now. And 80 results returned. Go team! (If you're in the UK, or can otherwise legitimately access BBC iPlayer, may I recommend last night's introductory episode Special Forces - Ultimate Hell Week) |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
Sheesh, 80 jobs running already. Keep it coming guys (and gals)! |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 164 |
Condor is now flying on mine :-) cmsRun doesn't look like it is going at much more than a fast stroll though :-( Occasional sprints but mostly 5 - 10% cpu on ALT+F3 Will see what they are doing in the |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
Yeah, these jobs take a while to get into their stride -- something like a slow 20-25 seconds CPU for cmsRun, then a download of a several of megabytes before the simulation actually begins. I've been canvassing for better workflows, but no-one's stepped up to the plate yet. :-( I can only run what I know, which is not necessarily representative of today's analyses; the current jobs are based on the Phase2 detector to be installed around 2025! |
Send message Joined: 9 Apr 15 Posts: 57 Credit: 230,221 RAC: 0 |
|
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
Condor is now flying on mine :-) I'll smoke to that! (Less than once a year, it should be said...) |
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 164 |
I have also seen evidence that my cmsRun tasks are now smokin |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 874,807 RAC: 1,175 |
After a glitch of three 'empty' runs, run-6 is doing something useful again. Run-1 and Run-2 each did 5 jobs. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
Yes, we're holding steady at around 80 jobs running; dashboard says 796 successes but still reckons there were only 1293 jobs submitted. 818 results have been returned. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
What is a "JOB"? I have 3 runns, so far. Each has 6 lots of 25 events. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
What is a "JOB"? OK, we're trying to stick to this terminology: A task is what you run when you send a request to the server. Currently each task runs for about 24 hours, then stops. The tasks then poll for jobs where each job simulates a given number of events, currently 25, and returns data for further analysis. Currently Laurence has set things up to save logs in the pages you see under "show graphics"; these are grouped into "runs" for some reason, I think it might be the lifetime of the glidein pilot that calls for jobs, |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Thanks for the clarification. Yes, we're holding steady at around 80 jobs running; I guess, that means 80 computers running? AFAIK there can only be one task running on each machine, currently. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
As far as I understand, the job (glidein) starts a HTCondor client which runs three jobs then HTCondor exits and another glidein starts. In other words we have a pilot job (the glidein) that runs three real jobs. I ca change the logging to report pilot rather than run if that makes things clearer. |
Send message Joined: 29 May 15 Posts: 147 Credit: 2,842,484 RAC: 0 |
Congratulations ! You managed it to shoot the Point where I couldn't help you for 24 hours :-( Now as I'm back I have re-activated my Clients, let's see what it brings. Yeti |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
Yes, we're holding steady at around 80 jobs running; Yes, that's right, on both counts. Looks like this batch will run out tonight at this rate (i.e. another 4 or 5 jobs for each machine) so tomorrow I'll have the timing statistics I want for my comparison: 515 jobs; 0 completed, 0 removed, 411 idle, 89 running, 15 held, 0 suspended |
Send message Joined: 29 May 15 Posts: 147 Credit: 2,842,484 RAC: 0 |
As far as I understand, the job (glidein) starts a HTCondor client which runs three jobs then HTCondor exits and another glidein starts. In other words we have a pilot job (the glidein) that runs three real jobs. I ca change the logging to report pilot rather than run if that makes things clearer. When new patches from Microsoft come in (or some other reason to restart a machine), I'm looking for a good Point to restart the Client. At the Moment, I wait until I see that a "fresh" cmsRun comes up and then I restart the box; is there a better Point? (I want to be shure that the result from last run is already uploaded) |
©2025 CERN