Message boards : News : Jobs incoming!
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

AuthorMessage
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 933 - Posted: 31 Aug 2015, 22:35:48 UTC - in response to Message 930.  

Glad there's progress again. Hopefully you are able to iron out all important issues, so you can start your BOINC <==> GRID comparison.

Tja, I ran the GRID jobs on Friday night, expecting the CMS@Home to run at the same time -- but then I found the recent problems. So, current run is to compare with what I found on Friday night. Unfortunately, that finished in about seven hours (as I said earlier, there were 1600 jobs running simultaneously at one point). It's a pity that the recent difficulties led many people to switch off, and alerts can only reach people who are (ahem!) alert. OK, we're up to 24 jobs running now, not as many as earlier last week (80 or so) but we have to take what we get.
I see 61 results (of 2000 submitted) returned so far.
We're not going to win on turnaround -- we never really expected to. What I'm looking for now is completeness (1.5% of the GRID jobs didn't complete) and statistical similarity -- which obviously requires me to run further analysis programmes once the results are in.
Expect another delay once these results are in, because I'll be collating the data for official reports, but by then I hope the path is clearer from us to move from current pre-beta to a short beta; I really do hope we're into production soon but this weekend's events make me a little wary of saying just when.
Thanks again for your support[1].

[1] "I shall wear it always" -- Major Dennis Bloodnok
ID: 933 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,631,631
RAC: 15,737
Message 934 - Posted: 31 Aug 2015, 22:43:38 UTC - in response to Message 933.  

Alright.
After Run 1 gave up, Run 2 has started with what looks like good news...

type : RFC 3820 compliant impersonation proxy
strength : 1024 bits
path : /tmp/x509up_u500
timeleft : 129:50:58 (5.4 days)
15:39:06 -0700 2015-08-31 [INFO] Downloading glidein
15:39:07 -0700 2015-08-31 [INFO] Running glidein (check logs)

so am hopeful of better things !
ID: 934 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 935 - Posted: 31 Aug 2015, 22:54:31 UTC - in response to Message 934.  

Fingers crossed! Thanks for the feedback.
ID: 935 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 May 15
Posts: 64
Credit: 55,584
RAC: 0
Message 936 - Posted: 31 Aug 2015, 22:57:26 UTC

"It was a long weekend", "There was a challenge on", "You told us it was broken", ... two machines woken up and told to report for duty. They'll have to fend for themselves while I sleep.
ID: 936 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 937 - Posted: 31 Aug 2015, 23:03:43 UTC

Aha, someone's woken up! We've got 67 jobs running now. And 80 results returned. Go team!
(If you're in the UK, or can otherwise legitimately access BBC iPlayer, may I recommend last night's introductory episode Special Forces - Ultimate Hell Week)
ID: 937 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 938 - Posted: 31 Aug 2015, 23:06:19 UTC
Last modified: 31 Aug 2015, 23:06:46 UTC

Sheesh, 80 jobs running already. Keep it coming guys (and gals)!
ID: 938 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,631,631
RAC: 15,737
Message 939 - Posted: 31 Aug 2015, 23:27:26 UTC - in response to Message 938.  

Condor is now flying on mine :-)

cmsRun doesn't look like it is going at much more than a fast stroll though :-(
Occasional sprints but mostly 5 - 10% cpu on ALT+F3
Will see what they are doing in the morning later in the morning !
ID: 939 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 940 - Posted: 31 Aug 2015, 23:45:47 UTC - in response to Message 939.  
Last modified: 31 Aug 2015, 23:49:27 UTC

Yeah, these jobs take a while to get into their stride -- something like a slow 20-25 seconds CPU for cmsRun, then a download of a several of megabytes before the simulation actually begins. I've been canvassing for better workflows, but no-one's stepped up to the plate yet. :-( I can only run what I know, which is not necessarily representative of today's analyses; the current jobs are based on the Phase2 detector to be installed around 2025!
ID: 940 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Phil

Send message
Joined: 9 Apr 15
Posts: 57
Credit: 230,221
RAC: 0
Message 941 - Posted: 1 Sep 2015, 0:08:12 UTC - in response to Message 939.  
Last modified: 1 Sep 2015, 0:09:56 UTC

Condor is now flying on mine :-)

Nothing should disturb that Condor Moment.
ID: 941 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 942 - Posted: 1 Sep 2015, 1:10:15 UTC - in response to Message 941.  

Condor is now flying on mine :-)

Nothing should disturb that Condor Moment.

I'll smoke to that! (Less than once a year, it should be said...)
ID: 942 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 5,631,631
RAC: 15,737
Message 943 - Posted: 1 Sep 2015, 6:18:01 UTC - in response to Message 942.  

I have also seen evidence that my cmsRun tasks are now smokin
ID: 943 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1180
Credit: 815,336
RAC: 238
Message 944 - Posted: 1 Sep 2015, 8:57:19 UTC

After a glitch of three 'empty' runs, run-6 is doing something useful again.
Run-1 and Run-2 each did 5 jobs.
ID: 944 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 945 - Posted: 1 Sep 2015, 9:56:35 UTC

Yes, we're holding steady at around 80 jobs running; dashboard says 796 successes but still reckons there were only 1293 jobs submitted. 818 results have been returned.
ID: 945 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 946 - Posted: 1 Sep 2015, 10:17:06 UTC

What is a "JOB"?
I have 3 runns, so far. Each has 6 lots of 25 events.
ID: 946 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 947 - Posted: 1 Sep 2015, 12:27:39 UTC - in response to Message 946.  

What is a "JOB"?
I have 3 runns, so far. Each has 6 lots of 25 events.

OK, we're trying to stick to this terminology:
A task is what you run when you send a request to the server. Currently each task runs for about 24 hours, then stops.
The tasks then poll for jobs where each job simulates a given number of events, currently 25, and returns data for further analysis.
Currently Laurence has set things up to save logs in the pages you see under "show graphics"; these are grouped into "runs" for some reason, I think it might be the lifetime of the glidein pilot that calls for jobs,
ID: 947 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 948 - Posted: 1 Sep 2015, 13:07:17 UTC

Thanks for the clarification.

Yes, we're holding steady at around 80 jobs running;


I guess, that means 80 computers running?
AFAIK there can only be one task running on each machine, currently.
ID: 948 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 328,405
RAC: 158
Message 949 - Posted: 1 Sep 2015, 15:06:14 UTC - in response to Message 947.  

As far as I understand, the job (glidein) starts a HTCondor client which runs three jobs then HTCondor exits and another glidein starts. In other words we have a pilot job (the glidein) that runs three real jobs. I ca change the logging to report pilot rather than run if that makes things clearer.
ID: 949 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 950 - Posted: 1 Sep 2015, 16:00:04 UTC

Congratulations !

You managed it to shoot the Point where I couldn't help you for 24 hours :-(

Now as I'm back I have re-activated my Clients, let's see what it brings.

Yeti
ID: 950 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,541
RAC: 270
Message 951 - Posted: 1 Sep 2015, 18:09:46 UTC - in response to Message 948.  
Last modified: 1 Sep 2015, 18:11:32 UTC

Yes, we're holding steady at around 80 jobs running;

I guess, that means 80 computers running?
AFAIK there can only be one task running on each machine, currently.

Yes, that's right, on both counts. Looks like this batch will run out tonight at this rate (i.e. another 4 or 5 jobs for each machine) so tomorrow I'll have the timing statistics I want for my comparison:
515 jobs; 0 completed, 0 removed, 411 idle, 89 running, 15 held, 0 suspended

ID: 951 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 952 - Posted: 1 Sep 2015, 18:28:24 UTC - in response to Message 949.  

As far as I understand, the job (glidein) starts a HTCondor client which runs three jobs then HTCondor exits and another glidein starts. In other words we have a pilot job (the glidein) that runs three real jobs. I ca change the logging to report pilot rather than run if that makes things clearer.

When new patches from Microsoft come in (or some other reason to restart a machine), I'm looking for a good Point to restart the Client.

At the Moment, I wait until I see that a "fresh" cmsRun comes up and then I restart the box; is there a better Point? (I want to be shure that the result from last run is already uploaded)
ID: 952 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

Message boards : News : Jobs incoming!


©2024 CERN