Message boards : Theory Application : Duplicate jobs
Message board moderation

To post messages, you must log in.

AuthorMessage
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3442 - Posted: 21 May 2016, 10:09:54 UTC

Duplicate jobs are happening again.
First two numbers show "slot number,finished_x_log number"

EDIT:Looks like only jobs with the "seed=268" are affected, as they were 2 days ago

3,1===> [runRivet] Fri May 20 22:40:41 CEST 2016 [boinc pp uemb-hard 200 4 - pythia6 6.428 a 100000 273]
3,2===> [runRivet] Fri May 20 22:57:03 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia8 8.212 tune-1 100000 273]
3,3===> [runRivet] Sat May 21 00:11:38 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia8 8.205 tune-A14-CTEQL1 100000 268]
3,4===> [runRivet] Sat May 21 01:24:02 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia8 8.205 tune-A14-CTEQL1 100000 268]
3,5===> [runRivet] Sat May 21 02:32:15 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia6 6.428 375 100000 268]
3,6===> [runRivet] Sat May 21 03:15:21 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia6 6.428 396 100000 268]--------------
3,7===> [runRivet] Sat May 21 03:52:34 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia6 6.425 d6t 100000 268]
3,8===> [runRivet] Sat May 21 04:29:22 CEST 2016 [boinc ppbar z 1800 -,-,50,130 - pythia8 8.183 default 100000 268]
3,9===> [runRivet] Sat May 21 05:20:22 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia6 6.428 396 100000 268]--------------
3,10===> [runRivet] Sat May 21 05:57:19 CEST 2016 [boinc ppbar uemb-soft 900 - - pythia6 6.427 379 100000 268]
3,11===> [runRivet] Sat May 21 06:14:19 CEST 2016 [boinc ppbar z 1800 -,-,50,130 - pythia8 8.205 tune-AU2lox 100000 268]
3,12===> [runRivet] Sat May 21 07:19:05 CEST 2016 [boinc ppbar z 1800 -,-,50,130 - pythia8 8.183 default 100000 268]
3,13===> [runRivet] Sat May 21 08:10:47 CEST 2016 [boinc ppbar z 1800 -,-,50,130 - pythia8 8.183 default 100000 268]
3,14===> [runRivet] Sat May 21 09:03:13 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia6 6.425 dw 100000 268]
3,15===> [runRivet] Sat May 21 09:48:07 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia8 8.212 tune-monashstar 100000 268]
ID: 3442 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 3445 - Posted: 21 May 2016, 11:52:43 UTC - in response to Message 3442.  

Your mentioned:

3,11===> [runRivet] Sat May 21 06:14:19 CEST 2016 [boinc ppbar z 1800 -,-,50,130 - pythia8 8.205 tune-AU2lox 100000 268]

is now running with Condor JobID 315581 on my host:

===> [runRivet] Sat May 21 13:39:01 CEST 2016 [boinc ppbar z 1800 -,-,50,130 - pythia8 8.205 tune-AU2lox 100000 268]
ID: 3445 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 3451 - Posted: 21 May 2016, 19:14:23 UTC

The same job again but with Condor JobID 315750
===> [runRivet] Sat May 21 18:18:05 CEST 2016 [boinc ppbar z 1800 -,-,50,130 - pythia8 8.205 tune-AU2lox 100000 268]
ID: 3451 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 3452 - Posted: 21 May 2016, 19:19:21 UTC

Your finally different job has running again on my host.

===> [runRivet] Sat May 21 20:18:50 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - herwig++ 2.5.2 LHC-UE-EE-3-2760 100000 268] with Condor JobID 315820
ID: 3452 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 3453 - Posted: 21 May 2016, 19:28:26 UTC

Rasputin already had this job:

3,14===> [runRivet] Sat May 21 09:03:13 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia6 6.425 dw 100000 268]

Now it's running here: ===> [runRivet] Sat May 21 21:15:27 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia6 6.425 dw 100000 268]

Condor JobID: 315846
ID: 3453 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3455 - Posted: 21 May 2016, 19:40:47 UTC
Last modified: 21 May 2016, 19:42:45 UTC

Hi Crystal,
How many jobs did you have today with an Condor ID >316096?
I only seem to get "old" ones, that are affected by the duplication issue.

If you would keep a eye on tasks above that value, to see, if these are not affected.
Much appreciated.

BTW. The MC-plot IDs for duplicates are the same, the condor-IDs not.
(i guess,you know this already)
ID: 3455 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 3457 - Posted: 21 May 2016, 20:29:56 UTC - in response to Message 3455.  
Last modified: 21 May 2016, 20:32:04 UTC

How many jobs did you have today with an Condor ID >316096?

On 1 VM only two: 316605 and 316912.

The other VM is suspended for testing.
ID: 3457 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3458 - Posted: 21 May 2016, 20:34:15 UTC - in response to Message 3457.  

Thanks,Crystal!
ID: 3458 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 3459 - Posted: 21 May 2016, 20:39:53 UTC
Last modified: 21 May 2016, 20:40:28 UTC

Another duplicate I already had 36 minutes ago:
===> [runRivet] Sat May 21 21:51:07 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - herwig++ 2.5.2 LHC-UE-EE-3-2760 100000 268]
Condor JobID: 315866
ID: 3459 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3460 - Posted: 21 May 2016, 21:09:27 UTC - in response to Message 3459.  

I had so many duplicates, i stopped counting.
ID: 3460 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 3471 - Posted: 23 May 2016, 7:55:36 UTC
Last modified: 23 May 2016, 7:56:36 UTC

The higher priority jobs are gone; back to the old queue and duplicates are popping up again in two different VM's and already had on Saturday. MCPlots ID 30484435

Mon May 23 08:36:54 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - herwig++ 2.5.2 LHC-UE-EE-3-2760 100000 268]
Mon May 23 08:57:01 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - herwig++ 2.5.2 LHC-UE-EE-3-2760 100000 268]
ID: 3471 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Leonardo Cristella

Send message
Joined: 4 Mar 16
Posts: 31
Credit: 44,320
RAC: 0
Message 3472 - Posted: 23 May 2016, 9:24:35 UTC - in response to Message 3471.  

It is expected as yesterday there was a problem with Condor and the "fresh" jobs have been put on hold.
Until I manage to resume them, the minimum Condor JobID for new jobs is 317295.
Among the "old" jobs do you ever get any non-duplicate job?

Thanks for the feedback.
ID: 3472 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 3473 - Posted: 23 May 2016, 11:03:04 UTC - in response to Message 3472.  

Among the "old" jobs do you ever get any non-duplicate job?

Meanwhile the term 'duplicate' is not appropriate anymore.
1 job is sent multifold to the same guest-VM, the same host, hosts of the same volunteer and hosts of different volunteers.
To your question: obviously on my host there are non-multifold jobs, but I can't check, whether that 'single' job is not sent to another volunteer.
Maybe you could submit a sql search query over the Theory results of the last week for "MCPlots JobID: " since a 'duplicate' seems to have the same MCPlots JobID.
ID: 3473 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Leonardo Cristella

Send message
Joined: 4 Mar 16
Posts: 31
Credit: 44,320
RAC: 0
Message 3477 - Posted: 23 May 2016, 12:51:09 UTC - in response to Message 3473.  

To your question: obviously on my host there are non-multifold jobs

Why is it obvious that a job can not be sent more than once to the same host?

If I remember correctly, Rasputin had the same job running twice in the same task.
ID: 3477 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 3478 - Posted: 23 May 2016, 13:52:31 UTC - in response to Message 3477.  
Last modified: 23 May 2016, 13:54:22 UTC

Probably I misunderstood your question:

Among the "old" jobs do you ever get any non-duplicate job?

I can say when a job is duplicate, but can't say it's non-duplicate, because I can't see what's sent to other volunteers hosts.

Today from the 'old' jobs 3 multifolds and 1 non-duplicate candidate on my host:

May 23 08:12:00	ppbar z 1800 -,-,50,130 - pythia8 8.108.p1 default 100000 268]	315952	30485353
May 23 10:29:33	ppbar z 1800 -,-,50,130 - pythia8 8.108.p1 default 100000 268]	316012	30485353

May 23 09:03:23	ppbar z 1800 -,-,50,130 - pythia8 8.183 default 100000 268]	315973	30484962
May 23 09:50:07	ppbar z 1800 -,-,50,130 - pythia8 8.183 default 100000 268]	315994	30484962

May 23 08:36:54	ppbar z 1960 -,-,50,120 - herwig++ 2.5.2 LHC-UE-EE-3-2760 100000 268]	315960	30484435
May 23 08:57:01	ppbar z 1960 -,-,50,120 - herwig++ 2.5.2 LHC-UE-EE-3-2760 100000 268]	315969	30484435
May 23 11:04:33	ppbar z 1960 -,-,50,120 - herwig++ 2.5.2 LHC-UE-EE-3-2760 100000 268]	316027	30484435

May 23 09:37:42	ppbar z 1960 -,-,50,120 - herwig++powheg 2.7.0 LHC-UE-EE-4 100000 268]	315992	30484348
May 23 10:19:36	ppbar z 1960 -,-,50,120 - herwig++powheg 2.7.0 LHC-UE-EE-4 100000 268]	316009	30484348

May 23 09:24:00	ppbar z 1960 -,-,50,120 - herwig++powheg 2.5.1 LHC-UE7-2 100000 268]	315981	30484114
ID: 3478 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Theory Application : Duplicate jobs


©2024 CERN