Message boards :
Theory Application :
Duplicate jobs
Message board moderation
Author | Message |
---|---|
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Duplicate jobs are happening again. First two numbers show "slot number,finished_x_log number" EDIT:Looks like only jobs with the "seed=268" are affected, as they were 2 days ago 3,1===> [runRivet] Fri May 20 22:40:41 CEST 2016 [boinc pp uemb-hard 200 4 - pythia6 6.428 a 100000 273] 3,2===> [runRivet] Fri May 20 22:57:03 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia8 8.212 tune-1 100000 273] 3,3===> [runRivet] Sat May 21 00:11:38 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia8 8.205 tune-A14-CTEQL1 100000 268] 3,4===> [runRivet] Sat May 21 01:24:02 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia8 8.205 tune-A14-CTEQL1 100000 268] 3,5===> [runRivet] Sat May 21 02:32:15 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia6 6.428 375 100000 268] 3,6===> [runRivet] Sat May 21 03:15:21 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia6 6.428 396 100000 268]-------------- 3,7===> [runRivet] Sat May 21 03:52:34 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia6 6.425 d6t 100000 268] 3,8===> [runRivet] Sat May 21 04:29:22 CEST 2016 [boinc ppbar z 1800 -,-,50,130 - pythia8 8.183 default 100000 268] 3,9===> [runRivet] Sat May 21 05:20:22 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia6 6.428 396 100000 268]-------------- 3,10===> [runRivet] Sat May 21 05:57:19 CEST 2016 [boinc ppbar uemb-soft 900 - - pythia6 6.427 379 100000 268] 3,11===> [runRivet] Sat May 21 06:14:19 CEST 2016 [boinc ppbar z 1800 -,-,50,130 - pythia8 8.205 tune-AU2lox 100000 268] 3,12===> [runRivet] Sat May 21 07:19:05 CEST 2016 [boinc ppbar z 1800 -,-,50,130 - pythia8 8.183 default 100000 268] 3,13===> [runRivet] Sat May 21 08:10:47 CEST 2016 [boinc ppbar z 1800 -,-,50,130 - pythia8 8.183 default 100000 268] 3,14===> [runRivet] Sat May 21 09:03:13 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia6 6.425 dw 100000 268] 3,15===> [runRivet] Sat May 21 09:48:07 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia8 8.212 tune-monashstar 100000 268] |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
Your mentioned: 3,11===> [runRivet] Sat May 21 06:14:19 CEST 2016 [boinc ppbar z 1800 -,-,50,130 - pythia8 8.205 tune-AU2lox 100000 268] is now running with Condor JobID 315581 on my host: ===> [runRivet] Sat May 21 13:39:01 CEST 2016 [boinc ppbar z 1800 -,-,50,130 - pythia8 8.205 tune-AU2lox 100000 268] |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
The same job again but with Condor JobID 315750 ===> [runRivet] Sat May 21 18:18:05 CEST 2016 [boinc ppbar z 1800 -,-,50,130 - pythia8 8.205 tune-AU2lox 100000 268] |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
Your finally different job has running again on my host. ===> [runRivet] Sat May 21 20:18:50 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - herwig++ 2.5.2 LHC-UE-EE-3-2760 100000 268] with Condor JobID 315820 |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
Rasputin already had this job: 3,14===> [runRivet] Sat May 21 09:03:13 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia6 6.425 dw 100000 268] Now it's running here: ===> [runRivet] Sat May 21 21:15:27 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - pythia6 6.425 dw 100000 268] Condor JobID: 315846 |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Hi Crystal, How many jobs did you have today with an Condor ID >316096? I only seem to get "old" ones, that are affected by the duplication issue. If you would keep a eye on tasks above that value, to see, if these are not affected. Much appreciated. BTW. The MC-plot IDs for duplicates are the same, the condor-IDs not. (i guess,you know this already) |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
How many jobs did you have today with an Condor ID >316096? On 1 VM only two: 316605 and 316912. The other VM is suspended for testing. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Thanks,Crystal! |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
Another duplicate I already had 36 minutes ago: ===> [runRivet] Sat May 21 21:51:07 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - herwig++ 2.5.2 LHC-UE-EE-3-2760 100000 268] Condor JobID: 315866 |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I had so many duplicates, i stopped counting. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
The higher priority jobs are gone; back to the old queue and duplicates are popping up again in two different VM's and already had on Saturday. MCPlots ID 30484435 Mon May 23 08:36:54 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - herwig++ 2.5.2 LHC-UE-EE-3-2760 100000 268] Mon May 23 08:57:01 CEST 2016 [boinc ppbar z 1960 -,-,50,120 - herwig++ 2.5.2 LHC-UE-EE-3-2760 100000 268] |
Send message Joined: 4 Mar 16 Posts: 31 Credit: 44,320 RAC: 0 |
It is expected as yesterday there was a problem with Condor and the "fresh" jobs have been put on hold. Until I manage to resume them, the minimum Condor JobID for new jobs is 317295. Among the "old" jobs do you ever get any non-duplicate job? Thanks for the feedback. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
Among the "old" jobs do you ever get any non-duplicate job? Meanwhile the term 'duplicate' is not appropriate anymore. 1 job is sent multifold to the same guest-VM, the same host, hosts of the same volunteer and hosts of different volunteers. To your question: obviously on my host there are non-multifold jobs, but I can't check, whether that 'single' job is not sent to another volunteer. Maybe you could submit a sql search query over the Theory results of the last week for "MCPlots JobID: " since a 'duplicate' seems to have the same MCPlots JobID. |
Send message Joined: 4 Mar 16 Posts: 31 Credit: 44,320 RAC: 0 |
To your question: obviously on my host there are non-multifold jobs Why is it obvious that a job can not be sent more than once to the same host? If I remember correctly, Rasputin had the same job running twice in the same task. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
Probably I misunderstood your question: Among the "old" jobs do you ever get any non-duplicate job? I can say when a job is duplicate, but can't say it's non-duplicate, because I can't see what's sent to other volunteers hosts. Today from the 'old' jobs 3 multifolds and 1 non-duplicate candidate on my host: May 23 08:12:00 ppbar z 1800 -,-,50,130 - pythia8 8.108.p1 default 100000 268] 315952 30485353 May 23 10:29:33 ppbar z 1800 -,-,50,130 - pythia8 8.108.p1 default 100000 268] 316012 30485353 May 23 09:03:23 ppbar z 1800 -,-,50,130 - pythia8 8.183 default 100000 268] 315973 30484962 May 23 09:50:07 ppbar z 1800 -,-,50,130 - pythia8 8.183 default 100000 268] 315994 30484962 May 23 08:36:54 ppbar z 1960 -,-,50,120 - herwig++ 2.5.2 LHC-UE-EE-3-2760 100000 268] 315960 30484435 May 23 08:57:01 ppbar z 1960 -,-,50,120 - herwig++ 2.5.2 LHC-UE-EE-3-2760 100000 268] 315969 30484435 May 23 11:04:33 ppbar z 1960 -,-,50,120 - herwig++ 2.5.2 LHC-UE-EE-3-2760 100000 268] 316027 30484435 May 23 09:37:42 ppbar z 1960 -,-,50,120 - herwig++powheg 2.7.0 LHC-UE-EE-4 100000 268] 315992 30484348 May 23 10:19:36 ppbar z 1960 -,-,50,120 - herwig++powheg 2.7.0 LHC-UE-EE-4 100000 268] 316009 30484348 May 23 09:24:00 ppbar z 1960 -,-,50,120 - herwig++powheg 2.5.1 LHC-UE7-2 100000 268] 315981 30484114 |
©2024 CERN