Message boards : Theory Application : Endless Theory job
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,945,852
RAC: 0
Message 4069 - Posted: 13 Aug 2016, 11:16:27 UTC

From my earlier post
33200 events processed
33300 events processed
Display update finished (4 histograms, 33000 events).
33400 events processed
Updating display...
Display update finished (4 histograms, 33000 events).

No further events are processed after 33400 and the only further messages are the repeating
Updating display...
Display update finished (4 histograms, 33000 events).
etc.,etc.

I only reported it because it was stuck repeating that update. If other events had been processed it would have been a normal job and therefore not worthy of comment. It was just like that looping Vincia we used to get on Challenge that would run forever until manually reset.
ID: 4069 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1184
Credit: 824,998
RAC: 1,080
Message 4683 - Posted: 22 Feb 2017, 15:52:07 UTC

Here we go again.

===> [runRivet] Wed Feb 22 16:03:28 CET 2017 [boinc pp uemb-soft 53 - - sherpa 2.1.1 default 2000 842]
.
.
.
integration time: ( 4m 56s elapsed / 9s left ) [16:14:28]
7.47328e+08 pb +- ( 2.32511e+06 pb = 0.311123 % ) 310000 ( 647686 -> 48.3 % )
integration time: ( 5m 6s elapsed / 0s left ) [16:14:39]
2_2__j__j__j__j : 7.47328e+08 pb +- ( 2.32511e+06 pb = 0.311123 % ) exp. eff: 0.545449 %
Updating display...
reduce max for 2_2__j__j__j__j to 0.642318 ( eps = 0.001 )
Output_Phase::Output_Phase(): Set output interval 1000000000 events.
----------------------------------------------------------
-- SHERPA generates events with the following structure --
----------------------------------------------------------
Perturbative : Signal_Processes
Perturbative : Hard_Decays
Perturbative : Jet_Evolution:CSS
Perturbative : Lepton_FS_QED_Corrections:Photons
Perturbative : Multiple_Interactions:Amisic
Perturbative : Minimum_Bias:Off
Hadronization : Beam_Remnants
Hadronization : Hadronization:Ahadic
Hadronization : Hadron_Decays
Analysis : HepMC2
Display update finished (0 histograms, 0 events).
Updating display...
Display update finished (0 histograms, 0 events).
Updating display...
Display update finished (0 histograms, 0 events).
Updating display...
Display update finished (0 histograms, 0 events).
Updating display...
Display update finished (0 histograms, 0 events).
ID: 4683 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Leonardo Cristella

Send message
Joined: 4 Mar 16
Posts: 31
Credit: 44,320
RAC: 0
Message 4690 - Posted: 23 Feb 2017, 9:24:06 UTC - in response to Message 4683.  

Hello Crystal,
this seems an application ("Rivet" in this case) bug and not a job related problem.
It can be reported to the MCPlots support: http://mcplots.cern.ch/, the job status should be successful though.

Leo
ID: 4690 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1184
Credit: 824,998
RAC: 1,080
Message 4694 - Posted: 23 Feb 2017, 13:03:05 UTC - in response to Message 4690.  
Last modified: 23 Feb 2017, 13:16:44 UTC

Hello Crystal,
this seems an application ("Rivet" in this case) bug and not a job related problem.
It can be reported to the MCPlots support: http://mcplots.cern.ch/, the job status should be successful though.

Leo

Thanks Leonardo,

The job cannot get a status cause it's running endless.
The VM has to be killed (manual or by BOINC after the maximum runtime of 18 hours), causing a job not returning a result.

I also mentioned such jobs at the production platform (LHC@home) and the almost former VirtualLHC@home (T4T) platform.
When a jobs turns to be endless, it's always with the generator parameter Sherpa.
The previous example the looping was not in the event processing part, but I've also seen that arising loop during the event processing part.
When it's not in the event processing part of the job, it's often a Sherpa with only 2000 events.
ID: 4694 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 670
Credit: 1,874,617
RAC: 7,098
Message 4922 - Posted: 25 May 2017, 10:04:48 UTC

Sherpa with looping:

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=330038

56000 events processed
dumping histograms...
Event 56100 ( 1h 13m 38s elapsed / 57m 37s left ) -> ETA: Thu May 25 01:34
56100 events processed
Event 56200 ( 1h 13m 46s elapsed / 57m 30s left ) -> ETA: Thu May 25 01:35
56200 events processed
Matrix_Element_Handler::GenerateOneEvent(): Point for '2_2__G__G__G__G' exceeds maximum by 10.1045.
Event 56300 ( 1h 13m 52s elapsed / 57m 20s left ) -> ETA: Thu May 25 01:34
56300 events processed
Event 56400 ( 1h 14m elapsed / 57m 12s left ) -> ETA: Thu May 25 01:34
56400 events processed
Event 56500 ( 1h 14m 6s elapsed / 57m 3s left ) -> ETA: Thu May 25 01:34
56500 events processed
Updating display...
Display update finished (4 histograms, 56000 events).
Event 56600 ( 1h 14m 16s elapsed / 56m 57s left ) -> ETA: Thu May 25 01:34
56600 events processed
Event 56700 ( 1h 14m 23s elapsed / 56m 48s left ) -> ETA: Thu May 25 01:34
56700 events processed
Event 56800 ( 1h 14m 30s elapsed / 56m 40s left ) -> ETA: Thu May 25 01:34
56800 events processed
Event 56900 ( 1h 14m 38s elapsed / 56m 32s left ) -> ETA: Thu May 25 01:34
56900 events processed
Event 57000 ( 1h 14m 44s elapsed / 56m 23s left ) -> ETA: Thu May 25 01:34
XS = 3.07216e+10 pb +- ( 1.28433e+08 pb = 0.41 % )
57000 events processed
dumping histograms...
Event 57100 ( 1h 14m 51s elapsed / 56m 14s left ) -> ETA: Thu May 25 01:34
57100 events processed
Error in Splitting_Tools::ConstructKinematics(kt = -nan, z = 0.989177, y = 0.0071532).
Event 57200 ( 1h 15m elapsed / 56m 7s left ) -> ETA: Thu May 25 01:34
57200 events processed
Matrix_Element_Handler::GenerateOneEvent(): Point for '2_2__G__s__G__s' exceeds maximum by 13.4284.
Event 57300 ( 1h 15m 7s elapsed / 55m 58s left ) -> ETA: Thu May 25 01:34
57300 events processed
Updating display...
Display update finished (4 histograms, 57000 events).
Event 57400 ( 1h 15m 17s elapsed / 55m 52s left ) -> ETA: Thu May 25 01:34
57400 events processed
Event 57500 ( 1h 15m 26s elapsed / 55m 45s left ) -> ETA: Thu May 25 01:34
57500 events processed
Updating display...
Display update finished (4 histograms, 57000 events).
Updating display...
Display update finished (4 histograms, 57000 events).
Updating display...

over more than 12 hours, waiting for 18 hour limit ;-)
ID: 4922 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 670
Credit: 1,874,617
RAC: 7,098
Message 4930 - Posted: 26 May 2017, 9:36:31 UTC - in response to Message 4922.  

over more than 12 hours, waiting for 18 hour limit ;-)


LHC Multi-task have a limit of 19 hours and 30 min and not a limit of 18 hours.
ID: 4930 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 670
Credit: 1,874,617
RAC: 7,098
Message 4979 - Posted: 10 Jun 2017, 6:11:50 UTC - in response to Message 4922.  

A sherpa running from the beginning of the task with looping over more than 8 hours:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=339096

Event 16900 ( 16m 11s elapsed / 1h 19m 36s left ) -> ETA: Sat Jun 10 01:53
16900 events processed
Event 17000 ( 16m 16s elapsed / 1h 19m 28s left ) -> ETA: Sat Jun 10 01:53
XS = 3.41448e+10 pb +- ( 2.61461e+08 pb = 0.76 % )
17000 events processed
dumping histograms...
Event 17100 ( 16m 22s elapsed / 1h 19m 21s left ) -> ETA: Sat Jun 10 01:52
17100 events processed
Updating display...
Display update finished (0 histograms, 17000 events).
Updating display...
Display update finished (0 histograms, 17000 events).
Updating display...
ID: 4979 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 670
Credit: 1,874,617
RAC: 7,098
Message 5714 - Posted: 10 Dec 2018, 18:28:00 UTC

sherpa 1.3.1
After 9 hours:Display update finished (0 histograms, 0 events).
3255 days for finishing?

5.40387e+16 pb +- ( 5.40358e+16 pb = 99.9946 % ) 213980000 ( 213980048 -> 100 % )
integration time: ( 8h 7m 10s elapsed / 3255d 10h 40m 34s left )
5.40336e+16 pb +- ( 5.40307e+16 pb = 99.9946 % ) 214000000 ( 214000048 -> 100 % )
integration time: ( 8h 7m 13s elapsed / 3255d 18h 21m 20s left )
5.40286e+16 pb +- ( 5.40257e+16 pb = 99.9946 % ) 214020000 ( 214020048 -> 100 % )
integration time: ( 8h 7m 15s elapsed / 3256d 1h 55m 42s left )
ID: 5714 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1184
Credit: 824,998
RAC: 1,080
Message 5716 - Posted: 10 Dec 2018, 18:37:42 UTC - in response to Message 5714.  

3255 days for finishing?

... and increasing!
ID: 5716 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 670
Credit: 1,874,617
RAC: 7,098
Message 5717 - Posted: 11 Dec 2018, 5:20:09 UTC - in response to Message 5716.  

Crystal,
and a lot of Cobblestones after finishing :-).
ID: 5717 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 751
Credit: 11,610,444
RAC: 1,210
Message 5720 - Posted: 11 Dec 2018, 18:59:31 UTC - in response to Message 5714.  

Here where we have a new power company I keep having the power go off and right back on so it shuts off all the desktops and only the 3 newer ones start right back up and the other 5 have to be restarted or on a couple just logged in.

But one of the older quad-cores that just runs Theory over at LHC started the tasks at %0 progress and 2,875 DAYS remaining

It is slowly decreasing that part but it still has the tasks starting over but the elapsed time stays the same as it was before the restart.

Lately I have had to deal with this more than I would want to and I know computers rather not shut down that way.
ID: 5720 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 670
Credit: 1,874,617
RAC: 7,098
Message 6110 - Posted: 27 Feb 2019, 22:07:25 UTC

new Theory (cranky/runc) have for the moment only Sherpa-tasks to be unsuccess:
http://mcplots-dev.cern.ch/production.php?view=matrix&rev=2279&display=unsucc
ID: 6110 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 670
Credit: 1,874,617
RAC: 7,098
Message 6129 - Posted: 2 Mar 2019, 10:04:36 UTC

This is the FIRST Sherpa with runtime from more than 18 hours and finished:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2756200
If there is ONLY one Sherpa per Computer it is ok.

We are on the way to see a good solution to let Sherpa running. Thank you.
ID: 6129 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1184
Credit: 824,998
RAC: 1,080
Message 6130 - Posted: 2 Mar 2019, 12:04:20 UTC - in response to Message 6129.  

If there is ONLY one Sherpa per Computer it is ok.

We are on the way to see a good solution to let Sherpa running. Thank you.
18.5 hours cpu for 10000 events. It's good it finished.
On a windows machine it would have been killed with the use of the current Theory_2019_02_20.xml.
ID: 6130 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 670
Credit: 1,874,617
RAC: 7,098
Message 6146 - Posted: 5 Mar 2019, 14:26:36 UTC

this Sherpa 2.2.0 was finished successful after more than 19 hours:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2757163
ID: 6146 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 670
Credit: 1,874,617
RAC: 7,098
Message 6170 - Posted: 7 Mar 2019, 23:45:28 UTC

ID: 6170 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 670
Credit: 1,874,617
RAC: 7,098
Message 6184 - Posted: 11 Mar 2019, 7:46:02 UTC - in response to Message 6170.  
Last modified: 11 Mar 2019, 7:46:35 UTC

This task running more than 50 hours and was canceled from me, because of time limit reached.
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2758454
===> [runRivet] Fri Mar 8 22:56:17 UTC 2019 [boinc ee zhad 206 - - sherpa 1.4.1 default 2000 28]
.
.
Channel_Basics::Boost : Spacelike four vector ...
Channel_Basics::Boost : Spacelike four vector ...
Channel_Basics::Boost : Spacelike four vector ...
7.47435e+17 pb +- ( 7.42752e+17 pb = 99.3734 % ) 617520000 ( 617537728 -> 99.9 % )
integration time: ( 1d 13h 52m 32s elapsed / 15410d 22h 51m 15s left )
7.47423e+17 pb +- ( 7.4274e+17 pb = 99.3734 % ) 617530000 ( 617547728 -> 99.9 % )
integration time: ( 1d 13h 52m 35s elapsed / 15411d 6h 33m 36s left )
7.47411e+17 pb +- ( 7.42728e+17 pb = 99.3734 % ) 617540000 ( 617557728 -> 99.9 % )
integration time: ( 1d 13h 52m 37s elapsed / 15411d 13h 59m 42s left )
Channel_Basics::Boost : Spacelike four vector ...
ID: 6184 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 670
Credit: 1,874,617
RAC: 7,098
Message 6229 - Posted: 15 Mar 2019, 13:13:34 UTC
Last modified: 15 Mar 2019, 13:26:51 UTC

<message>
Disk usage limit exceeded
</message>
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2760480

We need in -dev more control about those Theory-tasks with problems before transfer to production!

===> [runRivet] Fri Mar 15 04:03:40 UTC 2019 [boinc pp jets 7000 170,-,2960 - sherpa 2.2.5 default 21000 30]

Setting environment...
grep: /etc/redhat-release: No such file or directory
MCGENERATORS=/cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt67c
g++ = /cvmfs/sft.cern.ch/lcg/external/gcc/4.8.4/x86_64-slc6/bin/g++
g++ version = 4.8.4
RIVET=/cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt67c/rivet/2.6.1/x86_64-slc6-gcc48-opt
Rivet version = rivet v2.6.1
RIVET_REF_PATH=/cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt67c/rivet/2.6.1/x86_64-slc6-gcc48-opt/share/Rivet
RIVET_ANALYSIS_PATH=/shared/analyses
GSL=/cvmfs/sft.cern.ch/lcg/external/GSL/1.10/x86_64-slc6-gcc48-opt
HEPMC=/cvmfs/sft.cern.ch/lcg/external/HepMC/2.06.08/x86_64-slc6-gcc48-opt
FASTJET=/cvmfs/sft.cern.ch/lcg/external/fastjet/3.0.3/x86_64-slc6-gcc48-opt
PYTHON=/cvmfs/sft.cern.ch/lcg/external/Python/2.7.4/x86_64-slc6-gcc48-opt
ROOTSYS=/cvmfs/sft.cern.ch/lcg/app/releases/ROOT/5.34.26/x86_64-slc6-gcc48-opt/root

Input parameters:
mode=boinc
beam=pp
process=jets
energy=7000
params=170,-,2960
specific=-
generator=sherpa
version=2.2.5
tune=default
nevts=21000
seed=30

Prepare temporary directories and files ...
ID: 6229 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1184
Credit: 824,998
RAC: 1,080
Message 6230 - Posted: 15 Mar 2019, 15:26:43 UTC - in response to Message 6229.  

<message>
Disk usage limit exceeded
</message>
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2760480

We need in -dev more control about those Theory-tasks with problems before transfer to production!

===> [runRivet] Fri Mar 15 04:03:40 UTC 2019 [boinc pp jets 7000 170,-,2960 - sherpa 2.2.5 default 21000 30]

Killing the task when disk limit is exceeded is the control to prevent endless looping and writing.
The rsc_disk_bound is set to 2000000000 bytes equals 1907.348633 MB, what's much lower than the Theory Vbox tasks (clear because the VM is inside the slot and maybe snapshots should be added),
so the task with this container setup will reach the limit much earlier. While the sherpa scientists don't care about these failing tasks, the only thing what could be tried is to reduce the rsc_disk_bound to a more realistic value.
The 4 slots from the running tasks at the moment have 2.772MB, 3.316MB, 4.472MB and 12.656MB occupied.
ID: 6230 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 670
Credit: 1,874,617
RAC: 7,098
Message 6231 - Posted: 18 Mar 2019, 9:54:13 UTC

Crystal,
found this from the developer of Cosmology in Github, but for docker:
https://github.com/marius311/boinc2docker/issues/7
ID: 6231 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Theory Application : Endless Theory job


©2024 CERN