Message boards : Theory Application : Endless Theory job
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 13 Apr 15 Posts: 138 Credit: 2,969,210 RAC: 0 ![]() |
From my earlier post 33200 events processed 33300 events processed Display update finished (4 histograms, 33000 events). 33400 events processed Updating display... Display update finished (4 histograms, 33000 events). No further events are processed after 33400 and the only further messages are the repeating Updating display... Display update finished (4 histograms, 33000 events). etc.,etc. I only reported it because it was stuck repeating that update. If other events had been processed it would have been a normal job and therefore not worthy of comment. It was just like that looping Vincia we used to get on Challenge that would run forever until manually reset. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 878,593 RAC: 80 ![]() ![]() |
Here we go again. ===> [runRivet] Wed Feb 22 16:03:28 CET 2017 [boinc pp uemb-soft 53 - - sherpa 2.1.1 default 2000 842] . . . integration time: ( 4m 56s elapsed / 9s left ) [16:14:28] 7.47328e+08 pb +- ( 2.32511e+06 pb = 0.311123 % ) 310000 ( 647686 -> 48.3 % ) integration time: ( 5m 6s elapsed / 0s left ) [16:14:39] 2_2__j__j__j__j : 7.47328e+08 pb +- ( 2.32511e+06 pb = 0.311123 % ) exp. eff: 0.545449 % Updating display... reduce max for 2_2__j__j__j__j to 0.642318 ( eps = 0.001 ) Output_Phase::Output_Phase(): Set output interval 1000000000 events. ---------------------------------------------------------- -- SHERPA generates events with the following structure -- ---------------------------------------------------------- Perturbative : Signal_Processes Perturbative : Hard_Decays Perturbative : Jet_Evolution:CSS Perturbative : Lepton_FS_QED_Corrections:Photons Perturbative : Multiple_Interactions:Amisic Perturbative : Minimum_Bias:Off Hadronization : Beam_Remnants Hadronization : Hadronization:Ahadic Hadronization : Hadron_Decays Analysis : HepMC2 Display update finished (0 histograms, 0 events). Updating display... Display update finished (0 histograms, 0 events). Updating display... Display update finished (0 histograms, 0 events). Updating display... Display update finished (0 histograms, 0 events). Updating display... Display update finished (0 histograms, 0 events). |
![]() Send message Joined: 4 Mar 16 Posts: 31 Credit: 44,320 RAC: 0 ![]() |
Hello Crystal, this seems an application ("Rivet" in this case) bug and not a job related problem. It can be reported to the MCPlots support: http://mcplots.cern.ch/, the job status should be successful though. Leo |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 878,593 RAC: 80 ![]() ![]() |
Hello Crystal, Thanks Leonardo, The job cannot get a status cause it's running endless. The VM has to be killed (manual or by BOINC after the maximum runtime of 18 hours), causing a job not returning a result. I also mentioned such jobs at the production platform (LHC@home) and the almost former VirtualLHC@home (T4T) platform. When a jobs turns to be endless, it's always with the generator parameter Sherpa. The previous example the looping was not in the event processing part, but I've also seen that arising loop during the event processing part. When it's not in the event processing part of the job, it's often a Sherpa with only 2000 events. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 ![]() ![]() |
Sherpa with looping: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=330038 56000 events processed dumping histograms... Event 56100 ( 1h 13m 38s elapsed / 57m 37s left ) -> ETA: Thu May 25 01:34 56100 events processed Event 56200 ( 1h 13m 46s elapsed / 57m 30s left ) -> ETA: Thu May 25 01:35 56200 events processed Matrix_Element_Handler::GenerateOneEvent(): Point for '2_2__G__G__G__G' exceeds maximum by 10.1045. Event 56300 ( 1h 13m 52s elapsed / 57m 20s left ) -> ETA: Thu May 25 01:34 56300 events processed Event 56400 ( 1h 14m elapsed / 57m 12s left ) -> ETA: Thu May 25 01:34 56400 events processed Event 56500 ( 1h 14m 6s elapsed / 57m 3s left ) -> ETA: Thu May 25 01:34 56500 events processed Updating display... Display update finished (4 histograms, 56000 events). Event 56600 ( 1h 14m 16s elapsed / 56m 57s left ) -> ETA: Thu May 25 01:34 56600 events processed Event 56700 ( 1h 14m 23s elapsed / 56m 48s left ) -> ETA: Thu May 25 01:34 56700 events processed Event 56800 ( 1h 14m 30s elapsed / 56m 40s left ) -> ETA: Thu May 25 01:34 56800 events processed Event 56900 ( 1h 14m 38s elapsed / 56m 32s left ) -> ETA: Thu May 25 01:34 56900 events processed Event 57000 ( 1h 14m 44s elapsed / 56m 23s left ) -> ETA: Thu May 25 01:34 XS = 3.07216e+10 pb +- ( 1.28433e+08 pb = 0.41 % ) 57000 events processed dumping histograms... Event 57100 ( 1h 14m 51s elapsed / 56m 14s left ) -> ETA: Thu May 25 01:34 57100 events processed Error in Splitting_Tools::ConstructKinematics(kt = -nan, z = 0.989177, y = 0.0071532). Event 57200 ( 1h 15m elapsed / 56m 7s left ) -> ETA: Thu May 25 01:34 57200 events processed Matrix_Element_Handler::GenerateOneEvent(): Point for '2_2__G__s__G__s' exceeds maximum by 13.4284. Event 57300 ( 1h 15m 7s elapsed / 55m 58s left ) -> ETA: Thu May 25 01:34 57300 events processed Updating display... Display update finished (4 histograms, 57000 events). Event 57400 ( 1h 15m 17s elapsed / 55m 52s left ) -> ETA: Thu May 25 01:34 57400 events processed Event 57500 ( 1h 15m 26s elapsed / 55m 45s left ) -> ETA: Thu May 25 01:34 57500 events processed Updating display... Display update finished (4 histograms, 57000 events). Updating display... Display update finished (4 histograms, 57000 events). Updating display... over more than 12 hours, waiting for 18 hour limit ;-) |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 ![]() ![]() |
over more than 12 hours, waiting for 18 hour limit ;-) LHC Multi-task have a limit of 19 hours and 30 min and not a limit of 18 hours. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 ![]() ![]() |
A sherpa running from the beginning of the task with looping over more than 8 hours: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=339096 Event 16900 ( 16m 11s elapsed / 1h 19m 36s left ) -> ETA: Sat Jun 10 01:53 16900 events processed Event 17000 ( 16m 16s elapsed / 1h 19m 28s left ) -> ETA: Sat Jun 10 01:53 XS = 3.41448e+10 pb +- ( 2.61461e+08 pb = 0.76 % ) 17000 events processed dumping histograms... Event 17100 ( 16m 22s elapsed / 1h 19m 21s left ) -> ETA: Sat Jun 10 01:52 17100 events processed Updating display... Display update finished (0 histograms, 17000 events). Updating display... Display update finished (0 histograms, 17000 events). Updating display... |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 ![]() ![]() |
sherpa 1.3.1 After 9 hours:Display update finished (0 histograms, 0 events). 3255 days for finishing? 5.40387e+16 pb +- ( 5.40358e+16 pb = 99.9946 % ) 213980000 ( 213980048 -> 100 % ) integration time: ( 8h 7m 10s elapsed / 3255d 10h 40m 34s left ) 5.40336e+16 pb +- ( 5.40307e+16 pb = 99.9946 % ) 214000000 ( 214000048 -> 100 % ) integration time: ( 8h 7m 13s elapsed / 3255d 18h 21m 20s left ) 5.40286e+16 pb +- ( 5.40257e+16 pb = 99.9946 % ) 214020000 ( 214020048 -> 100 % ) integration time: ( 8h 7m 15s elapsed / 3256d 1h 55m 42s left ) |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 878,593 RAC: 80 ![]() ![]() |
3255 days for finishing? ... and increasing! |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 ![]() ![]() |
Crystal, and a lot of Cobblestones after finishing :-). |
![]() ![]() Send message Joined: 8 Apr 15 Posts: 787 Credit: 12,981,558 RAC: 11,307 ![]() ![]() ![]() |
Here where we have a new power company I keep having the power go off and right back on so it shuts off all the desktops and only the 3 newer ones start right back up and the other 5 have to be restarted or on a couple just logged in. But one of the older quad-cores that just runs Theory over at LHC started the tasks at %0 progress and 2,875 DAYS remaining ![]() It is slowly decreasing that part but it still has the tasks starting over but the elapsed time stays the same as it was before the restart. Lately I have had to deal with this more than I would want to and I know computers rather not shut down that way. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 ![]() ![]() |
new Theory (cranky/runc) have for the moment only Sherpa-tasks to be unsuccess: http://mcplots-dev.cern.ch/production.php?view=matrix&rev=2279&display=unsucc |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 ![]() ![]() |
This is the FIRST Sherpa with runtime from more than 18 hours and finished: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2756200 If there is ONLY one Sherpa per Computer it is ok. We are on the way to see a good solution to let Sherpa running. Thank you. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 878,593 RAC: 80 ![]() ![]() |
If there is ONLY one Sherpa per Computer it is ok.18.5 hours cpu for 10000 events. It's good it finished. On a windows machine it would have been killed with the use of the current Theory_2019_02_20.xml. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 ![]() ![]() |
this Sherpa 2.2.0 was finished successful after more than 19 hours: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2757163 |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 ![]() ![]() |
This Sherpa was running in a endless loop: https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=1880894 |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 ![]() ![]() |
This task running more than 50 hours and was canceled from me, because of time limit reached. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2758454 ===> [runRivet] Fri Mar 8 22:56:17 UTC 2019 [boinc ee zhad 206 - - sherpa 1.4.1 default 2000 28] . . Channel_Basics::Boost : Spacelike four vector ... Channel_Basics::Boost : Spacelike four vector ... Channel_Basics::Boost : Spacelike four vector ... 7.47435e+17 pb +- ( 7.42752e+17 pb = 99.3734 % ) 617520000 ( 617537728 -> 99.9 % ) integration time: ( 1d 13h 52m 32s elapsed / 15410d 22h 51m 15s left ) 7.47423e+17 pb +- ( 7.4274e+17 pb = 99.3734 % ) 617530000 ( 617547728 -> 99.9 % ) integration time: ( 1d 13h 52m 35s elapsed / 15411d 6h 33m 36s left ) 7.47411e+17 pb +- ( 7.42728e+17 pb = 99.3734 % ) 617540000 ( 617557728 -> 99.9 % ) integration time: ( 1d 13h 52m 37s elapsed / 15411d 13h 59m 42s left ) Channel_Basics::Boost : Spacelike four vector ... |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 ![]() ![]() |
<message> Disk usage limit exceeded </message> https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2760480 We need in -dev more control about those Theory-tasks with problems before transfer to production! ===> [runRivet] Fri Mar 15 04:03:40 UTC 2019 [boinc pp jets 7000 170,-,2960 - sherpa 2.2.5 default 21000 30] Setting environment... grep: /etc/redhat-release: No such file or directory MCGENERATORS=/cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt67c g++ = /cvmfs/sft.cern.ch/lcg/external/gcc/4.8.4/x86_64-slc6/bin/g++ g++ version = 4.8.4 RIVET=/cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt67c/rivet/2.6.1/x86_64-slc6-gcc48-opt Rivet version = rivet v2.6.1 RIVET_REF_PATH=/cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt67c/rivet/2.6.1/x86_64-slc6-gcc48-opt/share/Rivet RIVET_ANALYSIS_PATH=/shared/analyses GSL=/cvmfs/sft.cern.ch/lcg/external/GSL/1.10/x86_64-slc6-gcc48-opt HEPMC=/cvmfs/sft.cern.ch/lcg/external/HepMC/2.06.08/x86_64-slc6-gcc48-opt FASTJET=/cvmfs/sft.cern.ch/lcg/external/fastjet/3.0.3/x86_64-slc6-gcc48-opt PYTHON=/cvmfs/sft.cern.ch/lcg/external/Python/2.7.4/x86_64-slc6-gcc48-opt ROOTSYS=/cvmfs/sft.cern.ch/lcg/app/releases/ROOT/5.34.26/x86_64-slc6-gcc48-opt/root Input parameters: mode=boinc beam=pp process=jets energy=7000 params=170,-,2960 specific=- generator=sherpa version=2.2.5 tune=default nevts=21000 seed=30 Prepare temporary directories and files ... |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 878,593 RAC: 80 ![]() ![]() |
<message> Killing the task when disk limit is exceeded is the control to prevent endless looping and writing. The rsc_disk_bound is set to 2000000000 bytes equals 1907.348633 MB, what's much lower than the Theory Vbox tasks (clear because the VM is inside the slot and maybe snapshots should be added), so the task with this container setup will reach the limit much earlier. While the sherpa scientists don't care about these failing tasks, the only thing what could be tried is to reduce the rsc_disk_bound to a more realistic value. The 4 slots from the running tasks at the moment have 2.772MB, 3.316MB, 4.472MB and 12.656MB occupied. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 ![]() ![]() |
Crystal, found this from the developer of Cosmology in Github, but for docker: https://github.com/marius311/boinc2docker/issues/7 |
©2025 CERN