Message boards :
Theory Application :
New Native App - Linux Only
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next
Author | Message |
---|---|
Send message Joined: 10 Mar 17 Posts: 40 Credit: 108,345 RAC: 0 |
This sherpa was running OK (and fast)Maybe the problem has already been mentioned and a measure implemented, but how are Sherpa jobs handled that run endlessly? Is there some control mechanism in place or if not, is this planned in future? |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
This sherpa was running OK (and fast)Maybe the problem has already been mentioned and a measure implemented, but how are Sherpa jobs handled that run endlessly? Is there some control mechanism in place or if not, is this planned in future? No, but when you spot one, send to me the log file and I will implement something. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
The last Theory in progress was this Sherpa: ===> [runRivet] Sat Feb 16 11:36:36 UTC 2019 [boinc pp jets 7000 65 - sherpa 1.4.1 default 100000 18] https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752716 |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
The physics results are looking OK. A few minor changes are needed for this to show up in MCPlots. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
The physics results are looking OK. A few minor changes are needed for this to show up in MCPlots. That's great, Laurence. How about the science application not obeying the suspend by the user / BOINC wrapper? BTW: I've a sherpa running now eating my harddrive. 'ISR_Handler::Make_ISR(..) out of bounds' runRivet.log meanwhile 2.3 GB and increasing. sda1 53% used. VM total 25GB Suspend does not work. Made a copy of 1.5GB. Have to look how to get that out of my Linux VM. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
BTW: I've a sherpa running now eating my harddrive. 'ISR_Handler::Make_ISR(..) out of bounds'Solved by computation error: EXIT_DISK_LIMIT_EXCEEDED https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752772 Process name not in stderr.txt - Last line in this port. In BOINC Manager: Aborting task Theory_2279-790023-18_2: exceeded disk limit: 3038.16MB > 1907.35MB runRivet.log 3184721191 bytes. File runRivet.log present within my VM. Job: [boinc pp jets 8000 250,-,4160 - sherpa 1.2.3 default 31000 18] |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
BTW: I've a sherpa running now eating my harddrive. 'ISR_Handler::Make_ISR(..) out of bounds'Solved by computation error: EXIT_DISK_LIMIT_EXCEEDED I looking at the results so far I see one other task like this one. Note the value of peak disk usage. The next highest is this one, with 18MB. I will set the max disk usage to 50MB. This may catch the looping Sherpa jobs. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
I looking at the results so far I see one other task like this one. Your link is directing to a host, not a task result. Was that task maybe not a native one, but an older Theory VBox-task? Extending the MB-size will not solve IMO the looping Sherpa's. That's more a matter of error-handling within the science application. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
At the moment I have 2 sherpa's running. ===> [runRivet] Mon Feb 18 09:23:57 UTC 2019 [boinc pp jets 7000 80,-,1060 - sherpa 1.4.3 default 100000 18] still in the full optimization phase. Edit: ETA: Mon Feb 18 15:12 ===> [runRivet] Mon Feb 18 09:34:49 UTC 2019 [boinc pp jets 7000 400 - sherpa 1.2.3 default 31000 18] This is a long running one, but already in the events processing phase. ETA: Tue Feb 19 11:56 |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
The app version has been updated. We now also require the alice.cern.ch repository. Please update the default.local file to include this new repository. sudo wget https://cern.ch/lfield/default.local -O /etc/cvmfs/default.local |
Send message Joined: 10 Mar 17 Posts: 40 Credit: 108,345 RAC: 0 |
Looks like the queue is empty: 18.02.2019 14:58:17 | lhcathome-dev | Requesting new tasks for CPU 18.02.2019 14:58:19 | lhcathome-dev | Scheduler request completed: got 0 new tasks 18.02.2019 14:58:19 | lhcathome-dev | No tasks sent 18.02.2019 14:58:19 | lhcathome-dev | No tasks are available for Theory SimulationSo no testing possible at the moment. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Looks like the queue is empty: Just need to drain the current queue before submitting more. CP if you are watching please can you terminate your jobs. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
New tasks are available. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
Just need to drain the current queue before submitting more. CP if you are watching please can you terminate your jobs. Weather is too fine for sitting whole day behind my screen.Just aborted last 4 tasks. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
With the new version 4.13 (cranky-0.0.20) I get only errors. cranky-0.0.20: [INFO] Preparing output. |
Send message Joined: 10 Mar 17 Posts: 40 Credit: 108,345 RAC: 0 |
With the new version 4.13 (cranky-0.0.20) I get only errors.Same here. 4 out of 4 reported the same error: 16:53:04 2019-02-18: cranky-0.0.20: [INFO] Preparing output. tar: local.txt: Cannot stat: No such file or directory tar: Exiting with failure status due to previous errors 17:53:05 (14091): cranky exited; CPU time 3423.660000 17:53:05 (14091): app exit status: 0x2 17:53:05 (14091): called boinc_finish(195)e.g. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752874 |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
With the new version 4.13 (cranky-0.0.20) I get only errors. Strange, nothing much changed. Will look into it a bit later this evening. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
The science application ends normal: Disk usage: 5636 Kb CPU usage: 6012 s Clean tmp ... Run finished successfully from task https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752841 |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Am investigating... |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 0 |
With the new version 4.13 (cranky-0.0.20) I get only errors. Same here, the physics app finishes OK, From this task:- https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752862 Generator run finished successfully 100000 events processed dumping histograms... Rivet.Analysis.Handler: INFO Finalising analyses Rivet.Analysis.Handler: INFO Processed 100000 events The MCnet usage guidelines apply to Rivet: see http://www.montecarlonet.org/GUIDELINES Please acknowledge plots made with Rivet analyses, and cite arXiv:1003.0694 (http://arxiv.org/abs/1003.0694) Processing histograms... input = /shared/tmp/tmp.b9ZK1W7cQa/flat output = /shared ./runRivet.sh: line 742: 205 Killed display_service $tmpd_dump "$beam $process $energy $params $generator $version $tune" (wd: /shared) mc: ATLAS_2011_S9131140_d01-x01-y02.dat -> /shared/dat/pp/zinclusive/pT-Z-norm/atlas2011-el-dressed/7000/pythia8/8.235/default-CD.dat mc: ATLAS_2011_S9131140_d01-x01-y03.dat -> /shared/dat/pp/zinclusive/pT-Z-norm/atlas2011-el-bare/7000/pythia8/8.235/default-CD.dat mc: ATLAS_2011_S9131140_d02-x01-y02.dat -> /shared/dat/pp/zinclusive/pT-Z-norm/atlas2011-mu-dressed/7000/pythia8/8.235/default-CD.dat mc: ATLAS_2011_S9131140_d02-x01-y03.dat -> /shared/dat/pp/zinclusive/pT-Z-.....(snip).....ATLAS_2011_S9131140_d02-x01-y03.dat -> /shared/dat/pp/zinclusive/pT-Z-norm/atlas2011-mu-bare/7000/ATLAS_2011_S9131140.dat Disk usage: 2440 Kb CPU usage: 12136 s Clean tmp ... Run finished successfully but then the task fails 15:35:16 2019-02-18: cranky-0.0.20: [INFO] Running Container 'runc'. ===> [runRivet] Mon Feb 18 15:35:16 UTC 2019 [boinc pp zinclusive 7000 -,-,50,130 - pythia8 8.235 default-CD 100000 19] 19:00:41 2019-02-18: cranky-0.0.20: [INFO] Preparing output. tar: local.txt: Cannot stat: No such file or directory tar: Exiting with failure status due to previous errors 19:00:42 (17293): cranky exited; CPU time 12083.959181 19:00:42 (17293): app exit status: 0x2 19:00:42 (17293): called boinc_finish(195) The next one is at ~70000 events so I'll let it run and see what happens. |
©2024 CERN