Message boards : Theory Application : New Native App - Linux Only
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next

AuthorMessage
gyllic

Send message
Joined: 10 Mar 17
Posts: 40
Credit: 108,345
RAC: 0
Message 5919 - Posted: 16 Feb 2019, 8:48:00 UTC - in response to Message 5918.  
Last modified: 16 Feb 2019, 8:50:45 UTC

This sherpa was running OK (and fast)
Maybe the problem has already been mentioned and a measure implemented, but how are Sherpa jobs handled that run endlessly? Is there some control mechanism in place or if not, is this planned in future?
ID: 5919 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 5920 - Posted: 16 Feb 2019, 9:29:22 UTC - in response to Message 5919.  
Last modified: 16 Feb 2019, 11:17:47 UTC

This sherpa was running OK (and fast)
Maybe the problem has already been mentioned and a measure implemented, but how are Sherpa jobs handled that run endlessly? Is there some control mechanism in place or if not, is this planned in future?

No, but when you spot one, send to me the log file and I will implement something.
ID: 5920 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 5922 - Posted: 16 Feb 2019, 19:40:28 UTC

The last Theory in progress was this Sherpa:
===> [runRivet] Sat Feb 16 11:36:36 UTC 2019 [boinc pp jets 7000 65 - sherpa 1.4.1 default 100000 18]

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752716
ID: 5922 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 5923 - Posted: 17 Feb 2019, 11:08:22 UTC - in response to Message 5922.  

The physics results are looking OK. A few minor changes are needed for this to show up in MCPlots.
ID: 5923 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 5924 - Posted: 17 Feb 2019, 11:54:21 UTC - in response to Message 5923.  
Last modified: 17 Feb 2019, 12:04:48 UTC

The physics results are looking OK. A few minor changes are needed for this to show up in MCPlots.

That's great, Laurence.

How about the science application not obeying the suspend by the user / BOINC wrapper?

BTW: I've a sherpa running now eating my harddrive. 'ISR_Handler::Make_ISR(..) out of bounds'
runRivet.log meanwhile 2.3 GB and increasing. sda1 53% used. VM total 25GB
Suspend does not work. Made a copy of 1.5GB. Have to look how to get that out of my Linux VM.
ID: 5924 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 5925 - Posted: 17 Feb 2019, 12:24:25 UTC - in response to Message 5924.  
Last modified: 17 Feb 2019, 12:25:28 UTC

BTW: I've a sherpa running now eating my harddrive. 'ISR_Handler::Make_ISR(..) out of bounds'
runRivet.log meanwhile 2.3 GB and increasing. sda1 53% used. VM total 25GB
Suspend does not work. Made a copy of 1.5GB. Have to look how to get that out of my Linux VM.
Solved by computation error: EXIT_DISK_LIMIT_EXCEEDED
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752772 Process name not in stderr.txt - Last line in this port.
In BOINC Manager: Aborting task Theory_2279-790023-18_2: exceeded disk limit: 3038.16MB > 1907.35MB
runRivet.log 3184721191 bytes. File runRivet.log present within my VM.

Job: [boinc pp jets 8000 250,-,4160 - sherpa 1.2.3 default 31000 18]
ID: 5925 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 5926 - Posted: 18 Feb 2019, 9:30:21 UTC - in response to Message 5925.  
Last modified: 18 Feb 2019, 9:31:10 UTC

BTW: I've a sherpa running now eating my harddrive. 'ISR_Handler::Make_ISR(..) out of bounds'
runRivet.log meanwhile 2.3 GB and increasing. sda1 53% used. VM total 25GB
Suspend does not work. Made a copy of 1.5GB. Have to look how to get that out of my Linux VM.
Solved by computation error: EXIT_DISK_LIMIT_EXCEEDED
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752772 Process name not in stderr.txt - Last line in this port.
In BOINC Manager: Aborting task Theory_2279-790023-18_2: exceeded disk limit: 3038.16MB > 1907.35MB
runRivet.log 3184721191 bytes. File runRivet.log present within my VM.

Job: [boinc pp jets 8000 250,-,4160 - sherpa 1.2.3 default 31000 18]


I looking at the results so far I see one other task like this one. Note the value of peak disk usage. The next highest is this one, with 18MB. I will set the max disk usage to 50MB. This may catch the looping Sherpa jobs.
ID: 5926 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 5927 - Posted: 18 Feb 2019, 9:55:01 UTC - in response to Message 5926.  

I looking at the results so far I see one other task like this one.

Your link is directing to a host, not a task result.
Was that task maybe not a native one, but an older Theory VBox-task?

Extending the MB-size will not solve IMO the looping Sherpa's.
That's more a matter of error-handling within the science application.
ID: 5927 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 5928 - Posted: 18 Feb 2019, 10:54:00 UTC
Last modified: 18 Feb 2019, 11:39:34 UTC

At the moment I have 2 sherpa's running.

===> [runRivet] Mon Feb 18 09:23:57 UTC 2019 [boinc pp jets 7000 80,-,1060 - sherpa 1.4.3 default 100000 18]

still in the full optimization phase. Edit: ETA: Mon Feb 18 15:12

===> [runRivet] Mon Feb 18 09:34:49 UTC 2019 [boinc pp jets 7000 400 - sherpa 1.2.3 default 31000 18]

This is a long running one, but already in the events processing phase. ETA: Tue Feb 19 11:56
ID: 5928 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 5929 - Posted: 18 Feb 2019, 13:02:29 UTC - in response to Message 5928.  

The app version has been updated. We now also require the alice.cern.ch repository. Please update the default.local file to include this new repository.

sudo wget https://cern.ch/lfield/default.local -O /etc/cvmfs/default.local
ID: 5929 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gyllic

Send message
Joined: 10 Mar 17
Posts: 40
Credit: 108,345
RAC: 0
Message 5930 - Posted: 18 Feb 2019, 14:02:31 UTC - in response to Message 5929.  

Looks like the queue is empty:
18.02.2019 14:58:17 | lhcathome-dev | Requesting new tasks for CPU
18.02.2019 14:58:19 | lhcathome-dev | Scheduler request completed: got 0 new tasks
18.02.2019 14:58:19 | lhcathome-dev | No tasks sent
18.02.2019 14:58:19 | lhcathome-dev | No tasks are available for Theory Simulation
So no testing possible at the moment.
ID: 5930 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 5931 - Posted: 18 Feb 2019, 14:07:28 UTC - in response to Message 5930.  

Looks like the queue is empty:
18.02.2019 14:58:17 | lhcathome-dev | Requesting new tasks for CPU
18.02.2019 14:58:19 | lhcathome-dev | Scheduler request completed: got 0 new tasks
18.02.2019 14:58:19 | lhcathome-dev | No tasks sent
18.02.2019 14:58:19 | lhcathome-dev | No tasks are available for Theory Simulation
So no testing possible at the moment.


Just need to drain the current queue before submitting more. CP if you are watching please can you terminate your jobs.
ID: 5931 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 5932 - Posted: 18 Feb 2019, 15:27:54 UTC - in response to Message 5931.  

New tasks are available.
ID: 5932 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 5933 - Posted: 18 Feb 2019, 15:33:59 UTC - in response to Message 5931.  

Just need to drain the current queue before submitting more. CP if you are watching please can you terminate your jobs.

Weather is too fine for sitting whole day behind my screen.Just aborted last 4 tasks.
ID: 5933 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 5934 - Posted: 18 Feb 2019, 16:58:31 UTC

With the new version 4.13 (cranky-0.0.20) I get only errors.

cranky-0.0.20: [INFO] Preparing output.
tar: local.txt: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors
cranky exited; CPU time 1006.258542
app exit status: 0x2
called boinc_finish(195)
ID: 5934 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gyllic

Send message
Joined: 10 Mar 17
Posts: 40
Credit: 108,345
RAC: 0
Message 5935 - Posted: 18 Feb 2019, 17:22:19 UTC - in response to Message 5934.  

With the new version 4.13 (cranky-0.0.20) I get only errors.
Same here. 4 out of 4 reported the same error:
16:53:04 2019-02-18: cranky-0.0.20: [INFO] Preparing output.
tar: local.txt: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors
17:53:05 (14091): cranky exited; CPU time 3423.660000
17:53:05 (14091): app exit status: 0x2
17:53:05 (14091): called boinc_finish(195)
e.g. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752874
ID: 5935 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 5936 - Posted: 18 Feb 2019, 17:23:23 UTC - in response to Message 5934.  
Last modified: 18 Feb 2019, 17:24:01 UTC

With the new version 4.13 (cranky-0.0.20) I get only errors.


Strange, nothing much changed. Will look into it a bit later this evening.
ID: 5936 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 5937 - Posted: 18 Feb 2019, 17:36:36 UTC

The science application ends normal:
Disk usage: 5636 Kb

CPU usage: 6012 s

Clean tmp ...

Run finished successfully

from task https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752841
ID: 5937 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 129
Message 5938 - Posted: 18 Feb 2019, 18:41:24 UTC - in response to Message 5936.  


Strange, nothing much changed. Will look into it a bit later this evening.


Am investigating...
ID: 5938 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 101
Message 5939 - Posted: 18 Feb 2019, 19:12:09 UTC - in response to Message 5936.  
Last modified: 18 Feb 2019, 19:25:38 UTC

With the new version 4.13 (cranky-0.0.20) I get only errors.


Strange, nothing much changed. Will look into it a bit later this evening.


Same here, the physics app finishes OK,
From this task:-

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752862

Generator run finished successfully
100000 events processed
dumping histograms...
Rivet.Analysis.Handler: INFO  Finalising analyses
Rivet.Analysis.Handler: INFO  Processed 100000 events

The MCnet usage guidelines apply to Rivet: see http://www.montecarlonet.org/GUIDELINES
Please acknowledge plots made with Rivet analyses, and cite arXiv:1003.0694 (http://arxiv.org/abs/1003.0694)

Processing histograms...
input  = /shared/tmp/tmp.b9ZK1W7cQa/flat
output = /shared
./runRivet.sh: line 742:   205 Killed                  display_service $tmpd_dump "$beam $process $energy $params $generator $version $tune"  (wd: /shared)
mc:  ATLAS_2011_S9131140_d01-x01-y02.dat -> /shared/dat/pp/zinclusive/pT-Z-norm/atlas2011-el-dressed/7000/pythia8/8.235/default-CD.dat
mc:  ATLAS_2011_S9131140_d01-x01-y03.dat -> /shared/dat/pp/zinclusive/pT-Z-norm/atlas2011-el-bare/7000/pythia8/8.235/default-CD.dat
mc:  ATLAS_2011_S9131140_d02-x01-y02.dat -> /shared/dat/pp/zinclusive/pT-Z-norm/atlas2011-mu-dressed/7000/pythia8/8.235/default-CD.dat
mc:  ATLAS_2011_S9131140_d02-x01-y03.dat -> /shared/dat/pp/zinclusive/pT-Z-.....(snip).....ATLAS_2011_S9131140_d02-x01-y03.dat -> /shared/dat/pp/zinclusive/pT-Z-norm/atlas2011-mu-bare/7000/ATLAS_2011_S9131140.dat

Disk usage: 2440 Kb

CPU usage: 12136 s

Clean tmp ...

Run finished successfully

but then the task fails
15:35:16 2019-02-18: cranky-0.0.20: [INFO] Running Container 'runc'.
===> [runRivet] Mon Feb 18 15:35:16 UTC 2019 [boinc pp zinclusive 7000 -,-,50,130 - pythia8 8.235 default-CD 100000 19]
19:00:41 2019-02-18: cranky-0.0.20: [INFO] Preparing output.
tar: local.txt: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors
19:00:42 (17293): cranky exited; CPU time 12083.959181
19:00:42 (17293): app exit status: 0x2
19:00:42 (17293): called boinc_finish(195)

The next one is at ~70000 events so I'll let it run and see what happens.
ID: 5939 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next

Message boards : Theory Application : New Native App - Linux Only


©2024 CERN