Info | Message |
---|---|
1) Message boards : Theory Application : Endless Theory job
Message 4690 Posted 23 Feb 2017 by ![]() |
Hello Crystal, this seems an application ("Rivet" in this case) bug and not a job related problem. It can be reported to the MCPlots support: http://mcplots.cern.ch/, the job status should be successful though. Leo |
2) Message boards : Theory Application : Task shutting down prematurly
Message 3606 Posted 25 Jun 2016 by ![]() |
Yes, we were out of jobs for a while because of a temporary glitch in the injection workflow. Thanks for spotting. |
3) Message boards : Theory Application : Duplicate jobs
Message 3477 Posted 23 May 2016 by ![]() |
To your question: obviously on my host there are non-multifold jobs Why is it obvious that a job can not be sent more than once to the same host? If I remember correctly, Rasputin had the same job running twice in the same task. |
4) Message boards : Theory Application : Duplicate jobs
Message 3472 Posted 23 May 2016 by ![]() |
It is expected as yesterday there was a problem with Condor and the "fresh" jobs have been put on hold. Until I manage to resume them, the minimum Condor JobID for new jobs is 317295. Among the "old" jobs do you ever get any non-duplicate job? Thanks for the feedback. |
5) Message boards : Theory Application : Endless Theory job
Message 3456 Posted 21 May 2016 by ![]() |
Ok, at least it is consistent. There are about 250 old jobs in the queue but I just enabled the submission of new jobs with higher priority so you should start to get them in 20' or so. |
6) Message boards : Theory Application : Endless Theory job
Message 3444 Posted 21 May 2016 by ![]() |
On my side I see at most 26 "fresh" jobs (Condor JobID > 316094) out of 77 total jobs running. Maybe you got some "old" jobs (Condor JobID < 316094) which were affected by the duplicate issue, do you confirm? |
7) Message boards : Theory Application : Endless Theory job
Message 3443 Posted 21 May 2016 by ![]() |
I agree with Crystal but all the information are in this thread now and hopefully the issue is going to be solved. |
8) Message boards : Theory Application : Errors in log
Message 3420 Posted 20 May 2016 by ![]() |
Those are Rivet errors and they are handled by Rivet itself, no problem. I see you ran a Condor JobID > 316094, please let me know if you still get duplicate jobs. |
9) Message boards : Theory Application : Endless Theory job
Message 3412 Posted 19 May 2016 by ![]() |
From Condor JobID 316094 onwards jobs are submitted from a different location every 15'. Let's see if that solves the problem. |
10) Message boards : Theory Application : Endless Theory job
Message 3410 Posted 19 May 2016 by ![]() |
Ok I will try to submit jobs from a different place every time. I will let you know when "fresh" jobs will be available. Thanks |
11) Message boards : Theory Application : Endless Theory job
Message 3403 Posted 19 May 2016 by ![]() |
The "jobdata" file should be in /var/lib/condor/execute/dir_4349/ as from your output. |
12) Message boards : Theory Application : Endless Theory job
Message 3401 Posted 19 May 2016 by ![]() |
I don't know if "runid" and "seed" are written in the running.log, for sure they are in "jobdata". Wherever you find them is fine. |
13) Message boards : Theory Application : Endless Theory job
Message 3399 Posted 19 May 2016 by ![]() |
Ok thanks, we are investigating with Condor experts. If you can find the "jobdata" file in the job execution dir it would be helpful to know the "runid" and "seed" values listed in the first lines. |
14) Message boards : Theory Application : Endless Theory job
Message 3396 Posted 19 May 2016 by ![]() |
So it looks like duplicate jobs appear only in different tasks, never in the same task. Can you please retrieve the MC-plot id number and the Condor job one for them? Do you see a "jobdata" file in the job execution dir? |
15) Message boards : Theory Application : Endless Theory job
Message 3391 Posted 19 May 2016 by ![]() |
We disabled the Condor jobs automatic resubmission for the moment. We are not sure it was the cause of duplicating job so it would be good to hear back from you. Many thanks. |
16) Message boards : Theory Application : Endless Theory job
Message 3376 Posted 17 May 2016 by ![]() |
Sorry, my typo: I meant ".run", not ".tgz". By the way you can find the three numbers as revision=2016, runid=5xxxxx, seed=2xx in some output file. |
17) Message boards : Theory Application : Endless Theory job
Message 3374 Posted 17 May 2016 by ![]() |
Hi Rasputin, can you please retrieve the job executable name from any of them? It should be something like "2016-5xxxxx-2xx.tgz". |
18) Message boards : Theory Application : Endless Theory job
Message 3369 Posted 16 May 2016 by ![]() |
Dear all, the problem of running the same job several times should be related to the transition between manual and automatci job submission/retrieval. Please notify if you see this still happening. Thank you, Leonardo |
19) Message boards : Theory Application : No Jobs?
Message 3230 Posted 4 May 2016 by ![]() |
Yes, we were doing some tests yesterday and we did not submit the usual bunch of jobs. Now they are back. Thank you |
20) Message boards : Theory Application : Errors in log
Message 3160 Posted 1 May 2016 by ![]() |
I think so. |
©2025 CERN