1) Message boards : Theory Application : Endless Theory job (Message 4690)
Posted 23 Feb 2017 by Profile Leonardo Cristella
Post:
Hello Crystal,
this seems an application ("Rivet" in this case) bug and not a job related problem.
It can be reported to the MCPlots support: http://mcplots.cern.ch/, the job status should be successful though.

Leo
2) Message boards : Theory Application : Task shutting down prematurly (Message 3606)
Posted 25 Jun 2016 by Profile Leonardo Cristella
Post:
Yes, we were out of jobs for a while because of a temporary glitch in the injection workflow.

Thanks for spotting.
3) Message boards : Theory Application : Duplicate jobs (Message 3477)
Posted 23 May 2016 by Profile Leonardo Cristella
Post:
To your question: obviously on my host there are non-multifold jobs

Why is it obvious that a job can not be sent more than once to the same host?

If I remember correctly, Rasputin had the same job running twice in the same task.
4) Message boards : Theory Application : Duplicate jobs (Message 3472)
Posted 23 May 2016 by Profile Leonardo Cristella
Post:
It is expected as yesterday there was a problem with Condor and the "fresh" jobs have been put on hold.
Until I manage to resume them, the minimum Condor JobID for new jobs is 317295.
Among the "old" jobs do you ever get any non-duplicate job?

Thanks for the feedback.
5) Message boards : Theory Application : Endless Theory job (Message 3456)
Posted 21 May 2016 by Profile Leonardo Cristella
Post:
Ok, at least it is consistent. There are about 250 old jobs in the queue but I just enabled the submission of new jobs with higher priority so you should start to get them in 20' or so.
6) Message boards : Theory Application : Endless Theory job (Message 3444)
Posted 21 May 2016 by Profile Leonardo Cristella
Post:
On my side I see at most 26 "fresh" jobs (Condor JobID > 316094) out of 77 total jobs running.
Maybe you got some "old" jobs (Condor JobID < 316094) which were affected by the duplicate issue, do you confirm?
7) Message boards : Theory Application : Endless Theory job (Message 3443)
Posted 21 May 2016 by Profile Leonardo Cristella
Post:
I agree with Crystal but all the information are in this thread now and hopefully the issue is going to be solved.
8) Message boards : Theory Application : Errors in log (Message 3420)
Posted 20 May 2016 by Profile Leonardo Cristella
Post:
Those are Rivet errors and they are handled by Rivet itself, no problem.
I see you ran a Condor JobID > 316094, please let me know if you still get duplicate jobs.
9) Message boards : Theory Application : Endless Theory job (Message 3412)
Posted 19 May 2016 by Profile Leonardo Cristella
Post:
From Condor JobID 316094 onwards jobs are submitted from a different location every 15'.
Let's see if that solves the problem.
10) Message boards : Theory Application : Endless Theory job (Message 3410)
Posted 19 May 2016 by Profile Leonardo Cristella
Post:
Ok I will try to submit jobs from a different place every time.
I will let you know when "fresh" jobs will be available.

Thanks
11) Message boards : Theory Application : Endless Theory job (Message 3403)
Posted 19 May 2016 by Profile Leonardo Cristella
Post:
The "jobdata" file should be in /var/lib/condor/execute/dir_4349/
as from your output.
12) Message boards : Theory Application : Endless Theory job (Message 3401)
Posted 19 May 2016 by Profile Leonardo Cristella
Post:
I don't know if "runid" and "seed" are written in the running.log, for sure they are in "jobdata".
Wherever you find them is fine.
13) Message boards : Theory Application : Endless Theory job (Message 3399)
Posted 19 May 2016 by Profile Leonardo Cristella
Post:
Ok thanks, we are investigating with Condor experts.
If you can find the "jobdata" file in the job execution dir it would be helpful to know the "runid" and "seed" values listed in the first lines.
14) Message boards : Theory Application : Endless Theory job (Message 3396)
Posted 19 May 2016 by Profile Leonardo Cristella
Post:
So it looks like duplicate jobs appear only in different tasks, never in the same task.
Can you please retrieve the MC-plot id number and the Condor job one for them?
Do you see a "jobdata" file in the job execution dir?
15) Message boards : Theory Application : Endless Theory job (Message 3391)
Posted 19 May 2016 by Profile Leonardo Cristella
Post:
We disabled the Condor jobs automatic resubmission for the moment.
We are not sure it was the cause of duplicating job so it would be good to hear back from you.

Many thanks.
16) Message boards : Theory Application : Endless Theory job (Message 3376)
Posted 17 May 2016 by Profile Leonardo Cristella
Post:
Sorry, my typo: I meant ".run", not ".tgz".
By the way you can find the three numbers as revision=2016, runid=5xxxxx, seed=2xx in some output file.
17) Message boards : Theory Application : Endless Theory job (Message 3374)
Posted 17 May 2016 by Profile Leonardo Cristella
Post:
Hi Rasputin,
can you please retrieve the job executable name from any of them?
It should be something like "2016-5xxxxx-2xx.tgz".
18) Message boards : Theory Application : Endless Theory job (Message 3369)
Posted 16 May 2016 by Profile Leonardo Cristella
Post:
Dear all,
the problem of running the same job several times should be related to the transition between manual and automatci job submission/retrieval. Please notify if you see this still happening.

Thank you,
Leonardo
19) Message boards : Theory Application : No Jobs? (Message 3230)
Posted 4 May 2016 by Profile Leonardo Cristella
Post:
Yes, we were doing some tests yesterday and we did not submit the usual bunch of jobs.
Now they are back.

Thank you
20) Message boards : Theory Application : Errors in log (Message 3160)
Posted 1 May 2016 by Profile Leonardo Cristella
Post:
I think so.


Next 20


©2024 CERN