InfoMessage
1) Message boards : Theory Application : Endless Theory job
Message 4690
Posted 23 Feb 2017 by ProfileLeonardo Cristella
Hello Crystal,
this seems an application ("Rivet" in this case) bug and not a job related problem.
It can be reported to the MCPlots support: http://mcplots.cern.ch/, the job status should be successful though.

Leo
2) Message boards : Theory Application : Task shutting down prematurly
Message 3606
Posted 25 Jun 2016 by ProfileLeonardo Cristella
Yes, we were out of jobs for a while because of a temporary glitch in the injection workflow.

Thanks for spotting.
3) Message boards : Theory Application : Duplicate jobs
Message 3477
Posted 23 May 2016 by ProfileLeonardo Cristella
To your question: obviously on my host there are non-multifold jobs

Why is it obvious that a job can not be sent more than once to the same host?

If I remember correctly, Rasputin had the same job running twice in the same task.
4) Message boards : Theory Application : Duplicate jobs
Message 3472
Posted 23 May 2016 by ProfileLeonardo Cristella
It is expected as yesterday there was a problem with Condor and the "fresh" jobs have been put on hold.
Until I manage to resume them, the minimum Condor JobID for new jobs is 317295.
Among the "old" jobs do you ever get any non-duplicate job?

Thanks for the feedback.
5) Message boards : Theory Application : Endless Theory job
Message 3456
Posted 21 May 2016 by ProfileLeonardo Cristella
Ok, at least it is consistent. There are about 250 old jobs in the queue but I just enabled the submission of new jobs with higher priority so you should start to get them in 20' or so.
6) Message boards : Theory Application : Endless Theory job
Message 3444
Posted 21 May 2016 by ProfileLeonardo Cristella
On my side I see at most 26 "fresh" jobs (Condor JobID > 316094) out of 77 total jobs running.
Maybe you got some "old" jobs (Condor JobID < 316094) which were affected by the duplicate issue, do you confirm?
7) Message boards : Theory Application : Endless Theory job
Message 3443
Posted 21 May 2016 by ProfileLeonardo Cristella
I agree with Crystal but all the information are in this thread now and hopefully the issue is going to be solved.
8) Message boards : Theory Application : Errors in log
Message 3420
Posted 20 May 2016 by ProfileLeonardo Cristella
Those are Rivet errors and they are handled by Rivet itself, no problem.
I see you ran a Condor JobID > 316094, please let me know if you still get duplicate jobs.
9) Message boards : Theory Application : Endless Theory job
Message 3412
Posted 19 May 2016 by ProfileLeonardo Cristella
From Condor JobID 316094 onwards jobs are submitted from a different location every 15'.
Let's see if that solves the problem.
10) Message boards : Theory Application : Endless Theory job
Message 3410
Posted 19 May 2016 by ProfileLeonardo Cristella
Ok I will try to submit jobs from a different place every time.
I will let you know when "fresh" jobs will be available.

Thanks
11) Message boards : Theory Application : Endless Theory job
Message 3403
Posted 19 May 2016 by ProfileLeonardo Cristella
The "jobdata" file should be in /var/lib/condor/execute/dir_4349/
as from your output.
12) Message boards : Theory Application : Endless Theory job
Message 3401
Posted 19 May 2016 by ProfileLeonardo Cristella
I don't know if "runid" and "seed" are written in the running.log, for sure they are in "jobdata".
Wherever you find them is fine.
13) Message boards : Theory Application : Endless Theory job
Message 3399
Posted 19 May 2016 by ProfileLeonardo Cristella
Ok thanks, we are investigating with Condor experts.
If you can find the "jobdata" file in the job execution dir it would be helpful to know the "runid" and "seed" values listed in the first lines.
14) Message boards : Theory Application : Endless Theory job
Message 3396
Posted 19 May 2016 by ProfileLeonardo Cristella
So it looks like duplicate jobs appear only in different tasks, never in the same task.
Can you please retrieve the MC-plot id number and the Condor job one for them?
Do you see a "jobdata" file in the job execution dir?
15) Message boards : Theory Application : Endless Theory job
Message 3391
Posted 19 May 2016 by ProfileLeonardo Cristella
We disabled the Condor jobs automatic resubmission for the moment.
We are not sure it was the cause of duplicating job so it would be good to hear back from you.

Many thanks.
16) Message boards : Theory Application : Endless Theory job
Message 3376
Posted 17 May 2016 by ProfileLeonardo Cristella
Sorry, my typo: I meant ".run", not ".tgz".
By the way you can find the three numbers as revision=2016, runid=5xxxxx, seed=2xx in some output file.
17) Message boards : Theory Application : Endless Theory job
Message 3374
Posted 17 May 2016 by ProfileLeonardo Cristella
Hi Rasputin,
can you please retrieve the job executable name from any of them?
It should be something like "2016-5xxxxx-2xx.tgz".
18) Message boards : Theory Application : Endless Theory job
Message 3369
Posted 16 May 2016 by ProfileLeonardo Cristella
Dear all,
the problem of running the same job several times should be related to the transition between manual and automatci job submission/retrieval. Please notify if you see this still happening.

Thank you,
Leonardo
19) Message boards : Theory Application : No Jobs?
Message 3230
Posted 4 May 2016 by ProfileLeonardo Cristella
Yes, we were doing some tests yesterday and we did not submit the usual bunch of jobs.
Now they are back.

Thank you
20) Message boards : Theory Application : Errors in log
Message 3160
Posted 1 May 2016 by ProfileLeonardo Cristella
I think so.
Next 20


©2025 CERN