Message boards : Theory Application : Errors in log
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2711 - Posted: 13 Apr 2016, 10:27:02 UTC

Please look at this.
It seems, a process is interupted by starting computing events.
Then the events are interupted by the original process.

CKIN(1) changed from 2.00000 to 0.00000
CKIN(2) changed from -1.00000 to 7000.00000
MSTJ(22) changed from 1 to 2
PARJ(71) changed from 10.00000 to 10.00000
******************************************************************************
* *
* PYTUNE : Presets for underlying-event (and min-bias) *
* Last Change : Mar 2011 - P. Skands *
* *
* 324 Perugia NOCR *
* Tuned by P. Skands, hep-ph/1005.3457 *
* Physics Model: T. Sjostrand & P. Skands, hep-ph/0408302 *
* CR by M. Sandhoff & P. Skands, in hep-ph/0604120 *
* LEP parameters tuned by Professor, hep-ph/0907.2973 *
* *
* MSTP(51) = 7 PDF set *
* MSTP(52) = 1 PDF set internal (=1) or pdflib (=2) *
* MSTP( 3) = 2 INT switch for choice of LambdaQCD *
* PARJ(81) = 0.2570 FSR LambdaQCD (inside resonance decays) *
* MSTP(64) = 3 ISR alphaS type *
* PARP(64) = 1.0000 ISR renormalization scale prefactor *
* MSTP(67) = 2 ISR coherence option for 1st emission *
* MSTP(68) = 3 ISR phase space choice & ME corrections *
* (Note: MSTP(68) is not explicitly (re-)set by PYTUNE) *
* PARP(67) = 1.0000 ISR Q2max factor *
* MSTP(72) = 1 IFSR scheme for non-decay FSR *
* PARP(71) = 2.0000 IFSR Q2max factor in non-s-channel procs *
* MSTP(70) = 2 ISR IR regularization scheme *
* PARJ(82) = 0.8000 FSR IR cutoff *
* MSTP(33) = 0 "K" switch for K-factor on/off & type *
* MSTP(81) = 21 UE model *
* PARP(82) = 1.9500 UE IR cutoff at reference ecm *
* (Note: PARP(82) replaces PARP(62).) *
* PARP(89) = 1800.0000 UE IR cutoff reference ecm *
* PARP(90) = 0.2400 UE IR cutoff ecm scaling power *
* MSTP(82) = 5 UE hadron transverse mass distribution *
* PARP(83) = 1.8000 UE mass distribution parameter *
* MSTP(88) = 0 BR composite scheme *
* MSTP(89) = 2 BR color scheme *
* PARP(79) = 2.0000 BR composite x enhancement *
* PARP(80) = 0.0100 BR breakup suppression *
* MSTP(91) = 1 BR primordial kT distribution *
* PARP(91) = 2.0000 BR primordial kT width <|kT|> *
* PARP(93) = 10.0000 BR primordial kT UV cutoff *
* MSTP(95) = 0 FSI color (re-)connection model *
* ---------------------------------------------------------------------- *
* MSTJ(11) = 5 HAD choice of fragmentation function(s) *
* PARJ( 1) = 0.0730 HAD diquark suppression *
* PARJ( 2) = 0.2000 HAD strangeness suppression *
* PARJ( 3) = 0.9400 HAD strange diquark suppression *
#--------------------------------------------------------------------------
# FastJet release 3.0.3
# M. Cacciari, G.P. Salam and G. Soyez
# A software package for jet finding and analysis at colliders
# http://fastjet.fr
#
# Please cite EPJC72(2012)1896 [arXiv:1111.6097] if you use this package
# for scientific work and optionally PLB641(2006)57 [hep-ph/0512210].
#
# FastJet is provided without warranty under the terms of the GNU GPLv2.
# It uses T. Chan's closest pair algorithm, S. Fortune's Voronoi code
# and 3rd party plugin jet algorithms. See COPYING file for details.
#--------------------------------------------------------------------------
100 events processed
200 events processed
300 events processed
400 events processed
500 events processed
600 events processed
700 events processed
800 events processed
Updating display...
Display update finished (0 histograms, 0 events).
900 events processed
1000 events processed
dumping histograms...
1100 events processed
1200 events processed
1300 events processed
.
.
.
14100 events processed
14200 events processed.
14300 events processed
14400 events processed
14500 events processed
14600 events processed
14700 events processed
14800 events processed
14900 events processed
15000 events processed
dumping histograms...
Updating display...
15100 events processed
Display update finished (127 histograms, 15000 events).
15200 events processed
15300 events processed
15400 events processed
15500 events processed
15600 events processed
* PARJ( 4) = 0.0320 HAD vector diquark suppression *
* PARJ( 5) = 0.5000 HAD P(popcorn) *
* PARJ( 6) = 0.5000 HAD extra popcorn B(s)-M-B(s) supp *
* PARJ( 7) = 0.5000 HAD extra popcorn B-M(s)-B supp *
* PARJ(11) = 0.3100 HAD P(vector meson), u and d only *
* PARJ(12) = 0.4000 HAD P(vector meson), contains s *
* PARJ(13) = 0.5400 HAD P(vector meson), heavy quarks *
* PARJ(21) = 0.3130 HAD fragmentation pT *
* PARJ(25) = 0.6300 HAD eta0 suppression *
* PARJ(26) = 0.1200 HAD eta0' suppression *
* PARJ(41) = 0.4900 HAD string parameter a(Meson) *
* PARJ(42) = 1.2000 HAD string parameter b *
* PARJ(45) = 0.5000 HAD string a(Baryon)-a(Meson) *
* PARJ(46) = 1.0000 HAD Lund(=0)-Bowler(=1) rQ (rc) *
* PARJ(47) = 1.0000 HAD Lund(=0)-Bowler(=1) rb *
* *
******************************** END OF PYTUNE *******************************
MSTP(5) changed from 0 to 0
1****************** PYINIT: initialization of PYTHIA routines *****************

==============================================================================
I I
I PYTHIA will be initialized for a p+ on p+ collider I
I at 7000.000 GeV center-of-mass energy I
I I
==============================================================================

******** PYMAXI: summary of differential cross-section maximum search ********

==========================================================
I I I
I ISUB Subprocess name I Maximum value I
I I I
==========================================================
I I I
I 11 f + f' -> f + f' (QCD) I 2.3418D-04 I
I 12 f + fbar -> f' + fbar' I 1.7659D-06 I
I 13 f + fbar -> g + g I 1.9398D-06 I
I 28 f + g -> f + g I 1.4489D-03 I
I 53 g + g -> f + fbar I 1.7145D-05 I
I 68 g + g -> g + g I 5.9918D-04 I
I 96 Semihard QCD 2 -> 2 I 1.1214D+04 I
I I I
==========================================================

****** PYMULT: initialization of multiple interactions for MSTP(82) = 5 ******
pT0 = 2.70 GeV gives sigma(parton-parton) = 5.33D+02 mb: accepted

****** PYMIGN: initialization of multiple interactions for MSTP(82) = 5 ******
pT0 = 2.70 GeV gives sigma(parton-parton) = 2.14D+02 mb: accepted

********************** PYINIT: initialization completed **********************

Error type 4 has occured after 36 PYEXEC calls:
(PYSTRF:) caught in infinite loop

Error type 4 has occured after 628 PYEXEC calls:
(PYSTRF:) caught in infinite loop

Error type 4 has occured after 649 PYEXEC calls:
(PYSTRF:) caught in infinite loop

Error type 4 has occured after 862 PYEXEC calls:
(PYSTRF:) caught in infinite loop

Error type 4 has occured after 1004 PYEXEC calls:
(PYSTRF:) caught in infinite loop

Advisory warning type 9 given after 1138 PYEXEC calls:
(PYPTIS:) Sorry, I got a heavy companion quark here. Not handled yet, giving up!

Advisory warning type 9 given after 1522 PYEXEC calls:
(PYPTIS:) Sorry, I got a heavy companion quark here. Not handled yet, giving up!

Error type 4 has occured after 1797 PYEXEC calls:
(PYSTRF:) caught in infinite loop

Error type 4 has occured after 1839 PYEXEC calls:
(PYSTRF:) caught in infinite loop

Advisory warning type 9 given after 1891 PYEXEC calls:
(PYPTIS:) Sorry, I got a heavy companion quark here. Not handled yet, giving up!
ID: 2711 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 107
Message 2713 - Posted: 13 Apr 2016, 10:54:53 UTC - in response to Message 2711.  

This looks like an application failure.
ID: 2713 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2714 - Posted: 13 Apr 2016, 10:58:41 UTC - in response to Message 2713.  

Do you need more details to fix this?
ID: 2714 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Leonardo Cristella

Send message
Joined: 4 Mar 16
Posts: 31
Credit: 44,320
RAC: 0
Message 2716 - Posted: 13 Apr 2016, 11:23:24 UTC - in response to Message 2714.  

We can not fix this kind of errors.
The person to report them is someone from the Support section at the end of this website: http://mcplots.cern.ch/
ID: 2716 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2717 - Posted: 13 Apr 2016, 11:29:44 UTC - in response to Message 2716.  

Thanks.
If there is a problem, isn't it up to the project admin to fix it or report it to someone, who can?
ID: 2717 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 107
Message 2718 - Posted: 13 Apr 2016, 13:47:17 UTC - in response to Message 2717.  
Last modified: 13 Apr 2016, 13:49:24 UTC

Yes and no. At the moment we are focusing on the infrastructure so the job is a black box. It doesn't really matter if it works or not, what matters is that we can execute it and see those errors in a log file.

As Ivan pointed out in another thread, these jobs and the code are created by scientists and sometimes they make mistakes or in this case it looks like the random event doesn't have code yet to handle that situation.

Sorry, I got a heavy companion quark here. Not handled yet, giving up!

These are real Test4Theory jobs so you should see similar issues with jobs in the production project.

Also failed jobs like this may not be 'failed' jobs. If you count the number of failed jobs of this type you will get statistics on how frequently those events occur.
ID: 2718 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2722 - Posted: 13 Apr 2016, 15:48:19 UTC

I got this. There are a number of failed condor starts



04/11/16 16:18:40 Got activate_claim request from shadow (188.184.187.167)
04/11/16 16:18:40 Remote job ID is 260339.0
04/11/16 16:18:40 Got universe "VANILLA" (5) from request classad
04/11/16 16:18:40 State change: claim-activation protocol successful
04/11/16 16:18:40 Changing activity: Idle -> Busy
04/11/16 16:18:41 PERMISSION DENIED to condor@localhost from host 10.0.2.15 for command 60008 (DC_CHILDALIVE), access level DAEMON: reason: DAEMON authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 10.0.2.15,10.0.2.15, hostname size = 1, original ip address = 10.0.2.15
04/11/16 16:18:53 State change: benchmarks completed
04/11/16 16:21:46 Called deactivate_claim_forcibly()
04/11/16 16:21:46 Starter pid 4557 exited with status 0
04/11/16 16:21:46 State change: starter exited
04/11/16 16:21:46 Changing activity: Busy -> Idle
04/11/16 16:21:47 Got activate_claim request from shadow (188.184.187.167)
04/11/16 16:21:47 Remote job ID is 260340.0
04/11/16 16:21:47 Got universe "VANILLA" (5) from request classad
04/11/16 16:21:47 State change: claim-activation protocol successful
04/11/16 16:21:47 Changing activity: Idle -> Busy
04/11/16 16:21:48 PERMISSION DENIED to condor@localhost from host 10.0.2.15 for command 60008 (DC_CHILDALIVE), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason
04/11/16 16:24:15 Called deactivate_claim_forcibly()
04/11/16 16:24:15 Starter pid 5159 exited with status 0
04/11/16 16:24:15 State change: starter exited
04/11/16 16:24:15 Changing activity: Busy -> Idle
04/11/16 16:24:16 Got activate_claim request from shadow (188.184.187.167)
ID: 2722 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2736 - Posted: 13 Apr 2016, 18:47:34 UTC

console F2 now shows a number of "EXT4-fs error inode doubly allocated?"

This happens mostly with pythia8 jobs.
ID: 2736 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 2740 - Posted: 13 Apr 2016, 20:34:50 UTC - in response to Message 2736.  

console F2 now shows a number of "EXT4-fs error inode doubly allocated?"

Those EXT4-fs error and similar messages are displayed on any screen active at the moment of those errors popping up.
All Consoles are used like a 'System Console' - Console 0
ID: 2740 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2741 - Posted: 13 Apr 2016, 20:50:48 UTC - in response to Message 2740.  

I have this only on F2.
F3 for example shows TOP(cpu stats)
ID: 2741 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 2742 - Posted: 13 Apr 2016, 21:31:52 UTC - in response to Message 2741.  

I have this only on F2.
F3 for example shows TOP(cpu stats)

It will also been displayed on ALT-F3, but because the top screen is refreshed ever few seconds, you must be 'lucky' to catch such a message.
ID: 2742 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2743 - Posted: 13 Apr 2016, 21:38:26 UTC - in response to Message 2742.  

In any case, i hope, it will be fixed, soon.
ID: 2743 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Leonardo Cristella

Send message
Joined: 4 Mar 16
Posts: 31
Credit: 44,320
RAC: 0
Message 2744 - Posted: 14 Apr 2016, 8:17:06 UTC - in response to Message 2743.  

It is related to the ALLOW_DAEMON rule, it is not a critical error.
We are investigating.
ID: 2744 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2745 - Posted: 14 Apr 2016, 8:49:43 UTC
Last modified: 14 Apr 2016, 9:13:49 UTC

There are a large number of different "apps".
Phytia EDIT 6.xxx and 8.xxx.
Are they, more or less, all doing the same thing?
They have quite different run-times(for the same number of events).
Are you trying to find the best one, or is this mix going to stay as is?
ID: 2745 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Leonardo Cristella

Send message
Joined: 4 Mar 16
Posts: 31
Credit: 44,320
RAC: 0
Message 2746 - Posted: 14 Apr 2016, 9:04:41 UTC - in response to Message 2745.  

If the different "apps" belong to different jobs types (PHYTIA, Sherpa, ...) then it is not in our hands.
Otherwise can you please post a screenshot of the log?

Thank you.
ID: 2746 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2747 - Posted: 14 Apr 2016, 9:20:01 UTC - in response to Message 2746.  
Last modified: 14 Apr 2016, 9:20:25 UTC

Ok, they are just different types of jobs.
Is there a way to put the outcome of each job in the stderr of the result?
This way, we could monitor, if jobs are failing or not and how many are done.
ID: 2747 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Leonardo Cristella

Send message
Joined: 4 Mar 16
Posts: 31
Credit: 44,320
RAC: 0
Message 2748 - Posted: 14 Apr 2016, 9:37:59 UTC - in response to Message 2747.  

The job application failure is different from the job failure.
The former is not related to us (it can be reported to the MCPlots support: http://mcplots.cern.ch/) and the job status will be successful though.
The latter can be monitored here:

http://www.citizencyberscience.net/t4t-webapp/stats/
http://mcplots-dev.cern.ch/production.php?view=status

and maybe in some other user related statistics.
ID: 2748 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2749 - Posted: 14 Apr 2016, 10:08:39 UTC - in response to Message 2748.  

Thanks, Leonardo.
ID: 2749 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 849,977
RAC: 1,466
Message 2750 - Posted: 14 Apr 2016, 10:33:11 UTC - in response to Message 2747.  

Ok, they are just different types of jobs.
Is there a way to put the outcome of each job in the stderr of the result?
This way, we could monitor, if jobs are failing or not and how many are done.

Jobs are failing rarely. 5 jobs failed out of 831 jobs you ran for vLHCathome on the machine you're testing here: http://mcplots-dev.cern.ch/cache/stats/host-8879-82695.txt.

As you know Ben Segal wrote, the team is investigating a manner to report the Theory-jobs from vLHCathome-dev also to MCPLOTs.
ID: 2750 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2751 - Posted: 14 Apr 2016, 10:39:42 UTC - in response to Message 2750.  

Thanks Crystal.
ID: 2751 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Theory Application : Errors in log


©2024 CERN