Message boards : Theory Application : Errors in log
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
Are you interested in these kind of errors? PYTHIA8=/cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt65/pythia8/212/x86_64-slc6-gcc47-opt |
![]() Send message Joined: 4 Mar 16 Posts: 31 Credit: 44,320 RAC: 0 ![]() |
Yes thank you. I have already noticed this error, it should be fixed soon. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
MAssive number of error messages: (Several megabytes logfile)
Rivet.Analysis.LHCB_2011_I917009: ERROR Could not determine lifetime for particle with PID 990... This V^0 will be considered unprompt! |
![]() Send message Joined: 4 Mar 16 Posts: 31 Credit: 44,320 RAC: 0 ![]() |
Does it happen in several jobs? If not don't worry, it's just an application (Rivet) error which is handled by the same application. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
So far only one. It seems, it called LHCb and ALICE files----strange. |
![]() Send message Joined: 4 Mar 16 Posts: 31 Credit: 44,320 RAC: 0 ![]() |
I don't see "ALICE" in the log you posted. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
Rivet.Analysis.LHCB_2012_I1119400: WARN Lifetime map imcomplete --- 990... assume zero lifetime 100000 events processed dumping histograms... Generator run finished successfully Rivet.Analysis.Handler: INFO Finalising analyses Rivet.Analysis.Handler: INFO Processed 100000 events The MCnet usage guidelines apply to Rivet: see http://www.montecarlonet.org/GUIDELINES Please acknowledge plots made with Rivet analyses, and cite arXiv:1003.0694 (http://arxiv.org/abs/1003.0694) Processing histograms... input = /var/lib/condor/execute/dir_16617/tmp/tmp.tr9IFxQOxt/flat output = /var/lib/condor/execute/dir_16617 ./runRivet.sh: line 676: 17752 Killed display_service $tmpd_dump "$beam $process $energy $params $generator $version $tune" (wd: /var/lib/condor/execute/dir_16617) mc: ALICE_2010_S8624100_d11-x01-y01.dat -> /var/lib/condor/execute/dir_16617 /dat/pp/uemb-soft/nch/alice3-eta0.5/900/herwig++/2.7.1/default.dat |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
I checked the finshed logs. In 3 out of 25 are references to "ALICE" or "lhcb" or "atlas" or "cms". The others do not. |
![]() Send message Joined: 4 Mar 16 Posts: 31 Credit: 44,320 RAC: 0 ![]() |
I see. By reading the log, ALICE related settings start after 100000 LHCb events processed so they are not mixed together in my opinion. It should be all right. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
I have an other one like this.(massive amounts of error messages) Can i ignore it? (error messages as posted before) It begins: ===> [runRivet] Sun May 1 15:42:03 CEST 2016 [boinc pp uemb-soft 7000 - - pythia8 8.180 default-noCR 100000 222] Setting environment... MCGENERATORS=/cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt65 gcc = /cvmfs/sft.cern.ch/lcg/external/gcc/4.7.2/x86_64-slc6-gcc47-opt/bin/gcc gcc version = 4.7.2 RIVET=/cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt65/rivet/2.4.0/x86_64-slc6-gcc47-opt RIVET_REF_PATH=/cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt65/rivet/2.4.0/x86_64-slc6-gcc47-opt/share/Rivet RIVET_ANALYSIS_PATH=/var/lib/condor/execute/dir_23382/analyses ROOTSYS=/cvmfs/sft.cern.ch/lcg/app/releases/ROOT/5.34.19/x86_64-slc6-gcc47-opt/root Input parameters: mode=boinc beam=pp process=uemb-soft energy=7000 params=- specific=- generator=pythia8 version=8.180 tune=default-noCR nevts=100000 seed=222 Prepare temporary directories and files ... workd=/var/lib/condor/execute/dir_23382 tmpd=/var/lib/condor/execute/dir_23382/tmp/tmp.p6XviGk2Rs tmp_params=/var/lib/condor/execute/dir_23382/tmp/tmp.p6XviGk2Rs/generator.params tmp_hepmc=/var/lib/condor/execute/dir_23382/tmp/tmp.p6XviGk2Rs/generator.hepmc tmp_yoda=/var/lib/condor/execute/dir_23382/tmp/tmp.p6XviGk2Rs/generator.yoda tmp_jobs=/var/lib/condor/execute/dir_23382/tmp/tmp.p6XviGk2Rs/jobs.log tmpd_flat=/var/lib/condor/execute/dir_23382/tmp/tmp.p6XviGk2Rs/flat tmpd_dump=/var/lib/condor/execute/dir_23382/tmp/tmp.p6XviGk2Rs/dump tmpd_html=/var/lib/condor/execute/dir_23382/tmp/tmp.p6XviGk2Rs/html Prepare Rivet parameters ... analysesNames=ALICE_2010_S8625980 ALICE_2012_I1181770 ATLAS_2010_S8918562 ATLAS_2011_I894867 ATLAS_2012_I1084540 ATLAS_2012_I1183818 CMS_2010_S8656010 CMS_2011_S8884919 CMS_2011_S8978280 CMS_2011_S9120041 CMS_2011_S9215166 CMS_2012_I1184941 CMS_2012_I1193338 CMS_2013_I1218372 LHCB_2011_I917009 LHCB_2012_I1119400 LHCF_2012_I1115479 MC_GAPS TOTEM_2012_I1115294 Unpack data histograms... dataFiles = /cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt65/rivet/2.4.0/x86_64-slc6-gcc47-opt/share/Rivet/ALICE_2010_S8625980.yoda /cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt65/rivet/2.4.0/x86_64-slc6-gcc47-opt/share/Rivet/ALICE_2012_I1181770.yoda /cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt65/rivet/2.4.0/x86_64-slc6-gcc47-opt/share/Rivet/ATLAS_2010_S8918562.yoda /cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt65/rivet/2.4.0/x86_64-slc6-gcc47-opt/share/Rivet/ATLAS_2011_I894867.yoda /cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt65/rivet/2.4.0/x86_64-slc6-gcc47-opt/share/Rivet/ATLAS_2012_I1084540.yoda /cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt65/rivet/2.4.0/x86_64-slc6-gcc47-opt/share/Rivet/ATLAS_2012_I1183818.yoda /cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt65/rivet/2.4.0/x86_64-slc6-gcc47-opt/share/Rivet/CMS_2010_S8656010.yoda |
![]() Send message Joined: 4 Mar 16 Posts: 31 Credit: 44,320 RAC: 0 ![]() |
I think so. |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 0 ![]() ![]() |
There are a large number of different "apps". The basic idea is described here, and this post explains what we are doing in more detail. The results end up here, for use as a common resource by theorists. I'm sure that there are more recent explanations, but these were provided when (then) T4T first started and are still useful. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
Thanks for the info, m. Quite interesting. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
I had a Boinc-task fail with compute error. Exit status 194 (0xc2) EXIT_ABORTED_BY_CLIENT http://lhcathomedev.cern.ch/vLHCathome-dev/result.php?resultid=168295 What is that all about? |
Send message Joined: 13 Feb 15 Posts: 1206 Credit: 885,243 RAC: 497 ![]() ![]() |
What is that all about? This could be the reason, but I don't know why heartbeat file is suddenly missing. 2016-05-02 21:06:22 (1008): VM Heartbeat file specified, but missing heartbeat. Saw that only during first attempts of the applications due to bad configuration. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
I have not seen that before, that is why i asked. |
Send message Joined: 13 Feb 15 Posts: 1206 Credit: 885,243 RAC: 497 ![]() ![]() |
Endless looping in job going on now for 40 minutes: Updating display... Display update finished (0 histograms, 0 events). Job with parameters: ===> [runRivet] Thu May 5 08:21:35 CEST 2016 [boinc ppbar uemb-soft 63 - - sherpa 2.1.1 default 1000 240] Sherpa using 98% of the CPU. Number of events to do is 1,000 and every 100 events progress should be displayed. Maybe progressing is so difficult, that it cannot make 100 events in 2 hours and so will not finish the whole job in the 18 hours task limit. I'll wait another 80 minutes and will kill the job and ask for a new job, when no progress is shown. |
Send message Joined: 13 Feb 15 Posts: 1206 Credit: 885,243 RAC: 497 ![]() ![]() |
It looks like I got the same job again with a new BOINC-task in a newly created VM. Time to go outside. ===> [runRivet] Thu May 5 12:46:43 CEST 2016 [boinc ppbar uemb-soft 63 - - sherpa 2.1.1 default 1000 240] If it behaves like the one I posted before (no progress in 2 hours), I'll let it run overnight with modified extended <job_duration>. |
Send message Joined: 13 Feb 15 Posts: 1206 Credit: 885,243 RAC: 497 ![]() ![]() |
Not sure, whether the bold line is causing troubles. Output_Phase::Output_Phase(): Set output interval 1000000000 events. ---------------------------------------------------------- -- SHERPA generates events with the following structure -- ---------------------------------------------------------- Perturbative : Signal_Processes Perturbative : Hard_Decays Perturbative : Jet_Evolution:CSS Perturbative : Lepton_FS_QED_Corrections:Photons Perturbative : Multiple_Interactions:Amisic Perturbative : Minimum_Bias:Off Hadronization : Beam_Remnants Hadronization : Hadronization:Ahadic Hadronization : Hadron_Decays Analysis : HepMC2 Updating display... Display update finished (0 histograms, 0 events). Updating display... Display update finished (0 histograms, 0 events). Updating display... Display update finished (0 histograms, 0 events). |
Send message Joined: 13 Feb 15 Posts: 1206 Credit: 885,243 RAC: 497 ![]() ![]() |
It seems the job is killed: 05/05/16 12:46:42 Remote job ID is 287818.0 05/05/16 12:46:42 Got universe "VANILLA" (5) from request classad 05/05/16 12:46:42 State change: claim-activation protocol successful 05/05/16 12:46:42 Changing activity: Idle -> Busy 05/05/16 15:35:14 condor_write(): Socket closed when trying to write 2878 bytes to collector alicondor01.cern.ch, fd is 9, errno=104 Connection reset by peer 05/05/16 15:35:14 Buf::write(): condor_write() failed 05/05/16 16:50:25 condor_read() failed: recv(fd=6) returned -1, errno = 104 Connection reset by peer, reading 21 bytes from collector alicondor01.cern.ch. 05/05/16 16:50:25 IO: Failed to read packet header 05/05/16 16:50:25 CCBListener: failed to receive message from CCB server alicondor01.cern.ch 05/05/16 16:50:25 CCBListener: connection to CCB server alicondor01.cern.ch failed; will try to reconnect in 60 seconds. 05/05/16 16:51:25 CCBListener: registered with CCB server alicondor01.cern.ch as ccbid 188.184.129.127:9618?addrs=188.184.129.127-9618&noUDP&sock=collector#39120 05/05/16 17:42:13 State change: claim no longer recognized by the schedd - removing claim 05/05/16 17:42:13 Changing state and activity: Claimed/Busy -> Preempting/Killing 05/05/16 17:42:43 starter (pid 4324) is not responding to the request to hardkill its job. The startd will now directly hard kill the starter and all its decendents. 05/05/16 17:42:43 Starter pid 4324 died on signal 9 (signal 9 (Killed)) 05/05/16 17:42:43 State change: starter exited 05/05/16 17:42:43 State change: No preempting claim, returning to owner 05/05/16 17:42:43 Changing state and activity: Preempting/Killing -> Owner/Idle 05/05/16 17:42:43 State change: IS_OWNER is false 05/05/16 17:42:43 Changing state: Owner -> Unclaimed 05/05/16 17:42:43 State change: RunBenchmarks is TRUE 05/05/16 17:42:43 Changing activity: Idle -> Benchmarking 05/05/16 17:42:43 BenchMgr:StartBenchmarks() 05/05/16 17:42:44 State change: RunBenchmarks is TRUE 05/05/16 17:42:44 Changing activity: Benchmarking -> Idle 05/05/16 17:42:57 Request accepted. 05/05/16 17:42:57 Remote owner is test4theory@cern.ch 05/05/16 17:42:57 State change: claiming protocol successful 05/05/16 17:42:57 Changing state: Unclaimed -> Claimed 05/05/16 17:42:57 Got activate_claim request from shadow (188.184.187.167) 05/05/16 17:42:57 Remote job ID is 292572.0 05/05/16 17:42:57 Got universe "VANILLA" (5) from request classad 05/05/16 17:42:57 State change: claim-activation protocol successful 05/05/16 17:42:57 Changing activity: Idle -> Busy 05/05/16 17:43:07 State change: benchmarks completed 05/05/16 17:55:32 condor_write(): Socket closed when trying to write 4096 bytes to collector alicondor01.cern.ch, fd is 8, errno=104 Connection reset by peer 05/05/16 17:55:32 Buf::write(): condor_write() failed |
©2025 CERN