Message boards :
LHCb Application :
v0.05 task doing something
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 594 |
The current v0.05 task has been running for 1 hour 45 minutes and has python using 90% or more of the cpu. Edit: Ran for a little over 2 hours and completed whatever it was doing okay :) |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Take a look at the LHCb jobs link on the top left. You can see the outcome of the jobs and that they are all good. Next the logging will be improved but this will not be done until after Easter. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 9 |
CPU-usage yes, but is it doing real work? I returned one task with a run time over 2 hours and besides other python processes 1 single python process used 103 CPU-minutes of that. I also saw a process Job128478678, but that process had only 0.01s of CPU. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I have a task that ran for over two hours. Instead of ending, it started a new python process and appears to be running a second job within the same task. Is that to be expected? EDIT: It finished a little later. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Yes, in theory the LHCb pilot should run for 24 hours and will run as many jobs a possible during that time. For a few details about the LHCb pilots see this presentation. http://indico.cern.ch/event/304944/session/4/contribution/113 |
Send message Joined: 13 Apr 15 Posts: 138 Credit: 2,969,210 RAC: 0 |
From the stderr of my last task which ran for 5hrs 23mins with cpu time of 4hrs 45mins 2016-03-23 17:53:52 (2736): Status Report: Job Duration: '129600.000000' 2016-03-23 17:53:52 (2736): Status Report: Elapsed Time: '6003.223518' 2016-03-23 17:53:52 (2736): Status Report: CPU Time: '5274.062500' 2016-03-23 19:34:00 (2736): Status Report: Job Duration: '129600.000000' 2016-03-23 19:34:00 (2736): Status Report: Elapsed Time: '12011.943056' 2016-03-23 19:34:00 (2736): Status Report: CPU Time: '11166.625000' 2016-03-23 21:14:06 (2736): Status Report: Job Duration: '129600.000000' 2016-03-23 21:14:06 (2736): Status Report: Elapsed Time: '18018.871914' 2016-03-23 21:14:06 (2736): Status Report: CPU Time: '16324.078125' I interpret that as being 3 jobs done. Taking figures from the first of those: Potential (max allowed?) job duration 129600 (36hrs) Wallclock time 6003 (100 mins) Actual cpu time 5274 (88 mins) |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 9 |
I interpret that as being 3 jobs done. Nope! Vboxwrapper is writing every 6000 seconds to stderr.txt a kind of time stamp with used cpu seconds. |
Send message Joined: 15 Apr 16 Posts: 3 Credit: 8,855 RAC: 0 |
I've just completed two LHCb tasks and I notced, in both, a huge difference between RUN time and CPU time, as you can see from the log below the VM is checkpointing every 6000sec but the CPU time progress is much slower, I would say it almost stucks. Is it normal? Is it designed to behave like this? Guest Log: [INFO] LHCb application starting. Check log files. |
Send message Joined: 17 Aug 15 Posts: 62 Credit: 296,695 RAC: 0 |
I have 3 tasks running LHCb 0.05 CPU usage is close to zero. Tullio |
Send message Joined: 3 Mar 16 Posts: 10 Credit: 33,623 RAC: 0 |
I would need more info to understand what is going on. Could you give me your Processor model name?I'll find the jobs you are running and have a look. Cheers Cinzia |
Send message Joined: 15 Apr 16 Posts: 3 Credit: 8,855 RAC: 0 |
Processor i7 4770@3.4GHz tasks which I'm running http://lhcathomedev.cern.ch/vLHCathome-dev/results.php?userid=374 Thanks Max |
Send message Joined: 17 Aug 15 Posts: 62 Credit: 296,695 RAC: 0 |
I would need more info to understand what is going on. Computer ID is 663. CPU is AMD A10-6700. Tullio |
Send message Joined: 3 Mar 16 Posts: 10 Credit: 33,623 RAC: 0 |
Thanks, I come back to you soon. Cinzia |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 |
Computer-ID 1165 is running LHCb task since 2,5 hours (7,14% CPU used). AMD-A10-7850 Win 10pro(x64), Virtualbox 5.0.16 with ExtensionPack 5.0.18. Boinc 7.6.22 In RDP with Alt+F10: Welcome to CernVM Virtual Machine, version 3.5.1.14 Machine UUID bbec5d01...... To contextualize your VM log-in to http://cernvm-online.cern.ch In RDP with Alt+F6: Welcome to CERN Virtual Machine, version 3.5.1.14 based on Scientific Linux release 6.6 (Carbon) Kernel 3.10.64-85.cernvm.x86_64 on an x86_64 In RDP with Alt+F3: Linux Console with Python In RDP with Alt+F2 last two lines: Setting /LocalSite/Site = Boinc.World.org Setting /LocalSite/GridCE = Boinc-World-CE.org In RDP with Alt+F1 login Infos |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 9 |
LHCb job finished. Not sure if the result could report (complete running.log available): Retry connection: 2 Waiting 5.000000 second before retry all service(s) Error while handshaking Error while handshaking [('Your certificate is invalid', 'SSL routines', 'SSL3_READ_BYTES', 'tlsv1 alert unknown ca')] Error while handshaking [('Your certificate is invalid', 'SSL routines', 'SSL3_READ_BYTES', 'tlsv1 alert unknown ca')] Sending accounting records to failover Cleaning up job working directory . . After the finish the VM is idling and from ALT-F1: Pilot finished. Shutting down! But it doesn't. I stopped the task myself. |
Send message Joined: 17 Aug 15 Posts: 62 Credit: 296,695 RAC: 0 |
On my 64-bit Linux box, 1210 Opteron with 2 cores. 2016-04-22 23:51:00 (28876): Status Report: CPU Time: '2994.980000' 2016-04-23 01:30:08 (28876): Status Report: Job Duration: '129600.000000' 2016-04-23 01:30:08 (28876): Status Report: Elapsed Time: '36003.678585' 2016-04-23 01:30:08 (28876): Status Report: CPU Time: '3479.530000' CPU time is 1/10 of elapsed time. Tullio |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 |
Computer-ID 1165 finished first LHCb task CPU time was 10% of duration-time. At beginning of the task 0:28 hour remaining time was shown in Boinc. Edit Duration: 129,715.35 sec CPU: 14,651.53 sec |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 |
Computer-ID 1165 finished first LHCb task Have started 2. lHCb-Task 8:25 UTC Port 51115 stdout.log: Output of the job wrapper may appear here. stderr.log: Error messages may appear here. running.log: Directories in PYTHONPATH: [''] 2016-04-25 08:22:52 UTC INFO [Pilot] Executing commands: ['LHCbGetPilotVersion', 'CheckWorkerNode', 'LHCbInstallDIRAC', 'LHCbConfigureBasics', 'LHCbConfigureSite', 'LHCbConfigureArchitecture', 'LHCbConfigureCPURequirements', 'LaunchAgent'] 2016-04-25 08:22:52 UTC INFO [Pilot] Requested command extensions: ['LHCbPilot'] 2016-04-25 08:22:52 UTC INFO [Pilot] Command LHCbGetPilotVersion instantiated from LHCbPilotCommands 2016-04-25 08:22:52 UTC INFO [LHCbGetPilotVersion] Pilot version not requested as pilot script option, going to find it 2016-04-25 08:22:52 UTC INFO [LHCbGetPilotVersion] Setting pilot version to v8r2p36 2016-04-25 08:22:52 UTC INFO [Pilot] Command CheckWorkerNode instantiated from pilotCommands 2016-04-25 08:22:52 UTC INFO [CheckWorkerNode] Uname = Linux 378-1165-17134 3.10.64-85.cernvm.x86_64 #1 SMP Fri Jan 9 09:53:29 CET 2015 x86_64 2016-04-25 08:22:52 UTC INFO [CheckWorkerNode] Host Name = 378-1165-17134 2016-04-25 08:22:52 UTC INFO [CheckWorkerNode] Host FQDN = localhost.localdomain 2016-04-25 08:22:52 UTC INFO [CheckWorkerNode] WorkingDir = /home/boinc/pilot 2016-04-25 08:22:52 UTC INFO [CheckWorkerNode] RedHat Release = Scientific Linux release 6.6 (Carbon) 2016-04-25 08:22:52 UTC INFO [CheckWorkerNode] Linux release: 2016-04-25 08:22:52 UTC INFO [CheckWorkerNode] LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch 2016-04-25 08:22:52 UTC INFO [CheckWorkerNode] CPU (model) = AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G 2016-04-25 08:22:52 UTC INFO [CheckWorkerNode] CPU (MHz) = 1 x 3800.176 2016-04-25 08:22:52 UTC INFO [CheckWorkerNode] Memory (kB) = 2050972 2016-04-25 08:22:52 UTC INFO [CheckWorkerNode] FreeMem. (kB) = 1773832 2016-04-25 08:22:52 UTC INFO [CheckWorkerNode] DiskSpace (MB) = 17286 2016-04-25 08:22:52 UTC INFO [Pilot] Command LHCbInstallDIRAC instantiated from LHCbPilotCommands ******************************************************************************** * ---- LHCb Login v8r5p3 ---- * * Building with gcc49 on slc6 x86_64 system (x86_64-slc6-gcc49-opt) * ******************************************************************************** --- User_release_area is set to /home/boinc/pilot/cmtuser --- LHCBPROJECTPATH is set to: /cvmfs/lhcb.cern.ch/lib/lhcb /cvmfs/lhcb.cern.ch/lib/lcg/releases /cvmfs/lhcb.cern.ch/lib/lcg/app/releases /cvmfs/lhcb.cern.ch/lib/lcg/external -------------------------------------------------------------------------------- Using CMTPROJECTPATH = '/cvmfs/lhcb.cern.ch/lib/lhcb:/cvmfs/lhcb.cern.ch/lib/lcg/releases:/cvmfs/lhcb.cern.ch/lib/lcg/app/releases:/cvmfs/lhcb.cern.ch/lib/lcg/external' Environment for LbScripts v8r5p3 ready. (Compat v1r19 from /cvmfs/lhcb.cern.ch/lib/lhcb/COMPAT/COMPAT_v1r19, LbScripts v8r5p3 from /cvmfs/lhcb.cern.ch/lib/lhcb/LBSCRIPTS/LBSCRIPTS_v8r5p3, LCGCMT 84 from /cvmfs/lhcb.cern.ch/lib/lcg/releases/LCGCMT/LCGCMT_84, Compat v1r19 from /cvmfs/lhcb.cern.ch/lib/lhcb/COMPAT/COMPAT_v1r19) 2016-04-25 08:23:23 UTC INFO [LHCbInstallDIRAC] lb-run DONE, for release v8r2p36 2016-04-25 08:23:23 UTC INFO [Pilot] Command LHCbConfigureBasics instantiated from LHCbPilotCommands 2016-04-25 08:23:23 UTC WARN [LHCbConfigureBasics] Can't find shared area, forcing it to /cvmfs/lhcb.cern.ch/lib 2016-04-25 08:23:23 UTC INFO [LHCbConfigureBasics] Executing command dirac-configure -S "LHCb-Production" -C "dips://lbvobox46.cern.ch:9135/Configuration/Server" -o /LocalSite/ReleaseProject=LHCb -o /LocalSite/ReleaseVersion=v8r2p36 -o /LocalSite/SharedArea=/cvmfs/lhcb.cern.ch/lib -DMH --UseServerCertificate -o /DIRAC/Security/CertFile=/etc/grid-security/hostcert.pem -o /DIRAC/Security/KeyFile=/etc/grid-security/hostkey.pem -O pilot.cfg Executing: /cvmfs/lhcb.cern.ch/lib/lhcb/DIRAC/DIRAC_v6r14p24/scripts/dirac-configure -S LHCb-Production -C dips://lbvobox46.cern.ch:9135/Configuration/Server -o /LocalSite/ReleaseProject=LHCb -o /LocalSite/ReleaseVersion=v8r2p36 -o /LocalSite/SharedArea=/cvmfs/lhcb.cern.ch/lib -DMH --UseServerCertificate -o /DIRAC/Security/CertFile=/etc/grid-security/hostcert.pem -o /DIRAC/Security/KeyFile=/etc/grid-security/hostkey.pem -O pilot.cfg Checking DIRAC installation at "/cvmfs/lhcb.cern.ch/lib/lhcb/DIRAC/DIRAC_v6r14p24" URL banned dips://lhcb-conf2-dirac.cern.ch:9135/Configuration/Server 2016-04-25 08:23:40 UTC INFO [Pilot] Command LHCbConfigureSite instantiated from LHCbPilotCommands 2016-04-25 08:23:40 UTC INFO [LHCbConfigureSite] Executing command dirac-configure -o /LocalSite/GridMiddleware=DIRAC -n "BOINC.World.org" -S "LHCb-Production" -N "Boinc-World-CE.org" -o /LocalSite/GridCE=Boinc-World-CE.org -o /LocalSite/CEQueue=Boinc.World.Queue --UseServerCertificate -o /DIRAC/Security/CertFile=/etc/grid-security/hostcert.pem -o /DIRAC/Security/KeyFile=/etc/grid-security/hostkey.pem -FDMH -O pilot.cfg pilot.cfg Executing: /cvmfs/lhcb.cern.ch/lib/lhcb/DIRAC/DIRAC_v6r14p24/scripts/dirac-configure -o /LocalSite/GridMiddleware=DIRAC -n BOINC.World.org -S LHCb-Production -N Boinc-World-CE.org -o /LocalSite/GridCE=Boinc-World-CE.org -o /LocalSite/CEQueue=Boinc.World.Queue --UseServerCertificate -o /DIRAC/Security/CertFile=/etc/grid-security/hostcert.pem -o /DIRAC/Security/KeyFile=/etc/grid-security/hostkey.pem -FDMH -O pilot.cfg pilot.cfg Checking DIRAC installation at "/cvmfs/lhcb.cern.ch/lib/lhcb/DIRAC/DIRAC_v6r14p24" Will update the output file pilot.cfg Setting /LocalSite/Site = BOINC.World.org Setting /LocalSite/GridCE = Boinc-World-CE.org 2016-04-25 08:23:43 UTC INFO [Pilot] Command LHCbConfigureArchitecture instantiated from LHCbPilotCommands 2016-04-25 08:23:43 UTC INFO [LHCbConfigureArchitecture] Executing command dirac-architecture -o /DIRAC/Security/UseServerCertificate=yes pilot.cfg x86_64-slc6 2016-04-25 08:23:48 UTC INFO [LHCbConfigureArchitecture] Executing command dirac-configure -FDMH --UseServerCertificate -O pilot.cfg pilot.cfg -S "LHCb-Production" -o /LocalSite/Architecture=x86_64-slc6 Executing: /cvmfs/lhcb.cern.ch/lib/lhcb/DIRAC/DIRAC_v6r14p24/scripts/dirac-configure -FDMH --UseServerCertificate -O pilot.cfg pilot.cfg -S LHCb-Production -o /LocalSite/Architecture=x86_64-slc6 Checking DIRAC installation at "/cvmfs/lhcb.cern.ch/lib/lhcb/DIRAC/DIRAC_v6r14p24" Will update the output file pilot.cfg Setting /LocalSite/Site = BOINC.World.org Setting /LocalSite/GridCE = Boinc-World-CE.org 2016-04-25 08:23:51 UTC INFO [LHCbConfigureArchitecture] Setting variable CMTCONFIG=x86_64-slc6 2016-04-25 08:23:51 UTC INFO [Pilot] Command LHCbConfigureCPURequirements instantiated from LHCbPilotCommands 2016-04-25 08:23:51 UTC INFO [LHCbConfigureCPURequirements] Executing command dirac-wms-cpu-normalization -U -o /DIRAC/Security/UseServerCertificate=yes -R pilot.cfg pilot.cfg Estimated CPU power is 6.6 HS06 MJF not available on this node 2016-04-25 08:25:25 UTC INFO [LHCbConfigureCPURequirements] Current normalized CPU as determined by 'dirac-wms-cpu-normalization' is 6.600000 2016-04-25 08:25:25 UTC INFO [LHCbConfigureCPURequirements] Executing command dirac-wms-get-queue-cpu-time -o /DIRAC/Security/UseServerCertificate=yes pilot.cfg 15151 2016-04-25 08:25:29 UTC INFO [LHCbConfigureCPURequirements] CPUTime left (in seconds) is 15151 2016-04-25 08:25:29 UTC INFO [LHCbConfigureCPURequirements] Queue length (which is also set as CPUTimeLeft) is 99996.600000 2016-04-25 08:25:29 UTC INFO [LHCbConfigureCPURequirements] Executing command dirac-configure -FDMH -o /DIRAC/Security/UseServerCertificate=yes -O pilot.cfg pilot.cfg -o /LocalSite/CPUTimeLeft=99996 Executing: /cvmfs/lhcb.cern.ch/lib/lhcb/DIRAC/DIRAC_v6r14p24/scripts/dirac-configure -FDMH -o /DIRAC/Security/UseServerCertificate=yes -O pilot.cfg pilot.cfg -o /LocalSite/CPUTimeLeft=99996 Checking DIRAC installation at "/cvmfs/lhcb.cern.ch/lib/lhcb/DIRAC/DIRAC_v6r14p24" Will update the output file pilot.cfg Setting /LocalSite/Site = BOINC.World.org Setting /LocalSite/GridCE = Boinc-World-CE.org 2016-04-25 08:25:31 UTC INFO [Pilot] Command LaunchAgent instantiated from pilotCommands 2016-04-25 08:25:31 UTC INFO [LaunchAgent] User Name = boinc 2016-04-25 08:25:31 UTC INFO [LaunchAgent] User Id = 500 2016-04-25 08:25:31 UTC INFO [LaunchAgent] Starting JobAgent 2016-04-25 08:25:31 UTC INFO [LaunchAgent] Executing command dirac-agent WorkloadManagement/JobAgent -o MaxCycles=10 -s /Resources/Computing/CEDefaults -o WorkingDirectory=/home/boinc/pilot -o /LocalSite/MaxCPUTime=99996 -o /LocalSite/CPUTime=99996 -o MaxTotalJobs=10 -o /DIRAC/Security/UseServerCertificate=yes -o /LocalSite/InstancePath=/home/boinc/pilot -o /AgentJobRequirements/ExtraOptions=pilot.cfg pilot.cfg /home/boinc/pilot/pilot.cfg |
Send message Joined: 3 Mar 16 Posts: 10 Credit: 33,623 RAC: 0 |
The jobs that you, Zurlistuta and Tullio, were running are done, no problem found. |
Send message Joined: 3 Mar 16 Posts: 10 Credit: 33,623 RAC: 0 |
The jobs that you were running are done, no problem found. |
©2024 CERN