Name h1OKDmcvoDwnShfckohDCDFpABFKDmABFKDmyiALDmPEFKDmmxtrWm_1
Workunit 1973481
Created 23 Jan 2020, 11:22:38 UTC
Sent 26 Jan 2020, 15:07:05 UTC
Report deadline 2 Feb 2020, 15:07:05 UTC
Received 26 Jan 2020, 15:49:13 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x00000000)
Computer ID 3717
Run time 37 min 36 sec
CPU time 1 hours 22 min 0 sec
Validate state Valid
Credit 92.07
Device peak FLOPS 17.62 GFLOPS
Application version ATLAS Simulation v1.00 (native_mt)
x86_64-pc-linux-gnu
Peak working set size 1.86 GB
Peak swap size 2.55 GB
Peak disk usage 723.13 MB

Stderr output

<core_client_version>7.12.0</core_client_version>
<![CDATA[
<stderr_txt>
16:08:44 (3761): wrapper (7.7.26015): starting
16:08:44 (3761): wrapper: running run_atlas (--nthreads 4)
zo 26 jan 2020 16:08:44 CET: Arguments: --nthreads 4
zo 26 jan 2020 16:08:44 CET: Threads: 4
zo 26 jan 2020 16:08:44 CET: Checking for CVMFS
zo 26 jan 2020 16:08:51 CET: Probing /cvmfs/atlas.cern.ch... OK
zo 26 jan 2020 16:08:55 CET: Probing /cvmfs/atlas-condb.cern.ch... OK
zo 26 jan 2020 16:08:58 CET: Probing /cvmfs/grid.cern.ch... OK
zo 26 jan 2020 16:08:58 CET: Probing /cvmfs/cernvm-prod.cern.ch... OK
zo 26 jan 2020 16:08:59 CET: Probing /cvmfs/sft.cern.ch... OK
zo 26 jan 2020 16:09:01 CET: Probing /cvmfs/alice.cern.ch... OK
zo 26 jan 2020 16:09:02 CET: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE
zo 26 jan 2020 16:09:02 CET: 2.5.2.0 3905 0 24424 59624 3 1 3069036 4194304 0 65024 0 0 n/a 20919 4060 http://s1ral-cvmfs.openhtc.io/cvmfs/atlas.cern.ch DIRECT 1
zo 26 jan 2020 16:09:02 CET: CVMFS is ok
zo 26 jan 2020 16:09:02 CET: Using singularity image /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img
zo 26 jan 2020 16:09:02 CET: Checking for singularity binary...
zo 26 jan 2020 16:09:02 CET: Singularity is not installed, using version from CVMFS
zo 26 jan 2020 16:09:02 CET: Checking singularity works with /cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img hostname
zo 26 jan 2020 16:11:01 CET: INFO:  Convert SIF file to sandbox... LinAH125 INFO:  Cleaning up image...
zo 26 jan 2020 16:11:01 CET: Singularity works
zo 26 jan 2020 16:11:03 CET: Set ATHENA_PROC_NUMBER=4
zo 26 jan 2020 16:11:03 CET: Starting ATLAS job with PandaID=4002876565
zo 26 jan 2020 16:11:03 CET: Running command: /cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec --pwd /var/lib/boinc-client/slots/0 -B /cvmfs,/var /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img sh start_atlas.sh
zo 26 jan 2020 16:47:20 CET:  *** The last 200 lines of the pilot log: ***
zo 26 jan 2020 16:47:20 CET:           "cpuTime": 1, 
zo 26 jan 2020 16:47:20 CET:           "wallTime": 1
zo 26 jan 2020 16:47:20 CET:         }, 
zo 26 jan 2020 16:47:20 CET:         "preExe": {
zo 26 jan 2020 16:47:20 CET:           "cpuTime": 1, 
zo 26 jan 2020 16:47:20 CET:           "wallTime": 4
zo 26 jan 2020 16:47:20 CET:         }, 
zo 26 jan 2020 16:47:20 CET:         "total": {
zo 26 jan 2020 16:47:20 CET:           "cpuTime": 74, 
zo 26 jan 2020 16:47:20 CET:           "wallTime": 146
zo 26 jan 2020 16:47:20 CET:         }, 
zo 26 jan 2020 16:47:20 CET:         "validation": {
zo 26 jan 2020 16:47:20 CET:           "cpuTime": 0, 
zo 26 jan 2020 16:47:20 CET:           "wallTime": 0
zo 26 jan 2020 16:47:20 CET:         }, 
zo 26 jan 2020 16:47:20 CET:         "wallTime": 142
zo 26 jan 2020 16:47:20 CET:       }
zo 26 jan 2020 16:47:20 CET:     }, 
zo 26 jan 2020 16:47:20 CET:     "machine": {
zo 26 jan 2020 16:47:20 CET:       "cpu_family": "6", 
zo 26 jan 2020 16:47:20 CET:       "linux_distribution": [
zo 26 jan 2020 16:47:20 CET:         "CentOS Linux", 
zo 26 jan 2020 16:47:20 CET:         "7.6.1810", 
zo 26 jan 2020 16:47:20 CET:         "Core"
zo 26 jan 2020 16:47:20 CET:       ], 
zo 26 jan 2020 16:47:20 CET:       "model": "42", 
zo 26 jan 2020 16:47:20 CET:       "model_name": "Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz", 
zo 26 jan 2020 16:47:20 CET:       "node": "LinAH125", 
zo 26 jan 2020 16:47:20 CET:       "platform": "Linux-4.18.0-25-generic-x86_64-with-centos-7.6.1810-Core"
zo 26 jan 2020 16:47:20 CET:     }, 
zo 26 jan 2020 16:47:20 CET:     "transform": {
zo 26 jan 2020 16:47:20 CET:       "cpuEfficiency": 0.6294, 
zo 26 jan 2020 16:47:20 CET:       "cpuPWEfficiency": 0.6602, 
zo 26 jan 2020 16:47:20 CET:       "cpuTime": 7, 
zo 26 jan 2020 16:47:20 CET:       "cpuTimeTotal": 4984, 
zo 26 jan 2020 16:47:20 CET:       "externalCpuTime": 21, 
zo 26 jan 2020 16:47:20 CET:       "processedEvents": 10, 
zo 26 jan 2020 16:47:20 CET:       "trfPredata": null, 
zo 26 jan 2020 16:47:20 CET:       "wallTime": 1945
zo 26 jan 2020 16:47:20 CET:     }
zo 26 jan 2020 16:47:20 CET:   }
zo 26 jan 2020 16:47:20 CET: }
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,836 | DEBUG    | queue_monitor       | pilot.util.auxiliary.4002876565  | update_server             | xml:will send fileinfo
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,837 | DEBUG    | queue_monitor       | pilot.control.job                | get_proper_state          | state=finished
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,837 | DEBUG    | queue_monitor       | pilot.control.job                | get_proper_state          | serverstate=running
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,837 | DEBUG    | queue_monitor       | pilot.control.job                | get_proper_state          | serverstate=finished
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,837 | INFO     | queue_monitor       | pilot.control.job.4002876565     | send_state                | pilot will not update the server (heartbeat message will be written to file)
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,837 | INFO     | queue_monitor       | pilot.control.job.4002876565     | send_state                | job 4002876565 has finished - writing final server update
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,837 | DEBUG    | queue_monitor       | pilot.control.job.4002876565     | get_data_structure        | building data structure to be sent to server with heartbeat
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,838 | INFO     | queue_monitor       | pilot.util.auxiliary.4002876565  | get_job_metrics           | will not add max space = -351703837 B to job metrics
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,838 | DEBUG    | queue_monitor       | pilot.api.analytics              | get_fitted_data           | removing tails from data to be fitted
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,838 | INFO     | queue_monitor       | pilot.api.analytics              | get_fitted_data           | fitting pss+swap vs Time
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,839 | INFO     | queue_monitor       | pilot.api.analytics              | get_fitted_data           | current memory leak: -378.49 B/s (using 26 data points, chi2=1512453)
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,839 | DEBUG    | queue_monitor       | pilot.util.auxiliary.4002876565  | get_job_metrics           | job metrics="coreCount=4 actualCoreCount=1 nEvents=10 leak=-378.49 chi2=1512453"
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,839 | INFO     | queue_monitor       | pilot.control.job.4002876565     | get_data_structure        | total number of processed events: 10 (read)
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,840 | INFO     | queue_monitor       | pilot.user.atlas.utilities       | get_memory_values         | using path: /var/lib/boinc-client/slots/0/PanDA_Pilot-4002876565/memory_monitor_summary.json (trf name=prmon)
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,841 | DEBUG    | queue_monitor       | pilot.user.atlas.utilities       | get_memory_monitor_info   | summary_dictionary={'Max': {'rx_packets': 47963, 'nprocs': 10, 'nthreads': 1, 'rx_bytes': 40431668, 'wtime': 1973, 'rss': 9752940, 'write_bytes': 0, 'vmem': 14019624, 'read_bytes': 0, 'stime': 74, 'tx_bytes': 7936087, 'pss': 2573847, 'wchar': 0, 'rchar': 0, 'tx_packets': 25533, 'swap': 0, 'utime': 4901}, 'Avg': {'write_bytes': 0, 'nprocs': 7, 'nthreads': 0, 'rx_bytes': 20490, 'rx_packets': 24, 'vmem': 9102920, 'read_bytes': 0, 'swap': 0, 'tx_bytes': 4022, 'pss': 1972977, 'wchar': 0, 'rchar': 0, 'tx_packets': 12, 'rss': 6120684}}
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,841 | INFO     | queue_monitor       | pilot.user.atlas.utilities       | get_memory_monitor_info   | extracted standard info from prmon json
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,842 | INFO     | queue_monitor       | pilot.user.atlas.utilities       | get_memory_monitor_info   | extracted standard memory fields from prmon json
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,842 | INFO     | queue_monitor       | pilot.util.auxiliary.4002876565  | timing_report             | ..............................
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,842 | INFO     | queue_monitor       | pilot.util.auxiliary.4002876565  | timing_report             | . Timing measurements:
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,842 | INFO     | queue_monitor       | pilot.util.auxiliary.4002876565  | timing_report             | . get job = 0 s
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,842 | INFO     | queue_monitor       | pilot.util.auxiliary.4002876565  | timing_report             | . initial setup = 2 s
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,842 | INFO     | queue_monitor       | pilot.util.auxiliary.4002876565  | timing_report             | . payload setup = 0 s
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,842 | INFO     | queue_monitor       | pilot.util.auxiliary.4002876565  | timing_report             | . total setup = 2 s
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,842 | INFO     | queue_monitor       | pilot.util.auxiliary.4002876565  | timing_report             | . stage-in = 0 s
zo 26 jan 2020 16:47:20 CET: 2020-01-26 15:46:59,843 | INFO     | queue_monitor       | pilot.util.auxiliary.4002876565  | timing_report             | . payload execution = 2021 s
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,843 | INFO     | queue_monitor       | pilot.util.auxiliary.4002876565  | timing_report             | . stage-out = 3 s
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,843 | INFO     | queue_monitor       | pilot.util.auxiliary.4002876565  | timing_report             | ..............................
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,844 | DEBUG    | queue_monitor       | pilot.control.job.4002876565     | send_state                | wrote heartbeat to file /var/lib/boinc-client/slots/0/heartbeat.json
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,844 | DEBUG    | queue_monitor       | pilot.control.job                | queue_monitor             | job 4002876565 was dequeued from the monitored payloads queue
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,844 | DEBUG    | queue_monitor       | pilot.control.job                | queue_monitor             | tmp job object deleted
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,884 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | 
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,884 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | job summary report
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,884 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | --------------------------------------------------
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,884 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | PanDA job id: 4002876565
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,884 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | task id: 000649-198114-32114
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,884 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | errors: (none)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,885 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | status: LOG_TRANSFER = DONE 
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,885 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | pilot state: finished 
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,885 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | transexitcode: 0
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,885 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | exeerrorcode: 0
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,885 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | exeerrordiag: 
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,885 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | exitcode: 0
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,885 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | exitmsg: OK
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,885 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | cpuconsumptiontime: 4968 s
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,885 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | nevents: 10
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,885 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | neventsw: 0
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,885 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | pid: 10972
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,886 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | pgrp: 10972
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,886 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | corecount: 4
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,886 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | event service: False
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,886 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | --------------------------------------------------
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,886 | INFO     | retrieve            | pilot.util.auxiliary.4002876565  | make_job_report           | 
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,886 | INFO     | retrieve            | pilot.util.queuehandling         | queue_report              | queue jobs has 0 job(s)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,886 | INFO     | retrieve            | pilot.util.queuehandling         | queue_report              | queue payloads has 0 job(s)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,886 | INFO     | retrieve            | pilot.util.queuehandling         | queue_report              | queue data_in has 0 job(s)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,886 | INFO     | retrieve            | pilot.util.queuehandling         | queue_report              | queue data_out has 0 job(s)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,887 | INFO     | retrieve            | pilot.util.queuehandling         | queue_report              | queue current_data_in has 0 job(s)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,887 | INFO     | retrieve            | pilot.util.queuehandling         | queue_report              | queue validated_jobs has 0 job(s)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,887 | INFO     | retrieve            | pilot.util.queuehandling         | queue_report              | queue validated_payloads has 0 job(s)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,887 | INFO     | retrieve            | pilot.util.queuehandling         | queue_report              | queue monitored_payloads has 0 job(s)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,887 | INFO     | retrieve            | pilot.util.queuehandling         | queue_report              | queue finished_jobs has 0 job(s)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,887 | INFO     | retrieve            | pilot.util.queuehandling         | queue_report              | queue finished_payloads has 0 job(s)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,887 | INFO     | retrieve            | pilot.util.queuehandling         | queue_report              | queue finished_data_in has 1 job(s)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,887 | INFO     | retrieve            | pilot.util.queuehandling         | queue_report              | queue finished_data_out has 0 job(s)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,887 | INFO     | retrieve            | pilot.util.queuehandling         | queue_report              | queue failed_jobs has 0 job(s)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,887 | INFO     | retrieve            | pilot.util.queuehandling         | queue_report              | queue failed_payloads has 0 job(s)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,887 | INFO     | retrieve            | pilot.util.queuehandling         | queue_report              | queue failed_data_in has 0 job(s)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,888 | INFO     | retrieve            | pilot.util.queuehandling         | queue_report              | queue failed_data_out has 0 job(s)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,888 | INFO     | retrieve            | pilot.util.queuehandling         | queue_report              | queue completed_jobs has 0 job(s)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,888 | INFO     | retrieve            | pilot.util.queuehandling         | queue_report              | queue completed_jobids has 1 job(s)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,888 | INFO     | retrieve            | pilot.control.job.4002876565     | has_job_completed         | job 4002876565 has completed (purged errors)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,888 | INFO     | retrieve            | pilot.util.processes             | cleanup                   | overall cleanup function is called
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:46:59,893 | DEBUG    | retrieve            | pilot.util.processes             | cleanup                   | work directory was removed: /var/lib/boinc-client/slots/0/PanDA_Pilot-4002876565
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:00,896 | INFO     | retrieve            | pilot.info.jobdata               | collect_zombies           | --- collectZombieJob: --- 10, [10972]
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:00,897 | INFO     | retrieve            | pilot.info.jobdata               | collect_zombies           | zombie collector trying to kill pid 10972
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:00,898 | INFO     | retrieve            | pilot.info.jobdata               | collect_zombies           | harmless exception when collecting zombies: [Errno 10] No child processes
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:01,903 | INFO     | retrieve            | pilot.util.processes             | cleanup                   | collected zombie processes
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:01,903 | INFO     | retrieve            | pilot.util.processes             | cleanup                   | will now attempt to kill all subprocesses of pid=10972
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:01,958 | INFO     | retrieve            | pilot.util.processes             | kill_processes            | process IDs to be killed: [10972] (in reverse order)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:01,996 | WARNING  | retrieve            | pilot.util.processes             | kill_processes            | found no corresponding commands to process id(s)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:01,996 | INFO     | retrieve            | pilot.util.processes             | kill_orphans              | Do not look for orphan processes in BOINC jobs
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:01,997 | INFO     | retrieve            | pilot.control.job                | retrieve                  | ready for new job
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:01,997 | INFO     | retrieve            | root                             | retrieve                  | pilot has finished for previous job - re-establishing logging
zo 26 jan 2020 16:47:21 CET: mpi4py not found
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:02,003 | INFO     | retrieve            | pilot.util.auxiliary             | pilot_version_banner      | ****************************************
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:02,003 | INFO     | retrieve            | pilot.util.auxiliary             | pilot_version_banner      | ***  PanDA Pilot version 2.3.4 (12)  ***
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:02,003 | INFO     | retrieve            | pilot.util.auxiliary             | pilot_version_banner      | ****************************************
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:02,003 | INFO     | retrieve            | pilot.util.auxiliary             | pilot_version_banner      | 
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:02,004 | INFO     | retrieve            | pilot.util.auxiliary             | pilot_version_banner      | pilot is running in a VM
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:02,004 | INFO     | retrieve            | pilot.util.auxiliary             | display_architecture_info | architecture information:
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:02,127 | INFO     | retrieve            | pilot.util.auxiliary             | display_architecture_info | 
zo 26 jan 2020 16:47:21 CET: LSB Version:	:core-4.1-amd64:core-4.1-noarch
zo 26 jan 2020 16:47:21 CET: Distributor ID:	CentOS
zo 26 jan 2020 16:47:21 CET: Description:	CentOS Linux release 7.6.1810 (Core) 
zo 26 jan 2020 16:47:21 CET: Release:	7.6.1810
zo 26 jan 2020 16:47:21 CET: Codename:	Core
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:02,127 | INFO     | retrieve            | pilot.util.auxiliary             | pilot_version_banner      | ****************************************
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:02,630 | DEBUG    | retrieve            | pilot.util.monitoring            | check_local_space         | checking local space on /var/lib/boinc-client/slots/0
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:02,764 | INFO     | retrieve            | pilot.util.monitoring            | check_local_space         | sufficient remaining disk space (9652142080 B)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:02,764 | WARNING  | retrieve            | pilot.control.job                | proceed_with_getjob       | since timefloor is set to 0, pilot was only allowed to run one job
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:02,765 | DEBUG    | retrieve            | pilot.control.job                | retrieve                  | [job] retrieve thread has finished
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:02,901 | DEBUG    | payload             | pilot.control.payload            | control                   | payload control ending since graceful_stop has been set
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:02,902 | DEBUG    | payload             | pilot.control.payload            | control                   | [payload] control thread has finished
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:03,032 | WARNING  | copytool_out        | pilot.util.common                | should_abort              | data:copytool_out:received graceful stop - abort after this iteration
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:03,100 | DEBUG    | job                 | pilot.control.job                | control                   | job control ending since graceful_stop has been set
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:03,100 | DEBUG    | job                 | pilot.control.job                | control                   | [job] control thread has finished
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:03,287 | DEBUG    | create_data_payload | pilot.control.job                | create_data_payload       | [job] create_data_payload thread has finished
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:03,301 | INFO     | monitor             | pilot.control.monitor            | control                   | [monitor] control thread has ended
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:03,351 | INFO     | validate_post       | pilot.control.payload            | validate_post             | [payload] validate_post thread has finished
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:03,565 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | thread count now at 11 threads
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:03,566 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | enumerate: [<_MainThread(MainThread, started 140253881681728)>, <ExcThread(failed_post, started 140253222397696)>, <ExcThread(copytool_out, started 140253608265472)>, <ExcThread(validate_pre, started 140253616658176)>, <ExcThread(execute_payloads, started 140253205612288)>, <ExcThread(copytool_in, started 140253197219584)>, <ExcThread(data, started 140253633443584)>, <ExcThread(job_monitor, started 140253188826880)>, <ExcThread(queue_monitor, started 140253180434176)>, <ExcThread(queue_monitoring, started 140253214004992)>, <ExcThread(validate, started 140253710608128)>]
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:03,819 | DEBUG    | data                | pilot.control.data               | control                   | data control ending since graceful_stop has been set
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:03,819 | DEBUG    | data                | pilot.control.data               | control                   | [data] control thread has finished
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:03,819 | DEBUG    | validate            | pilot.control.job                | validate                  | [job] validate thread has finished
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:03,819 | INFO     | validate_pre        | pilot.control.payload            | validate_pre              | [payload] validate_pre thread has finished
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:03,903 | DEBUG    | copytool_in         | pilot.control.data               | copytool_in               | [data] copytool_in thread has finished
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:04,142 | DEBUG    | copytool_out        | pilot.control.data               | copytool_out              | [data] copytool_out thread has finished
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:04,143 | INFO     | execute_payloads    | pilot.control.payload            | execute_payloads          | [payload] execute_payloads thread has finished
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:04,400 | INFO     | failed_post         | pilot.control.payload            | failed_post               | [payload] failed_post thread has finished
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:04,682 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | thread count now at 4 threads
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:04,682 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | enumerate: [<_MainThread(MainThread, started 140253881681728)>, <ExcThread(job_monitor, started 140253188826880)>, <ExcThread(queue_monitor, started 140253180434176)>, <ExcThread(queue_monitoring, started 140253214004992)>]
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:04,961 | WARNING  | queue_monitor       | pilot.util.common                | should_abort              | job:queue_monitor:received graceful stop - abort after this iteration
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:04,961 | DEBUG    | queue_monitor       | pilot.control.job                | queue_monitor             | [job] queue monitor thread has finished
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:05,862 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | thread count now at 3 threads
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:05,863 | WARNING  | queue_monitoring    | pilot.util.common                | should_abort              | data:queue_monitoring:received graceful stop - abort after this iteration
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:05,863 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | enumerate: [<_MainThread(MainThread, started 140253881681728)>, <ExcThread(job_monitor, started 140253188826880)>, <ExcThread(queue_monitoring, started 140253214004992)>]
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:08,921 | DEBUG    | queue_monitoring    | pilot.control.data               | queue_monitoring          | [data] queue_monitor thread has finished
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:08,934 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | thread count now at 2 threads
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:08,934 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | enumerate: [<_MainThread(MainThread, started 140253881681728)>, <ExcThread(job_monitor, started 140253188826880)>]
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:18,123 | WARNING  | job_monitor         | pilot.control.job                | check_job_monitor_waiting_time | no jobs in monitored_payloads queue (waited for 72 s)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:18,123 | DEBUG    | job_monitor         | pilot.control.job                | job_monitor               | [job] job monitor thread has finished
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:19,078 | INFO     | MainThread          | pilot.workflow.generic           | run                       | end of generic workflow (traces error code: 0)
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:19,078 | INFO     | MainThread          | root                             | wrap_up                   | traces error code: 0
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:19,078 | INFO     | MainThread          | root                             | wrap_up                   | pilot has finished
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:19 UTC [wrapper] ==== pilot stdout END ====
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:19 UTC [wrapper] ==== wrapper stdout RESUME ====
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:19 UTC [wrapper] Pilot exit status: 0
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:19 UTC [wrapper] STATUSCODE: 0
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:19 UTC [wrapper] apfmon messages muted
zo 26 jan 2020 16:47:21 CET: ---- find pandaID.out ----
zo 26 jan 2020 16:47:21 CET: total 64
zo 26 jan 2020 16:47:21 CET: -rw-------  1 boinc boinc 11357 Jul 25  2019 LICENSE
zo 26 jan 2020 16:47:21 CET: -rw-------  1 boinc boinc    20 Sep  9 13:04 MANIFEST.IN
zo 26 jan 2020 16:47:21 CET: -rw-------  1 boinc boinc     8 Dec 12 19:00 PILOTVERSION
zo 26 jan 2020 16:47:21 CET: -rw-------  1 boinc boinc  2212 Nov 14 11:01 README.md
zo 26 jan 2020 16:47:21 CET: -rw-------  1 boinc boinc   221 Jul 25  2019 TODO.md
zo 26 jan 2020 16:47:21 CET: -rw-------  1 boinc boinc    11 Jan 26 16:12 pandaIDs.out
zo 26 jan 2020 16:47:21 CET: drwx------ 14 boinc boinc  4096 Jan 26 16:12 pilot
zo 26 jan 2020 16:47:21 CET: -rwx------  1 boinc boinc 21225 Dec 12 19:00 pilot.py
zo 26 jan 2020 16:47:21 CET: -rw-------  1 boinc boinc   766 Oct 10 16:01 setup.py
zo 26 jan 2020 16:47:21 CET: -rw------- 1 boinc boinc 11 Jan 26 16:12 /var/lib/boinc-client/slots/0/pilot2/pandaIDs.out
zo 26 jan 2020 16:47:21 CET: 4002876565
zo 26 jan 2020 16:47:21 CET: 
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:19 UTC [wrapper] Test setup, not cleaning
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:19 UTC [wrapper] ==== wrapper stdout END ====
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:19 UTC [wrapper] ==== wrapper stderr END ====
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:19 UTC [wrapper] wrapper wrapperexiting ec=0, duration=2109
zo 26 jan 2020 16:47:21 CET: 2020-01-26 15:47:19 UTC [wrapper] apfmon messages muted
zo 26 jan 2020 16:47:21 CET:  *** Error codes and diagnostics ***
zo 26 jan 2020 16:47:21 CET:     "exeErrorCode": 0,
zo 26 jan 2020 16:47:21 CET:     "exeErrorDiag": "",
zo 26 jan 2020 16:47:21 CET:     "pilotErrorCode": 0,
zo 26 jan 2020 16:47:21 CET:     "pilotErrorDiag": "",
zo 26 jan 2020 16:47:21 CET:  *** Listing of results directory ***
zo 26 jan 2020 16:47:21 CET: total 379388
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc    267260 jan 20 16:32 pilot2.tar.gz
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc      4492 jan 20 17:15 queuedata.json
zo 26 jan 2020 16:47:21 CET: -rwx------ 1 boinc boinc     12641 jan 20 17:17 runpilot2-wrapper.sh
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc       107 jan 26 16:08 wrapper_26015_x86_64-pc-linux-gnu
zo 26 jan 2020 16:47:21 CET: -rwxr-xr-x 1 boinc boinc      5557 jan 26 16:08 run_atlas
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc       112 jan 26 16:08 job.xml
zo 26 jan 2020 16:47:21 CET: drwxrwx--x 2 boinc boinc      4096 jan 26 16:08 shared
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc      5874 jan 26 16:08 init_data.xml
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc         0 jan 26 16:08 boinc_lockfile
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc      8509 jan 26 16:11 start_atlas.sh
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc       815 jan 26 16:11 RTE.tar.gz
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc    275414 jan 26 16:11 input.tar.gz
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc 365251149 jan 26 16:11 EVNT.14296418._001447.pool.root.1
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc      2948 jan 26 16:11 pandaJob.out
zo 26 jan 2020 16:47:21 CET: drwxr-xr-x 3 boinc boinc      4096 jan 26 16:12 APPS
zo 26 jan 2020 16:47:21 CET: -rw------- 1 boinc boinc   3662359 jan 26 16:12 agis_schedconf.cvmfs.json
zo 26 jan 2020 16:47:21 CET: -rw------- 1 boinc boinc   7844343 jan 26 16:12 agis_ddmendpoints.json
zo 26 jan 2020 16:47:21 CET: drwx------ 3 boinc boinc      4096 jan 26 16:12 pilot2
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc       535 jan 26 16:36 boinc_task_state.xml
zo 26 jan 2020 16:47:21 CET: -rw------- 1 boinc boinc   9091280 jan 26 16:46 HITS.000649-198114-32114._078090.pool.root.1
zo 26 jan 2020 16:47:21 CET: -rw------- 1 boinc boinc       786 jan 26 16:46 memory_monitor_summary.json
zo 26 jan 2020 16:47:21 CET: -rw------- 1 boinc boinc    726948 jan 26 16:46 log.000649-198114-32114._078090.job.log.tgz.1
zo 26 jan 2020 16:47:21 CET: -rw------- 1 boinc boinc     10967 jan 26 16:46 heartbeat.json
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc      8192 jan 26 16:47 boinc_mmap_file
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc        26 jan 26 16:47 wrapper_checkpoint.txt
zo 26 jan 2020 16:47:21 CET: -rw------- 1 boinc boinc      8389 jan 26 16:47 pilotlog.txt
zo 26 jan 2020 16:47:21 CET: -rw------- 1 boinc boinc    218897 jan 26 16:47 log.000649-198114-32114._078090.job.log.1
zo 26 jan 2020 16:47:21 CET: -rw------- 1 boinc boinc       501 jan 26 16:47 h1OKDmcvoDwnShfckohDCDFpABFKDmABFKDmyiALDmPEFKDmmxtrWm.diag
zo 26 jan 2020 16:47:21 CET: -rw------- 1 boinc boinc       496 jan 26 16:47 output.list
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc       739 jan 26 16:47 runtime_log
zo 26 jan 2020 16:47:21 CET: -rw------- 1 boinc boinc    972800 jan 26 16:47 result.tar.gz
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc      7393 jan 26 16:47 runtime_log.err
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc      2356 jan 26 16:47 stderr.txt
zo 26 jan 2020 16:47:21 CET: HITS file was successfully produced:
zo 26 jan 2020 16:47:21 CET: -rw------- 1 boinc boinc 9091280 jan 26 16:46 shared/HITS.pool.root.1
zo 26 jan 2020 16:47:21 CET:  *** Contents of shared directory: ***
zo 26 jan 2020 16:47:21 CET: total 366816
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc 365251149 jan 26 16:08 ATLAS.root_0
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc      8509 jan 26 16:08 start_atlas.sh
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc       815 jan 26 16:08 RTE.tar.gz
zo 26 jan 2020 16:47:21 CET: -rw-r--r-- 1 boinc boinc    275414 jan 26 16:08 input.tar.gz
zo 26 jan 2020 16:47:21 CET: -rw------- 1 boinc boinc   9091280 jan 26 16:46 HITS.pool.root.1
zo 26 jan 2020 16:47:21 CET: -rw------- 1 boinc boinc    972800 jan 26 16:47 result.tar.gz
16:47:21 (3761): run_atlas exited; CPU time 4920.121866
16:47:21 (3761): called boinc_finish(0)

</stderr_txt>
]]>


©2024 CERN