Name nAQNDmy2fHvnShfckohDCDFpABFKDmABFKDmjdFaDmABFKDmfVc7Hn_0
Workunit 1916979
Created 13 Aug 2019, 7:06:14 UTC
Sent 15 Aug 2019, 4:25:15 UTC
Report deadline 22 Aug 2019, 4:25:15 UTC
Received 16 Aug 2019, 3:21:48 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x00000000)
Computer ID 4008
Run time 22 hours 54 min 19 sec
CPU time 23 hours 20 min 31 sec
Validate state Valid
Credit 389.88
Device peak FLOPS 2.04 GFLOPS
Application version ATLAS Simulation v0.71 (native_mt)
x86_64-pc-linux-gnu
Peak working set size 1.84 GB
Peak swap size 4.96 GB
Peak disk usage 541.17 MB

Stderr output

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<stderr_txt>
06:26:30 (13549): wrapper (7.7.26015): starting
06:26:30 (13549): wrapper: running run_atlas (--nthreads 1)
singularity image is /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6
sys.argv = ['run_atlas', '--nthreads', '1']
THREADS=1
Checking for CVMFS
CVMFS is installed
OS:CentOS Linux release 7.4.1708 (Core) 

This is not SLC6, need to run with Singularity....
Checking Singularity...
Singularity is installed, version 2.6.1-dist
Testing the function of Singularity...
Checking singularity with cmd:singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 hostname
Singularity Works...

copy /root/slots/4/shared/ATLAS.root_0
copy /root/slots/4/shared/input.tar.gz
copy /root/slots/4/shared/RTE.tar.gz
copy /root/slots/4/shared/start_atlas.sh
start atlas job with PandaID=4443290924
cmd = singularity exec --pwd /root/slots/4 -B /cvmfs,/root /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 sh start_atlas.sh > runtime_log 2> runtime_log.err
running cmd return value is 0
Moving ./HITS.18722398._024922.pool.root.1 to shared/HITS.pool.root.1
*****************The last 100 lines of the pilot log******************
2019-08-16 03:19:55,778 | INFO     | retrieve            | pilot.util.auxiliary             | make_job_report           | task id: 18722398
2019-08-16 03:19:55,778 | INFO     | retrieve            | pilot.util.auxiliary             | make_job_report           | errors: (none)
2019-08-16 03:19:55,778 | INFO     | retrieve            | pilot.util.auxiliary             | make_job_report           | status: LOG_TRANSFER = DONE 
2019-08-16 03:19:55,778 | INFO     | retrieve            | pilot.util.auxiliary             | make_job_report           | pilot state: finished 
2019-08-16 03:19:55,778 | INFO     | retrieve            | pilot.util.auxiliary             | make_job_report           | transexitcode: 0
2019-08-16 03:19:55,778 | INFO     | retrieve            | pilot.util.auxiliary             | make_job_report           | exeerrorcode: 0
2019-08-16 03:19:55,778 | INFO     | retrieve            | pilot.util.auxiliary             | make_job_report           | exeerrordiag: 
2019-08-16 03:19:55,779 | INFO     | retrieve            | pilot.util.auxiliary             | make_job_report           | exitcode: 0
2019-08-16 03:19:55,779 | INFO     | retrieve            | pilot.util.auxiliary             | make_job_report           | exitmsg: OK
2019-08-16 03:19:55,779 | INFO     | retrieve            | pilot.util.auxiliary             | make_job_report           | cpuconsumptiontime: 82220 s
2019-08-16 03:19:55,779 | INFO     | retrieve            | pilot.util.auxiliary             | make_job_report           | nevents: 200
2019-08-16 03:19:55,779 | INFO     | retrieve            | pilot.util.auxiliary             | make_job_report           | neventsw: 0
2019-08-16 03:19:55,779 | INFO     | retrieve            | pilot.util.auxiliary             | make_job_report           | pid: 20667
2019-08-16 03:19:55,779 | INFO     | retrieve            | pilot.util.auxiliary             | make_job_report           | pgrp: 20667
2019-08-16 03:19:55,779 | INFO     | retrieve            | pilot.util.auxiliary             | make_job_report           | corecount: 1
2019-08-16 03:19:55,780 | INFO     | retrieve            | pilot.util.auxiliary             | make_job_report           | event service: False
2019-08-16 03:19:55,780 | INFO     | retrieve            | pilot.util.auxiliary             | make_job_report           | --------------------------------------------------
2019-08-16 03:19:55,780 | INFO     | retrieve            | pilot.util.auxiliary             | make_job_report           | 
2019-08-16 03:19:55,780 | INFO     | retrieve            | pilot.control.job                | has_job_completed         | job 4443290924 has completed
2019-08-16 03:19:55,780 | INFO     | retrieve            | pilot.util.processes             | cleanup                   | overall cleanup function is called
2019-08-16 03:19:55,784 | DEBUG    | retrieve            | pilot.util.processes             | cleanup                   | work directory was removed: /root/slots/4/PanDA_Pilot-4443290924
2019-08-16 03:19:56,787 | INFO     | retrieve            | pilot.info.jobdata               | collect_zombies           | --- collectZombieJob: --- 10, [20667]
2019-08-16 03:19:56,788 | INFO     | retrieve            | pilot.info.jobdata               | collect_zombies           | zombie collector trying to kill pid 20667
2019-08-16 03:19:56,788 | INFO     | retrieve            | pilot.info.jobdata               | collect_zombies           | harmless exception when collecting zombies: [Errno 10] No child processes
2019-08-16 03:19:57,788 | INFO     | retrieve            | pilot.util.processes             | cleanup                   | collected zombie processes
2019-08-16 03:19:57,789 | INFO     | retrieve            | pilot.util.processes             | cleanup                   | will now attempt to kill all subprocesses of pid=20667
2019-08-16 03:19:57,863 | INFO     | retrieve            | pilot.util.processes             | kill_processes            | process IDs to be killed: [20667] (in reverse order)
2019-08-16 03:19:57,945 | WARNING  | retrieve            | pilot.util.processes             | kill_processes            | found no corresponding commands to process id(s)
2019-08-16 03:19:57,945 | INFO     | retrieve            | pilot.util.processes             | kill_orphans              | Do not look for orphan processes in BOINC jobs
2019-08-16 03:19:57,945 | INFO     | retrieve            | pilot.control.job                | retrieve                  | ready for new job
2019-08-16 03:19:57,945 | INFO     | retrieve            | root                             | retrieve                  | pilot has finished for previous job - re-establishing logging
No handlers could be found for logger "pilot.util.mpi"
2019-08-16 03:19:58,468 | DEBUG    | retrieve            | pilot.control.job                | retrieve                  | getjob_requests=1
2019-08-16 03:20:08,520 | DEBUG    | retrieve            | pilot.control.job                | proceed_with_getjob       | proceed_with_getjob called with getjob_requests=1
2019-08-16 03:20:08,520 | DEBUG    | retrieve            | pilot.util.monitoring            | check_local_space         | checking local space on /root/slots/4
2019-08-16 03:20:08,583 | INFO     | retrieve            | pilot.util.monitoring            | check_local_space         | sufficient remaining disk space (15289286656 B)
2019-08-16 03:20:08,584 | WARNING  | retrieve            | pilot.control.job                | proceed_with_getjob       | since timefloor is set to 0, pilot was only allowed to run one job
2019-08-16 03:20:08,584 | DEBUG    | retrieve            | pilot.control.job                | retrieve                  | [job] retrieve thread has finished
2019-08-16 03:20:08,612 | INFO     | monitor             | pilot.control.monitor            | control                   | [monitor] control thread has ended
2019-08-16 03:20:08,656 | WARNING  | copytool_out        | pilot.util.common                | should_abort              | data:copytool_out:received graceful stop - abort after this iteration
2019-08-16 03:20:08,770 | DEBUG    | validate            | pilot.control.job                | validate                  | [job] validate thread has finished
2019-08-16 03:20:08,771 | DEBUG    | payload             | pilot.control.payload            | control                   | payload control ending since graceful_stop has been set
2019-08-16 03:20:08,771 | DEBUG    | payload             | pilot.control.payload            | control                   | [payload] control thread has finished
2019-08-16 03:20:08,829 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | thread count now at 13 threads
2019-08-16 03:20:08,830 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | enumerate: [<_MainThread(MainThread, started 139778846541568)>, <ExcThread(job, started 139778630268672)>, <ExcThread(validate_post, started 139778234251008)>, <ExcThread(execute_payloads, started 139778200680192)>, <ExcThread(failed_post, started 139778225858304)>, <ExcThread(queue_monitor, started 139777554769664)>, <ExcThread(copytool_in, started 139778192287488)>, <ExcThread(validate_pre, started 139778588305152)>, <ExcThread(job_monitor, started 139777563162368)>, <ExcThread(queue_monitoring, started 139778209072896)>, <ExcThread(copytool_out, started 139778605090560)>, <ExcThread(create_data_payload, started 139778217465600)>, <ExcThread(data, started 139778613483264)>]
2019-08-16 03:20:09,087 | INFO     | validate_post       | pilot.control.payload            | validate_post             | [payload] validate_post thread has finished
2019-08-16 03:20:09,134 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | thread count now at 12 threads
2019-08-16 03:20:09,134 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | enumerate: [<_MainThread(MainThread, started 139778846541568)>, <ExcThread(job, started 139778630268672)>, <ExcThread(execute_payloads, started 139778200680192)>, <ExcThread(failed_post, started 139778225858304)>, <ExcThread(queue_monitor, started 139777554769664)>, <ExcThread(copytool_in, started 139778192287488)>, <ExcThread(validate_pre, started 139778588305152)>, <ExcThread(job_monitor, started 139777563162368)>, <ExcThread(queue_monitoring, started 139778209072896)>, <ExcThread(copytool_out, started 139778605090560)>, <ExcThread(create_data_payload, started 139778217465600)>, <ExcThread(data, started 139778613483264)>]
2019-08-16 03:20:09,391 | DEBUG    | copytool_in         | pilot.control.data               | copytool_in               | [data] copytool_in thread has finished
2019-08-16 03:20:09,434 | DEBUG    | job                 | pilot.control.job                | control                   | job control ending since graceful_stop has been set
2019-08-16 03:20:09,434 | DEBUG    | job                 | pilot.control.job                | control                   | [job] control thread has finished
2019-08-16 03:20:09,455 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | thread count now at 10 threads
2019-08-16 03:20:09,455 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | enumerate: [<_MainThread(MainThread, started 139778846541568)>, <ExcThread(execute_payloads, started 139778200680192)>, <ExcThread(failed_post, started 139778225858304)>, <ExcThread(queue_monitor, started 139777554769664)>, <ExcThread(validate_pre, started 139778588305152)>, <ExcThread(job_monitor, started 139777563162368)>, <ExcThread(queue_monitoring, started 139778209072896)>, <ExcThread(copytool_out, started 139778605090560)>, <ExcThread(create_data_payload, started 139778217465600)>, <ExcThread(data, started 139778613483264)>]
2019-08-16 03:20:09,477 | DEBUG    | data                | pilot.control.data               | control                   | data control ending since graceful_stop has been set
2019-08-16 03:20:09,478 | DEBUG    | data                | pilot.control.data               | control                   | [data] control thread has finished
2019-08-16 03:20:09,528 | DEBUG    | create_data_payload | pilot.control.job                | create_data_payload       | [job] create_data_payload thread has finished
2019-08-16 03:20:09,581 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | thread count now at 8 threads
2019-08-16 03:20:09,582 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | enumerate: [<_MainThread(MainThread, started 139778846541568)>, <ExcThread(execute_payloads, started 139778200680192)>, <ExcThread(failed_post, started 139778225858304)>, <ExcThread(queue_monitor, started 139777554769664)>, <ExcThread(validate_pre, started 139778588305152)>, <ExcThread(job_monitor, started 139777563162368)>, <ExcThread(queue_monitoring, started 139778209072896)>, <ExcThread(copytool_out, started 139778605090560)>]
2019-08-16 03:20:09,582 | INFO     | failed_post         | pilot.control.payload            | failed_post               | [payload] failed_post thread has finished
2019-08-16 03:20:09,657 | DEBUG    | copytool_out        | pilot.control.data               | copytool_out              | [data] copytool_out thread has finished
2019-08-16 03:20:09,682 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | thread count now at 6 threads
2019-08-16 03:20:09,683 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | enumerate: [<_MainThread(MainThread, started 139778846541568)>, <ExcThread(execute_payloads, started 139778200680192)>, <ExcThread(queue_monitor, started 139777554769664)>, <ExcThread(validate_pre, started 139778588305152)>, <ExcThread(job_monitor, started 139777563162368)>, <ExcThread(queue_monitoring, started 139778209072896)>]
2019-08-16 03:20:09,728 | INFO     | execute_payloads    | pilot.control.payload            | execute_payloads          | [payload] execute_payloads thread has finished
2019-08-16 03:20:09,790 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | thread count now at 5 threads
2019-08-16 03:20:09,790 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | enumerate: [<_MainThread(MainThread, started 139778846541568)>, <ExcThread(queue_monitor, started 139777554769664)>, <ExcThread(validate_pre, started 139778588305152)>, <ExcThread(job_monitor, started 139777563162368)>, <ExcThread(queue_monitoring, started 139778209072896)>]
2019-08-16 03:20:09,809 | INFO     | validate_pre        | pilot.control.payload            | validate_pre              | [payload] validate_pre thread has finished
2019-08-16 03:20:09,895 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | thread count now at 4 threads
2019-08-16 03:20:09,895 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | enumerate: [<_MainThread(MainThread, started 139778846541568)>, <ExcThread(queue_monitor, started 139777554769664)>, <ExcThread(job_monitor, started 139777563162368)>, <ExcThread(queue_monitoring, started 139778209072896)>]
2019-08-16 03:20:10,337 | WARNING  | queue_monitor       | pilot.util.common                | should_abort              | job:queue_monitor:received graceful stop - abort after this iteration
2019-08-16 03:20:10,338 | DEBUG    | queue_monitor       | pilot.control.job                | queue_monitor             | [job] queue monitor thread has finished
2019-08-16 03:20:10,422 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | thread count now at 3 threads
2019-08-16 03:20:10,423 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | enumerate: [<_MainThread(MainThread, started 139778846541568)>, <ExcThread(job_monitor, started 139777563162368)>, <ExcThread(queue_monitoring, started 139778209072896)>]
2019-08-16 03:20:11,635 | WARNING  | queue_monitoring    | pilot.util.common                | should_abort              | data:queue_monitoring:received graceful stop - abort after this iteration
2019-08-16 03:20:14,671 | DEBUG    | queue_monitoring    | pilot.control.data               | queue_monitoring          | [data] queue_monitor thread has finished
2019-08-16 03:20:14,714 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | thread count now at 2 threads
2019-08-16 03:20:14,714 | DEBUG    | MainThread          | pilot.workflow.generic           | run                       | enumerate: [<_MainThread(MainThread, started 139778846541568)>, <ExcThread(job_monitor, started 139777563162368)>]
2019-08-16 03:20:46,370 | WARNING  | job_monitor         | pilot.control.job                | check_job_monitor_waiting_time | no jobs in monitored_payloads queue (waited for 62 s)
2019-08-16 03:20:46,370 | DEBUG    | job_monitor         | pilot.control.job                | job_monitor               | [job] job monitor thread has finished
2019-08-16 03:20:46,415 | INFO     | MainThread          | pilot.workflow.generic           | run                       | end of generic workflow (traces error code: 0)
2019-08-16 03:20:46,416 | INFO     | MainThread          | root                             | wrap_up                   | traces error code: 0
2019-08-16 03:20:46,416 | INFO     | MainThread          | root                             | wrap_up                   | pilot has finished
2019-08-16 03:20:46 UTC [wrapper] ==== pilot stdout END ====
2019-08-16 03:20:46 UTC [wrapper] ==== wrapper stdout RESUME ====
2019-08-16 03:20:46 UTC [wrapper] Pilot exit status: 0
2019-08-16 03:20:46 UTC [wrapper] STATUSCODE: 0
2019-08-16 03:20:46 UTC [wrapper] apfmon messages muted
---- find pandaID.out ----
insgesamt 44
-rw-rw-r--.  1 320 320 11357 25. Jul 16:38 LICENSE
drwxrwxr-x. 14 320 320   234 15. Aug 06:51 pilot
-rwxrwxr-x.  1 320 320 20463 25. Jul 16:38 pilot.py
-rw-r--r--.  1 320 320     8  7. Aug 13:02 PILOTVERSION
-rw-rw-r--.  1 320 320  2251 25. Jul 16:38 README.md
-rw-rw-r--.  1 320 320   221 25. Jul 16:38 TODO.md

2019-08-16 03:20:46 UTC [wrapper] Test setup, not cleaning
2019-08-16 03:20:46 UTC [wrapper] ==== wrapper stdout END ====
2019-08-16 03:20:46 UTC [wrapper] ==== wrapper stderr END ====
2019-08-16 03:20:46 UTC [wrapper] wrapper wrapperexiting ec=0, duration=82432
2019-08-16 03:20:46 UTC [wrapper] apfmon messages muted
***************diag file************
runtimeenvironments=APPS/HEP/ATLAS-SITE;
Processors=1
WallTime=82432.01s
KernelTime=3395.12s
UserTime=82003.52s
CPUUsage=103%
MaxResidentMemory=1943656kB
AverageResidentMemory=0kB
AverageTotalMemory=0kB
AverageUnsharedMemory=0kB
AverageUnsharedStack=0kB
AverageSharedMemory=0kB
PageSize=4096B
MajorPageFaults=1382871
MinorPageFaults=41873589
Swaps=0
ForcedSwitches=2718005
WaitSwitches=42348705
Inputs=330387488
Outputs=372848
SocketReceived=0
SocketSent=0
Signals=0

nodename=maeax@HPi7COS7
exitcode=0
******************************WorkDir***********************
insgesamt 218360
drwxrwx--x. 6 root root      4096 16. Aug 05:20 .
drwxrwx--x. 7 root root        69 15. Aug 06:27 ..
-rw-------. 1 root root   7409162 15. Aug 06:28 agis_ddmendpoints.json
-rw-------. 1 root root   4448398 15. Aug 06:28 agis_schedconf.cvmfs.json
drwx------. 2 root root         6 15. Aug 06:26 .alrb
drwxr-xr-x. 3 root root        17 15. Aug 06:26 APPS
-rw-------. 1 root root       533 15. Aug 06:27 .asetup
-rw-------. 1 root root       826 15. Aug 06:30 .asetup.save
-rw-r--r--. 1 root root         0 15. Aug 06:26 boinc_lockfile
-rw-r--r--. 1 root root      8192 16. Aug 05:20 boinc_mmap_file
-rw-r--r--. 1 root root       537 16. Aug 05:17 boinc_task_state.xml
-rw-------. 1 root root        59 15. Aug 06:27 .directory
-rw-r--r--. 1 root root 206760239 15. Aug 06:26 EVNT.18605775._000493.pool.root.1
-rw-------. 1 root root     71245 16. Aug 05:19 heartbeat.json
-rw-r--r--. 1 root root      5868 15. Aug 06:26 init_data.xml
-rw-r--r--. 1 root root    261622 15. Aug 06:26 input.tar.gz
-rw-r--r--. 1 root root       112 15. Aug 06:26 job.xml
-rw-------. 1 root root   3551201 16. Aug 05:20 log.18722398._024922.job.log.1
-rw-------. 1 root root    419100 16. Aug 05:19 log.18722398._024922.job.log.tgz.1
-rw-------. 1 root root       503 16. Aug 05:20 nAQNDmy2fHvnShfckohDCDFpABFKDmABFKDmjdFaDmABFKDmfVc7Hn.diag
-rw-------. 1 root root       391 16. Aug 05:20 output.list
-rw-------. 1 4871 1028      2957 13. Aug 09:05 pandaJobData.out
drwxr-xr-x. 3  320  320       192 15. Aug 06:51 pilot2
-rw-r--r--. 1 root root    253018 13. Aug 08:21 pilot2.tar.gz
-rw-------. 1 root root     11033 16. Aug 05:20 pilotlog.txt
-rw-r--r--. 1 root root      4468 13. Aug 09:05 queuedata.json
-rw-r--r--. 1 root root       815 15. Aug 06:26 RTE.tar.gz
-rwxr-xr-x. 1 root root      8779 15. Aug 06:26 run_atlas
-rwx------. 1 4871 1028     12641 13. Aug 09:05 runpilot2-wrapper.sh
-rw-r--r--. 1 root root       692 16. Aug 05:20 runtime_log
-rw-r--r--. 1 root root      6667 16. Aug 05:20 runtime_log.err
drwxrwx--x. 2 root root       131 16. Aug 05:20 shared
-rw-r--r--. 1 root root      8659 15. Aug 06:26 start_atlas.sh
-rw-r--r--. 1 root root     18459 16. Aug 05:20 stderr.txt
-rw-r--r--. 1 root root       107 15. Aug 06:26 wrapper_26015_x86_64-pc-linux-gnu
-rw-r--r--. 1 root root        28 16. Aug 05:20 wrapper_checkpoint.txt

running start_atlas return value is 0
Parent exit 0
child process exit 0
05:20:47 (13549): run_atlas exited; CPU time 82004.187294
05:20:47 (13549): called boinc_finish(0)

</stderr_txt>
]]>


©2024 CERN