Message boards :
ATLAS Application :
ATLAS native 0.98 and docker image
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
0.98 has some very minor bug fixes, and I've been using WU here for testing docker images for running ATLAS tasks. If anyone would like to give it a go see here: https://hub.docker.com/r/davidgcameron/boinc-atlas This is not really for the home PC, but for server farms running docker where it's not possible or desired to install and configure cvmfs or boinc client. It provides a one line command to start running ATLAS without any dependencies required. Best wishes to everyone for the festive season! |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 3 |
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2849446 ended after 258 sec. CPU-usage. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 3 |
After Docker Installation from: https://docs.docker.com/install/linux/docker-ce/centos/ |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 3 |
I did run one native task without installing something extra's with BOINC Manager. In the result https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2849436, I found this non-destructive error: tar: Pattern matching characters used in file names tar: Use --wildcards to enable pattern matching, or --no-wildcards to suppress this warning tar: */pandaJobData.out: Not found in archive tar: Exiting with failure status due to previous errors |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 3 |
Hi Crystal, two are more than one ;-) your task was running with singularity. Have you: $ sudo systemctl start docker saw this now: 11:32:59 Py:ISF INFO Overriding run number to be: 284500 11:32:59 Py:JobProperty :: INFO The JobProperty SimBarcodeOffset is blocked 11:32:59 Py:Athena INFO including file "ISF_Config/ISF_ConfigJobInclude.py" 11:32:59 Py:JobProperty :: INFO The JobProperty InputFormat is blocked 11:34:01 ./runwrapper.EVNTtoHITS.sh: line 10: 17021 Bus error (core dumped) athena.py --preloadlib=/cvmfs/atlas.cern.ch/repo/sw/software/21.0/AtlasExternals/21.0.15/InstallArea/x86_64-slc6-gcc49-opt/lib/libintlc.so.5:/cvmfs/atlas.cern.ch/repo/sw/software/21.0/AtlasExternals/21.0.15/InstallArea/x86_64-slc6-gcc49-opt/lib/libimf.so runargs.EVNTtoHITS.py SimuJobTransforms/skeleton.EVGENtoHIT_ISF.py |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 3 |
Had now a Theory-native running successful. Docker is started and there is no problem for this task shown. Thinking it is GCC49, will check it. https://lhcathomedev.cern.ch/lhcathome-dev/show_host_detail.php?hostid=4064 |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 3 |
Hi Crystal,I don't have docker installed. I think this is only for CentOS atm. I'm running Ubuntu 18.10 I was just testing whether the minor bug fixes did not cause major problems ;) If I should test this, I first should install docker with "sudo snap install docker", I suppose. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 3 |
This are the last lines from EVNTtoHITS.log: 08:03:00 simulate : ON ON ON ON ON ON -- -- -- -- -- ON ON ON ON ON ON ON ON ON ON ON -- ON ON 08:03:00 simulateLVL1 : -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 08:03:00 writeBS : -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 08:03:00 writeRDOPool : -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 08:03:00 writeRIOPool : -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 08:03:00 Py:ISF INFO Overriding run number to be: 284500 08:03:01 Py:JobProperty :: INFO The JobProperty SimBarcodeOffset is blocked 08:03:01 Py:Athena INFO including file "ISF_Config/ISF_ConfigJobInclude.py" 08:03:01 Py:JobProperty :: INFO The JobProperty InputFormat is blocked |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 3 |
I've seen that the result file is uploaded wo 25 dec 2019 12:33:34 CET: Could not find HITS file from job descriptionand it's in the listing of results directory: wo 25 dec 2019 12:33:34 CET: -rw------- 1 boinc boinc 9100669 dec 25 12:32 HITS.000649-2078526-28361._078090.pool.root.1 |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 3 |
Docker Image download with: docker pull davidgcameron/boinc-atlas:latest Next step tomorrow.... |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 3 |
Docker Image download with: ATLAS - DOCKER :-)) https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=1962131 Thanking David for this experience. My brain is now empty... ;-) |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
Do 26. Dez 07:50:42 CET 2019: "exeErrorDiag": "EVNTtoHITS got a SIGBUS signal (exit code 135)", Unfortunately your tasks fail due to this error. I have seen this before when not enough shared memory was available inside the docker container, or maybe when other tasks on the same host were using too much memory. |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
I did run one native task without installing something extra's with BOINC Manager. In fact this was a destructive error and it caused the task to fail even though the HITS file was produced. It looks like a difference in the way tar handles wildcards on your system compared to what I tested with. I will fix it and make a new version. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 3 |
Have started a new task: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2856618 Task had confirm-Error and sigbus-Error. Will look tomorrow again. |
Send message Joined: 8 Apr 15 Posts: 781 Credit: 12,324,905 RAC: 1,836 |
sure wish I could test these on a Windows OS here or over at Atlas Test |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 3 |
Have RAM now set to 12.5 Gbyte, but without success. Will testing 0.99 today on a HP i7 without Docker, but there are native-Atlas ATM running. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 3 |
Do 26. Dez 07:50:42 CET 2019: "exeErrorDiag": "EVNTtoHITS got a SIGBUS signal (exit code 135)", Had made a deeper look. The last two 0.98 Tasks had no sigbus error and showed this lines: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2849445 Do 26. Dez 11:20:30 CET 2019: 2019-12-26 10:19:58,490 | INFO | queue_monitor | pilot.user.atlas.utilities | get_memory_monitor_info | extracted standard memory fields from prmon json 11:20:30 (16877): run_atlas exited; CPU time 234.618104 11:20:30 (16877): called boinc_finish(0) https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2849442 Do 26. Dez 10:52:01 CET 2019: 2019-12-26 09:51:21,675 | INFO | queue_monitor | pilot.api.analytics | get_fitted_data | current memory leak: 61.49 B/s (using 9 data points, chi2=10364) 10:52:01 (802): run_atlas exited; CPU time 234.746424 10:52:01 (802): called boinc_finish(0) |
©2024 CERN