Message boards : ATLAS Application : ATLAS native 0.98 and docker image
Message board moderation

To post messages, you must log in.

AuthorMessage
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 154
Credit: 1,352,539
RAC: 84
Message 6910 - Posted: 19 Dec 2019, 13:51:26 UTC

0.98 has some very minor bug fixes, and I've been using WU here for testing docker images for running ATLAS tasks. If anyone would like to give it a go see here: https://hub.docker.com/r/davidgcameron/boinc-atlas

This is not really for the home PC, but for server farms running docker where it's not possible or desired to install and configure cvmfs or boinc client. It provides a one line command to start running ATLAS without any dependencies required.

Best wishes to everyone for the festive season!
ID: 6910 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 601
Credit: 1,451,312
RAC: 1,368
Message 6911 - Posted: 22 Dec 2019, 4:22:03 UTC

ID: 6911 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 601
Credit: 1,451,312
RAC: 1,368
Message 6913 - Posted: 22 Dec 2019, 9:32:25 UTC

ID: 6913 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1147
Credit: 754,546
RAC: 10
Message 6914 - Posted: 22 Dec 2019, 9:51:49 UTC

I did run one native task without installing something extra's with BOINC Manager.
In the result https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2849436, I found this non-destructive error:

tar: Pattern matching characters used in file names
tar: Use --wildcards to enable pattern matching, or --no-wildcards to suppress this warning
tar: */pandaJobData.out: Not found in archive
tar: Exiting with failure status due to previous errors
ID: 6914 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 601
Credit: 1,451,312
RAC: 1,368
Message 6915 - Posted: 22 Dec 2019, 10:42:58 UTC - in response to Message 6913.  
Last modified: 22 Dec 2019, 10:57:27 UTC

Hi Crystal,
two are more than one ;-)

your task was running with singularity.
Have you: $ sudo systemctl start docker

saw this now:
11:32:59 Py:ISF INFO Overriding run number to be: 284500
11:32:59 Py:JobProperty :: INFO The JobProperty SimBarcodeOffset is blocked
11:32:59 Py:Athena INFO including file "ISF_Config/ISF_ConfigJobInclude.py"
11:32:59 Py:JobProperty :: INFO The JobProperty InputFormat is blocked
11:34:01 ./runwrapper.EVNTtoHITS.sh: line 10: 17021 Bus error (core dumped) athena.py --preloadlib=/cvmfs/atlas.cern.ch/repo/sw/software/21.0/AtlasExternals/21.0.15/InstallArea/x86_64-slc6-gcc49-opt/lib/libintlc.so.5:/cvmfs/atlas.cern.ch/repo/sw/software/21.0/AtlasExternals/21.0.15/InstallArea/x86_64-slc6-gcc49-opt/lib/libimf.so runargs.EVNTtoHITS.py SimuJobTransforms/skeleton.EVGENtoHIT_ISF.py
ID: 6915 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 601
Credit: 1,451,312
RAC: 1,368
Message 6916 - Posted: 22 Dec 2019, 12:02:58 UTC - in response to Message 6915.  

Had now a Theory-native running successful. Docker is started and
there is no problem for this task shown.
Thinking it is GCC49, will check it.
https://lhcathomedev.cern.ch/lhcathome-dev/show_host_detail.php?hostid=4064
ID: 6916 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1147
Credit: 754,546
RAC: 10
Message 6917 - Posted: 22 Dec 2019, 12:06:43 UTC - in response to Message 6915.  
Last modified: 22 Dec 2019, 12:15:53 UTC

Hi Crystal,
two are more than one ;-)

your task was running with singularity.
Have you: $ sudo systemctl start docker
I don't have docker installed. I think this is only for CentOS atm. I'm running Ubuntu 18.10
I was just testing whether the minor bug fixes did not cause major problems ;)

If I should test this, I first should install docker with "sudo snap install docker", I suppose.
ID: 6917 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 601
Credit: 1,451,312
RAC: 1,368
Message 6918 - Posted: 24 Dec 2019, 7:31:18 UTC

This are the last lines from EVNTtoHITS.log:
08:03:00 simulate : ON ON ON ON ON ON -- -- -- -- -- ON ON ON ON ON ON ON ON ON ON ON -- ON ON
08:03:00 simulateLVL1 : -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
08:03:00 writeBS : -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
08:03:00 writeRDOPool : -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
08:03:00 writeRIOPool : -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
08:03:00 Py:ISF INFO Overriding run number to be: 284500
08:03:01 Py:JobProperty :: INFO The JobProperty SimBarcodeOffset is blocked
08:03:01 Py:Athena INFO including file "ISF_Config/ISF_ConfigJobInclude.py"
08:03:01 Py:JobProperty :: INFO The JobProperty InputFormat is blocked
ID: 6918 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1147
Credit: 754,546
RAC: 10
Message 6919 - Posted: 25 Dec 2019, 12:00:43 UTC

I've seen that the result file is uploaded
wo 25 dec 2019 12:33:34 CET: Could not find HITS file from job description
wo 25 dec 2019 12:33:34 CET: *** Contents of shared directory: ***
wo 25 dec 2019 12:33:34 CET: total 357924
wo 25 dec 2019 12:33:34 CET: -rw-r--r-- 1 boinc boinc 8513 dec 25 11:56 start_atlas.sh
wo 25 dec 2019 12:33:34 CET: -rw-r--r-- 1 boinc boinc 815 dec 25 11:56 RTE.tar.gz
wo 25 dec 2019 12:33:34 CET: -rw-r--r-- 1 boinc boinc 275418 dec 25 11:56 input.tar.gz
wo 25 dec 2019 12:33:34 CET: -rw-r--r-- 1 boinc boinc 365251149 dec 25 11:56 ATLAS.root_0
wo 25 dec 2019 12:33:35 CET: -rw------- 1 boinc boinc 962560 dec 25 12:33 result.tar.gz
12:33:35 (3109): run_atlas exited; CPU time 5017.473930
and it's in the listing of results directory:

wo 25 dec 2019 12:33:34 CET: -rw------- 1 boinc boinc 9100669 dec 25 12:32 HITS.000649-2078526-28361._078090.pool.root.1
ID: 6919 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 601
Credit: 1,451,312
RAC: 1,368
Message 6920 - Posted: 25 Dec 2019, 17:35:37 UTC

Docker Image download with:
docker pull davidgcameron/boinc-atlas:latest
Next step tomorrow....
ID: 6920 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 601
Credit: 1,451,312
RAC: 1,368
Message 6921 - Posted: 26 Dec 2019, 6:55:22 UTC - in response to Message 6920.  

Docker Image download with:
docker pull davidgcameron/boinc-atlas:latest
Next step tomorrow....

ATLAS - DOCKER :-)) https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=1962131
Thanking David for this experience.
My brain is now empty... ;-)
ID: 6921 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 154
Credit: 1,352,539
RAC: 84
Message 6924 - Posted: 7 Jan 2020, 12:27:51 UTC - in response to Message 6921.  

Do 26. Dez 07:50:42 CET 2019:     "exeErrorDiag": "EVNTtoHITS got a SIGBUS signal (exit code 135)",


Unfortunately your tasks fail due to this error. I have seen this before when not enough shared memory was available inside the docker container, or maybe when other tasks on the same host were using too much memory.
ID: 6924 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 154
Credit: 1,352,539
RAC: 84
Message 6925 - Posted: 7 Jan 2020, 12:59:09 UTC - in response to Message 6914.  

I did run one native task without installing something extra's with BOINC Manager.
In the result https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2849436, I found this non-destructive error:

tar: Pattern matching characters used in file names
tar: Use --wildcards to enable pattern matching, or --no-wildcards to suppress this warning
tar: */pandaJobData.out: Not found in archive
tar: Exiting with failure status due to previous errors


In fact this was a destructive error and it caused the task to fail even though the HITS file was produced. It looks like a difference in the way tar handles wildcards on your system compared to what I tested with. I will fix it and make a new version.
ID: 6925 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 601
Credit: 1,451,312
RAC: 1,368
Message 6927 - Posted: 7 Jan 2020, 15:04:06 UTC
Last modified: 7 Jan 2020, 15:14:59 UTC

Have started a new task:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2856618
Task had confirm-Error and sigbus-Error.
Will look tomorrow again.
ID: 6927 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 654
Credit: 10,929,747
RAC: 1,604
Message 6929 - Posted: 7 Jan 2020, 23:40:02 UTC

sure wish I could test these on a Windows OS here or over at Atlas Test
ID: 6929 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 601
Credit: 1,451,312
RAC: 1,368
Message 6930 - Posted: 8 Jan 2020, 7:42:27 UTC - in response to Message 6929.  

Have RAM now set to 12.5 Gbyte, but without success.
Will testing 0.99 today on a HP i7 without Docker,
but there are native-Atlas ATM running.
ID: 6930 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 601
Credit: 1,451,312
RAC: 1,368
Message 6937 - Posted: 9 Jan 2020, 20:01:09 UTC - in response to Message 6924.  

Do 26. Dez 07:50:42 CET 2019:     "exeErrorDiag": "EVNTtoHITS got a SIGBUS signal (exit code 135)",


Unfortunately your tasks fail due to this error. I have seen this before when not enough shared memory was available inside the docker container, or maybe when other tasks on the same host were using too much memory.

Had made a deeper look. The last two 0.98 Tasks had no sigbus error and showed this lines:

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2849445
Do 26. Dez 11:20:30 CET 2019: 2019-12-26 10:19:58,490 | INFO | queue_monitor | pilot.user.atlas.utilities | get_memory_monitor_info | extracted standard memory fields from prmon json
11:20:30 (16877): run_atlas exited; CPU time 234.618104
11:20:30 (16877): called boinc_finish(0)

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2849442
Do 26. Dez 10:52:01 CET 2019: 2019-12-26 09:51:21,675 | INFO | queue_monitor | pilot.api.analytics | get_fitted_data | current memory leak: 61.49 B/s (using 9 data points, chi2=10364)
10:52:01 (802): run_atlas exited; CPU time 234.746424
10:52:01 (802): called boinc_finish(0)
ID: 6937 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : ATLAS Application : ATLAS native 0.98 and docker image


©2023 CERN