Message boards : ATLAS Application : Native app using Singularity from CVMFS
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 ![]() ![]() |
This task is running with CentOS 7.4 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2799754 06:26:30 (13549): wrapper (7.7.26015): starting 06:26:30 (13549): wrapper: running run_atlas (--nthreads 1) singularity image is /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 sys.argv = ['run_atlas', '--nthreads', '1'] THREADS=1 Checking for CVMFS CVMFS is installed OS:CentOS Linux release 7.4.1708 (Core) This is not SLC6, need to run with Singularity.... Checking Singularity... Singularity is installed, version 2.6.1-dist Testing the function of Singularity... Checking singularity with cmd:singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 hostname Singularity Works... copy /root/slots/4/shared/ATLAS.root_0 copy /root/slots/4/shared/input.tar.gz copy /root/slots/4/shared/RTE.tar.gz copy /root/slots/4/shared/start_atlas.sh start atlas job with PandaID=4443290924 cmd = singularity exec --pwd /root/slots/4 -B /cvmfs,/root /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 sh start_atlas.sh > runtime_log 2> runtime_log.err |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 ![]() ![]() |
Singularity is installed, version 2.2.99 This is a really ancient version of singularity so it would be better if you update it. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 ![]() ![]() |
Singularity is installed, version 2.2.99 Have deinstalled singularity without reboot, because of running two Theory-native from Production. Is running now (SL76) Singularity come from CVMFS :-)): https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2800426 10:45:48 (25637): wrapper (7.7.26015): starting 10:45:48 (25637): wrapper: running run_atlas (--nthreads 1) singularity image is /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 sys.argv = ['run_atlas', '--nthreads', '1'] THREADS=1 Checking for CVMFS CVMFS is installed OS:Scientific Linux release 7.6 (Nitrogen) This is not SLC6, need to run with Singularity.... Checking Singularity... Singularity seems to be installed but not working: %s Will use version from CVMFS Testing the function of Singularity... Checking singularity with cmd:/cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 hostname Singularity Works... copy /root/Downloads/BOINC/slots/2/shared/ATLAS.root_0 copy /root/Downloads/BOINC/slots/2/shared/input.tar.gz copy /root/Downloads/BOINC/slots/2/shared/RTE.tar.gz copy /root/Downloads/BOINC/slots/2/shared/start_atlas.sh start atlas job with PandaID=4445586893 cmd = /cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec --pwd /root/Downloads/BOINC/slots/2 -B /cvmfs,/root /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 sh start_atlas.sh > runtime_log 2> runtime_log.err |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 878,593 RAC: 80 ![]() ![]() |
Finally I got one task working on Ubuntu 18.10 by uninstalling singularity 2.5.2-2. No newer package available, so now using singularity from CVMFS.What if singularity is already installed. One task tested and that failed: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2798232Are you sure that your local singularity version is working? Running 75 minutes on 2 threads and see 'only' 3 events processed so far. How many events are in 1 task? Edit: At the same time running a single core Theory Native from LHC-production |
![]() Send message Joined: 28 Jul 16 Posts: 485 Credit: 394,839 RAC: 0 ![]() ![]() |
... 3 events processed so far. How many events are in 1 task? 200 A simple monitoring can be set up using the method here: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5102&postid=39606 |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 878,593 RAC: 80 ![]() ![]() |
I remembered it was 200, but I doubted, cause of having only 3 done so far (meanwhile 13).... 3 events processed so far. How many events are in 1 task? I'm monitoring the progress in 2 terminals with tail -F /var/lib/boinc-client/slots/0/PanDA_Pilot-4445903070/athenaMP-workers-EVNTtoHITS-sim/worker_0/AthenaMP.log tail -F /var/lib/boinc-client/slots/0/PanDA_Pilot-4445903070/athenaMP-workers-EVNTtoHITS-sim/worker_1/AthenaMP.logDisadvantage: you have to adjust the path (Pilot-task#) with every new task. |
![]() Send message Joined: 28 Jul 16 Posts: 485 Credit: 394,839 RAC: 0 ![]() ![]() |
Disadvantage: you have to adjust the path (Pilot-task#) with every new task. That's what my oneliner does automatically using the find command. In addition it works for single core tasks as well as for multicore tasks. They use different lognames: singlecore -> log.EVNTtoHITS multicore -> AthenaMP.log BTW: If you use "find /var/lib/boinc-client/slots ..." instead of "find . ..." the oneliner works from every directory. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 ![]() ![]() |
Thanks for the info, I will add this information to the original post. But which version of CentOS 7 do you have? As far as I know CentOS 7.6 doesn't require updating the kernel arguments or reboot. Setting a number for max user namespaces with sysctl as I showed in the original post should be enough. OS:CentOS Linux release 7.4.1708 (Core) and OS:Scientific Linux release 7.6 (Nitrogen) working with namespace and have a successful task finished. This Task is running on a Computer with namespace Error yesterday: "Failed to create mount namespace: Operation not permitted" https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2800952 |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 878,593 RAC: 80 ![]() ![]() |
The one running native ATLAS-task https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2800501 is still running after 25 hours wall clock on 2 threads. It's progressing, but slowly. Done now 156 events. I'm wondering, whether this task type is different, cause in the mean time I also ran on the same machine a VBox ATLAS task on 4 threads from LHC-production and that task finished after 7 hours and 38 minutes elapsed time. https://lhcathome.cern.ch/lhcathome/result.php?resultid=240389930 |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 ![]() ![]() |
This was in three tasks today seen: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2801419 File "/root/slots/5/pilot2/pilot/common/exception.py", line 413, in run self._Thread__target(**self._Thread__kwargs) File "/root/slots/5/pilot2/pilot/control/job.py", line 1351, in retrieve if has_job_completed(queues): File "/root/slots/5/pilot2/pilot/control/job.py", line 1431, in has_job_completed cleanup(job) File "/root/slots/5/pilot2/pilot/util/processes.py", line 565, in cleanup job.collect_zombies(tn=10) File "/root/slots/5/pilot2/pilot/info/jobdata.py", line 776, in collect_zombies _id, rc = os.waitpid(x, os.WNOHANG) exception caught by thread run() function: (<type 'exceptions.TypeError'>, TypeError('an integer is required',), <traceback object at 0x27181b8>) Traceback (most recent call last): File "/root/slots/5/pilot2/pilot/common/exception.py", line 413, in run self._Thread__target(**self._Thread__kwargs) File "/root/slots/5/pilot2/pilot/control/job.py", line 1351, in retrieve if has_job_completed(queues): File "/root/slots/5/pilot2/pilot/control/job.py", line 1431, in has_job_completed cleanup(job) File "/root/slots/5/pilot2/pilot/util/processes.py", line 565, in cleanup job.collect_zombies(tn=10) File "/root/slots/5/pilot2/pilot/info/jobdata.py", line 776, in collect_zombies _id, rc = os.waitpid(x, os.WNOHANG) TypeError: an integer is required Edit: There are 9 slots active (8 for sixtrack and one for lhc-dev Atlas) |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 ![]() ![]() |
Thanks for reporting, I have passed this to the upstream developers to take a look. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 ![]() ![]() |
Have this task finishing today: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2802027 Pilot.log show this info. 2019-08-21 07:07:06,658 | INFO | retrieve | root | retrieve | pilot has finished for previous job - re-establishing logging No handlers could be found for logger "pilot.util.mpi" 2019-08-21 07:07:07,165 | DEBUG | retrieve | pilot.control.job | retrieve | getjob_requests=1 |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 878,593 RAC: 80 ![]() ![]() |
I see a new version for ATLAS native available. Linux running on an AMD x86_64 or Intel EM64T CPU 0.73 (native_mt) 25 Sep 2019, 19:55:33 UTC without announcement. I wanted to try that version within my small Linux VM, but getting: Sun 29 Sep 2019 10:58:46 AM CEST | lhcathome-dev | Message from server: ATLAS Simulation needs 3933.64MB more disk space. You currently have 11325.14 MB available and it needs 15258.79 MB. The peak disk usage for a 4 core task seems to be max 1 GB, so why is the demand set so high? Did you just take the setting from the ATLAS VBOX version? |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 ![]() ![]() |
Crystal, have seen the same, but not reported. So many RAM is not avalaible. |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 ![]() ![]() |
I see a new version for ATLAS native available. This was a test of version 2.70 that was released shortly after on the production project.
I've set the disk requirement down to 8GB (same as the production project). I don't remember why it was set so high but maybe it was from a previous test which used more disk space. The setting is per-app and not app version, so it's not possible to have different numbers for native and vbox apps. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 878,593 RAC: 80 ![]() ![]() |
The peak disk usage for a 4 core task seems to be max 1 GB, so why is the demand set so high? Did you just take the setting from the ATLAS VBOX version?I've set the disk requirement down to 8GB (same as the production project). I don't remember why it was set so high but maybe it was from a previous test which used more disk space. The setting is per-app and not app version, so it's not possible to have different numbers for native and vbox apps. That's a very bad solution. In my opinion that are totally different apps, at least for the BOINC-user, maybe not the application inside the VBox-VM and a native task. That 16 GB was because of the new CentOS7-vdi is almost 1 GB bigger than the one in production and 8 GB was too small, when a user had to suspend a task and save the VM to disk. When you are not able to differentiate the ATLAS-applications, we have to compromise. What about 10GB for disk-requirement? Btw: The requirement is not yet reduced: Fri 04 Oct 2019 06:54:01 PM CEST | lhcathome-dev | Message from server: ATLAS Simulation needs 3016.77MB more disk space. You currently have 12242.02 MB available and it needs 15258.79 MB. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 ![]() ![]() |
native Test with Error: 2019-10-07 20:37:39,824: Checking singularity with cmd:singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img hostname 2019-10-07 20:37:40,894: Singularity isnt working: [91mERROR : Unknown image format/type: /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img [0m[31mABORT : Retval = 255 [0m 2019-10-07 20:37:42,916: running start_atlas return value is 3 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2828914 |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 ![]() ![]() |
CentOS7 have namespace not active by default. To use it, have found this command: nsenter: failed to unshare namespaces: Operation not permitted user_namespace.enable=1 must be changed to namespace.unpriv_enable=1 in some cases. |
©2025 CERN