Message boards : ATLAS Application : Native app using Singularity from CVMFS
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,720,327
RAC: 2,947
Message 6551 - Posted: 15 Aug 2019, 5:00:45 UTC

This task is running with CentOS 7.4
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2799754
06:26:30 (13549): wrapper (7.7.26015): starting
06:26:30 (13549): wrapper: running run_atlas (--nthreads 1)
singularity image is /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6
sys.argv = ['run_atlas', '--nthreads', '1']
THREADS=1
Checking for CVMFS
CVMFS is installed
OS:CentOS Linux release 7.4.1708 (Core)

This is not SLC6, need to run with Singularity....
Checking Singularity...
Singularity is installed, version 2.6.1-dist
Testing the function of Singularity...
Checking singularity with cmd:singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 hostname
Singularity Works...

copy /root/slots/4/shared/ATLAS.root_0
copy /root/slots/4/shared/input.tar.gz
copy /root/slots/4/shared/RTE.tar.gz
copy /root/slots/4/shared/start_atlas.sh
start atlas job with PandaID=4443290924
cmd = singularity exec --pwd /root/slots/4 -B /cvmfs,/root /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 sh start_atlas.sh > runtime_log 2> runtime_log.err
ID: 6551 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 6552 - Posted: 15 Aug 2019, 7:26:06 UTC - in response to Message 6550.  

Singularity is installed, version 2.2.99


This is a really ancient version of singularity so it would be better if you update it.
ID: 6552 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,720,327
RAC: 2,947
Message 6555 - Posted: 15 Aug 2019, 9:04:14 UTC - in response to Message 6552.  

Singularity is installed, version 2.2.99


This is a really ancient version of singularity so it would be better if you update it.

Have deinstalled singularity without reboot, because of running two Theory-native from Production.
Is running now (SL76) Singularity come from CVMFS :-)):
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2800426
10:45:48 (25637): wrapper (7.7.26015): starting
10:45:48 (25637): wrapper: running run_atlas (--nthreads 1)
singularity image is /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6
sys.argv = ['run_atlas', '--nthreads', '1']
THREADS=1
Checking for CVMFS
CVMFS is installed
OS:Scientific Linux release 7.6 (Nitrogen)

This is not SLC6, need to run with Singularity....
Checking Singularity...
Singularity seems to be installed but not working: %s
Will use version from CVMFS
Testing the function of Singularity...
Checking singularity with cmd:/cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 hostname
Singularity Works...

copy /root/Downloads/BOINC/slots/2/shared/ATLAS.root_0
copy /root/Downloads/BOINC/slots/2/shared/input.tar.gz
copy /root/Downloads/BOINC/slots/2/shared/RTE.tar.gz
copy /root/Downloads/BOINC/slots/2/shared/start_atlas.sh
start atlas job with PandaID=4445586893
cmd = /cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec --pwd /root/Downloads/BOINC/slots/2 -B /cvmfs,/root /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 sh start_atlas.sh > runtime_log 2> runtime_log.err
ID: 6555 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 1,800
Message 6556 - Posted: 15 Aug 2019, 11:18:33 UTC - in response to Message 6529.  
Last modified: 15 Aug 2019, 11:38:55 UTC

What if singularity is already installed. One task tested and that failed: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2798232
Are you sure that your local singularity version is working?
Finally I got one task working on Ubuntu 18.10 by uninstalling singularity 2.5.2-2. No newer package available, so now using singularity from CVMFS.

Running 75 minutes on 2 threads and see 'only' 3 events processed so far. How many events are in 1 task?

Edit: At the same time running a single core Theory Native from LHC-production
ID: 6556 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 467
Credit: 389,411
RAC: 449
Message 6557 - Posted: 15 Aug 2019, 11:59:40 UTC - in response to Message 6556.  

... 3 events processed so far. How many events are in 1 task?

200

A simple monitoring can be set up using the method here:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5102&postid=39606
ID: 6557 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 1,800
Message 6558 - Posted: 15 Aug 2019, 12:55:08 UTC - in response to Message 6557.  

... 3 events processed so far. How many events are in 1 task?

200

A simple monitoring can be set up using the method here:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5102&postid=39606
I remembered it was 200, but I doubted, cause of having only 3 done so far (meanwhile 13).
I'm monitoring the progress in 2 terminals with
tail -F /var/lib/boinc-client/slots/0/PanDA_Pilot-4445903070/athenaMP-workers-EVNTtoHITS-sim/worker_0/AthenaMP.log
tail -F /var/lib/boinc-client/slots/0/PanDA_Pilot-4445903070/athenaMP-workers-EVNTtoHITS-sim/worker_1/AthenaMP.log
Disadvantage: you have to adjust the path (Pilot-task#) with every new task.
ID: 6558 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 467
Credit: 389,411
RAC: 449
Message 6559 - Posted: 15 Aug 2019, 13:17:31 UTC - in response to Message 6558.  

Disadvantage: you have to adjust the path (Pilot-task#) with every new task.

That's what my oneliner does automatically using the find command.
In addition it works for single core tasks as well as for multicore tasks.
They use different lognames:
singlecore -> log.EVNTtoHITS
multicore -> AthenaMP.log

BTW:
If you use "find /var/lib/boinc-client/slots ..." instead of "find . ..." the oneliner works from every directory.
ID: 6559 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,720,327
RAC: 2,947
Message 6560 - Posted: 16 Aug 2019, 7:13:43 UTC - in response to Message 6541.  
Last modified: 16 Aug 2019, 7:44:31 UTC

Thanks for the info, I will add this information to the original post. But which version of CentOS 7 do you have? As far as I know CentOS 7.6 doesn't require updating the kernel arguments or reboot. Setting a number for max user namespaces with sysctl as I showed in the original post should be enough.

OS:CentOS Linux release 7.4.1708 (Core) and OS:Scientific Linux release 7.6 (Nitrogen)
working with namespace and have a successful task finished.
This Task is running on a Computer with namespace Error yesterday:
"Failed to create mount namespace: Operation not permitted"
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2800952
ID: 6560 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 1,800
Message 6561 - Posted: 16 Aug 2019, 11:11:25 UTC
Last modified: 16 Aug 2019, 11:12:07 UTC

The one running native ATLAS-task https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2800501 is still running after 25 hours wall clock on 2 threads.
It's progressing, but slowly. Done now 156 events. I'm wondering, whether this task type is different, cause in the mean time I also ran on the same machine
a VBox ATLAS task on 4 threads from LHC-production and that task finished after 7 hours and 38 minutes elapsed time.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=240389930
ID: 6561 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,720,327
RAC: 2,947
Message 6564 - Posted: 17 Aug 2019, 16:43:35 UTC
Last modified: 17 Aug 2019, 17:08:23 UTC

This was in three tasks today seen:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2801419

File "/root/slots/5/pilot2/pilot/common/exception.py", line 413, in run
self._Thread__target(**self._Thread__kwargs)
File "/root/slots/5/pilot2/pilot/control/job.py", line 1351, in retrieve
if has_job_completed(queues):
File "/root/slots/5/pilot2/pilot/control/job.py", line 1431, in has_job_completed
cleanup(job)
File "/root/slots/5/pilot2/pilot/util/processes.py", line 565, in cleanup
job.collect_zombies(tn=10)
File "/root/slots/5/pilot2/pilot/info/jobdata.py", line 776, in collect_zombies
_id, rc = os.waitpid(x, os.WNOHANG)
exception caught by thread run() function: (<type 'exceptions.TypeError'>, TypeError('an integer is required',), <traceback object at 0x27181b8>)
Traceback (most recent call last):
File "/root/slots/5/pilot2/pilot/common/exception.py", line 413, in run
self._Thread__target(**self._Thread__kwargs)
File "/root/slots/5/pilot2/pilot/control/job.py", line 1351, in retrieve
if has_job_completed(queues):
File "/root/slots/5/pilot2/pilot/control/job.py", line 1431, in has_job_completed
cleanup(job)
File "/root/slots/5/pilot2/pilot/util/processes.py", line 565, in cleanup
job.collect_zombies(tn=10)
File "/root/slots/5/pilot2/pilot/info/jobdata.py", line 776, in collect_zombies
_id, rc = os.waitpid(x, os.WNOHANG)
TypeError: an integer is required
Edit: There are 9 slots active (8 for sixtrack and one for lhc-dev Atlas)
ID: 6564 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 6567 - Posted: 20 Aug 2019, 10:25:32 UTC - in response to Message 6564.  

Thanks for reporting, I have passed this to the upstream developers to take a look.
ID: 6567 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,720,327
RAC: 2,947
Message 6568 - Posted: 21 Aug 2019, 7:19:21 UTC

Have this task finishing today:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2802027
Pilot.log show this info.
2019-08-21 07:07:06,658 | INFO | retrieve | root | retrieve | pilot has finished for previous job - re-establishing logging
No handlers could be found for logger "pilot.util.mpi"
2019-08-21 07:07:07,165 | DEBUG | retrieve | pilot.control.job | retrieve | getjob_requests=1
ID: 6568 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 1,800
Message 6709 - Posted: 29 Sep 2019, 9:13:53 UTC

I see a new version for ATLAS native available.
Linux running on an AMD x86_64 or Intel EM64T CPU 0.73 (native_mt) 25 Sep 2019, 19:55:33 UTC without announcement.

I wanted to try that version within my small Linux VM, but getting:

Sun 29 Sep 2019 10:58:46 AM CEST | lhcathome-dev | Message from server: ATLAS Simulation needs 3933.64MB more disk space. You currently have 11325.14 MB available and it needs 15258.79 MB.

The peak disk usage for a 4 core task seems to be max 1 GB, so why is the demand set so high? Did you just take the setting from the ATLAS VBOX version?
ID: 6709 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,720,327
RAC: 2,947
Message 6710 - Posted: 29 Sep 2019, 10:16:55 UTC - in response to Message 6709.  

Crystal,
have seen the same, but not reported.
So many RAM is not avalaible.
ID: 6710 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 6740 - Posted: 4 Oct 2019, 13:19:01 UTC - in response to Message 6709.  

I see a new version for ATLAS native available.
Linux running on an AMD x86_64 or Intel EM64T CPU 0.73 (native_mt) 25 Sep 2019, 19:55:33 UTC without announcement.


This was a test of version 2.70 that was released shortly after on the production project.



I wanted to try that version within my small Linux VM, but getting:

Sun 29 Sep 2019 10:58:46 AM CEST | lhcathome-dev | Message from server: ATLAS Simulation needs 3933.64MB more disk space. You currently have 11325.14 MB available and it needs 15258.79 MB.

The peak disk usage for a 4 core task seems to be max 1 GB, so why is the demand set so high? Did you just take the setting from the ATLAS VBOX version?


I've set the disk requirement down to 8GB (same as the production project). I don't remember why it was set so high but maybe it was from a previous test which used more disk space. The setting is per-app and not app version, so it's not possible to have different numbers for native and vbox apps.
ID: 6740 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 1,800
Message 6745 - Posted: 4 Oct 2019, 16:39:38 UTC - in response to Message 6740.  
Last modified: 4 Oct 2019, 16:56:30 UTC

The peak disk usage for a 4 core task seems to be max 1 GB, so why is the demand set so high? Did you just take the setting from the ATLAS VBOX version?
I've set the disk requirement down to 8GB (same as the production project). I don't remember why it was set so high but maybe it was from a previous test which used more disk space. The setting is per-app and not app version, so it's not possible to have different numbers for native and vbox apps.

That's a very bad solution. In my opinion that are totally different apps, at least for the BOINC-user, maybe not the application inside the VBox-VM and a native task.
That 16 GB was because of the new CentOS7-vdi is almost 1 GB bigger than the one in production and 8 GB was too small, when a user had to suspend a task and save the VM to disk.
When you are not able to differentiate the ATLAS-applications, we have to compromise. What about 10GB for disk-requirement?

Btw: The requirement is not yet reduced:
Fri 04 Oct 2019 06:54:01 PM CEST | lhcathome-dev | Message from server: ATLAS Simulation needs 3016.77MB more disk space. You currently have 12242.02 MB available and it needs 15258.79 MB.
ID: 6745 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,720,327
RAC: 2,947
Message 6747 - Posted: 7 Oct 2019, 19:12:39 UTC

native Test with Error:
2019-10-07 20:37:39,824: Checking singularity with cmd:singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img hostname
2019-10-07 20:37:40,894: Singularity isnt working: ERROR : Unknown image format/type: /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img
ABORT : Retval = 255

2019-10-07 20:37:42,916: running start_atlas return value is 3
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2828914
ID: 6747 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,720,327
RAC: 2,947
Message 6922 - Posted: 30 Dec 2019, 3:43:20 UTC - in response to Message 6540.  
Last modified: 30 Dec 2019, 4:30:29 UTC

CentOS7 have namespace not active by default. To use it, have found this command:
grubby --args="user_namespace.enable=1" --update-kernel="$(grubby --default-kernel)"
After this a reboot is needed.

/cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 hostname
Error: Failed to create mount namespace: Operation not permitted


nsenter: failed to unshare namespaces: Operation not permitted
user_namespace.enable=1 must be changed to namespace.unpriv_enable=1 in some cases.
ID: 6922 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : ATLAS Application : Native app using Singularity from CVMFS


©2024 CERN