InfoMessage
1) Message boards : ATLAS Application : Native app using Singularity from CVMFS
Message 6543
Posted 12 Aug 2019 by gyllic
Can you run the command manually and post the output?

/cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 hostname
This command gives no output.

My first thought was that user namespaces are not working properly. However, in the kernel config file it says "CONFIG_USER_NS=y", within the /etc/sysctl.conf file "kernel.unprivileged_userns_clone=1" is written, and the file /proc/sys/user/max_user_namespaces shows the value "19655".

Are there other parameters that have to be adapted in order for namespaces to work?

I am not sure, but I think that someone here or on the production site wrote that he had to compile a custom kernel for debian testing (which is now stable 10), but I can't find the post. But since the CONFIG_USER_NS is set to yes within the kernel config file there should be no need to compile a custom kernel, or are there other config options needed to be set to yes?
2) Message boards : ATLAS Application : Native app using Singularity from CVMFS
Message 6535
Posted 11 Aug 2019 by gyllic
I have set up a new VM with debian 10, installed boinc and compiled cvmfs. cvmfs_config probe shows all 'ok'.
For enabling user namespace the following commands have been used:
sudo sed -i '$ a\kernel.unprivileged_userns_clone = 1' /etc/sysctl.conf
sudo sysctl -p

Tested a 0.70 nativ task, but it failed due to singularity not working:
THREADS=1
Checking for CVMFS
CVMFS is installed
OS:cat: /etc/redhat-release: No such file or directory

This is not SLC6, need to run with Singularity....
Checking Singularity...
Singularity is not installed, using version from CVMFS
Testing the function of Singularity...
Checking singularity with cmd:/cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 hostname

Singularity isnt working: 

running start_atlas return value is 3
Any ideas why it is failing?
3) Message boards : ATLAS Application : Native app using Singularity from CVMFS
Message 6528
Posted 10 Aug 2019 by gyllic
What if singularity is already installed. One task tested and that failed: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2798232
Are you sure that your local singularity version is working? BTW, your singularity version is quite old.

Tested the new version with singularity installed and everything is working fine. The log shows:

Checking for CVMFS
CVMFS is installed
OS:cat: /etc/redhat-release: Datei oder Verzeichnis nicht gefunden

This is not SLC6, need to run with Singularity....
Checking Singularity...
Singularity is installed, version singularity version 3.3.0-614.gf0cd4b488
Testing the function of Singularity...
Checking singularity with cmd:singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 hostname
Singularity Works...

copy /home/boinc/boinc1/slots/0/shared/ATLAS.root_0
copy /home/boinc/boinc1/slots/0/shared/RTE.tar.gz
copy /home/boinc/boinc1/slots/0/shared/input.tar.gz
copy /home/boinc/boinc1/slots/0/shared/start_atlas.sh
export ATHENA_PROC_NUMBER=2;start atlas job with grep: pandaJobData.out: Datei oder Verzeichnis nicht gefunden
cmd = singularity exec --pwd /home/boinc/boinc1/slots/0 -B /cvmfs,/home /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 sh start_atlas.sh > runtime_log 2> runtime_log.err
4) Message boards : ATLAS Application : Native app using Singularity from CVMFS
Message 6523
Posted 9 Aug 2019 by gyllic
The task is currently running, and there should be no reason why it should fail.
It finished successfully:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2797696
5) Message boards : ATLAS Application : Native app using Singularity from CVMFS
Message 6521
Posted 8 Aug 2019 by gyllic
It is looking good!

Tested it with a system with no singularity installed. The logs show:
...
Checking for CVMFS
CVMFS is installed
OS:cat: /etc/redhat-release: Datei oder Verzeichnis nicht gefunden

This is not SLC6, need to run with Singularity....
Checking Singularity...
Singularity is not installed, using version from CVMFS
Testing the function of Singularity...
Checking singularity with cmd:/cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 hostname
Singularity Works...
...

The task is currently running, and there should be no reason why it should fail.

N.B.: The command
 /cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity version
shows 3.2.1-1. The most current version of singularity is 3.3.0.
6) Message boards : ATLAS Application : Tasks testing new pilot version
Message 6384
Posted 21 May 2019 by gyllic
I've made a change to avoid these connections which will apply to new WU submitted from now. If the problem is confirmed to be fixed I'll apply the changes on the production WU too.
Tested one task (https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2778990) and it is looking good. The logs show no connections to ports that are not mentioned in http://lhcathome.web.cern.ch/test4theory/my-firewall-complaining-which-ports-does-project-use
7) Message boards : ATLAS Application : Tasks testing new pilot version
Message 6381
Posted 20 May 2019 by gyllic
I already mentioned a firewall issue a while ago at LHC-prod regarding pandaserver.cern.ch, port 25085:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5008&postid=38684

The same happens here and should be solved.
I also see connections to this port, especially at task start up. The logs (only logged for one task) show connections to aipanda034.cern.ch on port 25085
8) Message boards : ATLAS Application : Tasks testing new pilot version
Message 6380
Posted 20 May 2019 by gyllic
this tasks ran und validated successfully:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2778502

looking good
9) Message boards : ATLAS Application : Tasks testing new pilot version
Message 6374
Posted 19 May 2019 by gyllic

Nonetheless the task is marked as invalid.
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2777605

same here:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2777811
same here
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2778227
10) Message boards : Theory Application : New Native App - Linux Only
Message 6098
Posted 26 Feb 2019 by gyllic
had one of those to two days ago:
195 (0x000000C3) EXIT_CHILD_FAILED
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2754240
Dont think that this has something to do with the problem, but anyway:
I did a router restart while the task was running, so temporary there was no internet access available for the task.
11) Message boards : Theory Application : New Native App - Linux Only
Message 6076
Posted 24 Feb 2019 by gyllic
There are tasks that run for a couple of hundred seconds and some for a couple of hours. Its probably the normal fluctuations (different job types, different codes, different events types, ...). Using the old vbox app there have also been jobs that run for a couple of hours. Would be weird if the native jobs run slower compared to running within a VM.
12) Message boards : Theory Application : Windows Version
Message 6053
Posted 22 Feb 2019 by gyllic
Considering how the new vbox app works, is it still worth using a local proxy server anymore?
If yes, are there plans to implement that?
13) Message boards : Theory Application : New Native App - Linux Only
Message 5946
Posted 19 Feb 2019 by gyllic
So far 13/13 with version 4.14 worked fine. Also the host is shown as active in MCPlots (showing 9 jobs at the moment with 0% failure).
Regarding the suspend feature, I have experienced the same behaviour as other users have already mentioned.
14) Message boards : Theory Application : New Native App - Linux Only
Message 5935
Posted 18 Feb 2019 by gyllic
With the new version 4.13 (cranky-0.0.20) I get only errors.
Same here. 4 out of 4 reported the same error:
16:53:04 2019-02-18: cranky-0.0.20: [INFO] Preparing output.
tar: local.txt: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors
17:53:05 (14091): cranky exited; CPU time 3423.660000
17:53:05 (14091): app exit status: 0x2
17:53:05 (14091): called boinc_finish(195)
e.g. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752874
15) Message boards : Theory Application : New Native App - Linux Only
Message 5930
Posted 18 Feb 2019 by gyllic
Looks like the queue is empty:
18.02.2019 14:58:17 | lhcathome-dev | Requesting new tasks for CPU
18.02.2019 14:58:19 | lhcathome-dev | Scheduler request completed: got 0 new tasks
18.02.2019 14:58:19 | lhcathome-dev | No tasks sent
18.02.2019 14:58:19 | lhcathome-dev | No tasks are available for Theory Simulation
So no testing possible at the moment.
16) Message boards : Theory Application : New Native App - Linux Only
Message 5919
Posted 16 Feb 2019 by gyllic
This sherpa was running OK (and fast)
Maybe the problem has already been mentioned and a measure implemented, but how are Sherpa jobs handled that run endlessly? Is there some control mechanism in place or if not, is this planned in future?
17) Message boards : Theory Application : Native Setup Linux
Message 5870
Posted 12 Feb 2019 by gyllic
Debian Stretch:
Follow the Ubuntu guide and do the following to enable user namespaces:
Unfortunately, it looks like that this fix is only temporarily, since after rebooting the system the value in that file is again 0 and the unshare command does not work anymore.
To fix this for good, I adapted the sysctl.conf file in order to make the value of "/proc/sys/kernel/unprivileged_userns_clone" to be persisant 1.
sudo echo "kernel.unprivileged_userns_clone = 1" >>  /etc/sysctl.conf
sudo sysctl -p
Sry to be fussy, but the shown command "sudo echo...." does not work for debian. Using e.g. sed will work.
So I would suggest something like this:

Debian Stretch:
Follow the Ubuntu guide and do the following to enable user namespaces:
For enabling user namespaces for every user permanently, type
sudo sed -i '$ a\kernel.unprivileged_userns_clone = 1' /etc/sysctl.conf
sudo sysctl -p
If you want to enable user namespaces only temporarily, type
sudo su -c 'echo 1 > /proc/sys/kernel/unprivileged_userns_clone'
You will have to execute this command after every system reboot in order to be able to crunch native Theory tasks.

@Laurence (or any admin/moderator): You can delete this command if you want in order to keep this thread clean.
18) Message boards : Theory Application : New Native App - Linux Only
Message 5858
Posted 11 Feb 2019 by gyllic
sudo su -c 'echo 1 > /proc/sys/kernel/unprivileged_userns_clone'

This did the trick and the application ran successfully on Debian Stretch:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752281

Great! I will add that to the guide.
Unfortunately, it looks like that this fix is only temporarily, since after rebooting the system the value in that file is again 0 and the unshare command does not work anymore.
To fix this for good, I adapted the sysctl.conf file in order to make the value of "/proc/sys/kernel/unprivileged_userns_clone" to be persisant 1.

Open the file with
sudo nano /etc/sysctl.conf
and add the following to the end of that file
kernel.unprivileged_userns_clone = 1
Then apply the changes with
sudo sysctl -p

Using this approach made the value of " /proc/sys/kernel/unprivileged_userns_clone" stay at 1 after reboots and the unshare command also works "out of the box" after rebooting.
19) Message boards : Theory Application : New Native App - Linux Only
Message 5852
Posted 11 Feb 2019 by gyllic
Let me know if you figure it out otherwise I will investigate after looking into Opensuse.
I have found a solution. Taken from the Debian mail logs:
"It turns out that the Debian kernel is set up to disable unprivileged users from unsharing the user namespace by default. This can be worked around using":

sudo su -c 'echo 1 > /proc/sys/kernel/unprivileged_userns_clone'

This did the trick and the application ran successfully on Debian Stretch:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752281
20) Message boards : Theory Application : New Native App - Linux Only
Message 5843
Posted 11 Feb 2019 by gyllic
Try this command for testing:
unshare -U /bin/bash
The output is
unshare: unshare failed: Operation not permitted

running it with sudo gives a new line in the terminal with
nobody@debian:
with "nobody" not being my normal user
Next 20


©2025 CERN