1) Message boards : ATLAS Application : Native app using Singularity from CVMFS (Message 6543)
Posted 12 Aug 2019 by gyllic
Post:
Can you run the command manually and post the output?

/cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 hostname
This command gives no output.

My first thought was that user namespaces are not working properly. However, in the kernel config file it says "CONFIG_USER_NS=y", within the /etc/sysctl.conf file "kernel.unprivileged_userns_clone=1" is written, and the file /proc/sys/user/max_user_namespaces shows the value "19655".

Are there other parameters that have to be adapted in order for namespaces to work?

I am not sure, but I think that someone here or on the production site wrote that he had to compile a custom kernel for debian testing (which is now stable 10), but I can't find the post. But since the CONFIG_USER_NS is set to yes within the kernel config file there should be no need to compile a custom kernel, or are there other config options needed to be set to yes?
2) Message boards : ATLAS Application : Native app using Singularity from CVMFS (Message 6535)
Posted 11 Aug 2019 by gyllic
Post:
I have set up a new VM with debian 10, installed boinc and compiled cvmfs. cvmfs_config probe shows all 'ok'.
For enabling user namespace the following commands have been used:
sudo sed -i '$ a\kernel.unprivileged_userns_clone = 1' /etc/sysctl.conf
sudo sysctl -p

Tested a 0.70 nativ task, but it failed due to singularity not working:
THREADS=1
Checking for CVMFS
CVMFS is installed
OS:cat: /etc/redhat-release: No such file or directory

This is not SLC6, need to run with Singularity....
Checking Singularity...
Singularity is not installed, using version from CVMFS
Testing the function of Singularity...
Checking singularity with cmd:/cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 hostname

Singularity isnt working: 

running start_atlas return value is 3
Any ideas why it is failing?
3) Message boards : ATLAS Application : Native app using Singularity from CVMFS (Message 6528)
Posted 10 Aug 2019 by gyllic
Post:
What if singularity is already installed. One task tested and that failed: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2798232
Are you sure that your local singularity version is working? BTW, your singularity version is quite old.

Tested the new version with singularity installed and everything is working fine. The log shows:

Checking for CVMFS
CVMFS is installed
OS:cat: /etc/redhat-release: Datei oder Verzeichnis nicht gefunden

This is not SLC6, need to run with Singularity....
Checking Singularity...
Singularity is installed, version singularity version 3.3.0-614.gf0cd4b488
Testing the function of Singularity...
Checking singularity with cmd:singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 hostname
Singularity Works...

copy /home/boinc/boinc1/slots/0/shared/ATLAS.root_0
copy /home/boinc/boinc1/slots/0/shared/RTE.tar.gz
copy /home/boinc/boinc1/slots/0/shared/input.tar.gz
copy /home/boinc/boinc1/slots/0/shared/start_atlas.sh
export ATHENA_PROC_NUMBER=2;start atlas job with grep: pandaJobData.out: Datei oder Verzeichnis nicht gefunden
cmd = singularity exec --pwd /home/boinc/boinc1/slots/0 -B /cvmfs,/home /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 sh start_atlas.sh > runtime_log 2> runtime_log.err
4) Message boards : ATLAS Application : Native app using Singularity from CVMFS (Message 6523)
Posted 9 Aug 2019 by gyllic
Post:
The task is currently running, and there should be no reason why it should fail.
It finished successfully:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2797696
5) Message boards : ATLAS Application : Native app using Singularity from CVMFS (Message 6521)
Posted 8 Aug 2019 by gyllic
Post:
It is looking good!

Tested it with a system with no singularity installed. The logs show:
...
Checking for CVMFS
CVMFS is installed
OS:cat: /etc/redhat-release: Datei oder Verzeichnis nicht gefunden

This is not SLC6, need to run with Singularity....
Checking Singularity...
Singularity is not installed, using version from CVMFS
Testing the function of Singularity...
Checking singularity with cmd:/cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 hostname
Singularity Works...
...

The task is currently running, and there should be no reason why it should fail.

N.B.: The command
 /cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity version
shows 3.2.1-1. The most current version of singularity is 3.3.0.
6) Message boards : ATLAS Application : Tasks testing new pilot version (Message 6384)
Posted 21 May 2019 by gyllic
Post:
I've made a change to avoid these connections which will apply to new WU submitted from now. If the problem is confirmed to be fixed I'll apply the changes on the production WU too.
Tested one task (https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2778990) and it is looking good. The logs show no connections to ports that are not mentioned in http://lhcathome.web.cern.ch/test4theory/my-firewall-complaining-which-ports-does-project-use
7) Message boards : ATLAS Application : Tasks testing new pilot version (Message 6381)
Posted 20 May 2019 by gyllic
Post:
I already mentioned a firewall issue a while ago at LHC-prod regarding pandaserver.cern.ch, port 25085:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5008&postid=38684

The same happens here and should be solved.
I also see connections to this port, especially at task start up. The logs (only logged for one task) show connections to aipanda034.cern.ch on port 25085
8) Message boards : ATLAS Application : Tasks testing new pilot version (Message 6380)
Posted 20 May 2019 by gyllic
Post:
this tasks ran und validated successfully:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2778502

looking good
9) Message boards : ATLAS Application : Tasks testing new pilot version (Message 6374)
Posted 19 May 2019 by gyllic
Post:

Nonetheless the task is marked as invalid.
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2777605

same here:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2777811
same here
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2778227
10) Message boards : Theory Application : New Native App - Linux Only (Message 6098)
Posted 26 Feb 2019 by gyllic
Post:
had one of those to two days ago:
195 (0x000000C3) EXIT_CHILD_FAILED
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2754240
Dont think that this has something to do with the problem, but anyway:
I did a router restart while the task was running, so temporary there was no internet access available for the task.
11) Message boards : Theory Application : New Native App - Linux Only (Message 6076)
Posted 24 Feb 2019 by gyllic
Post:
There are tasks that run for a couple of hundred seconds and some for a couple of hours. Its probably the normal fluctuations (different job types, different codes, different events types, ...). Using the old vbox app there have also been jobs that run for a couple of hours. Would be weird if the native jobs run slower compared to running within a VM.
12) Message boards : Theory Application : Windows Version (Message 6053)
Posted 22 Feb 2019 by gyllic
Post:
Considering how the new vbox app works, is it still worth using a local proxy server anymore?
If yes, are there plans to implement that?
13) Message boards : Theory Application : New Native App - Linux Only (Message 5946)
Posted 19 Feb 2019 by gyllic
Post:
So far 13/13 with version 4.14 worked fine. Also the host is shown as active in MCPlots (showing 9 jobs at the moment with 0% failure).
Regarding the suspend feature, I have experienced the same behaviour as other users have already mentioned.
14) Message boards : Theory Application : New Native App - Linux Only (Message 5935)
Posted 18 Feb 2019 by gyllic
Post:
With the new version 4.13 (cranky-0.0.20) I get only errors.
Same here. 4 out of 4 reported the same error:
16:53:04 2019-02-18: cranky-0.0.20: [INFO] Preparing output.
tar: local.txt: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors
17:53:05 (14091): cranky exited; CPU time 3423.660000
17:53:05 (14091): app exit status: 0x2
17:53:05 (14091): called boinc_finish(195)
e.g. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752874
15) Message boards : Theory Application : New Native App - Linux Only (Message 5930)
Posted 18 Feb 2019 by gyllic
Post:
Looks like the queue is empty:
18.02.2019 14:58:17 | lhcathome-dev | Requesting new tasks for CPU
18.02.2019 14:58:19 | lhcathome-dev | Scheduler request completed: got 0 new tasks
18.02.2019 14:58:19 | lhcathome-dev | No tasks sent
18.02.2019 14:58:19 | lhcathome-dev | No tasks are available for Theory Simulation
So no testing possible at the moment.
16) Message boards : Theory Application : New Native App - Linux Only (Message 5919)
Posted 16 Feb 2019 by gyllic
Post:
This sherpa was running OK (and fast)
Maybe the problem has already been mentioned and a measure implemented, but how are Sherpa jobs handled that run endlessly? Is there some control mechanism in place or if not, is this planned in future?
17) Message boards : Theory Application : Native Setup Linux (Message 5870)
Posted 12 Feb 2019 by gyllic
Post:
Debian Stretch:
Follow the Ubuntu guide and do the following to enable user namespaces:
Unfortunately, it looks like that this fix is only temporarily, since after rebooting the system the value in that file is again 0 and the unshare command does not work anymore.
To fix this for good, I adapted the sysctl.conf file in order to make the value of "/proc/sys/kernel/unprivileged_userns_clone" to be persisant 1.
sudo echo "kernel.unprivileged_userns_clone = 1" >>  /etc/sysctl.conf
sudo sysctl -p
Sry to be fussy, but the shown command "sudo echo...." does not work for debian. Using e.g. sed will work.
So I would suggest something like this:

Debian Stretch:
Follow the Ubuntu guide and do the following to enable user namespaces:
For enabling user namespaces for every user permanently, type
sudo sed -i '$ a\kernel.unprivileged_userns_clone = 1' /etc/sysctl.conf
sudo sysctl -p
If you want to enable user namespaces only temporarily, type
sudo su -c 'echo 1 > /proc/sys/kernel/unprivileged_userns_clone'
You will have to execute this command after every system reboot in order to be able to crunch native Theory tasks.

@Laurence (or any admin/moderator): You can delete this command if you want in order to keep this thread clean.
18) Message boards : Theory Application : New Native App - Linux Only (Message 5858)
Posted 11 Feb 2019 by gyllic
Post:
sudo su -c 'echo 1 > /proc/sys/kernel/unprivileged_userns_clone'

This did the trick and the application ran successfully on Debian Stretch:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752281

Great! I will add that to the guide.
Unfortunately, it looks like that this fix is only temporarily, since after rebooting the system the value in that file is again 0 and the unshare command does not work anymore.
To fix this for good, I adapted the sysctl.conf file in order to make the value of "/proc/sys/kernel/unprivileged_userns_clone" to be persisant 1.

Open the file with
sudo nano /etc/sysctl.conf
and add the following to the end of that file
kernel.unprivileged_userns_clone = 1
Then apply the changes with
sudo sysctl -p

Using this approach made the value of " /proc/sys/kernel/unprivileged_userns_clone" stay at 1 after reboots and the unshare command also works "out of the box" after rebooting.
19) Message boards : Theory Application : New Native App - Linux Only (Message 5852)
Posted 11 Feb 2019 by gyllic
Post:
Let me know if you figure it out otherwise I will investigate after looking into Opensuse.
I have found a solution. Taken from the Debian mail logs:
"It turns out that the Debian kernel is set up to disable unprivileged users from unsharing the user namespace by default. This can be worked around using":

sudo su -c 'echo 1 > /proc/sys/kernel/unprivileged_userns_clone'

This did the trick and the application ran successfully on Debian Stretch:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752281
20) Message boards : Theory Application : New Native App - Linux Only (Message 5843)
Posted 11 Feb 2019 by gyllic
Post:
Try this command for testing:
unshare -U /bin/bash
The output is
unshare: unshare failed: Operation not permitted

running it with sudo gives a new line in the terminal with
nobody@debian:
with "nobody" not being my normal user


Next 20


©2024 CERN