Info | Message |
---|---|
1) Message boards : ATLAS Application : Native app using Singularity from CVMFS
Message 6543 Posted 12 Aug 2019 by gyllic |
Can you run the command manually and post the output?This command gives no output. My first thought was that user namespaces are not working properly. However, in the kernel config file it says "CONFIG_USER_NS=y", within the /etc/sysctl.conf file "kernel.unprivileged_userns_clone=1" is written, and the file /proc/sys/user/max_user_namespaces shows the value "19655". Are there other parameters that have to be adapted in order for namespaces to work? I am not sure, but I think that someone here or on the production site wrote that he had to compile a custom kernel for debian testing (which is now stable 10), but I can't find the post. But since the CONFIG_USER_NS is set to yes within the kernel config file there should be no need to compile a custom kernel, or are there other config options needed to be set to yes? |
2) Message boards : ATLAS Application : Native app using Singularity from CVMFS
Message 6535 Posted 11 Aug 2019 by gyllic |
I have set up a new VM with debian 10, installed boinc and compiled cvmfs. cvmfs_config probe shows all 'ok'. For enabling user namespace the following commands have been used: sudo sed -i '$ a\kernel.unprivileged_userns_clone = 1' /etc/sysctl.conf sudo sysctl -p Tested a 0.70 nativ task, but it failed due to singularity not working: THREADS=1 Checking for CVMFS CVMFS is installed OS:cat: /etc/redhat-release: No such file or directory This is not SLC6, need to run with Singularity.... Checking Singularity... Singularity is not installed, using version from CVMFS Testing the function of Singularity... Checking singularity with cmd:/cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 hostname Singularity isnt working: running start_atlas return value is 3Any ideas why it is failing? |
3) Message boards : ATLAS Application : Native app using Singularity from CVMFS
Message 6528 Posted 10 Aug 2019 by gyllic |
What if singularity is already installed. One task tested and that failed: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2798232Are you sure that your local singularity version is working? BTW, your singularity version is quite old. Tested the new version with singularity installed and everything is working fine. The log shows: Checking for CVMFS CVMFS is installed OS:cat: /etc/redhat-release: Datei oder Verzeichnis nicht gefunden This is not SLC6, need to run with Singularity.... Checking Singularity... Singularity is installed, version singularity version 3.3.0-614.gf0cd4b488 Testing the function of Singularity... Checking singularity with cmd:singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 hostname Singularity Works... copy /home/boinc/boinc1/slots/0/shared/ATLAS.root_0 copy /home/boinc/boinc1/slots/0/shared/RTE.tar.gz copy /home/boinc/boinc1/slots/0/shared/input.tar.gz copy /home/boinc/boinc1/slots/0/shared/start_atlas.sh export ATHENA_PROC_NUMBER=2;start atlas job with grep: pandaJobData.out: Datei oder Verzeichnis nicht gefunden cmd = singularity exec --pwd /home/boinc/boinc1/slots/0 -B /cvmfs,/home /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 sh start_atlas.sh > runtime_log 2> runtime_log.err |
4) Message boards : ATLAS Application : Native app using Singularity from CVMFS
Message 6523 Posted 9 Aug 2019 by gyllic |
The task is currently running, and there should be no reason why it should fail.It finished successfully: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2797696 |
5) Message boards : ATLAS Application : Native app using Singularity from CVMFS
Message 6521 Posted 8 Aug 2019 by gyllic |
It is looking good! Tested it with a system with no singularity installed. The logs show: ... Checking for CVMFS CVMFS is installed OS:cat: /etc/redhat-release: Datei oder Verzeichnis nicht gefunden This is not SLC6, need to run with Singularity.... Checking Singularity... Singularity is not installed, using version from CVMFS Testing the function of Singularity... Checking singularity with cmd:/cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-slc6 hostname Singularity Works... ... The task is currently running, and there should be no reason why it should fail. N.B.: The command /cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity versionshows 3.2.1-1. The most current version of singularity is 3.3.0. |
6) Message boards : ATLAS Application : Tasks testing new pilot version
Message 6384 Posted 21 May 2019 by gyllic |
I've made a change to avoid these connections which will apply to new WU submitted from now. If the problem is confirmed to be fixed I'll apply the changes on the production WU too.Tested one task (https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2778990) and it is looking good. The logs show no connections to ports that are not mentioned in http://lhcathome.web.cern.ch/test4theory/my-firewall-complaining-which-ports-does-project-use |
7) Message boards : ATLAS Application : Tasks testing new pilot version
Message 6381 Posted 20 May 2019 by gyllic |
I already mentioned a firewall issue a while ago at LHC-prod regarding pandaserver.cern.ch, port 25085:I also see connections to this port, especially at task start up. The logs (only logged for one task) show connections to aipanda034.cern.ch on port 25085 |
8) Message boards : ATLAS Application : Tasks testing new pilot version
Message 6380 Posted 20 May 2019 by gyllic |
this tasks ran und validated successfully: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2778502 looking good |
9) Message boards : ATLAS Application : Tasks testing new pilot version
Message 6374 Posted 19 May 2019 by gyllic |
same here https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2778227 |
10) Message boards : Theory Application : New Native App - Linux Only
Message 6098 Posted 26 Feb 2019 by gyllic |
had one of those to two days ago: 195 (0x000000C3) EXIT_CHILD_FAILED https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2754240 Dont think that this has something to do with the problem, but anyway: I did a router restart while the task was running, so temporary there was no internet access available for the task. |
11) Message boards : Theory Application : New Native App - Linux Only
Message 6076 Posted 24 Feb 2019 by gyllic |
There are tasks that run for a couple of hundred seconds and some for a couple of hours. Its probably the normal fluctuations (different job types, different codes, different events types, ...). Using the old vbox app there have also been jobs that run for a couple of hours. Would be weird if the native jobs run slower compared to running within a VM. |
12) Message boards : Theory Application : Windows Version
Message 6053 Posted 22 Feb 2019 by gyllic |
Considering how the new vbox app works, is it still worth using a local proxy server anymore? If yes, are there plans to implement that? |
13) Message boards : Theory Application : New Native App - Linux Only
Message 5946 Posted 19 Feb 2019 by gyllic |
So far 13/13 with version 4.14 worked fine. Also the host is shown as active in MCPlots (showing 9 jobs at the moment with 0% failure). Regarding the suspend feature, I have experienced the same behaviour as other users have already mentioned. |
14) Message boards : Theory Application : New Native App - Linux Only
Message 5935 Posted 18 Feb 2019 by gyllic |
With the new version 4.13 (cranky-0.0.20) I get only errors.Same here. 4 out of 4 reported the same error: 16:53:04 2019-02-18: cranky-0.0.20: [INFO] Preparing output. tar: local.txt: Cannot stat: No such file or directory tar: Exiting with failure status due to previous errors 17:53:05 (14091): cranky exited; CPU time 3423.660000 17:53:05 (14091): app exit status: 0x2 17:53:05 (14091): called boinc_finish(195)e.g. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752874 |
15) Message boards : Theory Application : New Native App - Linux Only
Message 5930 Posted 18 Feb 2019 by gyllic |
Looks like the queue is empty:18.02.2019 14:58:17 | lhcathome-dev | Requesting new tasks for CPU 18.02.2019 14:58:19 | lhcathome-dev | Scheduler request completed: got 0 new tasks 18.02.2019 14:58:19 | lhcathome-dev | No tasks sent 18.02.2019 14:58:19 | lhcathome-dev | No tasks are available for Theory SimulationSo no testing possible at the moment. |
16) Message boards : Theory Application : New Native App - Linux Only
Message 5919 Posted 16 Feb 2019 by gyllic |
This sherpa was running OK (and fast)Maybe the problem has already been mentioned and a measure implemented, but how are Sherpa jobs handled that run endlessly? Is there some control mechanism in place or if not, is this planned in future? |
17) Message boards : Theory Application : Native Setup Linux
Message 5870 Posted 12 Feb 2019 by gyllic |
Debian Stretch:Sry to be fussy, but the shown command "sudo echo...." does not work for debian. Using e.g. sed will work. So I would suggest something like this: Debian Stretch: Follow the Ubuntu guide and do the following to enable user namespaces: For enabling user namespaces for every user permanently, type sudo sed -i '$ a\kernel.unprivileged_userns_clone = 1' /etc/sysctl.conf sudo sysctl -pIf you want to enable user namespaces only temporarily, type sudo su -c 'echo 1 > /proc/sys/kernel/unprivileged_userns_clone'You will have to execute this command after every system reboot in order to be able to crunch native Theory tasks. @Laurence (or any admin/moderator): You can delete this command if you want in order to keep this thread clean. |
18) Message boards : Theory Application : New Native App - Linux Only
Message 5858 Posted 11 Feb 2019 by gyllic |
Unfortunately, it looks like that this fix is only temporarily, since after rebooting the system the value in that file is again 0 and the unshare command does not work anymore.sudo su -c 'echo 1 > /proc/sys/kernel/unprivileged_userns_clone' To fix this for good, I adapted the sysctl.conf file in order to make the value of "/proc/sys/kernel/unprivileged_userns_clone" to be persisant 1. Open the file with sudo nano /etc/sysctl.confand add the following to the end of that file kernel.unprivileged_userns_clone = 1Then apply the changes with sudo sysctl -p Using this approach made the value of " /proc/sys/kernel/unprivileged_userns_clone" stay at 1 after reboots and the unshare command also works "out of the box" after rebooting. |
19) Message boards : Theory Application : New Native App - Linux Only
Message 5852 Posted 11 Feb 2019 by gyllic |
Let me know if you figure it out otherwise I will investigate after looking into Opensuse.I have found a solution. Taken from the Debian mail logs: "It turns out that the Debian kernel is set up to disable unprivileged users from unsharing the user namespace by default. This can be worked around using": sudo su -c 'echo 1 > /proc/sys/kernel/unprivileged_userns_clone' This did the trick and the application ran successfully on Debian Stretch: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752281 |
20) Message boards : Theory Application : New Native App - Linux Only
Message 5843 Posted 11 Feb 2019 by gyllic |
Try this command for testing:The output is unshare: unshare failed: Operation not permitted running it with sudo gives a new line in the terminal with nobody@debian:with "nobody" not being my normal user |
©2025 CERN