Message boards :
Theory Application :
New Native App - Linux Only
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next
Author | Message |
---|---|
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
I've no idea what happened here. For whatever reason the python3 command doesn't exist after installing Python 3.6 it is python36. I have managed to get it working on CentOS7 with some fiddling but will try to improve the setup so this is not necessary. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
From this:- The issue is that although username spaces is enabled, you can check with the following: grep CONFIG_USER_NS /boot/config-$(uname -r) CONFIG_USER_NS=y By default the maximum number of namespaces is set to 0. To fix this run: echo 640 > /proc/sys/user/max_user_namespaces |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
I have cloned and built the lastest runc code (from https://github.com/opencontainers/runc) on Debian Stretch, and the task produces this error message: Application side, I will fix this. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
cranky-0.0.13 ERROR: 'runc spec version < 1.1 Am testing with opensuse now. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
[Application side, I will fix this. ok, I have removed the need for python3, and the runc command is now taken from CVMFS. The guide for CentOS7 is here. Please give it a try and let me know how it goes. Cheers, Laurence |
Send message Joined: 18 Aug 15 Posts: 14 Credit: 125,335 RAC: 0 |
I managed to snag another one of these tasks this morning. It has been running for 58 minutes. Even though it is allocated 2 CPUs, it looks like it is only using 1 CPU. Please let me know if you need more information. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
[Application side, I will fix this. Before changing SL69 to SL610 or SL76. This is without Python3 and runc error with SL69. <core_client_version>7.5.1</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61) </message> <stderr_txt> 13:19:19 (27657): wrapper (7.7.26015): starting 13:19:19 (27657): wrapper: running ../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.14 () cranky-0.0.14 INFO: Starting Traceback (most recent call last): File "../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.14", line 157, in <module> logging.info("Detected {} App".format(app)) ValueError: zero length field name in format 13:19:20 (27657): cranky exited; CPU time 0.047992 13:19:20 (27657): app exit status: 0x1 13:19:20 (27657): called boinc_finish(195) </stderr_txt> ]]> https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752250 |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 0 |
By default the maximum number of namespaces is set to 0. To fix this run: I think that this only applies to kernel versions 644 and up, mine is 514 so should be OK. I had tried this:- sudo sysctl user.max_user_namespaces=15000 and got .".. /proc/sys/user/max_user_namespaces "no such file or directory" so .assumed it didn't apply - yet. . From here:- For background information, see this bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1340238 runc here is version 1.0.0. If the setup is to be done from CVMFS do I need to remove the installed version? I don't see why I can't get "python3" to go to "python3.6" but if it's to be fixed that's great --- lots to learn. The host is running an Atlas native at the moment so I'll need to wait for it to finish... the job starts again from the beginning if I even think about suspending it... (that's something else that needs sorting out...) |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
I managed to snag another one of these tasks this morning. It has been running for 58 minutes. Even though it is allocated 2 CPUs, it looks like it is only using 1 CPU. This may be related to it starting two processes. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
1st task finished successfully on my opensuse 42.3 system: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752265 Environment info: python --version Python 2.7.13 python3 --version Python 3.4.6 runc --version runc version spec: 1.0.0-rc2-dev 2nd task is currently in progress. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Thanks for testing but it will not work on SL6 due to the lack of user namespaces. I would recommend using CentOS 7 rather than SL7. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Thanks, that is interesting information.
It should not matter if it is already on the system. The binary from CVMFS will be used.
The python command does not seem to be consistent between different operating systems. I am relaying on just python being there now but this is not true for Ubuntu where the default seems to be python3. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
1st task finished successfully on my opensuse 42.3 system: Great!
This we don't need anymore. I will add the OpenSuse guide here later once I have managed to run it myself from a freshly installed machine. |
Send message Joined: 10 Mar 17 Posts: 40 Credit: 108,345 RAC: 0 |
Now it shows the user namespace error also on Debian Stretch: cranky-0.0.14 INFO: Starting cranky-0.0.14 INFO: Detected Theory App cranky-0.0.14 INFO: Checking CVMFS. cranky-0.0.14 INFO: Checking runc. cranky-0.0.14 INFO: Creating the filesystem. cranky-0.0.14 INFO: Using /cvmfs/cernvm-prod.cern.ch/cvm3 cranky-0.0.14 INFO: Updating config.json. cranky-0.0.14 INFO: Running Container 'runc'. nsenter: failed to unshare user namespace: Operation not permitted container_linux.go:336: starting container process caused "process_linux.go:279: running exec setns process for init caused \"exit status 39\"" cranky-0.0.14 ERROR: Container 'runc' failed. 13:52:12 (2366): cranky exited; CPU time 0.212000 13:52:12 (2366): app exit status: 0xcehttps://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752261 The "/proc/sys/user/max_user_namespace" file shows "31300". Also, the flag in the .config file used for kernel building shows "CONFIG_USER_NS=y". Do you know any way how to fix this problem on debian stretch (kernel 4.9)? |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Now it shows the user namespace error also on Debian Stretch: Try this command for testing: unshare -U /bin/bash |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
I guess we are running real scientific simulations, e.g. Herwig++, rather than dummy subtasks, right? Hence the different runtimes of the tasks. Could the stderr.txt be made more verbose to show the scientific app? Would be helpful in case of errors. |
Send message Joined: 10 Mar 17 Posts: 40 Credit: 108,345 RAC: 0 |
Try this command for testing:The output is unshare: unshare failed: Operation not permitted running it with sudo gives a new line in the terminal with nobody@debian:with "nobody" not being my normal user |
Send message Joined: 20 Mar 15 Posts: 243 Credit: 886,442 RAC: 0 |
] On Centos7 python gets you python2.7 which the rest of the system needs. After simply installing python3.X you seem to need "python3.x" note the dot, "python3x" doesn't work, not for me, anyway. /usr/bin/env python Python 2.7.5 (default, Nov 6 2016, 00:28:07) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2 Type "help", "copyright", "credits" or "license" for more information. /usr/bin/env python3 /usr/bin/env: python3: No such file or directory /usr/bin/env python3.6 Python 3.6.7 (default, Dec 5 2018, 15:02:05) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux Type "help", "copyright", "credits" or "license" for more information. .... trying to get an app_config to get this host to only download one task at a time, but I can't even get that right now..... says "missing start tag", looks OK to me; aaaaarrrgggh. |
Send message Joined: 18 Aug 15 Posts: 14 Credit: 125,335 RAC: 0 |
m wrote: .... trying to get an app_config to get this host to only download one task at a time, but I can't even get that right now..... says "missing start tag" An app_config will not control the number of tasks that get downloaded, it controls the number of tasks that run concurrently. If you want to limit the number of tasks that get downloaded, use the "Max # Jobs" setting in your project preferences. If you still want to use an app_config for another purpose, post it here and maybe someone else here can help debug. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
I guess we are running real scientific simulations, e.g. Herwig++, rather than dummy subtasks, right? You can find the job log in a subdirectory of the slot directory. Doing a hard link should make it stay available after the job ends. If you let me know what info you would like to see, I can make it available. |
©2024 CERN