Message boards : Theory Application : New Native App - Linux Only
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 5847 - Posted: 11 Feb 2019, 16:28:11 UTC - in response to Message 5843.  
Last modified: 11 Feb 2019, 17:23:03 UTC

Try this command for testing:
unshare -U /bin/bash
The output is
unshare: unshare failed: Operation not permitted

running it with sudo gives a new line in the terminal with
nobody@debian:
with "nobody" not being my normal user


This is related to rootless containers. It is a newish feature so may not work with older kernel versions. Let me know if you figure it out otherwise I will investigate after looking into Opensuse.
ID: 5847 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 5848 - Posted: 11 Feb 2019, 18:06:30 UTC - in response to Message 5846.  

I guess we are running real scientific simulations, e.g. Herwig++, rather than dummy subtasks, right?
Hence the different runtimes of the tasks.


Yes. I am copying jobs from the production Theory queue. Will look at sending the results back to MCPlots later this week.
ID: 5848 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 0
Message 5849 - Posted: 11 Feb 2019, 18:11:12 UTC - in response to Message 5845.  

m wrote:
.... trying to get an app_config to get this host to only download one task at a time, but I can't even get that right now..... says "missing start tag"

An app_config will not control the number of tasks that get downloaded, it controls the number of tasks that run concurrently. If you want to limit the number of tasks that get downloaded, use the "Max # Jobs" setting in your project preferences.

If you still want to use an app_config for another purpose, post it here and maybe someone else here can help debug.


It's OK, thanks. Just more fingers than keys, more keys than brain cells, or so it seems. Would very much like to see if the python 3 problem is fixed but no work now.
ID: 5849 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 5850 - Posted: 11 Feb 2019, 18:19:16 UTC - in response to Message 5849.  


It's OK, thanks. Just more fingers than keys, more keys than brain cells, or so it seems. Would very much like to see if the python 3 problem is fixed but no work now.


Just submitted a few more.
ID: 5850 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 0
Message 5851 - Posted: 11 Feb 2019, 18:57:05 UTC
Last modified: 11 Feb 2019, 19:05:11 UTC

OK, thanks, got one with this result.

nsenter: failed to unshare namespaces: Operation not permitted
container_linux.go:336: starting container process caused "process_linux.go:279: running exec setns process for init caused \"exit status 46\""
cranky-0.0.14 ERROR: Container 'runc' failed.

Test Commands
[m@TeeC15 ~]$ unshare -U /bin/bash
[nfsnobody@TeeC15 ~]$

and
[m@TeeC15 ~]$ sudo unshare -U /bin/bash
[nfsnobody@TeeC15 m]$

and
[m@TeeC15 ~]$ grep CONFIG_USER_NS /boot/config-$(uname -r)
CONFIG_USER_NS=y
[m@TeeC15 ~]$ CONFIG_USER_NS=y
ID: 5851 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gyllic

Send message
Joined: 10 Mar 17
Posts: 40
Credit: 108,345
RAC: 0
Message 5852 - Posted: 11 Feb 2019, 19:03:06 UTC - in response to Message 5847.  

Let me know if you figure it out otherwise I will investigate after looking into Opensuse.
I have found a solution. Taken from the Debian mail logs:
"It turns out that the Debian kernel is set up to disable unprivileged users from unsharing the user namespace by default. This can be worked around using":

sudo su -c 'echo 1 > /proc/sys/kernel/unprivileged_userns_clone'

This did the trick and the application ran successfully on Debian Stretch:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752281
ID: 5852 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 5853 - Posted: 11 Feb 2019, 19:15:50 UTC - in response to Message 5851.  

OK, thanks, got one with this result.

nsenter: failed to unshare namespaces: Operation not permitted
container_linux.go:336: starting container process caused "process_linux.go:279: running exec setns process for init caused \"exit status 46\""
cranky-0.0.14 ERROR: Container 'runc' failed.
I
Test Commands
[m@TeeC15 ~]$ unshare -U /bin/bash
[nfsnobody@TeeC15 ~]$

and
[m@TeeC15 ~]$ sudo unshare -U /bin/bash
[nfsnobody@TeeC15 m]$

and
[m@TeeC15 ~]$ grep CONFIG_USER_NS /boot/config-$(uname -r)
CONFIG_USER_NS=y
[m@TeeC15 ~]$ CONFIG_USER_NS=y


User m may be privileged. Try with the boinc user.
ID: 5853 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 5854 - Posted: 11 Feb 2019, 19:26:19 UTC - in response to Message 5852.  

Let me know if you figure it out otherwise I will investigate after looking into Opensuse.
I have found a solution. Taken from the Debian mail logs:
"It turns out that the Debian kernel is set up to disable unprivileged users from unsharing the user namespace by default. This can be worked around using":

sudo su -c 'echo 1 > /proc/sys/kernel/unprivileged_userns_clone'

This did the trick and the application ran successfully on Debian Stretch:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752281


Great! I will add that to the guide.
ID: 5854 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 0
Message 5855 - Posted: 11 Feb 2019, 20:02:52 UTC - in response to Message 5852.  

Let me know if you figure it out otherwise I will investigate after looking into Opensuse.
I have found a solution. Taken from the Debian mail logs:
"It turns out that the Debian kernel is set up to disable unprivileged users from unsharing the user namespace by default. This can be worked around using":

sudo su -c 'echo 1 > /proc/sys/kernel/unprivileged_userns_clone'

This did the trick and the application ran successfully on Debian Stretch:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752281


There is a similar situation in Centos7 (7.3 in my case) here:-

https://groups.io/g/charliecloud/topic/charliecloud_and_centos7/13269608?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,13269608

I haven't tried it yet.
ID: 5855 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 5857 - Posted: 11 Feb 2019, 20:57:36 UTC

I created a new VM with OS Ubuntu 18.10 on my Windows 7 host and installed BOINC from the packages into that VM.
Using Laurence's new setup info, I got it to work with correcting some things in the setup text:

sudo wget https://lhcathomedev.cern.ch/lhcathome-dev/download/default.localcvmfs.autofs -O /etc/auto.master.d/cvmfs.autof

should be

sudo wget https://lhcathomedev.cern.ch/lhcathome-dev/download/cvmfs.autofs -O /etc/auto.master.d/cvmfs.autof

and had to reboot before "autofs restart"

Tasks ready so far from this newly host: https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=3717
ID: 5857 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gyllic

Send message
Joined: 10 Mar 17
Posts: 40
Credit: 108,345
RAC: 0
Message 5858 - Posted: 11 Feb 2019, 21:24:07 UTC - in response to Message 5854.  
Last modified: 11 Feb 2019, 21:32:12 UTC

sudo su -c 'echo 1 > /proc/sys/kernel/unprivileged_userns_clone'

This did the trick and the application ran successfully on Debian Stretch:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752281

Great! I will add that to the guide.
Unfortunately, it looks like that this fix is only temporarily, since after rebooting the system the value in that file is again 0 and the unshare command does not work anymore.
To fix this for good, I adapted the sysctl.conf file in order to make the value of "/proc/sys/kernel/unprivileged_userns_clone" to be persisant 1.

Open the file with
sudo nano /etc/sysctl.conf
and add the following to the end of that file
kernel.unprivileged_userns_clone = 1
Then apply the changes with
sudo sysctl -p

Using this approach made the value of " /proc/sys/kernel/unprivileged_userns_clone" stay at 1 after reboots and the unshare command also works "out of the box" after rebooting.
ID: 5858 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 886,442
RAC: 0
Message 5859 - Posted: 11 Feb 2019, 23:29:47 UTC - in response to Message 5855.  


There is a similar situation in Centos7 (7.3 in my case) here:-

https://groups.io/g/charliecloud/topic/charliecloud_and_centos7/13269608?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,13269608

I haven't tried it yet.


Well, I now have

$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-514.10.2.el7.x86_64 root=/dev/mapper/cl_teec15-root ro rd.lvm.lv=cl_teec15/root rd.lvm.lv=cl_teec15/swap rhgb quiet LANG=en_GB.UTF-8 user_namespace.enable=1 namespace unpriv_enable=1

and

$ cat /etc/sysctl.d/51-userns.conf
user.max_user_namespaces = 32767

but unshare still doesn't work.

$ unshare -U /bin/bash
[nfsnobody@TeeC15 ~]$


Short of updating the kernel, I don't know what else to do.
ID: 5859 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 484
Credit: 394,839
RAC: 1
Message 5860 - Posted: 12 Feb 2019, 6:38:38 UTC - in response to Message 5859.  

... $ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-514.10.2.el7.x86_64 root=/dev/mapper/cl_teec15-root ro rd.lvm.lv=cl_teec15/root rd.lvm.lv=cl_teec15/swap rhgb quiet LANG=en_GB.UTF-8 user_namespace.enable=1 namespace unpriv_enable=1

There may be a missing ".".
You may try "namespace.unpriv_enable=1" instead of "namespace unpriv_enable=1".
ID: 5860 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 5861 - Posted: 12 Feb 2019, 8:35:07 UTC - in response to Message 5859.  
Last modified: 12 Feb 2019, 8:35:47 UTC


There is a similar situation in Centos7 (7.3 in my case) here:-

Short of updating the kernel, I don't know what else to do.


I think this is what you need to do. From the 7.4 release notes:
User namespace is now fully supported
The user namespace feature, previously available as a Technology Preview, is now fully supported. It provides additional security to servers running Linux containers by providing better isolation between the host and the containers. Administrators of a container are no longer able to perform administrative operations on the host, which increases security. (BZ#1138782)


Upgrade to 7.6 if you can.
ID: 5861 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 5863 - Posted: 12 Feb 2019, 9:08:00 UTC - in response to Message 5857.  

I created a new VM with OS Ubuntu 18.10 on my Windows 7 host and installed BOINC from the packages into that VM.
Using Laurence's new setup info, I got it to work with correcting some things in the setup text:

sudo wget https://lhcathomedev.cern.ch/lhcathome-dev/download/default.localcvmfs.autofs -O /etc/auto.master.d/cvmfs.autof

should be

sudo wget https://lhcathomedev.cern.ch/lhcathome-dev/download/cvmfs.autofs -O /etc/auto.master.d/cvmfs.autof

and had to reboot before "autofs restart"

Tasks ready so far from this newly host: https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=3717


Thanks for the feedback. I have updated the guide.
ID: 5863 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 484
Credit: 394,839
RAC: 1
Message 5868 - Posted: 12 Feb 2019, 11:50:35 UTC

I'm currently running a Theory native task on opensuse 13.1:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752313

At first I got some errors as runc requested a newer version of libseccomp.so.2.
I copied that lib from opensuse 42.3 and adjusted the library path.

Now it works.
ID: 5868 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 5869 - Posted: 12 Feb 2019, 12:01:53 UTC - in response to Message 5846.  
Last modified: 12 Feb 2019, 12:04:04 UTC

You can find the job log in a subdirectory of the slot directory. Doing a hard link should make it stay available after the job ends. If you let me know what info you would like to see, I can make it available.

I found a method (not a Linux expert) to monitor the job output for a running job.

First line to see what kind of job is running including number of events to process (second last number):

head -n 1 /var/lib/boinc-client/slots/0/cernvm/shared/runRivet.log

and for a continued look at the last n lines:

watch -n 10 tail -n 15 /var/lib/boinc-client/slots/0/cernvm/shared/runRivet.log

where you have/can adjust the slotnumber, number of lines and/or number of seconds watch interval.
ID: 5869 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
captainjack

Send message
Joined: 18 Aug 15
Posts: 14
Credit: 125,335
RAC: 0
Message 5871 - Posted: 12 Feb 2019, 16:26:55 UTC

Earlier I wrote:
I managed to snag another one of these tasks this morning. It has been running for 58 minutes. Even though it is allocated 2 CPUs, it looks like it is only using 1 CPU.

Please let me know if you need more information.


Laurence responded:

This may be related to it starting two processes.


The latest attempt:

The machine has a 6 core 12 thread CPU. Each thread should show ~8% of total CPU usage according to the System Monitor.

The machine now has a test Theory task running that is allocated 4 CPUs (threads). According to the System Monitor, the processes that I can see that look like they are associated with the test Theory task are rivetvm.exe using 5% of the total CPU and pythia8.exe using 4% of the total CPU. The task has been running for 25+ minutes. Even though it is allocated 4 threads, it looks like it is really only using 1 thread.

Am I missing something?
ID: 5871 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 5872 - Posted: 12 Feb 2019, 17:32:32 UTC - in response to Message 5871.  
Last modified: 12 Feb 2019, 17:39:30 UTC

Am I missing something?
As far as I understand, setting in your preferences # cores is used for running more jobs within a VM (each VM-core 1 job).
Since we are testing Theory native, every BOINC-task will run only 1 job.
If you want to run more jobs select in your preferences the number of tasks you want or set No Limit.

If you have set e.g. 2 cores, BOINC will allocate 2 cores for 1 job and this single job will run a bit faster, but leaving a core partial idle.
ID: 5872 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 5873 - Posted: 12 Feb 2019, 21:49:57 UTC - in response to Message 5838.  

The python command does not seem to be consistent between different operating systems. I am relaying on just python being there now but this is not true for Ubuntu where the default seems to be python3.


I think that I might rewrite cranky in bash to be more portable.
ID: 5873 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next

Message boards : Theory Application : New Native App - Linux Only


©2024 CERN