Message boards : Theory Application : Suspend/Resume
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
rilian
Avatar

Send message
Joined: 28 Mar 24
Posts: 7
Credit: 12,604
RAC: 62
Message 8393 - Posted: 7 Apr 2024, 2:13:39 UTC - in response to Message 8391.  
Last modified: 7 Apr 2024, 2:13:49 UTC

Please make your computers visible for other volunteers here:
https://lhcathomedev.cern.ch/lhcathome-dev/prefs.php?subset=project

i made
eventually after the other multi-core project tasks were calculated, this one was finished with error

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3316482
ID: 8393 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 478
Credit: 394,720
RAC: 174
Message 8394 - Posted: 7 Apr 2024, 6:24:50 UTC - in response to Message 8393.  

Your Ubuntu box does not support virtualization, hence you run Theory native.
Theory native doesn't support suspend/resume out of the box.
There are 2 possible methods to enable it:

1.
Use the traditional method together with cgroups v1.
This must be prepared as described in the prod forum.
But this method is deprecated as most Linux distributions now use cgroups v2 as default.

2.
Use cgroups v2 (plus sudo >= v1.9.10)
Your sudo version is 1.9.10, but you did not install the required sudoers file '/etc/sudoers.d/50-lhcathome_boinc_theory_native'.
Check the relevant forum posts and your logfiles.
For method (2.) settings from method (1.) must not be used. Disable/uninstall them.


It looks like you are currently running a mix of both methods but none of them is correctly set up.
As a result the tasks ignore the pause/resume signals and are still running even if BOINC shows them as suspended.
ID: 8394 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rilian
Avatar

Send message
Joined: 28 Mar 24
Posts: 7
Credit: 12,604
RAC: 62
Message 8396 - Posted: 8 Apr 2024, 1:13:32 UTC - in response to Message 8394.  
Last modified: 8 Apr 2024, 1:14:17 UTC

@computezrmle i installed 50-lhcathome_boinc_theory_native as instructed in other thread https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=633&postid=8162#8162

$ sudo cat /etc/sudoers.d/50-lhcathome_boinc_theory_native
# save this file as '/etc/sudoers.d/50-lhcathome_boinc_theory_native'
# ownership must be 'root:root' and access rights must be '-r--r-----'
# '@includedir /etc/sudoers.d' must be enabled in /etc/sudoers

# regular expressions are enclosed between '^' and '$'
# this is supported since sudo version 1.9.10
# for more information read 'man sudoers'

# the regex patterns given here must match the command arguments in the calling script
# missing/additional arguments or an argument order not in sync causes a command to be rejected

# the commands are permitted for the local group 'boinc'
# ensure the calling user is a member of that group


Cmnd_Alias LHCATHOMEBOINC_01 = /usr/bin/cat ^/etc/sudoers.d/50-lhcathome_boinc_theory_native$
Cmnd_Alias LHCATHOMEBOINC_02 = /usr/bin/systemctl ^(freeze|thaw) Theory_[-a-zA-Z0-9_]+\.scope$
Cmnd_Alias LHCATHOMEBOINC_03 = /usr/bin/systemd-run ^--scope -u [a-zA-Z0-9_-]+ -p BindsTo=[a-zA-Z0-9_\.@-]+ -p After=[a-zA-Z0-9_\.@-]+ --slice-inherit --uid=[a-zA-Z0-9_-]+ --gid=boinc --same-dir -q -G /[a-zA-Z0-9_\./-]+/(runc|runc\.new|runc\.old) --root state run -b cernvm [a-zA-Z0-9_-]+$

%boinc     ALL = (ALL) NOPASSWD: LHCATHOMEBOINC_01, LHCATHOMEBOINC_02, LHCATHOMEBOINC_03


Now all my tasks fail due to error like here https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3316642



01:09:09 UTC +00:00 2024-04-08: cranky-0.1.4: [INFO] Starting runc container.
01:09:09 UTC +00:00 2024-04-08: cranky-0.1.4: [INFO] To get some details on systemd level run
01:09:09 UTC +00:00 2024-04-08: cranky-0.1.4: [INFO] systemctl status Theory_2743-2787161-48_0.scope
01:09:09 UTC +00:00 2024-04-08: cranky-0.1.4: [INFO] mcplots runspec: boinc pp jets 7000 80,-,1360 - pythia8 8.301 tune-1 100000 48
01:09:09 UTC +00:00 2024-04-08: cranky-0.1.4: [INFO] ----,^^^^,<<<~_____---,^^^,<<~____--,^^,<~__;_
time="2024-04-08T01:09:09Z" level=error msg="runc run failed: fchown fd 7: operation not permitted"

ID: 8396 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 478
Credit: 394,720
RAC: 174
Message 8397 - Posted: 8 Apr 2024, 7:30:21 UTC - in response to Message 8396.  

time="2024-04-08T01:09:09Z" level=error msg="runc run failed: fchown fd 7: operation not permitted"

This is primarily an error reported by runc.

Please post the output of the command below to ensure it is not caused as a side effect of a too strict systemd setting.
systemctl --no-pager show boinc-client |grep -i protect



In addition:
- You use BOINC version 7.18.1 which was never intended for Linux.
- You upgraded sudo (Ubuntu 22.04 originally came with sudo < 1.9.10)
- Your CVMFS configuration does not follow the latest suggestions (see your logs)
ID: 8397 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rilian
Avatar

Send message
Joined: 28 Mar 24
Posts: 7
Credit: 12,604
RAC: 62
Message 8398 - Posted: 8 Apr 2024, 13:42:55 UTC - in response to Message 8397.  
Last modified: 8 Apr 2024, 13:50:05 UTC

$ systemctl --no-pager show boinc-client |grep -i protect
ProtectClock=no
ProtectKernelTunables=no
ProtectKernelModules=no
ProtectKernelLogs=no
ProtectControlGroups=yes
ProtectHome=yes
ProtectSystem=strict
ProtectProc=default
ProtectHostname=no


i have latest boinc provided by Ubuntu, i do not know if it is intended for linux or not
# apt-get install boinc-client
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
boinc-client is already the newest version (7.18.1+dfsg-4).


yes Ubuntu 22.04 has sudo 1.9.9 so i had to compile sudo from sources. Later version was available but i installed 1.9.10 as requested by LHC app logs


cranky-0.1.4: [INFO] Can't find '/etc/cvmfs/domain.d/cern.ch.local'.
cranky-0.1.4: [INFO] Can't find '/etc/cvmfs/config.d/cvmfs-config.cern.ch.local'.

i think you refer to this when say "Your CVMFS configuration does not follow the latest suggestions (see your logs)"

i could not find documentation what should be done with these 2 files

thanks for the help
ID: 8398 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 478
Credit: 394,720
RAC: 174
Message 8399 - Posted: 8 Apr 2024, 14:22:20 UTC - in response to Message 8398.  

The following comments are from the original BOINC service file on github:
# The following options prevent setuid root as they imply NoNewPrivileges=true
# Since Atlas requires setuid root, they break Atlas
# In order to improve security, if you're not using Atlas,
# Add these options to the [Service] section of an override file using
# sudo systemctl edit boinc-client.service
#NoNewPrivileges=true
#ProtectKernelModules=true
#ProtectKernelTunables=true
#RestrictRealtime=true
#RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
#RestrictNamespaces=true
#PrivateUsers=true
#CapabilityBoundingSet=
#MemoryDenyWriteExecute=true
#PrivateTmp=true  #Block X11 idle detection


On your system the security policy is too strict and inhibits communication among relevant services required for ATLAS/Theory native.
ProtectControlGroups=yes
ProtectHome=yes
ProtectSystem=strict


Follow the comments above and create an override file with this content:
[Service]
ProtectControlGroups=no
ProtectHome=no
ProtectSystem=full

Then reboot and try a fresh Theory native task.
Let's see if this solves it.


As for CVMFS you may look at this post:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5594&postid=48539
ID: 8399 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rilian
Avatar

Send message
Joined: 28 Mar 24
Posts: 7
Credit: 12,604
RAC: 62
Message 8400 - Posted: 8 Apr 2024, 14:55:22 UTC - in response to Message 8399.  
Last modified: 8 Apr 2024, 15:00:27 UTC

I updated CVMFS, no more warnings

I did sudo systemctl edit boinc-client.service
### Anything between here and the comment below will become the new contents of the file

[Service]
ProtectHome=no
ProtectSystem=full
ProtectControlGroups=no

### Lines below this comment will be discarded


and rebooted

$ systemctl --no-pager show boinc-client |grep -i protect
ProtectClock=no
ProtectKernelTunables=no
ProtectKernelModules=no
ProtectKernelLogs=no
ProtectControlGroups=no
ProtectHome=no
ProtectSystem=full
ProtectProc=default
ProtectHostname=no


, but again same issue, here is the fresh task https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3316927

runc run failed: fchown fd 7: operation not permitted

* fchown() changes the ownership of the file referred to by the open file descriptor fd.



should i add any of
#NoNewPrivileges=true
#ProtectKernelModules=true
#ProtectKernelTunables=true
#RestrictRealtime=true
#RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
#RestrictNamespaces=true
#PrivateUsers=true
#CapabilityBoundingSet=
#MemoryDenyWriteExecute=true
#PrivateTmp=true  #Block X11 idle detection

?
ID: 8400 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rilian
Avatar

Send message
Joined: 28 Mar 24
Posts: 7
Credit: 12,604
RAC: 62
Message 8401 - Posted: 8 Apr 2024, 15:13:44 UTC - in response to Message 8400.  

there is a log line
- the user running this application is a member of the 'boinc' group


i noticed boinc does not auto start after reboot and i always start it as `boinc --daemon` which means it is started as my user

maybe that's the issue ?

i will find a way to auto start it
ID: 8401 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rilian
Avatar

Send message
Joined: 28 Mar 24
Posts: 7
Credit: 12,604
RAC: 62
Message 8402 - Posted: 8 Apr 2024, 16:19:16 UTC - in response to Message 8401.  

Probably this is the source of issue

Starting BOINC client version 7.18.1 for x86_64-pc-linux-gnu
This a development version of BOINC and may not function properly


as i could not figure out proper way to auto start it

I'll try to use other OS and will write later if i see any new issue
ID: 8402 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Theory Application : Suspend/Resume


©2024 CERN