Message boards : Theory Application : New native version v5.60
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile *"onmouseover="alert(&...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 0
Message 8136 - Posted: 21 Aug 2023, 9:59:25 UTC

This new native version aims to support cgroups v2. It is currently WIP so feedback more than welcome.
ID: 8136 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 420
Message 8137 - Posted: 21 Aug 2023, 15:12:38 UTC
Last modified: 21 Aug 2023, 15:55:37 UTC

ID: 8137 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 482
Credit: 394,720
RAC: 0
Message 8138 - Posted: 21 Aug 2023, 16:26:09 UTC - in response to Message 8136.  

../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.33: line 159: [: d: unary operator expected

This is caused by a missing "-".

Replace cranky line 159:
if [ d /sys/fs/cgroup/freezer/boinc ]; then

with:
if [ -d /sys/fs/cgroup/freezer/boinc ]; then
ID: 8138 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 482
Credit: 394,720
RAC: 0
Message 8139 - Posted: 21 Aug 2023, 16:47:19 UTC

This link might be useful:
https://systemd.io/CGROUP_DELEGATION/

Beside other hints it states:
Avoid "/sys/fs/cgroup/unified/".

Check if "Delegate=" needs to be added to the boinc-client.service file.
ID: 8139 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 482
Credit: 394,720
RAC: 0
Message 8140 - Posted: 22 Aug 2023, 7:39:44 UTC

Just did some generic tests using systemd to control freeze/thaw.

Run a user program as service:
systemd-run --unit foobar_0815 --user sleep 30



freeze it:
systemctl --user freeze foobar_0815.service



get the status:
systemctl --user list-units |grep foobar
systemctl --user status foobar_0815.service



thaw it:
systemctl --user thaw foobar_0815.service



I would prefer using systemd within cranky since it is the generic process to control recent Linux systems.
There are a couple of options that need to be tested (see: man systemd-run) to ensure the started containers run as expected (e.g. within the right slice) and we get the accounting information back to BOINC.
Will do some more tests in the afternoon/evening.
ID: 8140 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile *"onmouseover="alert(&...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 0
Message 8141 - Posted: 22 Aug 2023, 8:12:28 UTC - in response to Message 8139.  

Thanks for the feedback. I have fixed the issue and made some changes so hopefully suspend/resume will now work.
ID: 8141 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 420
Message 8142 - Posted: 22 Aug 2023, 8:34:12 UTC - in response to Message 8141.  

Thanks for the feedback. I have fixed the issue and made some changes so hopefully suspend/resume will now work.

Thanks Laurence. :-))
ID: 8142 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile *"onmouseover="alert(&...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 0
Message 8143 - Posted: 22 Aug 2023, 8:51:09 UTC - in response to Message 8142.  

It is broken. Debugging.
ID: 8143 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 420
Message 8144 - Posted: 22 Aug 2023, 9:09:02 UTC - in response to Message 8143.  

https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=2326389
This Task running with Version 5.70 on COS9-VM.
ID: 8144 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile *"onmouseover="alert(&...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 0
Message 8145 - Posted: 22 Aug 2023, 9:19:37 UTC - in response to Message 8144.  

Looks like the issue was on my machine.
ID: 8145 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 420
Message 8146 - Posted: 23 Aug 2023, 5:25:25 UTC - in response to Message 8145.  
Last modified: 23 Aug 2023, 5:26:20 UTC

Laurence,
in CentOS9-VM, Boinc 7.20.2,
in the folder of the slot is a file output.tgz NOT deleted, after the task is finished,
so, the next task create a new folder in slots, for the task.
CentOS7-VM is correct.
ID: 8146 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile *"onmouseover="alert(&...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 0
Message 8147 - Posted: 23 Aug 2023, 7:24:42 UTC - in response to Message 8146.  

Do you have any idea why the file is not deleted? I do not see the same behavior on my machine (Ubuntu 23.04).
ID: 8147 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile *"onmouseover="alert(&...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 0
Message 8148 - Posted: 23 Aug 2023, 7:30:20 UTC - in response to Message 8147.  

For suspend/resume to work you will need to use a new cgroups script

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4971

`sudo wget http://lhcathome.cern.ch/lhcathome/download/create-boinc-cgroup2 -O /sbin/create-boinc-cgroup`

Please restart the client after downloading this file.

`sudo systemctl restart boinc-client`
ID: 8148 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 420
Message 8149 - Posted: 23 Aug 2023, 17:29:40 UTC - in response to Message 8147.  

Do you have any idea why the file is not deleted? I do not see the same behavior on my machine (Ubuntu 23.04).

This CentOS9-VM is under Virtualbox Version 7.0.6 r155176 (Qt5.15.2) created.
ID: 8149 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 420
Message 8150 - Posted: 24 Aug 2023, 5:10:31 UTC - in response to Message 8148.  

For suspend/resume to work you will need to use a new cgroups script

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4971

`sudo wget http://lhcathome.cern.ch/lhcathome/download/create-boinc-cgroup2 -O /sbin/create-boinc-cgroup`

Please restart the client after downloading this file.

`sudo systemctl restart boinc-client`


Have it made for this CentOS9-VM:https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=4690
ID: 8150 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 420
Message 8151 - Posted: 24 Aug 2023, 6:24:38 UTC - in response to Message 8148.  

This is the log after cgroup2 resume/pause doing.
<core_client_version>7.20.2</core_client_version>
<![CDATA[
<stderr_txt>
07:08:15 (3608): wrapper (7.15.26016): starting
07:08:15 (3608): wrapper (7.15.26016): starting
07:08:15 (3608): wrapper: running ../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.35 ()
07:08:15 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Detected Theory App
07:08:15 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Checking CVMFS.
07:08:21 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Checking runc.
07:08:21 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Creating the filesystem.
07:08:21 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Using /cvmfs/cernvm-prod.cern.ch/cvm3
mkdir: das Verzeichnis &#226;&#128;&#158;/sys/fs/cgroup/unified&#226;&#128;&#156; kann nicht angelegt werden: Das Dateisystem ist nur lesbar
mkdir: das Verzeichnis &#226;&#128;&#158;/sys/fs/cgroup/unified&#226;&#128;&#156; kann nicht angelegt werden: Das Dateisystem ist nur lesbar
mkdir: das Verzeichnis &#226;&#128;&#158;/sys/fs/cgroup/unified&#226;&#128;&#156; kann nicht angelegt werden: Das Dateisystem ist nur lesbar
mkdir: das Verzeichnis &#226;&#128;&#158;/sys/fs/cgroup/unified&#226;&#128;&#156; kann nicht angelegt werden: Das Dateisystem ist nur lesbar
mkdir: das Verzeichnis &#226;&#128;&#158;/sys/fs/cgroup/unified&#226;&#128;&#156; kann nicht angelegt werden: Das Dateisystem ist nur lesbar
07:08:21 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Updating config.json.
07:08:21 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Running Container 'runc'.
07:08:23 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] ===> [runRivet] Thu Aug 24 05:08:21 UTC 2023 [boinc pp jets 8000 170,-,2960 - pythia8 8.176 default 100000 534]
07:26:48 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Pausing container Theory_2390-1131463-534_0.
../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.35: Zeile 150: [: Fehlende &#194;&#187;]&#194;&#171;
../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.35: Zeile 150: -d: Kommando nicht gefunden.
07:26:48 CEST +02:00 2023-08-24: cranky-0.0.35: [WARNING] Cannot pause container as /sys/fs/cgroup/freezer/boinc/freezer.state or /sys/fs/cgroup/freezer/boinc do not exist.
07:26:53 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Resuming container Theory_2390-1131463-534_0.
container not paused
08:13:17 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Container 'runc' finished with status code 0.
08:13:17 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Preparing output.
08:13:17 (3608): cranky exited; CPU time 3729.908138
08:13:17 (3608): called boinc_finish(0)
ID: 8151 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 17 Mar 15
Posts: 51
Credit: 602,329
RAC: 0
Message 8154 - Posted: 1 Sep 2023, 9:26:20 UTC
Last modified: 1 Sep 2023, 9:26:47 UTC

Hello

All the tasks failing on that linux debian 10 host

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
05:12:28 (11646): wrapper (7.15.26016): starting
05:12:28 (11646): wrapper (7.15.26016): starting
05:12:28 (11646): wrapper: running ../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.35 ()
05:12:28 CEST +02:00 2023-09-01: cranky-0.0.35: [INFO] Detected Theory App
05:12:28 CEST +02:00 2023-09-01: cranky-0.0.35: [INFO] Checking CVMFS.
05:12:35 CEST +02:00 2023-09-01: cranky-0.0.35: [INFO] Checking runc.
/cvmfs/grid.cern.ch/vc/containers/runc.new: symbol lookup error: /cvmfs/grid.cern.ch/vc/containers/runc.new: undefined symbol: seccomp_api_get
05:12:35 CEST +02:00 2023-09-01: cranky-0.0.35: [ERROR] 'runc -v' failed.

05:12:36 (11646): cranky exited; CPU time 0.730365
05:12:36 (11646): app exit status: 0xce
05:12:36 (11646): called boinc_finish(195)

</stderr_txt>
]]>

It started to fail with v5.60 and is failing the same with 5.70. It was running fine with v5.21 but it seems the latest tasks I got then were from January 2022 !

On the other hand I don't think I changed anything on that debian except from applying the patches regularly, I think I saw something about CVMFS during an apt upgrade not long ago - one or two months maybe ?

Are there some new requisites for this app to run native ?
ID: 8154 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 420
Message 8155 - Posted: 1 Sep 2023, 9:43:19 UTC - in response to Message 8154.  

Have you checked against the threads in Production,
native VM, CVMFS....
ID: 8155 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 482
Credit: 394,720
RAC: 0
Message 8156 - Posted: 1 Sep 2023, 18:21:54 UTC - in response to Message 8154.  

Looks like you don't have a local runc version.
Hence, cranky tries to use the version provided via CVMFS.
That one is not compatible with the seccomp packet installed on your computer.

Suggestion:
Install a recent runc version provided by your Linux vendor.


OTOH expect a completely rewritten cranky version for testing early next week.
Requirements:
- local CVMFS is a must since it needs permanent access to online repos
- init process is systemd (may become interesting for WSL2 users)
- cgroups v2 is enabled and 'freezer' is available (not locked by v1 processes)
- the user running cranky is a member of the 'boinc' group
- sudo must be at least version 1.9.10 (may be checked in advance running 'sudo -V')
- sudoer file provided via LHC@home (detailed information coming soon)
ID: 8156 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 17 Mar 15
Posts: 51
Credit: 602,329
RAC: 0
Message 8158 - Posted: 1 Sep 2023, 23:42:26 UTC - in response to Message 8156.  

Thanks for this, I tried a simple sudo apt install runc and I got

Get:1 http://security.debian.org/debian-security buster/updates/main amd64 runc amd64 1.0.0~rc6+dfsg1-3+deb10u2 

but now it seems there are no more tasks, or I'm not allowed for some reason

lhcathome-dev	02 sept. 2023, 01:37:42	This computer has finished a daily quota of 1 tasks
ID: 8158 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Theory Application : New native version v5.60


©2024 CERN