Message boards :
Theory Application :
New native version v5.60
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
This new native version aims to support cgroups v2. It is currently WIP so feedback more than welcome. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 2 |
First Task on COS7-VM:https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=4008 This is the first with COS9-VMhttps://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3234530 |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.33: line 159: [: d: unary operator expected This is caused by a missing "-". Replace cranky line 159: if [ d /sys/fs/cgroup/freezer/boinc ]; then with: if [ -d /sys/fs/cgroup/freezer/boinc ]; then |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
This link might be useful: https://systemd.io/CGROUP_DELEGATION/ Beside other hints it states: Avoid "/sys/fs/cgroup/unified/". Check if "Delegate=" needs to be added to the boinc-client.service file. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
Just did some generic tests using systemd to control freeze/thaw. Run a user program as service: systemd-run --unit foobar_0815 --user sleep 30 freeze it: systemctl --user freeze foobar_0815.service get the status: systemctl --user list-units |grep foobar systemctl --user status foobar_0815.service thaw it: systemctl --user thaw foobar_0815.service I would prefer using systemd within cranky since it is the generic process to control recent Linux systems. There are a couple of options that need to be tested (see: man systemd-run) to ensure the started containers run as expected (e.g. within the right slice) and we get the accounting information back to BOINC. Will do some more tests in the afternoon/evening. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Thanks for the feedback. I have fixed the issue and made some changes so hopefully suspend/resume will now work. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 2 |
Thanks for the feedback. I have fixed the issue and made some changes so hopefully suspend/resume will now work. Thanks Laurence. :-)) |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
It is broken. Debugging. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 2 |
https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=2326389 This Task running with Version 5.70 on COS9-VM. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Looks like the issue was on my machine. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 2 |
Laurence, in CentOS9-VM, Boinc 7.20.2, in the folder of the slot is a file output.tgz NOT deleted, after the task is finished, so, the next task create a new folder in slots, for the task. CentOS7-VM is correct. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Do you have any idea why the file is not deleted? I do not see the same behavior on my machine (Ubuntu 23.04). |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
For suspend/resume to work you will need to use a new cgroups script https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4971 `sudo wget http://lhcathome.cern.ch/lhcathome/download/create-boinc-cgroup2 -O /sbin/create-boinc-cgroup` Please restart the client after downloading this file. `sudo systemctl restart boinc-client` |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 2 |
Do you have any idea why the file is not deleted? I do not see the same behavior on my machine (Ubuntu 23.04). This CentOS9-VM is under Virtualbox Version 7.0.6 r155176 (Qt5.15.2) created. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 2 |
For suspend/resume to work you will need to use a new cgroups script Have it made for this CentOS9-VM:https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=4690 |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 2 |
This is the log after cgroup2 resume/pause doing. <core_client_version>7.20.2</core_client_version> <![CDATA[ <stderr_txt> 07:08:15 (3608): wrapper (7.15.26016): starting 07:08:15 (3608): wrapper (7.15.26016): starting 07:08:15 (3608): wrapper: running ../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.35 () 07:08:15 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Detected Theory App 07:08:15 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Checking CVMFS. 07:08:21 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Checking runc. 07:08:21 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Creating the filesystem. 07:08:21 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Using /cvmfs/cernvm-prod.cern.ch/cvm3 mkdir: das Verzeichnis „/sys/fs/cgroup/unified“ kann nicht angelegt werden: Das Dateisystem ist nur lesbar mkdir: das Verzeichnis „/sys/fs/cgroup/unified“ kann nicht angelegt werden: Das Dateisystem ist nur lesbar mkdir: das Verzeichnis „/sys/fs/cgroup/unified“ kann nicht angelegt werden: Das Dateisystem ist nur lesbar mkdir: das Verzeichnis „/sys/fs/cgroup/unified“ kann nicht angelegt werden: Das Dateisystem ist nur lesbar mkdir: das Verzeichnis „/sys/fs/cgroup/unified“ kann nicht angelegt werden: Das Dateisystem ist nur lesbar 07:08:21 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Updating config.json. 07:08:21 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Running Container 'runc'. 07:08:23 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] ===> [runRivet] Thu Aug 24 05:08:21 UTC 2023 [boinc pp jets 8000 170,-,2960 - pythia8 8.176 default 100000 534] 07:26:48 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Pausing container Theory_2390-1131463-534_0. ../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.35: Zeile 150: [: Fehlende »]« ../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.35: Zeile 150: -d: Kommando nicht gefunden. 07:26:48 CEST +02:00 2023-08-24: cranky-0.0.35: [WARNING] Cannot pause container as /sys/fs/cgroup/freezer/boinc/freezer.state or /sys/fs/cgroup/freezer/boinc do not exist. 07:26:53 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Resuming container Theory_2390-1131463-534_0. container not paused 08:13:17 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Container 'runc' finished with status code 0. 08:13:17 CEST +02:00 2023-08-24: cranky-0.0.35: [INFO] Preparing output. 08:13:17 (3608): cranky exited; CPU time 3729.908138 08:13:17 (3608): called boinc_finish(0) |
Send message Joined: 17 Mar 15 Posts: 51 Credit: 602,329 RAC: 0 |
Hello All the tasks failing on that linux debian 10 host <core_client_version>7.14.2</core_client_version> It started to fail with v5.60 and is failing the same with 5.70. It was running fine with v5.21 but it seems the latest tasks I got then were from January 2022 ! On the other hand I don't think I changed anything on that debian except from applying the patches regularly, I think I saw something about CVMFS during an apt upgrade not long ago - one or two months maybe ? Are there some new requisites for this app to run native ? |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 2 |
Have you checked against the threads in Production, native VM, CVMFS.... |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
Looks like you don't have a local runc version. Hence, cranky tries to use the version provided via CVMFS. That one is not compatible with the seccomp packet installed on your computer. Suggestion: Install a recent runc version provided by your Linux vendor. OTOH expect a completely rewritten cranky version for testing early next week. Requirements: - local CVMFS is a must since it needs permanent access to online repos - init process is systemd (may become interesting for WSL2 users) - cgroups v2 is enabled and 'freezer' is available (not locked by v1 processes) - the user running cranky is a member of the 'boinc' group - sudo must be at least version 1.9.10 (may be checked in advance running 'sudo -V') - sudoer file provided via LHC@home (detailed information coming soon) |
Send message Joined: 17 Mar 15 Posts: 51 Credit: 602,329 RAC: 0 |
Thanks for this, I tried a simple sudo apt install runc and I got Get:1 http://security.debian.org/debian-security buster/updates/main amd64 runc amd64 1.0.0~rc6+dfsg1-3+deb10u2 but now it seems there are no more tasks, or I'm not allowed for some reason lhcathome-dev 02 sept. 2023, 01:37:42 This computer has finished a daily quota of 1 tasks |
©2024 CERN