21) Message boards : Theory Application : New native version v5.91 (Message 8181)
Posted 9 Sep 2023 by computezrmle
Post:
No.
The sudoers file created by the setup script ensures cranky is allowed to run certain commands (systemd-run, systemctl freeze, systemctl thaw) without a password.
To make this work sudo MUST be at least version 1.9.10 as stated in stderr.txt a few lines above.
The log shows sudo on that system is version 1.9.5.

If cranky runs into that error it will fall back to legacy mode.
22) Message boards : Theory Application : New native version v5.60 (Message 8174)
Posted 7 Sep 2023 by computezrmle
Post:
Your logfiles show a couple of entries that will finally lead to a system not being able to use the new functions.

Most important:
It is a must to have at least sudo version 1.9.10 installed since this is the first version that supports regular expressions in sudoer files.
Your system reports:
04:04:45 CEST +02:00 2023-09-07: cranky-0.1.0: [INFO] Found Sudo version 1.8.27.

Without that cranky will definitely switch back to legacy mode.

Runc may still be an issue but it needs to be tested again with the next cranky version.


Since meanwhile some other minor issues have been found another new version is in preparation.
You may check this forum for notifications about when it is available.
23) Message boards : Theory Application : New cranky version explained (Message 8171)
Posted 7 Sep 2023 by computezrmle
Post:
New cranky version explained


Legacy version

Cranky is an interface script between BOINC and a software container app (runc) running complex scientific processes (Theory Simulation) in a native Linux environment (that means: without virtualbox).

It's last recent major change introduced suspend/resume which is expected by BOINC users but not supported by the scientific processes.
That suspend/resume method is based on a freeze/thaw request to runc which forwards it to systemd.
The method works fine with cgroups v1 but needs distinct cgroup directories being prepared in advance by the computer's admin (root).



New version

Since all major Linux distributions nowadays use cgroups v2 or a (not recommended, even deprecated) hybrid mode v1/v2 there's a need to make cranky ready for cgroups v2.
The most natural point for the split is to keep the scientific container under control of runc and move the cgroups interface to systemd (including suspend/resume).
This results in a single command line like this (within cranky):
sudo [sudo options] systemd-run [systemd options] runc [runc options] [container]

Similar to the legacy version the new cranky requires permission to access cgroup's freezer (now v2).
In addition it requires permission to create a temporary systemd scope per task via systemd-run.
On Linux a standard method to grant permission is to use sudo which checks certain configuration files (sudoers file) that include permission definitions.

The new cranky version comes with a setup script (must be run once) which creates a well formed sudoers file, saves it to the right place and sets the correct access rights. Instead of creating the sudoers file manually it is highly recommended to use that script to avoid a typo or wrong permission settings cause sudo to reject the commands from within cranky.

Since the sudoers file makes use of regular expressions sudo version must at least be 1.9.10. Older sudo versions do not support regular expressions.

This oneliner gets the setup script from CERN and executes it.
In case of any errors, post them here.
sudo /bin/bash -c "export script=\"prepare_theory_native_environment\" && wget https://lhcathomedev.cern.ch/lhcathome-dev/download/$script -O /tmp/$script && chmod u+x /tmp/$script && /tmp/$script && rm /tmp/$script"

<edit> corrected script with '$' being escaped.
sudo /bin/bash -c "export script=\"prepare_theory_native_environment\" && wget https://lhcathomedev.cern.ch/lhcathome-dev/download/\$script -O /tmp/\$script && chmod u+x /tmp/\$script && /tmp/\$script && rm /tmp/\$script"
</edit>


What happens on systems that do not meet the requirements?

New cranky will try to run the task in legacy mode.
Be aware that there is no further development to improve that mode since cgroups v1 will disappear sometime.


Minor changes

New cranky prints more information to the logfile (stderr.txt).
This allows users to see whether basic requirements are missing or which options are recommended, e.g. for the local CVMFS client.
It also prints a hint how to get information about the running task via systemctl - just copy/paste the command.


Microsoft Windows WSL2

According to Microsoft Linux guests under WSL2 can be configured to use systemd as init process.
It may be worth a test if Theory native can be run within such an environment.
24) Message boards : Theory Application : New native version v5.90 (Message 8168)
Posted 6 Sep 2023 by computezrmle
Post:
Just to mention it:
For years all apps from LHC@home -dev and -prod (except SixTrack) include code that I suggested and Laurence or other responsibles accepted.

The error messages that you post are caused by missing requirements your system simply does not fulfil.
Either because it is outdated or you just did not run the script Laurence mentioned.
In addition the -dev project is the right place to identify exactly those issues.
25) Message boards : Theory Application : New native version v5.90 (Message 8166)
Posted 6 Sep 2023 by computezrmle
Post:
Reply to post https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=630&postid=8165 from the meanwhile outdated thread.
Jerome_C2005 wrote:
I'm sorry I have no idea how to get that script (and then run it) in command line from the no-GUI debian VM I'm using for LHCdev :/


edit : I actually have another debian on a VM on my iMac that I have never used for LHC, there (I think) that I could run that script now, and attached it to LHCdev and I got native tasks (for both theory and atlas) but (obviously) all is failing because I never ran all the stuff that needs (install applications, setup... ) to be done to run the native applications... (I never intended to do it on that machine)

If you explain me how to run the script on my no-GUI VM (that one is not a VM in my mac, it is a small hosted machine on a cloud provider) I'll give it a try

To be honest, I don't know what you expect.

You got a link to download a script and have been asked to run that script.
Use wget or curl to download it, then run it.
This is basic knowledge and not related to BOINC or LHC.
Well, here is a oneliner doing it for you:
sudo /bin/bash -c "export script=\"prepare_theory_native_environment\" && wget https://lhcathomedev.cern.ch/lhcathome-dev/download/$script -O /tmp/$script && chmod u+x /tmp/$script && /tmp/$script && rm /tmp/$script"


Beside that it is a must for years to have a local CVMFS client installed and correctly configured before you can run native tasks.
I wonder why you ignore that and instead post complaints about failing tasks.


What you may have missed:
ATLAS tasks on this server are currently generated by an automatic loop with a minimum of events.
Since the ATLAS responsible has left CERN months ago there's currently nobody at CERN who takes care of the results.

Hence, it makes no sense to run any ATLAS tasks.
26) Message boards : Theory Application : New native version v5.60 (Message 8164)
Posted 5 Sep 2023 by computezrmle
Post:
The version that I mean is out since this afternoon:
https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=633

1. step
Download and run the script Laurence has set a link to.

2. step
try to get a task (suggest only a few ones first)

3. step
upload the results even in case of an error
27) Message boards : Theory Application : New native version v5.60 (Message 8161)
Posted 4 Sep 2023 by computezrmle
Post:
It makes no sense to investigate that error with the Theory native version currently on the -dev server.

As already mentioned a new version will be send out soon.
If your errors persist it makes sense to investigate against the new version.
28) Message boards : Theory Application : New native version v5.60 (Message 8156)
Posted 1 Sep 2023 by computezrmle
Post:
Looks like you don't have a local runc version.
Hence, cranky tries to use the version provided via CVMFS.
That one is not compatible with the seccomp packet installed on your computer.

Suggestion:
Install a recent runc version provided by your Linux vendor.


OTOH expect a completely rewritten cranky version for testing early next week.
Requirements:
- local CVMFS is a must since it needs permanent access to online repos
- init process is systemd (may become interesting for WSL2 users)
- cgroups v2 is enabled and 'freezer' is available (not locked by v1 processes)
- the user running cranky is a member of the 'boinc' group
- sudo must be at least version 1.9.10 (may be checked in advance running 'sudo -V')
- sudoer file provided via LHC@home (detailed information coming soon)
29) Message boards : Theory Application : New native version v5.60 (Message 8140)
Posted 22 Aug 2023 by computezrmle
Post:
Just did some generic tests using systemd to control freeze/thaw.

Run a user program as service:
systemd-run --unit foobar_0815 --user sleep 30



freeze it:
systemctl --user freeze foobar_0815.service



get the status:
systemctl --user list-units |grep foobar
systemctl --user status foobar_0815.service



thaw it:
systemctl --user thaw foobar_0815.service



I would prefer using systemd within cranky since it is the generic process to control recent Linux systems.
There are a couple of options that need to be tested (see: man systemd-run) to ensure the started containers run as expected (e.g. within the right slice) and we get the accounting information back to BOINC.
Will do some more tests in the afternoon/evening.
30) Message boards : Theory Application : New native version v5.60 (Message 8139)
Posted 21 Aug 2023 by computezrmle
Post:
This link might be useful:
https://systemd.io/CGROUP_DELEGATION/

Beside other hints it states:
Avoid "/sys/fs/cgroup/unified/".

Check if "Delegate=" needs to be added to the boinc-client.service file.
31) Message boards : Theory Application : New native version v5.60 (Message 8138)
Posted 21 Aug 2023 by computezrmle
Post:
../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.33: line 159: [: d: unary operator expected

This is caused by a missing "-".

Replace cranky line 159:
if [ d /sys/fs/cgroup/freezer/boinc ]; then

with:
if [ -d /sys/fs/cgroup/freezer/boinc ]; then
32) Message boards : General Discussion : Uploads failing (Message 8122)
Posted 7 Jul 2023 by computezrmle
Post:
Looks like the failing tasks use a filename that doesn't correspond to the filename expected by either BOINC or a backend system at CERN.
That results in upload issues.
This needs to be solved by the submitter at CERN.

I guess that since David Cameron left CERN an automatic loop keeps generating the same few tasks again and again.
Hence, I suggest not to run ATLAS from -dev until David's successor asks for it.
33) Message boards : Theory Application : Why is there no native_theory on this project? (Message 8069)
Posted 28 Apr 2023 by computezrmle
Post:
There's currently no need.
34) Message boards : ATLAS Application : Sometimes atlas jobs use more cpu and memory than required. (Message 8064)
Posted 27 Apr 2023 by computezrmle
Post:
Unfortunately David Cameron left CERN to start a new job.
Among the last tests he made here were modifications to find out optimized (RAM) settings for ATLAS v3.x.
Might be those tests are not yet completed and need to be finished when team has found a successor.

Beside that you mentioned ATLAS native but linked to ATLAS vbox tasks.
35) Message boards : ATLAS Application : SSL certificate error in atlas tasks (Message 8063)
Posted 27 Apr 2023 by computezrmle
Post:
The same suggestion came up 2 weeks ago at the -prod forum.
I just replied to the post there:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5986&postid=48045
36) Message boards : ATLAS Application : ATLAS vbox and native 3.01 (Message 8031)
Posted 22 Mar 2023 by computezrmle
Post:
Same as described here:
https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=614&postid=8026
37) Message boards : ATLAS Application : Huge EVNT Files (Message 8029)
Posted 22 Mar 2023 by computezrmle
Post:
[22/Mar/2023:15:11:31 +0100] "GET http://lhcathome-upload.cern.ch/lhcathome/download//217/bM6LDmhcBz2n9Rq4apoT9bVoABFKDmABFKDmSrVUDm0NXKDmZhlDEn_EVNT.32794564._000002.pool.root.1 HTTP/1.1" 200 1166414003 "-" "BOINC client (x86_64-suse-linux-gnu 7.21.0)" TCP_MISS:HIER_DIRECT

[22/Mar/2023:15:27:33 +0100] "GET http://lhcathome-upload.cern.ch/lhcathome/download//1be/MBeLDmLdBz2nsSi4apGgGQJmABFKDmABFKDmjv4TDmjwcKDmTB0Dkm_EVNT.32794564._000002.pool.root.1 HTTP/1.1" 200 1166414003 "-" "BOINC client (x86_64-suse-linux-gnu 7.21.0)" TCP_MISS:HIER_DIRECT


Just noticed those huge ATLAS EVNT files being downloaded from prod to different clients:
1,166,414,003 => 1.2 GB each


<edit>
Next one:
[22/Mar/2023:15:52:53 +0100] "GET http://lhcathome-upload.cern.ch/lhcathome/download//1a4/eP1NDmDOCz2np2BDcpmwOghnABFKDmABFKDmtdFKDmY3TKDmyMPKRn_EVNT.32794564._000003.pool.root.1 HTTP/1.1" 200 1164691364 "-" "BOINC client (x86_64-suse-linux-gnu 7.21.0)" TCP_MISS:HIER_DIRECT

</edit>
38) Message boards : ATLAS Application : ATLAS vbox and native 3.01 (Message 8028)
Posted 22 Mar 2023 by computezrmle
Post:
One solution for this is to add the library path to LD_LIBRARY_PATH.
It must point to the lib version pacparser/pactester is linked to.

Example (works for the old ATLAS version):

pactester_bin="/cvmfs/atlas.cern.ch/repo/sw/software/21.0/sw/lcg/releases/pacparser/1.3.5-a65a3/x86_64-centos7-gcc62-opt/bin/pactester"

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/cvmfs/atlas.cern.ch/repo/sw/software/21.0/sw/lcg/releases/pacparser/1.3.5-a65a3/x86_64-centos7-gcc62-opt/lib


The export can be made in setup.sh-local for example.
39) Message boards : ATLAS Application : ATLAS vbox and native 3.01 (Message 8026)
Posted 22 Mar 2023 by computezrmle
Post:
log.EVNTtoHITS starts but hangs after the last line:

12:16:59 Wed Mar 22 12:16:59 CET 2023
12:16:59 Preloading tcmalloc_minimal.so
12:16:59 Preloading /cvmfs/atlas.cern.ch/repo/sw/software/23.0/AthSimulationExternals/23.0.19/InstallArea/x86_64-centos7-gcc11-opt/lib/libintlc.so.5:/cvmfs/atlas.cern.ch/repo/sw/software/23.0/AthSimulationExternals/23.0.19/InstallArea/x86_64-centos7-gcc11-opt/lib/libimf.so
12:17:08 Py:Sim_tf            INFO ****************** STARTING Simulation *****************
12:17:08 Py:Sim_tf            INFO **** Transformation run arguments
12:17:08 Py:Sim_tf            INFO RunArguments:
12:17:08    AMITag = 's4066'
12:17:08    EVNTFileIO = 'input'
12:17:08    concurrentEvents = 4
12:17:08    conditionsTag = 'OFLCOND-MC21-SDR-RUN3-07'
12:17:08    firstEvent = 99501
12:17:08    geometryVersion = 'ATLAS-R3S-2021-03-02-00'
12:17:08    inputEVNTFile = ['EVNT.29838250._000010.pool.root.1']
12:17:08    inputEVNTFileNentries = 10000
12:17:08    inputEVNTFileType = 'EVNT'
12:17:08    jobNumber = 200
12:17:08    maxEvents = 20
12:17:08    nprocs = 0
12:17:08    outputHITSFile = 'HITS.32413688._000229-2536910-1679482957.pool.root.1'
12:17:08    outputHITSFileType = 'HITS'
12:17:08    perfmon = 'fastmonmt'
12:17:08    postInclude = ['PyJobTransforms.UseFrontier']
12:17:08    preInclude = ['Campaigns.MC23aSimulationMultipleIoV']
12:17:08    randomSeed = 200
12:17:08    runNumber = 601229
12:17:08    simulator = 'FullG4MT_QS'
12:17:08    skipEvents = 9500
12:17:08    threads = 4
12:17:08    totalExecutorSteps = 0
12:17:08    trfSubstepName = 'EVNTtoHITS'
12:17:08 Py:Sim_tf            INFO **** Setting-up configuration flags






pilotlog.txt loops printing lines like those:

2023-03-22 11:44:12,410 | INFO     | 1693.4158494472504s have passed since pilot start
2023-03-22 11:44:18,540 | INFO     | monitor loop #22: job 0:5753961892 is in state 'running'
2023-03-22 11:44:20,310 | INFO     | CPU consumption time for pid=121445: 10.57 (rounded to 11)
2023-03-22 11:44:20,310 | INFO     | executing command: ps -opid --no-headers --ppid 121445
2023-03-22 11:44:20,469 | INFO     | neither /home/boinc9/BOINC_TEST/slots/0/PanDA_Pilot-5753961892/memory_monitor_summary.json, nor /home/boinc9/BOINC_TEST/slots/0/memory_monitor_summary.json exist
2023-03-22 11:44:20,469 | INFO     | using path: /home/boinc9/BOINC_TEST/slots/0/PanDA_Pilot-5753961892/memory_monitor_output.txt (trf name=prmon)
2023-03-22 11:44:20,470 | INFO     | executing command: ps aux -q 109100
2023-03-22 11:44:20,585 | INFO     | neither /home/boinc9/BOINC_TEST/slots/0/PanDA_Pilot-5753961892/memory_monitor_summary.json, nor /home/boinc9/BOINC_TEST/slots/0/memory_monitor_summary.json exist
2023-03-22 11:44:20,585 | INFO     | using path: /home/boinc9/BOINC_TEST/slots/0/PanDA_Pilot-5753961892/memory_monitor_output.txt (trf name=prmon)
2023-03-22 11:44:20,586 | INFO     | max memory (maxPSS) used by the payload is within the allowed limit: 627933 B (2 * maxRSS = 131072000 B)
2023-03-22 11:44:20,587 | INFO     | oom_score(pilot) = 666, oom_score(payload) = 666
2023-03-22 11:44:20,587 | INFO     | payload log (log.EVNTtoHITS) within allowed size limit (2147483648 B): 1606 B
2023-03-22 11:44:20,587 | INFO     | payload log (payload.stdout) within allowed size limit (2147483648 B): 9445 B
2023-03-22 11:44:20,587 | INFO     | executing command: df -mP /home/boinc9/BOINC_TEST/slots/0
2023-03-22 11:44:20,628 | INFO     | sufficient remaining disk space (102636716032 B)
2023-03-22 11:44:20,628 | INFO     | work directory size check will use 61362667520 B as a max limit (10% grace limit added)
2023-03-22 11:44:20,629 | INFO     | size of work directory /home/boinc9/BOINC_TEST/slots/0/PanDA_Pilot-5753961892: 27332 B (within 61362667520 B limit)
2023-03-22 11:44:20,629 | INFO     | pfn file=/home/boinc9/BOINC_TEST/slots/0/PanDA_Pilot-5753961892/HITS.32413688._000229-2536910-1679482957.pool.root.1 does not exist (skip from workdir size calculation)
2023-03-22 11:44:20,629 | INFO     | total size of present files: 0 B (workdir size: 27332 B)
2023-03-22 11:44:20,629 | INFO     | output file size check: skipping output file /home/boinc9/BOINC_TEST/slots/0/PanDA_Pilot-5753961892/HITS.32413688._000229-2536910-1679482957.pool.root.1 since it does not exist
2023-03-22 11:44:21,871 | INFO     | number of running child processes to parent process 121445: 6
2023-03-22 11:44:21,872 | INFO     | maximum number of monitored processes: 6
40) Message boards : ATLAS Application : ATLAS vbox and native 3.01 (Message 8022)
Posted 22 Mar 2023 by computezrmle
Post:
... also setup the squid auto-discovery (Web Proxy Auto Detection (WPAD)) which should find the best squid server to use automatically.

This needs to be tested, especially with complex wpad files.
Frontier clients/pacparser libs run into problems if the server list gets too long.

Please provide some test tasks to check the runtime logs for related Frontier errors/warnings.

<edit>
If possible, not the 500 event ones.
</edit>


Previous 20 · Next 20


©2024 CERN