1) Message boards : Theory Application : New version v6.00 (Message 8426)
Posted 18 hours ago by computezrmle
Post:
Theory 6.00 and 6.01 are there to test vboxwrapper modifications.

The modifications address 2 errors:
1. "the media type 'MultiAttach' can only be attached to machines that were
created with VirtualBox 4.0 or later"
2. "Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/
Theory_2023_12_13.vdi' because it has 2 child media"

Other errors, especially those returned from CVMFS or deeper level scripts are out of scope.


Details can be found here:
https://github.com/BOINC/boinc/pull/5571
2) Message boards : Theory Application : New version v6.00 (Message 8422)
Posted 18 hours ago by computezrmle
Post:
This is wrong in Theory_2024_04_26_dev.xml:
<multiattach_vdi_file>Theory_2024_04_26_dev.xml</multiattach_vdi_file>

Should be:
<multiattach_vdi_file>Theory_2024_04_26_dev.vdi</multiattach_vdi_file>
3) Message boards : CMS Application : CMS multi-core (Message 8412)
Posted 10 days ago by computezrmle
Post:
Unlike VirtualBox VMWare is out of scope for CMS multi-core.
Hence, discussing VirtualBox settings may be useful while a discussion about VMWare vs. VirtualBox is not.
It just moves the focus off.
4) Message boards : CMS Application : CMS multi-core (Message 8410)
Posted 10 days ago by computezrmle
Post:
Your Ryzen 7 1700 has 8 physical cores and 16 logical cores.
Your 1st log shows that you configured a 15-core VM (meanwhile you use 4-core VMs):
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3314981
2024-04-02 11:41:10 (6376): Setting CPU Count for VM. (15)


15-core VMs should not be configured on an 8 core (physical cores) computer.
Instead, each VM should not exceed the number of physical cores.

See a detailed comment about that here:
https://forums.virtualbox.org/viewtopic.php?t=77413

I suggest to respect that limit to avoid issues being introduced in any test here that have nothing to do with CERN.
5) Message boards : Theory Application : Suspend/Resume (Message 8399)
Posted 18 days ago by computezrmle
Post:
The following comments are from the original BOINC service file on github:
# The following options prevent setuid root as they imply NoNewPrivileges=true
# Since Atlas requires setuid root, they break Atlas
# In order to improve security, if you're not using Atlas,
# Add these options to the [Service] section of an override file using
# sudo systemctl edit boinc-client.service
#NoNewPrivileges=true
#ProtectKernelModules=true
#ProtectKernelTunables=true
#RestrictRealtime=true
#RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
#RestrictNamespaces=true
#PrivateUsers=true
#CapabilityBoundingSet=
#MemoryDenyWriteExecute=true
#PrivateTmp=true  #Block X11 idle detection


On your system the security policy is too strict and inhibits communication among relevant services required for ATLAS/Theory native.
ProtectControlGroups=yes
ProtectHome=yes
ProtectSystem=strict


Follow the comments above and create an override file with this content:
[Service]
ProtectControlGroups=no
ProtectHome=no
ProtectSystem=full

Then reboot and try a fresh Theory native task.
Let's see if this solves it.


As for CVMFS you may look at this post:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5594&postid=48539
6) Message boards : Theory Application : Suspend/Resume (Message 8397)
Posted 18 days ago by computezrmle
Post:
time="2024-04-08T01:09:09Z" level=error msg="runc run failed: fchown fd 7: operation not permitted"

This is primarily an error reported by runc.

Please post the output of the command below to ensure it is not caused as a side effect of a too strict systemd setting.
systemctl --no-pager show boinc-client |grep -i protect



In addition:
- You use BOINC version 7.18.1 which was never intended for Linux.
- You upgraded sudo (Ubuntu 22.04 originally came with sudo < 1.9.10)
- Your CVMFS configuration does not follow the latest suggestions (see your logs)
7) Message boards : Theory Application : Suspend/Resume (Message 8394)
Posted 19 days ago by computezrmle
Post:
Your Ubuntu box does not support virtualization, hence you run Theory native.
Theory native doesn't support suspend/resume out of the box.
There are 2 possible methods to enable it:

1.
Use the traditional method together with cgroups v1.
This must be prepared as described in the prod forum.
But this method is deprecated as most Linux distributions now use cgroups v2 as default.

2.
Use cgroups v2 (plus sudo >= v1.9.10)
Your sudo version is 1.9.10, but you did not install the required sudoers file '/etc/sudoers.d/50-lhcathome_boinc_theory_native'.
Check the relevant forum posts and your logfiles.
For method (2.) settings from method (1.) must not be used. Disable/uninstall them.


It looks like you are currently running a mix of both methods but none of them is correctly set up.
As a result the tasks ignore the pause/resume signals and are still running even if BOINC shows them as suspended.
8) Message boards : Theory Application : Suspend/Resume (Message 8391)
Posted 21 days ago by computezrmle
Post:
Please make your computers visible for other volunteers here:
https://lhcathomedev.cern.ch/lhcathome-dev/prefs.php?subset=project
9) Message boards : Theory Application : Veeerrrry long Pythia8 (Message 8318)
Posted 1 Feb 2024 by computezrmle
Post:
I don't understand why ... I just removed them and this time I got a result

Well, that's your problem!
You checked for the first line showing "processed" instead of the last line.


This forum is not the place to give basic lessons.
Neither for Linux nor for Windows.
If you don't want to invest the time to learn the most simple basics it might be better you don't try to "analyse" anything.

Instead of:
sudo grep -m1 'processed' < /var/lib/boinc-client/slots/4/cernvm/shared/runRivet.log

run this:
sudo grep 'processed' /var/lib/boinc-client/slots/4/cernvm/shared/runRivet.log
10) Message boards : Theory Application : Veeerrrry long Pythia8 (Message 8316)
Posted 1 Feb 2024 by computezrmle
Post:
Why "sudo"?
Since you are already in "/var/lib/boinc-client/slots/4" it looks like you have the necessary access rights to list the dirs/files, don't you?

My 1st command is already explained:
Look for a "runRivet.log" not modified recently (recently can even be 1-x hours) as this might indicate a task being either in an endless loop or pausing.


The 2nd should be self explaining as it uses basic Linux commands.
Make yourself familiar with those commands as they are widely used.
Here the command is used to locate the last entry of the "processed" pattern in "runRivet.log".
The result should look like:
20100 events processed

A typical 1st line in "runRivet.log" looks like:
===> [runRivet] Tue Jan 30 20:52:08 UTC 2024 [boinc pp jets 13000 1500 - pythia8 8.301 CP1-CR1 100000 78]
The bold number tells you how many events are to be processed.
Here: 100000

So, roughly 20% of the task is done.
Now, look at the task's walltime, say (example): 21 hours

=> This task has another 84 hours to go.
=> It will finish before the 10-days-limit.


If the oneliners I suggested don't work for you (e.g. due to missing access rights) feel free to copy the log to a folder where you have full rights and use an editor to look into the copy.
11) Message boards : Theory Application : Veeerrrry long Pythia8 (Message 8314)
Posted 1 Feb 2024 by computezrmle
Post:
Sherpas sometimes get stuck in endless loops.


Check the "runRivet.log" from that task.

Has something been written to the log within the last "mmin" minutes (720 min = 12 h)?
Be aware! This command will also show logs from tasks that are suspended for roughly the same time span.
find /path_to_your/BOINC_working_directory/slots -type f -name "runRivet.log" -mmin +720 |xargs -I {} bash -c "head -n1 {}; ls -hal {}"



Now, use the path to the log from above for the next command.
This shows how many events are already processed (or nothing).
grep -m1 'processed' <(tac /path_to_your_logfile)

Compare that number with the task's total events and the actual runtime of the task.
This allows to calculate the estimated total runtime.
If that is far more than 10 days the task will not finish within the deadline and should be cancelled.

Be aware! Do not use the runtime estimates shown by BOINC.
BOINC doesn't know anything about the internal structure/logs of Theory tasks.
12) Message boards : General Discussion : Fetching configuration file (Message 8294)
Posted 19 Jan 2024 by computezrmle
Post:
I wonder where the URL "https://lhcathomedev.cern.ch/lhc-dev" comes from (it was you who posted it).
Please
- open a console on the computer that fails to connect
- change to the BOINC working directory
- run the following command
- post the output

find . -maxdepth 1 -name "*.xml" |xargs -I {} grep -H "://lhcathomedev.cern.ch/lhc-dev"
13) Message boards : General Discussion : Fetching configuration file (Message 8292)
Posted 18 Jan 2024 by computezrmle
Post:
I'm trying to add LHC-Dev to a linux machine (on VirtualBox) to test xtrack app
But i have this message: "Fetching configuration file from https://lhcathomedev.cern.ch/lhc-dev" and boinc manager stucks

I've tried 3 different distos
I've tried to connect to other projects without problems

Might be you used the wrong master URL.
Try this one:
https://lhcathomedev.cern.ch/lhcathome-dev/
14) Message boards : ATLAS Application : ATLAS vbox and native 3.01 (Message 8269)
Posted 3 Jan 2024 by computezrmle
Post:
Since David Cameron left CERN there's no BOINC development for ATLAS.

The tasks you get are created by an automatic loop and contain just a few test events.
Once the queue is empty the same tasks are created again.
Even if a task returns valid results those are not used in a scientific matter.

Decide yourself whether it makes sense to run ATLAS -dev until CERN officially resumes development here.
15) Message boards : Theory Application : All errors (Message 8262)
Posted 23 Dec 2023 by computezrmle
Post:
Looks like the vdi file on -prod is a simple copy of the vdi file on -dev without a fresh UUID.

If -prod and -dev run under the same username (even if that user runs multiple BOINC clients) VirtualBox registers the vdi that comes first.
Hence, if -dev comes first you get the error at -prod and vice versa.

A fresh UUID must be set at CERN when the vdi/app moves from -dev to -prod.
16) Message boards : Theory Application : New native version v5.94 (Message 8245)
Posted 5 Dec 2023 by computezrmle
Post:
On the repository cernvm-prod.cern.ch there are cvm3 and cvm4.
17) Message boards : Theory Application : New native version v5.94 (Message 8243)
Posted 5 Dec 2023 by computezrmle
Post:
Either run a recent Linux with cgroups v2 plus the sudo modification.
That's the recommended way as it delegates the cgroup handling used for suspend/resume to systemd.

Or disable cgroups v2 and modify the system according to the old suggestions on -prod.
Theory runs without those modifications as the are only there to allow suspend/resume independend from systemd.
18) Message boards : Theory Application : New native version v5.94 (Message 8240)
Posted 5 Dec 2023 by computezrmle
Post:
Good:
The task reported a valid scientific result.

Also good:
The fallback mode works.


And since the fallback mode uses cgroups v1 it has no permission to create a directory below cgroup/unified which represents the v2 hierarchy:
mkdir: das Verzeichnis &#226;&#128;&#158;/sys/fs/cgroup/unified&#226;&#128;&#156; kann nicht angelegt werden: Das Dateisystem ist nur lesbar



BTW:
The namespace setting is another requirement.
user.max_user_namespaces = 100 works but 15000 is a suggestion from Fermilab
19) Message boards : Theory Application : New native version v5.94 (Message 8238)
Posted 5 Dec 2023 by computezrmle
Post:
12:53:19 CET +01:00 2023-12-05: cranky-0.1.4: [INFO] Found Sudo-Version 1.9.5p2.
12:53:19 CET +01:00 2023-12-05: cranky-0.1.4: [INFO] To run this task in new mode
12:53:19 CET +01:00 2023-12-05: cranky-0.1.4: [INFO] Sudo-Version must be at least 1.9.10.
.
.
.
12:53:47 CET +01:00 2023-12-05: cranky-0.1.4: [INFO] Minor requirements are missing. Will try to run this task in legacy mode.

Sudo prior to 1.9.10 doesn't support regular expressions which are a must to define the required commands as pattern.
20) Message boards : Theory Application : New native version v5.94 (Message 8236)
Posted 2 Dec 2023 by computezrmle
Post:
I changed my setup back to cvm3 since it has a much lower container error rate.


Next 20


©2024 CERN