21) Message boards : ATLAS Application : ATLAS vbox and native 3.01 (Message 7959)
Posted 15 Mar 2023 by David Cameron
Post:
I'm submitting some longer tasks now (500 events).
22) Message boards : ATLAS Application : ATLAS vbox and native 3.01 (Message 7955)
Posted 14 Mar 2023 by David Cameron
Post:
Sorry, but for me, it looks very unusual . These 2 tasks seemed to run endless, CPU-Time was way to low:



I'm running Ubuntu 22.04.x

Oh, I see, results have got uploaded now, both say "Hits file was produced successfull": Not shure, if this is really true

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3193657

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3193668


The tasks are successful, but the run time is quite large compared to the CPU time. These are short tasks simulating 5 events each, I will submit some longer ones with 50 events to see the difference.

I think the vbox tasks are also working ok now.
23) Message boards : ATLAS Application : ATLAS vbox and native 3.01 (Message 7952)
Posted 14 Mar 2023 by David Cameron
Post:
Looks like native tasks run well now, but I'm still ironing out a few issues with the vbox version.
24) Message boards : ATLAS Application : ATLAS vbox and native 3.01 (Message 7948)
Posted 13 Mar 2023 by David Cameron
Post:
I've started submitting tasks with the Run 3 software now - these are tasks with input file EVNT.29838250._000010.pool.root.1.

Please post if you see any problems or notice anything strange, for example in the event monitor or other places.
25) Message boards : ATLAS Application : ATLAS vbox and native 3.01 (Message 7946)
Posted 13 Mar 2023 by David Cameron
Post:
v3.01 was just released. This contains updated ATLAS software for the latest version of Run 3 simulation.
26) Message boards : ATLAS Application : ATLAS vbox and native 3.00 (Message 7930)
Posted 6 Mar 2023 by David Cameron
Post:
Hi all,

Until now all ATLAS tasks have been simulations of the ATLAS detector during "Run 2" of the LHC that ran from 2015 to 2018. Even 5 years later ATLAS physicists are still analysing the data from that period and require new simulations to be done. However last year "Run 3" of the LHC started and it will last until the end of 2025. Soon we will switch the ATLAS tasks here to Run 3 simulations, where many things have changed in the detector and the software used to simulate and analyse data.

Version 3.0.0 of the app allows us to run both Run 2 and Run 3 simulations in the same version during the transition period, however this does make the image for vbox a bit larger (4.4GB, or 1.8GB compressed). The Linux native app is the same as before with one very minor change to set an environment variable required by the new software.

The main benefit of the software used for Run 3 simulations is that it uses far less memory, less than 3GB for an 8-core task. At the moment we are not yet submitting Run 3 tasks here but we'll let you know when we start sending them.
27) Message boards : ATLAS Application : ATLAS vbox v.1.27 (Message 7900)
Posted 30 Nov 2022 by David Cameron
Post:
v1.27 contains a very minor change to pass information from the bootstrap script to the wrapper script.
28) Message boards : ATLAS Application : ATLAS vbox v.1.26 (Message 7890)
Posted 24 Nov 2022 by David Cameron
Post:
ATLAS vbox 1.26 was just released which contains some small improvements in handling error conditions in the bootstrap script. v1.20 - 25 were already taken by native versions so that's why there is a jump from 1.19 :)
29) Message boards : ATLAS Application : ATLAS vbox v.1.19 (Message 7889)
Posted 22 Nov 2022 by David Cameron
Post:
v1.19 was just released which contains some improvements to the bootstrap scripts and CVMFS configuration.
30) Message boards : ATLAS Application : ATLAS vbox v.1.18 (Message 7882)
Posted 16 Nov 2022 by David Cameron
Post:
ATLAS v1.18 is now released. This version uses the new vboxwrapper version 26206 and also contains various CVMFS configuration improvements made by computezrmle.
31) Message boards : ATLAS Application : ATLAS vbox v.1.17 (Message 7826)
Posted 19 Oct 2022 by David Cameron
Post:
Hi all,

v1.17 of vbox contains some fixes for CVMFS configuration provided by computezrmle which should address some of the problems people see with stuck or not working CVMFS at the start of tasks. We are testing it here on dev just to make sure it works ok before releasing on the prod server.
32) Message boards : ATLAS Application : ATLAS native 1.25 (Message 7780)
Posted 5 Sep 2022 by David Cameron
Post:
This seemed to solve the problems with tmp dirs so this version is now deployed on the production server.
33) Message boards : ATLAS Application : ATLAS native 1.23 (Message 7770)
Posted 31 Aug 2022 by David Cameron
Post:
I wonder if this could be a side effect of hardening options set in BOINC's systemd service unit.

Did not yet test it but it should be ensured that the tmp dir forwarded to apptainer is not the system wide tmp.
Instead the tmp below the slot should be used.


Thanks for this tip, it looks like this is indeed the problem. The unit file has
ProtectSystem=strict
ReadWritePaths=-/var/lib/boinc -/etc/boinc-client


which makes /tmp and /var/tmp read-only.

In v1.25 I set APPTAINERENV_TMPDIR to a dir inside the slots and this seems to fix the problem.
34) Message boards : ATLAS Application : ATLAS native 1.25 (Message 7769)
Posted 31 Aug 2022 by David Cameron
Post:
This version sets TMPDIR to a directory inside the slots dir instead of /tmp
35) Message boards : ATLAS Application : ATLAS native 1.24 (Message 7764)
Posted 23 Aug 2022 by David Cameron
Post:
This version adds some debugging statements to try to figure out the problems with read-only tmp dirs.
36) Message boards : ATLAS Application : 8 core atlas native uses only 1 core. (Message 7763)
Posted 23 Aug 2022 by David Cameron
Post:
Also note that the tasks running here are very short and only process 2 events compared to 200 per task in production. This is to test things with a quick turnaround and not waste people's resources producing data that is not useful for science. Since the events are split between cores this means these short tasks will never use more than 2 CPUs.
37) Message boards : ATLAS Application : ATLAS native 1.23 (Message 7754)
Posted 18 Aug 2022 by David Cameron
Post:
This version explicitly mounts /tmp and /var/tmp into the container, to see if this fixes the errors seen in production.
38) Message boards : ATLAS Application : ATLAS native 1.22 (Message 7752)
Posted 18 Aug 2022 by David Cameron
Post:
It looks like there are a lot of failures with this version that were not picked up in testing so I reverted it in production and will try to debug here.

On one of my own hosts I have a mix of success (https://lhcathome.cern.ch/lhcathome/result.php?resultid=363399068) and failed (https://lhcathome.cern.ch/lhcathome/result.php?resultid=363399242) tasks.

The change in bind mounts seems to make some tmp directories read-only giving errors like:

Failed to execute payload:mktemp: failed to create file via template '/tmp/asetup_XXXXXX.sh': Read-only file system
39) Message boards : ATLAS Application : ATLAS native 1.22 (Message 7749)
Posted 18 Aug 2022 by David Cameron
Post:
Thanks for all the testing and feedback here, I just released this version as v2.90 on the production server
40) Message boards : ATLAS Application : ATLAS native 1.22 (Message 7748)
Posted 18 Aug 2022 by David Cameron
Post:
Thanks, so this confirms that the changes in 1.22 fix the errors seen in production. I will try to deploy this to production today.


Previous 20 · Next 20


©2024 CERN