Message boards : ATLAS Application : ATLAS native 1.22
Message board moderation

To post messages, you must log in.

AuthorMessage
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 7732 - Posted: 15 Aug 2022, 13:31:16 UTC

Version 1.22 attempts to fix the errors like "failed to create /var/lib/condor directory: mkdir /var/lib/condor: permission denied" which are seen in some situations with certain apptainer versions.

The change is to mount only the current working directory (eg /var/lib/boinc/slots/0) into the container rather than the top level directory (eg /var).
ID: 7732 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 7733 - Posted: 15 Aug 2022, 14:08:39 UTC

This fixes the problem for one of my computers.
ID: 7733 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 7734 - Posted: 16 Aug 2022, 13:20:40 UTC - in response to Message 7733.  

I have also tested with Centos Stream 9 and it works fine with CVMFS and boinc installed from standard packages and apptainer from CVMFS: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3108671
ID: 7734 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 3
Message 7735 - Posted: 16 Aug 2022, 14:00:07 UTC - in response to Message 7734.  
Last modified: 16 Aug 2022, 14:37:50 UTC

This sounds good.
Last two days got no tasks for CentOS8-VM.
My fault, had native not avalaibled.
Edit: CentOS9-VM epel-release-9-3.el9
sudo yum install -y cvmfs - no success. also
sudo yum install boinc-client boinc-manager no success
CentOS8-VM singularity is local installed
https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=4354
ID: 7735 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 3
Message 7736 - Posted: 16 Aug 2022, 15:11:22 UTC - in response to Message 7735.  
Last modified: 16 Aug 2022, 15:49:03 UTC

Edit: CentOS9-VM epel-release-9-3.el9
sudo yum install -y cvmfs - no success. also
sudo yum install boinc-client boinc-manager no success

download must be https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest.noarch.rpm
my second fault for today, sorry.
cvmfs works now on CentOS9-VM!
Tomorrow a deeper look for installing boinc (PRE-Release 7.20.2 in CentOS9-VM)
ID: 7736 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 7737 - Posted: 16 Aug 2022, 16:13:54 UTC
Last modified: 16 Aug 2022, 16:14:06 UTC

I have already Atlas-Native running and would be happy to help testing here with apptainer.

At the moment, there is singulary installed on my Ubuntu 20.04.4 LTS

Can you tell me please the exact instructions how I can install apptainer to the boxes?

Is Apptainer the same as CentOS..... ?

yeti
ID: 7737 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 3
Message 7738 - Posted: 16 Aug 2022, 16:23:11 UTC - in response to Message 7737.  

aptainer is the new app instead of singularity.
David posted some instructions here and in production today.
ID: 7738 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 3
Message 7739 - Posted: 17 Aug 2022, 4:46:52 UTC - in response to Message 7738.  
Last modified: 17 Aug 2022, 5:46:29 UTC

no success for boinc 7.20.2 (pre-release)

Link from RedHat Customer Portal CentOS9 stream epel repo:
sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm

Boinc pre-release is now installed on one Threadripper 3995wx,
but project /lhc@home or lhc@home-dev are not reached.
ID: 7739 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 3
Message 7740 - Posted: 17 Aug 2022, 5:55:18 UTC - in response to Message 7739.  
Last modified: 17 Aug 2022, 6:46:06 UTC

cvmfs on both Threadripper 3995wx now active in CentOS9-VM.
First -native from Production is now running:
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10806627

Now -dev integrated:
https://lhcathomedev.cern.ch/lhcathome-dev/show_host_detail.php?hostid=4689
But only one Cpu for this CentOs9-VM.
Have to wait for the ending -native from production.
ID: 7740 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 7741 - Posted: 17 Aug 2022, 7:51:19 UTC - in response to Message 7737.  

I have already Atlas-Native running and would be happy to help testing here with apptainer.

At the moment, there is singulary installed on my Ubuntu 20.04.4 LTS

Can you tell me please the exact instructions how I can install apptainer to the boxes?

Is Apptainer the same as CentOS..... ?

yeti


Hi Yeti,

Apptainer from CVMFS works on Ubuntu, at least on one of my machines with Ubuntu 21.10. So you should not have to install anything locally, just let the tasks use the version from CVMFS. The fallback to local singularity is only in case apptainer from CVMFS does not work.
ID: 7741 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 3
Message 7742 - Posted: 17 Aug 2022, 8:46:43 UTC - in response to Message 7740.  

Have identical epel-release downloads.
One CentOS9-VM have Boinc pre-release 7.20.2 installed,
the other CentOS9-VM doesn't install it?
ID: 7742 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 3
Message 7743 - Posted: 17 Aug 2022, 10:07:57 UTC - in response to Message 7742.  

https://lhcathomedev.cern.ch/lhcathome-dev/show_host_detail.php?hostid=4690
This is now the second CentOS9-VM.
Had Boinc installed from link of dl...fedoraproject....next....
ID: 7743 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 7744 - Posted: 17 Aug 2022, 10:19:39 UTC

So far, I don't get any Atlas-WU.

BOINC-Client is 7.16.6, is this modern enough or do I need 7.20.x ?
ID: 7744 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 3
Message 7745 - Posted: 17 Aug 2022, 10:37:39 UTC - in response to Message 7744.  

Boinc is ok, but there are only small numbers of -native-Tasks.
Seeing also no new tasks.
ID: 7745 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 7746 - Posted: 17 Aug 2022, 20:01:52 UTC

Meanwhile I have got 18 WUs, for me it looks as if they all have run fine so far: https://lhcathomedev.cern.ch/lhcathome-dev/results.php?userid=250
ID: 7746 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 3
Message 7747 - Posted: 18 Aug 2022, 7:40:40 UTC

Have made a test with CentOS9-VM and NO Singularity installed in production.
Got this Error:
[2022-08-18 09:01:27] CVMFS is ok
[2022-08-18 09:01:27] Using singularity image /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7
[2022-08-18 09:01:27] Checking for singularity binary...
[2022-08-18 09:01:27] which: no singularity in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin)
[2022-08-18 09:01:27] Singularity is not installed, using version from CVMFS
[2022-08-18 09:01:27] Checking singularity works with /cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7 hostname
[2022-08-18 09:01:39] TRCOS9
[2022-08-18 09:01:39] Singularity works
[2022-08-18 09:01:42] Starting ATLAS job with PandaID=5565589513
[2022-08-18 09:01:42] Running command: /cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec --pwd /var/lib/boinc/slots/0 -B /cvmfs,/var /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7 sh start_atlas.sh
[2022-08-18 09:01:42] Job failed
[2022-08-18 09:01:42] FATAL: container creation failed: hook function for tag prelayer returns error: failed to create /var/lib/condor directory: mkdir /var/lib/condor: read-only file system
[2022-08-18 09:01:42] ./runtime_log
[2022-08-18 09:01:42] ./runtime_log.err
09:11:42 (665950): run_atlas exited; CPU time 0.198227
09:11:42 (665950): app exit status: 0x1
09:11:42 (665950): called boinc_finish(195)
After installing singularity it works now.

Ok, this is only a test with the production.
This new Version in -dev running well, with or without installing of singularity in the CentOS9-VM.
ID: 7747 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 7748 - Posted: 18 Aug 2022, 8:17:04 UTC - in response to Message 7747.  

Thanks, so this confirms that the changes in 1.22 fix the errors seen in production. I will try to deploy this to production today.
ID: 7748 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 7749 - Posted: 18 Aug 2022, 11:44:56 UTC

Thanks for all the testing and feedback here, I just released this version as v2.90 on the production server
ID: 7749 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 7752 - Posted: 18 Aug 2022, 14:36:24 UTC

It looks like there are a lot of failures with this version that were not picked up in testing so I reverted it in production and will try to debug here.

On one of my own hosts I have a mix of success (https://lhcathome.cern.ch/lhcathome/result.php?resultid=363399068) and failed (https://lhcathome.cern.ch/lhcathome/result.php?resultid=363399242) tasks.

The change in bind mounts seems to make some tmp directories read-only giving errors like:

Failed to execute payload:mktemp: failed to create file via template '/tmp/asetup_XXXXXX.sh': Read-only file system
ID: 7752 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : ATLAS Application : ATLAS native 1.22


©2024 CERN