Message boards : ATLAS Application : ATLAS vbox and native 3.01
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 7966 - Posted: 15 Mar 2023, 11:51:15 UTC

Meanwhile I have finished several of the 500-Events-WUs and they all look fine so far
ID: 7966 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 7968 - Posted: 15 Mar 2023, 12:56:20 UTC - in response to Message 7965.  

Those warnings can be ignored, and the first lines about SSL certificate failures should not be present in new jobs today.

I looked in details at the logs of this task and it looked like it took 8 or 9 minutes to get going full steam. But still it could be that the initialisation phase is indeed longer in this new software.
ID: 7968 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 7969 - Posted: 15 Mar 2023, 14:40:31 UTC

I am trying out setting the total memory to 4GB (independent of #cores) for vbox tasks, since I see the memory used is always around 2.3GB. Let me know if you see any problems due to this.
ID: 7969 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 482
Credit: 394,720
RAC: 0
Message 7970 - Posted: 15 Mar 2023, 15:00:22 UTC

Just started a 2nd native task with most of the CVMFS data from the CVMFS cache and most of the Frontier data from the Squid cache.

Time to setup the working environment: ~14 min

It has to be said that my box is slightly overloaded by intention running 30 CMS VMs, 4 Theory natives, 2 other CPU tasks and 2 GPU tasks concurrently plus that 4-core ATLAS test.
Out of experience: on a partly loaded system I would expect a startup time of ~7-8 min.
ID: 7970 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 32
Message 7971 - Posted: 15 Mar 2023, 16:08:11 UTC - in response to Message 7969.  
Last modified: 15 Mar 2023, 16:09:47 UTC

Threadripper PRO 3995WX
2023-03-15 09:51:35 (16452): Detected: vboxwrapper 26206
2023-03-15 09:51:35 (16452): Detected: BOINC client v7.20.2
2023-03-15 09:51:35 (16452): Detected: VirtualBox VboxManage Interface (Version: 6.1.40)
2023-03-15 09:51:36 (16452): Successfully copied 'init_data.xml' to the shared directory.
2023-03-15 09:51:36 (16452): Create VM. (boinc_4bb42867e31f84c6, slot#2)
----
2023-03-15 09:52:37 (16452): Guest Log: ATHENA_CORE_NUMBER=5
2023-03-15 09:52:38 (16452): Guest Log: Apptainer command /cvmfs/atlas.cern.ch/repo/containers/sw/apptainer/x86_64-el7/current/bin/apptainer exec -B /cvmfs,/data,/home/atlas/RunAtlas /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7
2023-03-15 09:52:38 (16452): Guest Log: *** Starting ATLAS job. (PandaID=5753961892 taskID=32413688) ***

< 1 min?
ID: 7971 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 7973 - Posted: 16 Mar 2023, 13:29:14 UTC - in response to Message 7971.  

I think the others are referring to the time between starting the task and the task using all the CPU cores for processing, which is 5-20 mins depending on load/location/connection.

The vbox tasks at the moment are getting the wrong memory setting (2241MB) - they still succeed but are using swap space so are slower. I'm working on fixing this.
ID: 7973 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 32
Message 7974 - Posted: 16 Mar 2023, 16:59:33 UTC - in response to Message 7971.  

2023-03-15 09:52:38 (16452): Guest Log: *** Starting ATLAS job. (PandaID=5753961892 taskID=32413688) ***< 1 min?

next entry after 6000 seconds is correct the difference to the line before.
Top show no athena.py
Is this correct?
ID: 7974 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
boboviz

Send message
Joined: 24 Oct 19
Posts: 167
Credit: 438,193
RAC: 411
Message 7975 - Posted: 16 Mar 2023, 17:18:10 UTC

I'm not a lucky man.
My windows machine still not download Atlas wus...
ID: 7975 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 482
Credit: 394,720
RAC: 0
Message 7976 - Posted: 16 Mar 2023, 17:45:44 UTC - in response to Message 7975.  

You are running 3 computers
2 Windows
1 Linux

Since your Linux computer got ATLAS native "Run native if available?" is set, but this blocks vbox tasks on Windows.
To get native and vbox it is a must to connect Windows/Linux to different venues.


The fact that your Linux tasks failed is caused by a missing local CVMFS client.
ID: 7976 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 7977 - Posted: 16 Mar 2023, 20:18:36 UTC - in response to Message 7974.  
Last modified: 16 Mar 2023, 20:18:54 UTC

Top show no athena.py
Is this correct?
Yes, it should show a python task, with "Cores * 100%", so my 4-Core-WU shows the Python task with 400% CPU-Usage
ID: 7977 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
boboviz

Send message
Joined: 24 Oct 19
Posts: 167
Credit: 438,193
RAC: 411
Message 7978 - Posted: 17 Mar 2023, 7:33:01 UTC - in response to Message 7976.  

Since your Linux computer got ATLAS native "Run native if available?" is set, but this blocks vbox tasks on Windows.
To get native and vbox it is a must to connect Windows/Linux to different venues.

Uh, i didn't know this thing. Thank you!!


The fact that your Linux tasks failed is caused by a missing local CVMFS client.

I know, i know. I have some problems with cvmfs on my linux box....
ID: 7978 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
boboviz

Send message
Joined: 24 Oct 19
Posts: 167
Credit: 438,193
RAC: 411
Message 7979 - Posted: 17 Mar 2023, 16:03:47 UTC
Last modified: 17 Mar 2023, 16:04:15 UTC

It works

Some considerations:
- very very fast at the beginning of calculation, very very slow at the end
- over 6hrs on 4 cores mobile cpu
- 1200 points

Not so bad!
ID: 7979 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 482
Credit: 394,720
RAC: 0
Message 7980 - Posted: 18 Mar 2023, 8:11:32 UTC - in response to Message 7979.  

It works

Some considerations:
- very very fast at the beginning of calculation, very very slow at the end
- over 6hrs on 4 cores mobile cpu
- 1200 points

Not so bad!

Total runtime/CPU time doesn't look too bad.
Usual tasks (on -prod) process 200 events while this one processed 500 events.
=> CPU time <180 s/event.


... very very fast at the beginning of calculation, very very slow at the end

If you refer to BOINC's progress bar this might be misleading since lots of older tasks (on -dev) processed only 5-20 tasks. Those differences always confuse BOINC's runtime estimation.
ID: 7980 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 7981 - Posted: 20 Mar 2023, 7:30:06 UTC - in response to Message 7980.  

The new ATLAS software is much faster than the old version, so each event takes 3 minutes compared to 5 minutes on average previously. This is why I tried 500 event tasks here compared to the usual 200 on prod.

The downside of faster software is a larger HITS file to upload, so please let me know your opinions on whether you prefer quick tasks with smaller output files or longer tasks with larger output files.
ID: 7981 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 34
Message 7982 - Posted: 20 Mar 2023, 8:07:28 UTC - in response to Message 7981.  

I would prefer quick tasks. Not because of the size of the HITS-file.
The problem with ATLAS- (and CMS-) tasks is that they don't allow longer suspended periods.
Not all users have fast machines with a lot of cores running 24/7.
They are already happy when they can run a dual-core ATLAS-task during their daily machine uptime.
ID: 7982 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 482
Credit: 394,720
RAC: 0
Message 7983 - Posted: 20 Mar 2023, 8:54:22 UTC - in response to Message 7982.  

I would prefer quick tasks.

Agree, even on Linux running native 24/7.
As long as ATLAS native doesn't allow snapshots shorter runtimes would make it easier to plan maintenance/upgrade windows.
ID: 7983 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 7984 - Posted: 20 Mar 2023, 9:28:28 UTC - in response to Message 7983.  

Thanks for the input, then we will stay with 200 event tasks when this new version goes to prod.

From the ATLAS side we prefer longer tasks mainly because it makes bookkeeping and data transfer easier, and in fact on the ATLAS grid we run 2000 event tasks now. But of course that is a very different environment and it's more important to keep volunteers happy here.
ID: 7984 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 7985 - Posted: 20 Mar 2023, 9:42:05 UTC - in response to Message 7984.  

Sorry, but I'm against these smaller tasks. I think it was a good balance with the 200 Events-WUs and vote for 500 events in future

My reasons:

Downloads for the nrw VB-Tasks will be bigger, even if they still contain only 200 events. Did I understand this right ?

The more in Download from 200 to 500 events for VB-Tasks will be small. Am I right ?

Shure the upload will be bigger, but I guess that will only be a problem for Magic

The new 200 Events WUs will more than double the downloads. Much more than necassary !

On my machines with modern CPUs, Ubuntu in a VM and a central squid the run-times differ between 2 and 2,5 hours for one 4-Core-WU, in future that will be 1 hours. Not really much, from my side of view to short.

Perhaps you can make it configurable in LHC-Preferences and the user can switch between 200 and 500 tasks, that could help.
ID: 7985 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 32
Message 8000 - Posted: 20 Mar 2023, 17:37:33 UTC - in response to Message 7984.  

David,
we had the longrunner -native in Linux with 1.000 Collisions.
Is it possible to reactivate it, maybe with those 2.000?
So, we can be able to activate it or not.
ID: 8000 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 482
Credit: 394,720
RAC: 0
Message 8001 - Posted: 21 Mar 2023, 6:12:37 UTC

Not sure if I correctly understand the new RAM setting for ATLAS VMs.

Old:
RAM is set according to 3000 MB + 900 MB * #cores

New:
Fixed RAM setting, currently 2241 MB, may become 4000 MB (?)

Since the new VM should be able to run either old/new ATLAS version how will the RAM setting be configured?
ID: 8001 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : ATLAS Application : ATLAS vbox and native 3.01


©2024 CERN