Message boards :
ATLAS Application :
ATLAS vbox and native 3.01
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 29 May 15 Posts: 147 Credit: 2,842,484 RAC: 0 |
Meanwhile I have finished several of the 500-Events-WUs and they all look fine so far |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
Those warnings can be ignored, and the first lines about SSL certificate failures should not be present in new jobs today. I looked in details at the logs of this task and it looked like it took 8 or 9 minutes to get going full steam. But still it could be that the initialisation phase is indeed longer in this new software. |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
I am trying out setting the total memory to 4GB (independent of #cores) for vbox tasks, since I see the memory used is always around 2.3GB. Let me know if you see any problems due to this. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Just started a 2nd native task with most of the CVMFS data from the CVMFS cache and most of the Frontier data from the Squid cache. Time to setup the working environment: ~14 min It has to be said that my box is slightly overloaded by intention running 30 CMS VMs, 4 Theory natives, 2 other CPU tasks and 2 GPU tasks concurrently plus that 4-core ATLAS test. Out of experience: on a partly loaded system I would expect a startup time of ~7-8 min. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 2 |
Threadripper PRO 3995WX 2023-03-15 09:51:35 (16452): Detected: vboxwrapper 26206 2023-03-15 09:51:35 (16452): Detected: BOINC client v7.20.2 2023-03-15 09:51:35 (16452): Detected: VirtualBox VboxManage Interface (Version: 6.1.40) 2023-03-15 09:51:36 (16452): Successfully copied 'init_data.xml' to the shared directory. 2023-03-15 09:51:36 (16452): Create VM. (boinc_4bb42867e31f84c6, slot#2) ---- 2023-03-15 09:52:37 (16452): Guest Log: ATHENA_CORE_NUMBER=5 2023-03-15 09:52:38 (16452): Guest Log: Apptainer command /cvmfs/atlas.cern.ch/repo/containers/sw/apptainer/x86_64-el7/current/bin/apptainer exec -B /cvmfs,/data,/home/atlas/RunAtlas /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7 2023-03-15 09:52:38 (16452): Guest Log: *** Starting ATLAS job. (PandaID=5753961892 taskID=32413688) *** < 1 min? |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
I think the others are referring to the time between starting the task and the task using all the CPU cores for processing, which is 5-20 mins depending on load/location/connection. The vbox tasks at the moment are getting the wrong memory setting (2241MB) - they still succeed but are using swap space so are slower. I'm working on fixing this. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 2 |
2023-03-15 09:52:38 (16452): Guest Log: *** Starting ATLAS job. (PandaID=5753961892 taskID=32413688) ***< 1 min? next entry after 6000 seconds is correct the difference to the line before. Top show no athena.py Is this correct? |
Send message Joined: 24 Oct 19 Posts: 170 Credit: 543,238 RAC: 1,075 |
I'm not a lucky man. My windows machine still not download Atlas wus... |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
You are running 3 computers 2 Windows 1 Linux Since your Linux computer got ATLAS native "Run native if available?" is set, but this blocks vbox tasks on Windows. To get native and vbox it is a must to connect Windows/Linux to different venues. The fact that your Linux tasks failed is caused by a missing local CVMFS client. |
Send message Joined: 29 May 15 Posts: 147 Credit: 2,842,484 RAC: 0 |
Top show no athena.pyYes, it should show a python task, with "Cores * 100%", so my 4-Core-WU shows the Python task with 400% CPU-Usage |
Send message Joined: 24 Oct 19 Posts: 170 Credit: 543,238 RAC: 1,075 |
Since your Linux computer got ATLAS native "Run native if available?" is set, but this blocks vbox tasks on Windows. Uh, i didn't know this thing. Thank you!! The fact that your Linux tasks failed is caused by a missing local CVMFS client. I know, i know. I have some problems with cvmfs on my linux box.... |
Send message Joined: 24 Oct 19 Posts: 170 Credit: 543,238 RAC: 1,075 |
It works Some considerations: - very very fast at the beginning of calculation, very very slow at the end - over 6hrs on 4 cores mobile cpu - 1200 points Not so bad! |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
It works Total runtime/CPU time doesn't look too bad. Usual tasks (on -prod) process 200 events while this one processed 500 events. => CPU time <180 s/event. ... very very fast at the beginning of calculation, very very slow at the end If you refer to BOINC's progress bar this might be misleading since lots of older tasks (on -dev) processed only 5-20 tasks. Those differences always confuse BOINC's runtime estimation. |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
The new ATLAS software is much faster than the old version, so each event takes 3 minutes compared to 5 minutes on average previously. This is why I tried 500 event tasks here compared to the usual 200 on prod. The downside of faster software is a larger HITS file to upload, so please let me know your opinions on whether you prefer quick tasks with smaller output files or longer tasks with larger output files. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
I would prefer quick tasks. Not because of the size of the HITS-file. The problem with ATLAS- (and CMS-) tasks is that they don't allow longer suspended periods. Not all users have fast machines with a lot of cores running 24/7. They are already happy when they can run a dual-core ATLAS-task during their daily machine uptime. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
I would prefer quick tasks. Agree, even on Linux running native 24/7. As long as ATLAS native doesn't allow snapshots shorter runtimes would make it easier to plan maintenance/upgrade windows. |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
Thanks for the input, then we will stay with 200 event tasks when this new version goes to prod. From the ATLAS side we prefer longer tasks mainly because it makes bookkeeping and data transfer easier, and in fact on the ATLAS grid we run 2000 event tasks now. But of course that is a very different environment and it's more important to keep volunteers happy here. |
Send message Joined: 29 May 15 Posts: 147 Credit: 2,842,484 RAC: 0 |
Sorry, but I'm against these smaller tasks. I think it was a good balance with the 200 Events-WUs and vote for 500 events in future My reasons: Downloads for the nrw VB-Tasks will be bigger, even if they still contain only 200 events. Did I understand this right ? The more in Download from 200 to 500 events for VB-Tasks will be small. Am I right ? Shure the upload will be bigger, but I guess that will only be a problem for Magic The new 200 Events WUs will more than double the downloads. Much more than necassary ! On my machines with modern CPUs, Ubuntu in a VM and a central squid the run-times differ between 2 and 2,5 hours for one 4-Core-WU, in future that will be 1 hours. Not really much, from my side of view to short. Perhaps you can make it configurable in LHC-Preferences and the user can switch between 200 and 500 tasks, that could help. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 2 |
David, we had the longrunner -native in Linux with 1.000 Collisions. Is it possible to reactivate it, maybe with those 2.000? So, we can be able to activate it or not. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Not sure if I correctly understand the new RAM setting for ATLAS VMs. Old: RAM is set according to 3000 MB + 900 MB * #cores New: Fixed RAM setting, currently 2241 MB, may become 4000 MB (?) Since the new VM should be able to run either old/new ATLAS version how will the RAM setting be configured? |
©2024 CERN