Message boards :
ATLAS Application :
ATLAS long simulation 1.01
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
Hi all, We are testing long tasks which run over 1000 events instead of 200, so please select "ATLAS very long simulation" from the app list and give us feedback here. It is currently native linux-only and a minimum of 4 CPUs is required. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
This Task using Vers.1.01, but seem to be to early, only 2 Collisions: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2957693 [2021-03-18 09:55:19] Singularity works [2021-03-18 09:56:09] Set ATHENA_PROC_NUMBER=7 [2021-03-18 09:56:09] Starting ATLAS job with PandaID=5002850558 [2021-03-18 09:56:09] Running command: /usr/bin/singularity exec --pwd /var/lib/boinc/slots/4 -B /cvmfs,/var /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img sh start_atlas.sh [2021-03-18 10:13:06] *** The last 200 lines of the pilot log: *** [2021-03-18 10:13:06] "externalCpuTime": 5, [2021-03-18 10:13:06] "processedEvents": 2, [2021-03-18 10:13:06] "trfPredata": null, [2021-03-18 10:13:06] "wallTime": 879 |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
There are still some short test tasks in the system which process 2 events. I've stopped submitting those short tasks now. The first batch of 10 real long tasks with 1000 events is now available. Once those are finished there will be more available. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
This 10 Tasks are faster as light for other Volunteers running. The message is, no Tasks avalaible. |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
There are more long tasks in the queue now. I finished one task successfully on one of my hosts and it took 12 hours using 4 cores. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
Thank you David, first longrunning is downloaded. https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=2064041 Edit: Percent is growing very fast to 100 (in less than one hour). Have 6 CPU's. Calculation run time: 8 hours for this 1000 Collisions. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Got 1 task but forgot to make the test account a member of the singularity group. :-( Now the queue is empty again. :-( <rsc_fpops_est> is currently set to 43200 GFLOPS. Suggest to increase it by factor 5 before the next batch goes out. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 70 |
I got 1 task for my 4-core Ubuntu VM with 6144MB RAM -> https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2958027 Originally estimated elapsed time to finish 40 minutes. It lasted 22 minutes before 4 athena.py's were running. Now the task is running for 1 hour (incl. initializing) and the 4 workers have done together 5 Events (338 seconds up to 1715 seconds) Deadline: 26 Mar 2021, 11:58:53 UTC |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Got a task that started fine. BOINC estimates a runtime of 4 min. ;-D This miscalculation could become a problem when the app moves over to prod since lots of computers will then pull lots of tasks. The more cores a computer reports the greater the miscalculation. The time to setup the task (~20 min until the worker threads start) is comparable to "ATLAS native short". The logfiles look fine, especially log.EVNTtoHITS. |
Send message Joined: 26 Feb 15 Posts: 26 Credit: 5,042,431 RAC: 910 |
I am getting nothing but errors: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2958101 <core_client_version>7.16.6</core_client_version> |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
...No cvmfs_config command found... This is an ATLAS native test. ATLAS native requires a local CVMFS client to be installed and correctly configured. See the message board threads over at the -prod pages. |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
I've increased the flops estimate by a factor 5 for new tasks. I think BOINC's estimation is also based on previous tasks and since they have been very short the estimation is very small. Note that for this app the max CPUs is 48, so if you have a big machine please give 48-core tasks a try! |
Send message Joined: 26 Feb 15 Posts: 26 Credit: 5,042,431 RAC: 910 |
...No cvmfs_config command found... I assumed "native" just meant meant that a VM was no longer needed (no vbox required). It's been so long since I ran the native thing, I forgot all about all the additional manual steps required. Reno, NV Team: SETI.USA |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
I assumed "native" just ... meant that a VM was no longer needed (no vbox required) That's correct. But it also means that you are responsible to provide the environment that otherwise would be provided by the VM. You are welcome for testing but should first make yourself familiar with the required environment. |
Send message Joined: 26 Feb 15 Posts: 26 Credit: 5,042,431 RAC: 910 |
Yes, I ran the native stuff years ago. It's just been so long that I forgot all about it. Reno, NV Team: SETI.USA |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Just a hint for a monitoring one-liner. Open a console and cd to your BOINC working directory. watch -n 50 -d -x sh -c "find ./slots -name \"AthenaMP.log\" |sort |xargs -n1 -I {} sh -c \"grep 'New average' {} |tail -n1\"" May still be a long list on a 48-core machine. :-) |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
40% done after 5 h 48 min (4 worker threads) Estimated time left: 8 h 15 min + stage-out 2021-03-19 19:26:52,535 ISFG4SimSvc INFO Event nr. 104 took 257.2 s. New average 186.6 +- 7.604 2021-03-19 19:25:21,541 ISFG4SimSvc INFO Event nr. 96 took 161.5 s. New average 200.6 +- 8.843 2021-03-19 19:25:38,851 ISFG4SimSvc INFO Event nr. 96 took 201 s. New average 201.4 +- 7.967 2021-03-19 19:27:27,660 ISFG4SimSvc INFO Event nr. 105 took 302.8 s. New average 183.6 +- 6.909 |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 1 |
https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=2064041 CentOS8-VM Ryzen 3950x 6 Core - 11.5 hour runtime. 650 MByte upload. Tomorrow the same Test for two Ryzen2700 with 6 Cores in a CentOS8-VM. In a AMD FX-8370E is a Task running also with 6 Core - Now 10 hour so long. |
Send message Joined: 20 Jun 17 Posts: 25 Credit: 5,472,506 RAC: 1,632 |
<core_client_version>7.9.3</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 23:47:51 (28346): wrapper (7.7.26015): starting 23:47:51 (28346): wrapper: running run_atlas (--nthreads 4) [2021-03-19 23:47:51] Arguments: --nthreads 4 [2021-03-19 23:47:51] Threads: 4 [2021-03-19 23:47:51] Checking for CVMFS [2021-03-19 23:47:52] Probing /cvmfs/atlas.cern.ch... OK [2021-03-19 23:47:53] Probing /cvmfs/atlas-condb.cern.ch... OK [2021-03-19 23:47:54] Probing /cvmfs/grid.cern.ch... OK [2021-03-19 23:47:56] VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE [2021-03-19 23:47:56] 2.5.2.0 28470 0 23296 81113 3 1 2621483 4194304 0 65024 0 0 n/a 0 0 http://cvmfs-s1bnl.opensciencegrid.org/cvmfs/atlas.cern.ch DIRECT 1 [2021-03-19 23:47:56] CVMFS is ok [2021-03-19 23:47:56] Using singularity image /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img [2021-03-19 23:47:56] Checking for singularity binary... [2021-03-19 23:47:56] Using singularity found in PATH at /usr/bin/singularity [2021-03-19 23:47:56] Running /usr/bin/singularity --version [2021-03-19 23:47:56] 2.4.2-dist [2021-03-19 23:47:56] Checking singularity works with /usr/bin/singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img hostname [2021-03-19 23:47:56] Singularity isnt working: [91mERROR : Unknown image format/type: /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img [2021-03-19 23:47:56] [0m[31mABORT : Retval = 255 [2021-03-19 23:47:56] [0m 23:57:56 (28346): run_atlas exited; CPU time 0.301484 23:57:56 (28346): app exit status: 0x1 23:57:56 (28346): called boinc_finish(195) </stderr_txt> ]]> singularity --version This returns a version as the check listed on main LHC forums. I've ran the native app before but it's been awhile. |
Send message Joined: 11 Mar 16 Posts: 23 Credit: 68,680 RAC: 0 |
Unknown image format/type: /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img try a version of singularity from the server /cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img hostname maybe a hint will appear, e.g. "unsquashfs not found" or "mkdir /home/boinc: permission denied" PS if it works - just delete the installed singularity |
©2024 CERN