Message boards :
ATLAS Application :
ATLAS long simulation 1.01
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 |
In a AMD FX-8370E is a Task running also with 6 Core - Now Finished atm 18 hour runtime and 4 days duration. No Computer for this 1k Collisions. Also CentOS8-VM. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2958037 |
Send message Joined: 19 Apr 15 Posts: 4 Credit: 71,032 RAC: 0 |
Just a hint for a monitoring one-liner. Greetings All, Running the following workunit, on a XEON E5-2620 V2 using 6 cores, OS is Linux Mint 20.01 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2958159 Using the information above that was posted by computezrmle, this was the output so far. 2021-03-20 14:09:34,009 ISFG4SimSvc INFO Event nr. 156 took 286.8 s. New average 292 +- 9.647 2021-03-20 14:14:26,078 ISFG4SimSvc INFO Event nr. 171 took 252.7 s. New average 269.4 +- 8.411 2021-03-20 14:14:15,815 ISFG4SimSvc INFO Event nr. 164 took 317.9 s. New average 280.9 +- 8.913 2021-03-20 14:13:36,407 ISFG4SimSvc INFO Event nr. 157 took 398.3 s. New average 292.1 +- 9.11 2021-03-20 14:11:33,962 ISFG4SimSvc INFO Event nr. 156 took 196.8 s. New average 293.1 +- 9.583 2021-03-20 14:13:03,249 ISFG4SimSvc INFO Event nr. 166 took 403.2 s. New average 276.1 +- 8.551 Workunit is still running at the moment, at 99.782% (running so far for 14hrs and 37 mins) Completed an ATLAS Native task from the main project and it completed successfully, to ensure integrity of system. Now awaiting for workunit to complete before further action on my behalf. Cheers |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Logfile entries written to stderr.txt at the beginning of the task are missing in the final report. They are required to identify a CVMFS or Singularity misconfiguration. |
Send message Joined: 11 Mar 16 Posts: 23 Credit: 68,680 RAC: 0 |
Logfile entries written to stderr.txt at the beginning of the task are missing in the final report. the beginning of the file is lost if the payload is started it might make sense to have a separate test application without a payload, only diagnostic log entries to check the suitability of the host and project settings (memory, threads, etc.) Once upon a time there was a similar Benchmark Application |
Send message Joined: 19 Apr 15 Posts: 4 Credit: 71,032 RAC: 0 |
Just a hint for a monitoring one-liner. Greetings The following workunit returned valid with a hits file. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2958159 Regards |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 15 |
My single 4-core task is crawling slowly. After 22 hours run time the 4 workers show: 2021-03-20 11:54:30,834 ISFG4SimSvc INFO Event nr. 66 took 1081 s. New average 1170 +- 64.03 2021-03-20 11:45:58,903 ISFG4SimSvc INFO Event nr. 66 took 968.2 s. New average 1166 +- 64.4 2021-03-20 11:52:57,758 ISFG4SimSvc INFO Event nr. 69 took 824.6 s. New average 1134 +- 61.04 2021-03-20 11:44:35,597 ISFG4SimSvc INFO Event nr. 64 took 1503 s. New average 1188 +- 56.21 I will definitively not run these long jobs on this machine when they go into production. |
Send message Joined: 20 Jun 17 Posts: 25 Credit: 5,472,506 RAC: 372 |
Unknown image format/type: /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img That command returns my host name. No errors. Delete it? Like sudo apt-get remove --auto-remove singularity Edit: 2 PCs are getting this. A 3rd where I just went through the setup thread on LHC now works, after installing gawk, at lease its starting to use some memory/CPU. The two PCs may have singularity installed via a repository some time ago instead of via cmake etc. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Is it this computer? https://lhcathomedev.cern.ch/lhcathome-dev/show_host_detail.php?hostid=3717 I would expect it to require an average of around 290-300 s per event. 1170 s would be very close to a factor of 4.0. Guess its a Linux guest VM running on a Windows host. Could you check if the guest really runs on 4 cores? The numbers make me suspect the VM is allowed to use only 1 core. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 15 |
Is it this computer?Yes it is. The Linux VM uses 4 threads from that i7 2600 (4 cores - 8 threads) The other 4 threads on the host are for 1 long-running multi-core PrimeGrid job and personal PC-usage. I expect that PG-job to be ready somewhere sunday evening / monday morning. That will speed up the ATLAS-task a bit. The Linux VM gets 50% CPU-usage all the time from the Win10 host. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 |
Tomorrow the same Test for https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=2064234 |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 |
cpuconsumptiontime: 190923 s - AMD Ryzen 9 3950X cpuconsumptiontime: 360969 s - AMD FX-8370E cpuconsumptiontime: 228732 s - AMD Ryzen 7 2700 All CentOS8-VM - 6 Cpu |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 15 |
Halfway my first and last (on this host) ATLAS long simulation: 2021-03-21 09:37:09,229 ISFG4SimSvc INFO Event nr. 124 took 2482 s. New average 1230 +- 47.41 2021-03-21 09:25:02,948 ISFG4SimSvc INFO Event nr. 122 took 595.1 s. New average 1246 +- 49.64 2021-03-21 09:39:32,900 ISFG4SimSvc INFO Event nr. 137 took 724.8 s. New average 1122 +- 43.15 2021-03-21 09:29:00,688 ISFG4SimSvc INFO Event nr. 127 took 1577 s. New average 1191 +- 41.43 |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Still wondering. Can you post a "top" output from that VM? |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 15 |
Tasks: 234 total, 6 running, 228 sleeping, 0 stopped, 0 zombie %Cpu(s): 1,9 us, 3,1 sy, 95,0 ni, 0,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st MiB Mem : 5960,4 total, 425,5 free, 3558,1 used, 1976,8 buff/cache MiB Swap: 1186,4 total, 1163,1 free, 23,3 used. 1982,9 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16248 boinc 39 19 2699732 1,9g 114680 R 98,0 32,6 2650:52 athena.py 16247 boinc 39 19 2693440 1,9g 114536 R 93,5 32,5 2650:43 athena.py 16251 boinc 39 19 2696040 1,9g 114388 R 93,5 32,6 2651:36 athena.py 16254 boinc 39 19 2696516 1,9g 116260 R 91,5 32,5 2650:58 athena.py 7961 boinc 39 19 1339856 77012 8168 S 2,9 1,3 91:32.37 python2 18231 boinc 39 19 1324 52 0 R 1,6 0,0 0:00.05 sh 15464 boinc 39 19 2665428 1,8g 136004 S 0,3 31,1 12:16.86 athena.py 1565 boinc 30 10 238772 19992 12284 S 0,0 0,3 12:42.12 boinc 3400 boinc 30 10 6180 3276 2768 S 0,0 0,1 10:31.57 wrapper_26015_x 3402 boinc 39 19 20124 3304 3004 S 0,0 0,1 0:00.01 run_atlas 3403 boinc 39 19 20124 268 0 S 0,0 0,0 0:00.00 run_atlas 3405 boinc 39 19 30700 1952 1724 S 0,0 0,0 0:00.00 awk 4425 boinc 39 19 634744 13568 9492 S 0,0 0,2 0:02.93 starter 4479 boinc 39 19 15148 3296 3024 S 0,0 0,1 0:00.10 sh 4527 boinc 39 19 4364 708 632 S 0,0 0,0 0:00.00 time 4528 boinc 39 19 15672 3608 2848 S 0,0 0,1 0:00.84 runpilot2-wrapp 10893 boinc 39 19 16076 4148 2864 S 0,0 0,1 0:13.02 bash |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
Nothing in there that explains the long event runtimes. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 15 |
It appeared on the server status page as the max runtime in hours of last 100 tasks: ATLAS very long simulation 428 28 11.15 (3.59 - 74.28) https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2958027 |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 |
Crystal, your task is the Winner in runtime ;-) This are your last lines from your task: [2021-03-22 15:17:28] -rw------- 1 boinc boinc 17756160 mrt 22 15:17 result.tar.gz [2021-03-22 15:17:28] -rw------- 1 boinc boinc 599 mrt 22 15:17 asKMDmxaQhynfZGDcpSWOuwoABFKDmABFKDm2IFNDmGDFKDm36iUdn.diag [2021-03-22 15:17:28] -rw-r--r-- 1 boinc boinc 8674 mrt 22 15:17 runtime_log.err [2021-03-22 15:17:28] -rw-r--r-- 1 boinc boinc 397274 mrt 22 15:17 stderr.txt [2021-03-22 15:17:28] HITS file was successfully produced: [2021-03-22 15:17:28] -rw------- 1 boinc boinc 667310809 mrt 22 15:14 shared/HITS.pool.root.1 [2021-03-22 15:17:28] *** Contents of shared directory: *** [2021-03-22 15:17:28] total 1017256 From my tasks is the line with the HITS file not showing: [2021-03-20 07:00:19] -rw-------. 1 boinc boinc 7219200 20. Mär 07:00 result.tar.gz [2021-03-20 07:00:19] -rw-------. 1 boinc boinc 580 20. Mär 07:00 L1xKDm3aQhynfZGDcpSWOuwoABFKDmABFKDm2IFNDmQDFKDmwxvb5m.diag [2021-03-20 07:00:19] -rw-r--r--. 1 boinc boinc 389757 20. M |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 15 |
Crystal,Yeah, I saw the upload of my 636MB HITS file. |
Send message Joined: 26 Feb 15 Posts: 26 Credit: 5,042,431 RAC: 207 |
Let's say you have a 32 core machine. Is it better to run a single task using 32 cores? Or 4 tasks using 8 cores each? Or 8 tasks using 4 cores each? I have tried all three configurations and the credits per hour work out to be roughly the same. As far as I can tell, there is no advantage to me regardless of configuration. So my question is, does the project have a preference? Reno, NV Team: SETI.USA |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 0 |
The tasks have 3 main phases. 1. setup 2. calculation 3. stage out Phases 1 and 3 run on 1 core. Phase 2 runs on n cores (in this case real POSIX threads) depending on your setup. At the end of phase 2 not all threads will finish their work at the same time, hence there will be some waste. This waste depends on 1. how long the average runtime for a single event is (see CP's extreme long runtimes) 2. If your computer idles the free cores or not. The more events a tasks processes in total the lower the influence of phases 1/3 and the waste. Running many low-core tasks concurrently requires more RAM and more total runtime but a bit less waste. Running a few high-core tasks requires less RAM and less total runtime but a bit more waste. David Cameron wrote: ATLAS systems cancel any tasks which have been queued for more than two days I would suggest to take this as a hint and use a setup that allows a task to finish within 1-1.5 days. Beside that it's a matter of personal preference. |
©2024 CERN