Thread 'New Native App

Author	Message
computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 527 Credit: 400,710 RAC: 0	Message 6070 - Posted: 23 Feb 2019, 17:48:50 UTC - in response to Message 6069. There's something wrong regarding the runtime calculation. Even when running single-core tasks, the cpu-time is ever (mostly) higher than the elapsed time. When you have idle cores the application is stealing from the free core(s). See my Linux tasks and my Windows Vbox tasks. Right, this explains the higher CPU-times. In the mentioned examples I don't understand why the runtime is much higher than the difference between app start and app finish. BTW: There is no visible delay at app start or app finish. ID: 6070 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 0	Message 6073 - Posted: 23 Feb 2019, 21:24:00 UTC - in response to Message 6063. Have 7 tasks in parallel, but slot-Nr. are shown up to 21! Maybe, they are not deleted after finishing? Yes there was an issue with a previous image where the slot directories were not be clean. Let me know if that is still the case. ID: 6073 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 474	Message 6074 - Posted: 23 Feb 2019, 21:33:03 UTC - in response to Message 6073. Have cleaned the Slot-Nrs and will control it. Thank you. A question about the length of the tasks. They have a duration for example Pythia8 from 4 or 5 hours. Are this tasks the same as for production? Thinking they need longer. Can this be? ID: 6074 · Rating: 0 · rate: / Reply Quote

gyllic Send message Joined: 10 Mar 17 Posts: 40 Credit: 108,345 RAC: 0	Message 6076 - Posted: 24 Feb 2019, 10:20:21 UTC - in response to Message 6074. There are tasks that run for a couple of hundred seconds and some for a couple of hours. Its probably the normal fluctuations (different job types, different codes, different events types, ...). Using the old vbox app there have also been jobs that run for a couple of hours. Would be weird if the native jobs run slower compared to running within a VM. ID: 6076 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 0	Message 6080 - Posted: 24 Feb 2019, 21:13:33 UTC - in response to Message 6074. Have cleaned the Slot-Nrs and will control it. Thank you. A question about the length of the tasks. They have a duration for example Pythia8 from 4 or 5 hours. Are this tasks the same as for production? Thinking they need longer. Can this be? The same jobs are run as in production. There is a mixture of job types and parameters so it is normal to have different runtimes. ID: 6080 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 0	Message 6081 - Posted: 24 Feb 2019, 21:19:11 UTC - in response to Message 6077. It's not against the very good work from Laurence for this new concept. Sorry, if it is to criticle. Thanks, negative feed back is important. It is better to have it here and try to resolve the issue than in production. ID: 6081 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 527 Credit: 400,710 RAC: 0	Message 6093 - Posted: 26 Feb 2019, 15:03:03 UTC Have 1 failed task in a row of valids. Reason unknown. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755317 ID: 6093 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1253 Credit: 1,008,613 RAC: 683	Message 6095 - Posted: 26 Feb 2019, 18:27:57 UTC - in response to Message 6093. Same here after 8.5 hours run-time: EXIT_CHILD_FAILED 09:33:23 2019-02-26: cranky-0.0.24: [INFO] Running Container 'runc'. 18:11:23 2019-02-26: cranky-0.0.24: [ERROR] Container 'runc' failed. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755265 ID: 6095 · Rating: 0 · rate: / Reply Quote

gyllic Send message Joined: 10 Mar 17 Posts: 40 Credit: 108,345 RAC: 0	Message 6098 - Posted: 26 Feb 2019, 20:12:56 UTC - in response to Message 6095. Last modified: 26 Feb 2019, 20:13:19 UTC had one of those to two days ago: 195 (0x000000C3) EXIT_CHILD_FAILED https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2754240 Dont think that this has something to do with the problem, but anyway: I did a router restart while the task was running, so temporary there was no internet access available for the task. ID: 6098 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 527 Credit: 400,710 RAC: 0	Message 6101 - Posted: 27 Feb 2019, 9:35:52 UTC - in response to Message 6070. There's something wrong regarding the runtime calculation. Even when running single-core tasks, the cpu-time is ever (mostly) higher than the elapsed time. When you have idle cores the application is stealing from the free core(s). See my Linux tasks and my Windows Vbox tasks. Right, this explains the higher CPU-times. In the mentioned examples I don't understand why the runtime is much higher than the difference between app start and app finish. BTW: There is no visible delay at app start or app finish. An upgrade of my BOINC client may have solved that issue. ID: 6101 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 527 Credit: 400,710 RAC: 0	Message 6102 - Posted: 27 Feb 2019, 10:32:07 UTC - in response to Message 6101. ... An upgrade of my BOINC client may have solved that issue. Well, my fault. I checked the first tasks after a reboot when the host was not yet under full load. Under full load the runtime calculation still differs significantly compared to "finish time - starting time". See examples: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755583 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755593 ID: 6102 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1253 Credit: 1,008,613 RAC: 683	Message 6103 - Posted: 27 Feb 2019, 10:43:32 UTC - in response to Message 6102. Mostly the CPU-times are (much) higher than the elapsed times. Run time 2 hours 10 min 54 sec CPU time 3 hours 3 min 55 sec https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755539 Run time 1 hours 12 min 56 sec CPU time 1 hours 40 min 16 sec https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755545 ID: 6103 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 527 Credit: 400,710 RAC: 0	Message 6104 - Posted: 27 Feb 2019, 11:26:46 UTC - in response to Message 6103. Your example: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755539 The log shows 2h 10m 38s between "wrapper starting" and "called boinc_finish". Runtime shows 2h 10m 54s. A difference of 16s. Not very much, but in my eyes too much on a modern computer to fork/shutdown a process. Was your computer fully loaded? Is it bare metal or a VM? My full load example on a bare metal computer shows a time difference of more than 10 minutes (!). This seems to be stupid. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755583 ID: 6104 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1253 Credit: 1,008,613 RAC: 683	Message 6105 - Posted: 27 Feb 2019, 13:50:13 UTC - in response to Message 6104. Your example: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755539 . . Was your computer fully loaded? Is it bare metal or a VM? That task was on a 4-core Linux VM on a Windows host. Only 2 cores of the VM were used. In your task example the run-time must be wrong. It must be a BOINC/vboxwrapper issue. The cpu-time looks like the same as reported on the second last line in the result. The elapsed and cpu time being exactly the same is suspicious. ID: 6105 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 474	Message 6133 - Posted: 5 Mar 2019, 8:30:10 UTC SL 76 show under TOP: 2 GByte RAM of Swap in use with 0 free. The normal Memory is only for 50% in use (10Gbyte of 20 GByte). Is this also in other native Linux so? ID: 6133 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1253 Credit: 1,008,613 RAC: 683	Message 6137 - Posted: 5 Mar 2019, 11:46:27 UTC - in response to Message 6133. Is this also in other native Linux so? Ubuntu 18.10 top - 12:41:31 up 3:31, 1 user, load average: 1,18, 1,29, 1,34 Tasks: 218 total, 2 running, 216 sleeping, 0 stopped, 0 zombie %Cpu(s): 0,3 us, 0,7 sy, 28,2 ni, 70,8 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st MiB Mem : 5960,3 total, 3671,1 free, 890,1 used, 1399,1 buff/cache MiB Swap: 1186,4 total, 1186,4 free, 0,0 used. 4771,1 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 23512 boinc 39 19 476848 89408 42576 S 72,1 1,5 58:46.48 Herwig 23399 boinc 39 19 292024 19660 13056 R 43,2 0,3 35:53.56 rivetvm.exe 7573 boinc 39 19 4132 184 144 S 0,3 0,0 0:00.01 sleep 1370 boinc 30 10 166892 16492 13028 S 0,0 0,3 0:26.20 boinc 7521 boinc 39 19 4132 36 0 S 0,0 0,0 0:00.00 sleep 21989 boinc 30 10 6052 3312 2832 S 0,0 0,1 0:06.79 wrapper_26015_x 21991 boinc 39 19 20256 3600 3216 S 0,0 0,1 0:00.01 cranky-0.0.24 22769 boinc 39 19 609124 6624 2024 S 0,0 0,1 0:00.04 runc 22779 boinc 39 19 17728 204 0 S 0,0 0,0 0:00.01 job 22796 boinc 39 19 18664 1748 628 S 0,0 0,0 0:02.95 runRivet.sh 23398 boinc 39 19 18256 796 0 S 0,0 0,0 0:00.04 rungen.sh 23400 boinc 39 19 18796 1888 624 S 0,0 0,0 0:03.63 runRivet.sh ID: 6137 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 474	Message 6139 - Posted: 5 Mar 2019, 12:31:28 UTC - in response to Message 6137. Hi Crystal, the same for you! Theory-native use the Swap-Memory instead of the free normal Memory. Atlas-native does this not! Swap-Memory is normaly a Disk-File. ID: 6139 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 527 Credit: 400,710 RAC: 0	Message 6140 - Posted: 5 Mar 2019, 13:15:34 UTC - in response to Message 6139. ]Hi Crystal, the same for you! Theory-native use the Swap-Memory instead of the free normal Memory. Atlas-native does this not! Swap-Memory is normaly a Disk-File.[/quote] [pre]MiB Swap: 1186,4 total, 1186,4 free, 0,0 used[/pre] This means: Swap total: 1186,4 MB Swap free: 1186,4 MB Swap used: 0,0 MB ATM no swap is used on this host. ID: 6140 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 474	Message 6141 - Posted: 5 Mar 2019, 13:46:25 UTC - in response to Message 6140. Thank you, sorry, SL69 total used free SL76 total free used. Have not seen, it is changed in TOP. ID: 6141 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 474	Message 6162 - Posted: 7 Mar 2019, 10:39:12 UTC In the -dev Folder of Boinc is a cranky-0.0.24 and a cranky-0.0.25. The tasks show the protocol with cranky-0.0.24. 08:50:27 (20616): wrapper: running ../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.24 () ID: 6162 · Rating: 0 · rate: / Reply Quote

Development for LHC@home