Message boards :
Theory Application :
New Native App - Linux Only
Message board moderation
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · Next
Author | Message |
---|---|
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
There's something wrong regarding the runtime calculation. Right, this explains the higher CPU-times. In the mentioned examples I don't understand why the runtime is much higher than the difference between app start and app finish. BTW: There is no visible delay at app start or app finish. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Have 7 tasks in parallel, but slot-Nr. are shown up to 21! Yes there was an issue with a previous image where the slot directories were not be clean. Let me know if that is still the case. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 2 |
Have cleaned the Slot-Nrs and will control it. Thank you. A question about the length of the tasks. They have a duration for example Pythia8 from 4 or 5 hours. Are this tasks the same as for production? Thinking they need longer. Can this be? |
Send message Joined: 10 Mar 17 Posts: 40 Credit: 108,345 RAC: 0 |
There are tasks that run for a couple of hundred seconds and some for a couple of hours. Its probably the normal fluctuations (different job types, different codes, different events types, ...). Using the old vbox app there have also been jobs that run for a couple of hours. Would be weird if the native jobs run slower compared to running within a VM. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Have cleaned the Slot-Nrs and will control it. Thank you. The same jobs are run as in production. There is a mixture of job types and parameters so it is normal to have different runtimes. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
It's not against the very good work from Laurence for this new concept. Thanks, negative feed back is important. It is better to have it here and try to resolve the issue than in production. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
Have 1 failed task in a row of valids. Reason unknown. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755317 |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
Same here after 8.5 hours run-time: EXIT_CHILD_FAILED 09:33:23 2019-02-26: cranky-0.0.24: [INFO] Running Container 'runc'. 18:11:23 2019-02-26: cranky-0.0.24: [ERROR] Container 'runc' failed. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755265 |
Send message Joined: 10 Mar 17 Posts: 40 Credit: 108,345 RAC: 0 |
had one of those to two days ago: 195 (0x000000C3) EXIT_CHILD_FAILED https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2754240 Dont think that this has something to do with the problem, but anyway: I did a router restart while the task was running, so temporary there was no internet access available for the task. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
There's something wrong regarding the runtime calculation. An upgrade of my BOINC client may have solved that issue. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
... An upgrade of my BOINC client may have solved that issue. Well, my fault. I checked the first tasks after a reboot when the host was not yet under full load. Under full load the runtime calculation still differs significantly compared to "finish time - starting time". See examples: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755583 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755593 |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
Mostly the CPU-times are (much) higher than the elapsed times. Run time 2 hours 10 min 54 sec CPU time 3 hours 3 min 55 sec https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755539 Run time 1 hours 12 min 56 sec CPU time 1 hours 40 min 16 sec https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755545 |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
Your example: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755539 The log shows 2h 10m 38s between "wrapper starting" and "called boinc_finish". Runtime shows 2h 10m 54s. A difference of 16s. Not very much, but in my eyes too much on a modern computer to fork/shutdown a process. Was your computer fully loaded? Is it bare metal or a VM? My full load example on a bare metal computer shows a time difference of more than 10 minutes (!). This seems to be stupid. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755583 |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
Your example: That task was on a 4-core Linux VM on a Windows host. Only 2 cores of the VM were used. In your task example the run-time must be wrong. It must be a BOINC/vboxwrapper issue. The cpu-time looks like the same as reported on the second last line in the result. The elapsed and cpu time being exactly the same is suspicious. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 2 |
SL 76 show under TOP: 2 GByte RAM of Swap in use with 0 free. The normal Memory is only for 50% in use (10Gbyte of 20 GByte). Is this also in other native Linux so? |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 2 |
Is this also in other native Linux so?Ubuntu 18.10 top - 12:41:31 up 3:31, 1 user, load average: 1,18, 1,29, 1,34 Tasks: 218 total, 2 running, 216 sleeping, 0 stopped, 0 zombie %Cpu(s): 0,3 us, 0,7 sy, 28,2 ni, 70,8 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st MiB Mem : 5960,3 total, 3671,1 free, 890,1 used, 1399,1 buff/cache MiB Swap: 1186,4 total, 1186,4 free, 0,0 used. 4771,1 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 23512 boinc 39 19 476848 89408 42576 S 72,1 1,5 58:46.48 Herwig 23399 boinc 39 19 292024 19660 13056 R 43,2 0,3 35:53.56 rivetvm.exe 7573 boinc 39 19 4132 184 144 S 0,3 0,0 0:00.01 sleep 1370 boinc 30 10 166892 16492 13028 S 0,0 0,3 0:26.20 boinc 7521 boinc 39 19 4132 36 0 S 0,0 0,0 0:00.00 sleep 21989 boinc 30 10 6052 3312 2832 S 0,0 0,1 0:06.79 wrapper_26015_x 21991 boinc 39 19 20256 3600 3216 S 0,0 0,1 0:00.01 cranky-0.0.24 22769 boinc 39 19 609124 6624 2024 S 0,0 0,1 0:00.04 runc 22779 boinc 39 19 17728 204 0 S 0,0 0,0 0:00.01 job 22796 boinc 39 19 18664 1748 628 S 0,0 0,0 0:02.95 runRivet.sh 23398 boinc 39 19 18256 796 0 S 0,0 0,0 0:00.04 rungen.sh 23400 boinc 39 19 18796 1888 624 S 0,0 0,0 0:03.63 runRivet.sh |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 2 |
Hi Crystal, the same for you! Theory-native use the Swap-Memory instead of the free normal Memory. Atlas-native does this not! Swap-Memory is normaly a Disk-File. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
Hi Crystal, MiB Swap: 1186,4 total, 1186,4 free, 0,0 used This means: Swap total: 1186,4 MB Swap free: 1186,4 MB Swap used: 0,0 MB ATM no swap is used on this host. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 2 |
Thank you, sorry, SL69 total used free SL76 total free used. Have not seen, it is changed in TOP. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 2 |
In the -dev Folder of Boinc is a cranky-0.0.24 and a cranky-0.0.25. The tasks show the protocol with cranky-0.0.24. 08:50:27 (20616): wrapper: running ../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.24 () |
©2024 CERN