Message boards : Theory Application : New Native App - Linux Only
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · Next

AuthorMessage
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 484
Credit: 394,839
RAC: 1
Message 6070 - Posted: 23 Feb 2019, 17:48:50 UTC - in response to Message 6069.  

There's something wrong regarding the runtime calculation.

Even when running single-core tasks, the cpu-time is ever (mostly) higher than the elapsed time.
When you have idle cores the application is stealing from the free core(s).

See my Linux tasks and my Windows Vbox tasks.

Right, this explains the higher CPU-times.
In the mentioned examples I don't understand why the runtime is much higher than the difference between app start and app finish.

BTW:
There is no visible delay at app start or app finish.
ID: 6070 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 6073 - Posted: 23 Feb 2019, 21:24:00 UTC - in response to Message 6063.  

Have 7 tasks in parallel, but slot-Nr. are shown up to 21!
Maybe, they are not deleted after finishing?


Yes there was an issue with a previous image where the slot directories were not be clean. Let me know if that is still the case.
ID: 6073 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 2
Message 6074 - Posted: 23 Feb 2019, 21:33:03 UTC - in response to Message 6073.  

Have cleaned the Slot-Nrs and will control it. Thank you.

A question about the length of the tasks.
They have a duration for example Pythia8 from 4 or 5 hours.
Are this tasks the same as for production?
Thinking they need longer. Can this be?
ID: 6074 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gyllic

Send message
Joined: 10 Mar 17
Posts: 40
Credit: 108,345
RAC: 0
Message 6076 - Posted: 24 Feb 2019, 10:20:21 UTC - in response to Message 6074.  

There are tasks that run for a couple of hundred seconds and some for a couple of hours. Its probably the normal fluctuations (different job types, different codes, different events types, ...). Using the old vbox app there have also been jobs that run for a couple of hours. Would be weird if the native jobs run slower compared to running within a VM.
ID: 6076 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 6080 - Posted: 24 Feb 2019, 21:13:33 UTC - in response to Message 6074.  

Have cleaned the Slot-Nrs and will control it. Thank you.

A question about the length of the tasks.
They have a duration for example Pythia8 from 4 or 5 hours.
Are this tasks the same as for production?
Thinking they need longer. Can this be?


The same jobs are run as in production. There is a mixture of job types and parameters so it is normal to have different runtimes.
ID: 6080 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 6081 - Posted: 24 Feb 2019, 21:19:11 UTC - in response to Message 6077.  

It's not against the very good work from Laurence for this new concept.
Sorry, if it is to criticle.

Thanks, negative feed back is important. It is better to have it here and try to resolve the issue than in production.
ID: 6081 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 484
Credit: 394,839
RAC: 1
Message 6093 - Posted: 26 Feb 2019, 15:03:03 UTC

Have 1 failed task in a row of valids.
Reason unknown.
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755317
ID: 6093 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 6095 - Posted: 26 Feb 2019, 18:27:57 UTC - in response to Message 6093.  

Same here after 8.5 hours run-time: EXIT_CHILD_FAILED

09:33:23 2019-02-26: cranky-0.0.24: [INFO] Running Container 'runc'.
18:11:23 2019-02-26: cranky-0.0.24: [ERROR] Container 'runc' failed.

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755265
ID: 6095 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gyllic

Send message
Joined: 10 Mar 17
Posts: 40
Credit: 108,345
RAC: 0
Message 6098 - Posted: 26 Feb 2019, 20:12:56 UTC - in response to Message 6095.  
Last modified: 26 Feb 2019, 20:13:19 UTC

had one of those to two days ago:
195 (0x000000C3) EXIT_CHILD_FAILED
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2754240
Dont think that this has something to do with the problem, but anyway:
I did a router restart while the task was running, so temporary there was no internet access available for the task.
ID: 6098 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 484
Credit: 394,839
RAC: 1
Message 6101 - Posted: 27 Feb 2019, 9:35:52 UTC - in response to Message 6070.  

There's something wrong regarding the runtime calculation.

Even when running single-core tasks, the cpu-time is ever (mostly) higher than the elapsed time.
When you have idle cores the application is stealing from the free core(s).

See my Linux tasks and my Windows Vbox tasks.

Right, this explains the higher CPU-times.
In the mentioned examples I don't understand why the runtime is much higher than the difference between app start and app finish.

BTW:
There is no visible delay at app start or app finish.

An upgrade of my BOINC client may have solved that issue.
ID: 6101 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 484
Credit: 394,839
RAC: 1
Message 6102 - Posted: 27 Feb 2019, 10:32:07 UTC - in response to Message 6101.  

... An upgrade of my BOINC client may have solved that issue.

Well, my fault.
I checked the first tasks after a reboot when the host was not yet under full load.
Under full load the runtime calculation still differs significantly compared to "finish time - starting time".
See examples:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755583
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755593
ID: 6102 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 6103 - Posted: 27 Feb 2019, 10:43:32 UTC - in response to Message 6102.  

Mostly the CPU-times are (much) higher than the elapsed times.

Run time 2 hours 10 min 54 sec
CPU time 3 hours 3 min 55 sec

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755539

Run time 1 hours 12 min 56 sec
CPU time 1 hours 40 min 16 sec

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755545
ID: 6103 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 484
Credit: 394,839
RAC: 1
Message 6104 - Posted: 27 Feb 2019, 11:26:46 UTC - in response to Message 6103.  

Your example:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755539

The log shows 2h 10m 38s between "wrapper starting" and "called boinc_finish".
Runtime shows 2h 10m 54s.

A difference of 16s.
Not very much, but in my eyes too much on a modern computer to fork/shutdown a process.

Was your computer fully loaded?
Is it bare metal or a VM?



My full load example on a bare metal computer shows a time difference of more than 10 minutes (!).
This seems to be stupid.
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755583
ID: 6104 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 6105 - Posted: 27 Feb 2019, 13:50:13 UTC - in response to Message 6104.  

Your example:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2755539
.
.
Was your computer fully loaded?
Is it bare metal or a VM?

That task was on a 4-core Linux VM on a Windows host.
Only 2 cores of the VM were used.

In your task example the run-time must be wrong. It must be a BOINC/vboxwrapper issue.
The cpu-time looks like the same as reported on the second last line in the result.
The elapsed and cpu time being exactly the same is suspicious.
ID: 6105 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 2
Message 6133 - Posted: 5 Mar 2019, 8:30:10 UTC

SL 76 show under TOP:
2 GByte RAM of Swap in use with 0 free.
The normal Memory is only for 50% in use (10Gbyte of 20 GByte).
Is this also in other native Linux so?
ID: 6133 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 6137 - Posted: 5 Mar 2019, 11:46:27 UTC - in response to Message 6133.  

Is this also in other native Linux so?
Ubuntu 18.10
top - 12:41:31 up  3:31,  1 user,  load average: 1,18, 1,29, 1,34
Tasks: 218 total,   2 running, 216 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,3 us,  0,7 sy, 28,2 ni, 70,8 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
MiB Mem :   5960,3 total,   3671,1 free,    890,1 used,   1399,1 buff/cache
MiB Swap:   1186,4 total,   1186,4 free,      0,0 used.   4771,1 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                 
23512 boinc     39  19  476848  89408  42576 S  72,1   1,5  58:46.48 Herwig                  
23399 boinc     39  19  292024  19660  13056 R  43,2   0,3  35:53.56 rivetvm.exe             
 7573 boinc     39  19    4132    184    144 S   0,3   0,0   0:00.01 sleep                   
 1370 boinc     30  10  166892  16492  13028 S   0,0   0,3   0:26.20 boinc                   
 7521 boinc     39  19    4132     36      0 S   0,0   0,0   0:00.00 sleep                   
21989 boinc     30  10    6052   3312   2832 S   0,0   0,1   0:06.79 wrapper_26015_x         
21991 boinc     39  19   20256   3600   3216 S   0,0   0,1   0:00.01 cranky-0.0.24           
22769 boinc     39  19  609124   6624   2024 S   0,0   0,1   0:00.04 runc                    
22779 boinc     39  19   17728    204      0 S   0,0   0,0   0:00.01 job                     
22796 boinc     39  19   18664   1748    628 S   0,0   0,0   0:02.95 runRivet.sh             
23398 boinc     39  19   18256    796      0 S   0,0   0,0   0:00.04 rungen.sh               
23400 boinc     39  19   18796   1888    624 S   0,0   0,0   0:03.63 runRivet.sh
ID: 6137 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 2
Message 6139 - Posted: 5 Mar 2019, 12:31:28 UTC - in response to Message 6137.  

Hi Crystal,
the same for you!
Theory-native use the Swap-Memory instead of the free normal Memory.
Atlas-native does this not!
Swap-Memory is normaly a Disk-File.
ID: 6139 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 484
Credit: 394,839
RAC: 1
Message 6140 - Posted: 5 Mar 2019, 13:15:34 UTC - in response to Message 6139.  

Hi Crystal,
the same for you!
Theory-native use the Swap-Memory instead of the free normal Memory.
Atlas-native does this not!
Swap-Memory is normaly a Disk-File.

MiB Swap:   1186,4 total,   1186,4 free,      0,0 used

This means:
Swap total: 1186,4 MB
Swap free: 1186,4 MB
Swap used: 0,0 MB

ATM no swap is used on this host.
ID: 6140 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 2
Message 6141 - Posted: 5 Mar 2019, 13:46:25 UTC - in response to Message 6140.  

Thank you, sorry,
SL69 total used free
SL76 total free used.
Have not seen, it is changed in TOP.
ID: 6141 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 2
Message 6162 - Posted: 7 Mar 2019, 10:39:12 UTC

In the -dev Folder of Boinc is a cranky-0.0.24 and a cranky-0.0.25.
The tasks show the protocol with cranky-0.0.24.
08:50:27 (20616): wrapper: running ../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.24 ()
ID: 6162 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · Next

Message boards : Theory Application : New Native App - Linux Only


©2024 CERN