Message boards : Theory Application : New Muti-core version V1.9
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 159
Message 3694 - Posted: 14 Jul 2016, 16:53:30 UTC - in response to Message 3692.  
Last modified: 14 Jul 2016, 16:54:08 UTC

The default maximum number of slots in Condor is 10. I have just increased this to 16 and it will be active with new VMs in about an hour.

Go crazy :)
ID: 3694 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3696 - Posted: 14 Jul 2016, 17:11:28 UTC
Last modified: 14 Jul 2016, 17:12:27 UTC

One issue to consider is the ultra-long job.
I have had a number of them (maybe 1 in 50)

It will run, until it hits the 18h mark.
With multi-core tasks this would mean, that all other jobs will stop shortly after the 12h mark and sit idle until the 18h mark.
That sounds like a massive waste of computing time.
ID: 3696 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,685
RAC: 1,837
Message 3697 - Posted: 14 Jul 2016, 17:54:34 UTC
Last modified: 14 Jul 2016, 18:34:32 UTC

vLHCathome-dev 14 Jul 19:48:23 CEST Starting task Theory_3535_1468445018.400414_0

The one with the 8-core VM:

running-slot1.log 14-Jul-2016 19:53 9.2K
running-slot2.log 14-Jul-2016 19:51 3.1K
running-slot3.log 14-Jul-2016 19:51 2.9K
running-slot4.log 14-Jul-2016 19:53 27K
running-slot5.log 14-Jul-2016 19:53 28K
running-slot6.log 14-Jul-2016 19:51 2.7K
running-slot7.log 14-Jul-2016 19:51 3.1K
running-slot8.log 14-Jul-2016 19:51 2.1K

The next 8 jobs running:
===> [runRivet] Thu Jul 14 19:51:18 CEST 2016 [boinc pp winclusive 7000 -,-,10 - herwig++ 2.7.1 LHC-UE-EE-4 82000 360]
===> [runRivet] Thu Jul 14 19:51:28 CEST 2016 [boinc ee zhad 91.2 - - pythia6 6.428 349 100000 360]
===> [runRivet] Thu Jul 14 19:51:28 CEST 2016 [boinc pp jets 7000 50 - pythia6 6.428 360 75000 360]
===> [runRivet] Thu Jul 14 19:51:31 CEST 2016 [boinc pp jets 7000 25,-,480 - pythia8 8.186 tune-2c 100000 360]
===> [runRivet] Thu Jul 14 19:51:32 CEST 2016 [boinc pp jets 7000 20,-,610 - pythia8 8.186 default-noFsr 100000 360]
===> [runRivet] Thu Jul 14 19:51:32 CEST 2016 [boinc pp jets 7000 10 - pythia8 8.108.p1 default 100000 360]
===> [runRivet] Thu Jul 14 19:51:32 CEST 2016 [boinc ee zhad 91.2 - - pythia8 8.210 montull 100000 360]
===> [runRivet] Thu Jul 14 19:51:31 CEST 2016 [boinc ee zhad 189 - - vincia 1.2.02_8.210 jeppsson2 100000 360]

using together about 1.55 GB RAM from the 4GB I reserved.
ID: 3697 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,685
RAC: 1,837
Message 3698 - Posted: 14 Jul 2016, 19:23:07 UTC - in response to Message 3697.  
Last modified: 14 Jul 2016, 19:31:07 UTC

vLHCathome-dev 14 Jul 19:48:23 CEST Starting task Theory_3535_1468445018.400414_0

My first impression after 100,000,000,000,000 CPU cycles:

Using 8 threads is suboptimal. Expected about 90% CPU usage over the 1½ hours run time so far, but it's only 67.5%.
Running 8 single core VM's gives a much better performance.
When running at the same 90% Cap, the average CPU usage is over 90%, however the system was more sluggish.
ID: 3698 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 159
Message 3701 - Posted: 14 Jul 2016, 21:21:09 UTC - in response to Message 3698.  
Last modified: 14 Jul 2016, 21:21:46 UTC

We have seen similar behaviour in our data centre. I forget the cause of the top of my head.
ID: 3701 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,685
RAC: 1,837
Message 3705 - Posted: 15 Jul 2016, 5:59:07 UTC - in response to Message 3701.  
Last modified: 15 Jul 2016, 6:24:18 UTC

We have seen similar behaviour in our data centre. I forget the cause of the top of my head.

Maybe hypertheading and using more (virtual) (hard) cores than real available could be the cause.
I also noticed that the past event processing part is much slower than when using only 1 or 2 cores in a VM.
The maximum memory use I've seen was a bit over 2GB and 0k swap used.

Just passed the 12 hours elapsed time. Performance overall 69.45%
ID: 3705 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,685
RAC: 1,837
Message 3706 - Posted: 15 Jul 2016, 8:11:38 UTC

8-core VM finished:

http://lhcathomedev.cern.ch/vLHCathome-dev/result.php?resultid=221606

CPU time                 3.19 days
Elapsed time * 8 threads 4.55 days

Job INFO in stderr output missing for a great part. Saved log:

Output of the job wrapper may appear here.
19:51:18 +0200 2016-07-14 [INFO] New Job Starting in slot1
19:51:18 +0200 2016-07-14 [INFO] Condor JobID: 1194628 in slot1
19:51:23 +0200 2016-07-14 [INFO] MCPlots JobID: 31943081 in slot1
19:51:27 +0200 2016-07-14 [INFO] New Job Starting in slot2
19:51:27 +0200 2016-07-14 [INFO] New Job Starting in slot3
19:51:27 +0200 2016-07-14 [INFO] Condor JobID: 1194635 in slot2
19:51:27 +0200 2016-07-14 [INFO] Condor JobID: 1194642 in slot3
19:51:30 +0200 2016-07-14 [INFO] New Job Starting in slot4
19:51:30 +0200 2016-07-14 [INFO] New Job Starting in slot8
19:51:30 +0200 2016-07-14 [INFO] New Job Starting in slot7
19:51:30 +0200 2016-07-14 [INFO] New Job Starting in slot6
19:51:30 +0200 2016-07-14 [INFO] New Job Starting in slot5
19:51:30 +0200 2016-07-14 [INFO] Condor JobID: 1194648 in slot8
19:51:30 +0200 2016-07-14 [INFO] Condor JobID: 1194643 in slot4
19:51:30 +0200 2016-07-14 [INFO] Condor JobID: 1194647 in slot7
19:51:31 +0200 2016-07-14 [INFO] Condor JobID: 1194644 in slot5
19:51:31 +0200 2016-07-14 [INFO] Condor JobID: 1194646 in slot6
19:51:34 +0200 2016-07-14 [INFO] MCPlots JobID: 31943140 in slot2
19:51:34 +0200 2016-07-14 [INFO] MCPlots JobID: 31943115 in slot3
19:51:37 +0200 2016-07-14 [INFO] MCPlots JobID: 31943095 in slot4
19:51:37 +0200 2016-07-14 [INFO] MCPlots JobID: 31943171 in slot8
19:51:37 +0200 2016-07-14 [INFO] MCPlots JobID: 31943166 in slot5
19:51:37 +0200 2016-07-14 [INFO] MCPlots JobID: 31943129 in slot7
19:51:37 +0200 2016-07-14 [INFO] MCPlots JobID: 31943104 in slot6
20:53:31 +0200 2016-07-14 [INFO] Job finished in slot8 with 0.
20:54:12 +0200 2016-07-14 [INFO] New Job Starting in slot8
20:54:16 +0200 2016-07-14 [INFO] Condor JobID: 1181723 in slot8
20:54:34 +0200 2016-07-14 [INFO] MCPlots JobID: 31920578 in slot8
21:09:34 +0200 2016-07-14 [INFO] Job finished in slot5 with 0.
21:10:03 +0200 2016-07-14 [INFO] New Job Starting in slot5
21:10:07 +0200 2016-07-14 [INFO] Condor JobID: 1195400 in slot5
21:10:22 +0200 2016-07-14 [INFO] MCPlots JobID: 31943959 in slot5
21:30:26 +0200 2016-07-14 [INFO] Job finished in slot5 with 0.
21:31:28 +0200 2016-07-14 [INFO] New Job Starting in slot5
21:31:33 +0200 2016-07-14 [INFO] Condor JobID: 1195577 in slot5
21:31:53 +0200 2016-07-14 [INFO] MCPlots JobID: 31944066 in slot5
22:21:35 +0200 2016-07-14 [INFO] Job finished in slot1 with 0.
22:21:49 +0200 2016-07-14 [INFO] New Job Starting in slot1
22:21:53 +0200 2016-07-14 [INFO] Condor JobID: 1196063 in slot1
22:22:09 +0200 2016-07-14 [INFO] MCPlots JobID: 31944519 in slot1
22:26:01 +0200 2016-07-14 [INFO] Job finished in slot8 with 0.
22:26:16 +0200 2016-07-14 [INFO] New Job Starting in slot8
22:26:18 +0200 2016-07-14 [INFO] Condor JobID: 1195514 in slot8
22:26:33 +0200 2016-07-14 [INFO] MCPlots JobID: 31944030 in slot8
22:27:27 +0200 2016-07-14 [INFO] Job finished in slot7 with 0.
22:27:47 +0200 2016-07-14 [INFO] New Job Starting in slot7
22:27:50 +0200 2016-07-14 [INFO] Condor JobID: 1196132 in slot7
22:28:06 +0200 2016-07-14 [INFO] Job finished in slot3 with 0.
22:28:06 +0200 2016-07-14 [INFO] MCPlots JobID: 31944727 in slot7
22:28:18 +0200 2016-07-14 [INFO] New Job Starting in slot3
22:28:21 +0200 2016-07-14 [INFO] Condor JobID: 1196142 in slot3
22:28:43 +0200 2016-07-14 [INFO] Job finished in slot2 with 0.
22:28:46 +0200 2016-07-14 [INFO] MCPlots JobID: 31944655 in slot3
22:29:02 +0200 2016-07-14 [INFO] New Job Starting in slot2
22:29:07 +0200 2016-07-14 [INFO] Condor JobID: 1196147 in slot2
22:29:19 +0200 2016-07-14 [INFO] MCPlots JobID: 31944723 in slot2
22:51:35 +0200 2016-07-14 [INFO] Job finished in slot5 with 0.
22:52:03 +0200 2016-07-14 [INFO] New Job Starting in slot5
22:52:12 +0200 2016-07-14 [INFO] Condor JobID: 1196357 in slot5
22:52:32 +0200 2016-07-14 [INFO] MCPlots JobID: 31944853 in slot5
23:19:46 +0200 2016-07-14 [INFO] Job finished in slot5 with 0.
23:20:07 +0200 2016-07-14 [INFO] New Job Starting in slot5
23:20:10 +0200 2016-07-14 [INFO] Condor JobID: 1196585 in slot5
23:20:12 +0200 2016-07-14 [INFO] Job finished in slot4 with 0.
23:20:26 +0200 2016-07-14 [INFO] MCPlots JobID: 31944967 in slot5
23:21:02 +0200 2016-07-14 [INFO] New Job Starting in slot4
23:21:05 +0200 2016-07-14 [INFO] Condor JobID: 1196579 in slot4
23:21:19 +0200 2016-07-14 [INFO] MCPlots JobID: 31945008 in slot4
23:22:23 +0200 2016-07-14 [INFO] Job finished in slot1 with 0.
23:22:32 +0200 2016-07-14 [INFO] New Job Starting in slot1
23:22:34 +0200 2016-07-14 [INFO] Condor JobID: 1196605 in slot1
23:22:48 +0200 2016-07-14 [INFO] MCPlots JobID: 31945063 in slot1
23:28:45 +0200 2016-07-14 [INFO] Job finished in slot2 with 0.
23:29:04 +0200 2016-07-14 [INFO] New Job Starting in slot2
23:29:08 +0200 2016-07-14 [INFO] Condor JobID: 1196653 in slot2
23:29:26 +0200 2016-07-14 [INFO] MCPlots JobID: 31945085 in slot2
23:43:30 +0200 2016-07-14 [INFO] Job finished in slot3 with 0.
23:43:42 +0200 2016-07-14 [INFO] New Job Starting in slot3
23:43:46 +0200 2016-07-14 [INFO] Condor JobID: 1196777 in slot3
23:44:00 +0200 2016-07-14 [INFO] MCPlots JobID: 31945213 in slot3
00:06:50 +0200 2016-07-15 [INFO] Job finished in slot5 with 0.
00:07:06 +0200 2016-07-15 [INFO] New Job Starting in slot5
00:07:12 +0200 2016-07-15 [INFO] Condor JobID: 1197003 in slot5
00:07:40 +0200 2016-07-15 [INFO] MCPlots JobID: 31945363 in slot5
00:19:56 +0200 2016-07-15 [INFO] Job finished in slot5 with 0.
00:20:09 +0200 2016-07-15 [INFO] New Job Starting in slot5
00:20:14 +0200 2016-07-15 [INFO] Condor JobID: 1197113 in slot5
00:20:33 +0200 2016-07-15 [INFO] MCPlots JobID: 31945503 in slot5
00:31:28 +0200 2016-07-15 [INFO] Job finished in slot6 with 0.
00:31:42 +0200 2016-07-15 [INFO] New Job Starting in slot6
00:31:43 +0200 2016-07-15 [INFO] Condor JobID: 1184238 in slot6
00:31:52 +0200 2016-07-15 [INFO] MCPlots JobID: 31923120 in slot6
00:40:12 +0200 2016-07-15 [INFO] Job finished in slot3 with 0.
00:40:28 +0200 2016-07-15 [INFO] New Job Starting in slot3
00:40:32 +0200 2016-07-15 [INFO] Condor JobID: 1197225 in slot3
00:40:49 +0200 2016-07-15 [INFO] MCPlots JobID: 31945511 in slot3
01:16:29 +0200 2016-07-15 [INFO] Job finished in slot3 with 0.
01:16:46 +0200 2016-07-15 [INFO] New Job Starting in slot3
01:16:54 +0200 2016-07-15 [INFO] Condor JobID: 1197551 in slot3
01:17:13 +0200 2016-07-15 [INFO] MCPlots JobID: 31945892 in slot3
01:25:07 +0200 2016-07-15 [INFO] Job finished in slot3 with 0.
01:25:20 +0200 2016-07-15 [INFO] New Job Starting in slot3
01:25:26 +0200 2016-07-15 [INFO] Condor JobID: 1197651 in slot3
01:25:46 +0200 2016-07-15 [INFO] MCPlots JobID: 31946037 in slot3
01:36:04 +0200 2016-07-15 [INFO] Job finished in slot2 with 0.
01:36:16 +0200 2016-07-15 [INFO] New Job Starting in slot2
01:36:20 +0200 2016-07-15 [INFO] Condor JobID: 1197736 in slot2
01:36:34 +0200 2016-07-15 [INFO] MCPlots JobID: 31946203 in slot2
01:37:12 +0200 2016-07-15 [INFO] Job finished in slot6 with 0.
01:37:22 +0200 2016-07-15 [INFO] New Job Starting in slot6
01:37:27 +0200 2016-07-15 [INFO] Condor JobID: 1197742 in slot6
01:37:42 +0200 2016-07-15 [INFO] MCPlots JobID: 31946103 in slot6
01:45:44 +0200 2016-07-15 [INFO] Job finished in slot5 with 0.
01:46:04 +0200 2016-07-15 [INFO] New Job Starting in slot5
01:46:15 +0200 2016-07-15 [INFO] Condor JobID: 1197819 in slot5
01:46:26 +0200 2016-07-15 [INFO] MCPlots JobID: 31946208 in slot5
02:00:24 +0200 2016-07-15 [INFO] Job finished in slot4 with 0.
02:00:37 +0200 2016-07-15 [INFO] New Job Starting in slot4
02:00:39 +0200 2016-07-15 [INFO] Condor JobID: 1197961 in slot4
02:00:54 +0200 2016-07-15 [INFO] MCPlots JobID: 31946320 in slot4
02:00:55 +0200 2016-07-15 [INFO] Job finished in slot1 with 0.
02:01:08 +0200 2016-07-15 [INFO] New Job Starting in slot1
02:01:12 +0200 2016-07-15 [INFO] Condor JobID: 1197967 in slot1
02:01:27 +0200 2016-07-15 [INFO] MCPlots JobID: 31946284 in slot1
02:05:28 +0200 2016-07-15 [INFO] Job finished in slot3 with 0.
02:05:48 +0200 2016-07-15 [INFO] New Job Starting in slot3
02:05:54 +0200 2016-07-15 [INFO] Condor JobID: 1184136 in slot3
02:06:14 +0200 2016-07-15 [INFO] MCPlots JobID: 31922952 in slot3
02:44:41 +0200 2016-07-15 [INFO] Job finished in slot2 with 0.
02:45:04 +0200 2016-07-15 [INFO] New Job Starting in slot2
02:45:10 +0200 2016-07-15 [INFO] Condor JobID: 1185481 in slot2
02:45:43 +0200 2016-07-15 [INFO] MCPlots JobID: 31924076 in slot2
03:03:24 +0200 2016-07-15 [INFO] Job finished in slot8 with 0.
03:04:03 +0200 2016-07-15 [INFO] New Job Starting in slot8
03:04:13 +0200 2016-07-15 [INFO] Condor JobID: 1198630 in slot8
03:05:00 +0200 2016-07-15 [INFO] MCPlots JobID: 31946962 in slot8
03:23:09 +0200 2016-07-15 [INFO] Job finished in slot1 with 0.
03:23:45 +0200 2016-07-15 [INFO] New Job Starting in slot1
03:23:58 +0200 2016-07-15 [INFO] Condor JobID: 1198798 in slot1
03:24:45 +0200 2016-07-15 [INFO] MCPlots JobID: 31947074 in slot1
04:34:13 +0200 2016-07-15 [INFO] Job finished in slot5 with 0.
04:34:28 +0200 2016-07-15 [INFO] New Job Starting in slot5
04:34:31 +0200 2016-07-15 [INFO] Condor JobID: 1199471 in slot5
04:34:39 +0200 2016-07-15 [INFO] MCPlots JobID: 31947770 in slot5
04:36:00 +0200 2016-07-15 [INFO] Job finished in slot7 with 0.
04:36:07 +0200 2016-07-15 [INFO] New Job Starting in slot7
04:36:09 +0200 2016-07-15 [INFO] Condor JobID: 1185631 in slot7
04:36:24 +0200 2016-07-15 [INFO] MCPlots JobID: 31924956 in slot7
04:36:39 +0200 2016-07-15 [INFO] Job finished in slot1 with 0.
04:36:58 +0200 2016-07-15 [INFO] New Job Starting in slot1
04:37:06 +0200 2016-07-15 [INFO] Condor JobID: 1186234 in slot1
04:37:38 +0200 2016-07-15 [INFO] MCPlots JobID: 31925783 in slot1
04:45:36 +0200 2016-07-15 [INFO] Job finished in slot3 with 0.
04:45:37 +0200 2016-07-15 [INFO] Job finished in slot1 with 0.
04:45:45 +0200 2016-07-15 [INFO] New Job Starting in slot3
04:45:46 +0200 2016-07-15 [INFO] New Job Starting in slot1
04:45:46 +0200 2016-07-15 [INFO] Job finished in slot4 with 0.
04:45:47 +0200 2016-07-15 [INFO] Condor JobID: 1199560 in slot3
04:45:47 +0200 2016-07-15 [INFO] Condor JobID: 1199561 in slot1
04:45:56 +0200 2016-07-15 [INFO] MCPlots JobID: 31947752 in slot1
04:45:58 +0200 2016-07-15 [INFO] New Job Starting in slot4
04:45:59 +0200 2016-07-15 [INFO] Condor JobID: 1199562 in slot4
04:46:02 +0200 2016-07-15 [INFO] MCPlots JobID: 31947742 in slot3
04:46:08 +0200 2016-07-15 [INFO] MCPlots JobID: 31947801 in slot4
05:10:46 +0200 2016-07-15 [INFO] Job finished in slot5 with 0.
05:11:08 +0200 2016-07-15 [INFO] New Job Starting in slot5
05:11:14 +0200 2016-07-15 [INFO] Condor JobID: 1199785 in slot5
05:11:45 +0200 2016-07-15 [INFO] MCPlots JobID: 31948070 in slot5
05:32:37 +0200 2016-07-15 [INFO] Job finished in slot8 with 0.
05:32:56 +0200 2016-07-15 [INFO] New Job Starting in slot8
05:33:02 +0200 2016-07-15 [INFO] Condor JobID: 1200007 in slot8
05:33:52 +0200 2016-07-15 [INFO] MCPlots JobID: 31948260 in slot8
05:43:04 +0200 2016-07-15 [INFO] Job finished in slot6 with 0.
05:43:45 +0200 2016-07-15 [INFO] New Job Starting in slot6
05:43:58 +0200 2016-07-15 [INFO] Condor JobID: 1200104 in slot6
05:44:46 +0200 2016-07-15 [INFO] MCPlots JobID: 31948239 in slot6
06:31:59 +0200 2016-07-15 [INFO] Job finished in slot5 with 0.
06:32:24 +0200 2016-07-15 [INFO] New Job Starting in slot5
06:32:38 +0200 2016-07-15 [INFO] Condor JobID: 1200605 in slot5
06:33:06 +0200 2016-07-15 [INFO] MCPlots JobID: 31948893 in slot5
06:41:37 +0200 2016-07-15 [INFO] Job finished in slot3 with 0.
06:41:52 +0200 2016-07-15 [INFO] New Job Starting in slot3
06:41:56 +0200 2016-07-15 [INFO] Condor JobID: 1200671 in slot3
06:42:32 +0200 2016-07-15 [INFO] MCPlots JobID: 31948829 in slot3
07:04:27 +0200 2016-07-15 [INFO] Job finished in slot1 with 0.
07:05:11 +0200 2016-07-15 [INFO] New Job Starting in slot1
07:05:18 +0200 2016-07-15 [INFO] Condor JobID: 1200904 in slot1
07:05:42 +0200 2016-07-15 [INFO] MCPlots JobID: 31949201 in slot1
07:07:33 +0200 2016-07-15 [INFO] Job finished in slot6 with 0.
07:07:47 +0200 2016-07-15 [INFO] New Job Starting in slot6
07:07:50 +0200 2016-07-15 [INFO] Condor JobID: 1200939 in slot6
07:08:11 +0200 2016-07-15 [INFO] MCPlots JobID: 31949138 in slot6
07:36:08 +0200 2016-07-15 [INFO] Job finished in slot2 with 0.
07:36:44 +0200 2016-07-15 [INFO] New Job Starting in slot2
07:36:53 +0200 2016-07-15 [INFO] Condor JobID: 1201223 in slot2
07:37:46 +0200 2016-07-15 [INFO] MCPlots JobID: 31949405 in slot2
08:02:32 +0200 2016-07-15 [INFO] Job finished in slot2 with 0.
08:04:32 +0200 2016-07-15 [INFO] Job finished in slot8 with 0.
08:05:46 +0200 2016-07-15 [INFO] Job finished in slot6 with 0.
08:06:43 +0200 2016-07-15 [INFO] Job finished in slot7 with 0.
08:17:12 +0200 2016-07-15 [INFO] Job finished in slot3 with 0.
08:30:04 +0200 2016-07-15 [INFO] Job finished in slot4 with 0.
09:00:21 +0200 2016-07-15 [INFO] Job finished in slot5 with 0.
09:27:35 +0200 2016-07-15 [INFO] Job finished in slot1 with 0.
ID: 3706 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,685
RAC: 1,837
Message 3708 - Posted: 15 Jul 2016, 8:58:08 UTC

In the BIOS I set Hyper-threading off and started a new mt-task with 4 cores, 2048 MB dedicated RAM and execution cap set at 90%.
At the moment 1.1 GB is used and 0k swap.
After 32 minutes run time the average cpu is already higher (72%) taking into account the low cpu demanding startup of the VM and initialization of the first 4 jobs.
ID: 3708 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 329,589
RAC: 159
Message 3710 - Posted: 15 Jul 2016, 9:28:54 UTC - in response to Message 3706.  
Last modified: 15 Jul 2016, 9:29:42 UTC

Lost wall time due to jobs finishing = 7.4%
Overall CPU efficiency = 70.1%
CPU efficiency considering lost wall time = 80.3%

Here is a blog post and presentation covering some of the fun with using multi-core VMs internally for our batch work nodes in OpenStack.

It seems that there is some scope for some tuning.
ID: 3710 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,685
RAC: 1,837
Message 3711 - Posted: 15 Jul 2016, 11:26:46 UTC - in response to Message 3710.  

Here is a blog post and presentation covering some of the fun with using multi-core VMs internally for our batch work nodes in OpenStack.

Thanks for sharing!

4-core VM uptime 3 hours now (9 jobs finished) and cpu average 77.83% for the 3 VBoxHeadless.exe's together.
ID: 3711 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,918,014
RAC: 2,503
Message 3712 - Posted: 15 Jul 2016, 13:35:11 UTC - in response to Message 3710.  
Last modified: 15 Jul 2016, 13:36:25 UTC

With a bit lot of help from Laurence, I got an 8-core Theory task running on my 12-core box (along with a -dev CMS and a LHC@Home CMS task; I've suspended SETI@Home for the moment). top says:

top - 14:23:10 up 51 days,  3:46,  1 user,  load average: 4.02, 2.92, 2.46
Tasks: 385 total,   2 running, 382 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.6%us, 73.5%sy,  1.8%ni, 24.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  65936080k total, 24008816k used, 41927264k free,   717600k buffers
Swap: 62498812k total,        0k used, 62498812k free, 14744484k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
 5915 eesridr   39  19 4613m 2.6g 2.4g S 660.4  4.1  15:14.84 VBoxHeadless      
20779 eesridr   39  19 3634m 2.1g 2.0g S 100.6  3.3  72:11.35 VBoxHeadless      
20785 eesridr   39  19 3564m 2.1g 2.0g S 100.6  3.3  73:55.45 VBoxHeadless      
20643 eesridr   39  19  778m  11m 7436 S  9.6  0.0   4:31.85 VBoxSVC            
20840 eesridr   20   0  402m  26m  16m S  6.3  0.0   2:21.56 boincmgr           
20635 eesridr   39  19  195m 3504 1988 S  4.3  0.0   2:04.58 VBoxXPCOMIPCD

ID: 3712 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,918,014
RAC: 2,503
Message 3713 - Posted: 15 Jul 2016, 13:45:56 UTC - in response to Message 3712.  


ID: 3713 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,685
RAC: 1,837
Message 3719 - Posted: 16 Jul 2016, 5:34:44 UTC

I returned my 4-core VM-task (HT off)
Performance by 90% Exec.Cap 79%
In the 8-core version (HT on) I did 51 jobs and with 4 core 55 jobs.
After the 12 hour elapsed time was over and just finishing last jobs, I set the Exec.Cap to 100% and the performance went during that period to 96-97%.

I returned to the 3-core VM version (HT on), but set the max. threads to use to 50% (4 cores), Exec.Cap to 100% and another BOINC task on the 4th thread.
ID: 3719 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,918,014
RAC: 2,503
Message 3721 - Posted: 16 Jul 2016, 10:50:53 UTC - in response to Message 3719.  

I tried 20 cores but the tasks errored out, so I dropped that to eight and it seemed to work. I got an invalid on one 8-core task, valid on another.
http://lhcathomedev.cern.ch/vLHCathome-dev/results.php?userid=9&offset=0&show_names=0&state=0&appid=4
ID: 3721 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,685
RAC: 1,837
Message 3722 - Posted: 16 Jul 2016, 13:15:39 UTC - in response to Message 3721.  
Last modified: 16 Jul 2016, 13:16:29 UTC

Your 20-core VM's were not created properly and could not start. Maybe because of this command:

Command: VBoxManage -q controlvm "boinc_abe8f0dfb3126515" keyboardputscancode 0x39
Exit Code: 0
Output:
VBoxManage: error: Error: '0x39' is not a hex byte!


Directly after that error the VM was powered off.

Your 8-core VM wasn't a performance beast either (55%), cause 7 jobs ended within elapse hour 12 and 13, but it seems job 8 never ended.
ID: 3722 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,918,014
RAC: 2,503
Message 3723 - Posted: 16 Jul 2016, 16:49:26 UTC - in response to Message 3722.  

Can you reiterate here what parameters I should be using to try 20-core jobs again? I'll go and check with Google, but it would be worthwhile having them explicitly given here. Thanks.
ID: 3723 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,685
RAC: 1,837
Message 3724 - Posted: 16 Jul 2016, 17:24:05 UTC - in response to Message 3723.  

Can you reiterate here what parameters I should be using to try 20-core jobs again? I'll go and check with Google, but it would be worthwhile having them explicitly given here. Thanks.

Written in this thread in message 3674.

In the app_config.xml file add an app_version part:

<app_version>
  <app_name>Theory</app_name>
  <plan_class>vbox64_mt_mcore</plan_class>
  <avg_ncpus>20.000000</avg_ncpus>
  <cmdline>--nthreads 20.000000</cmdline>
  <cmdline>--memory_size_mb 10240</cmdline>
 </app_version>
ID: 3724 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,918,014
RAC: 2,503
Message 3726 - Posted: 16 Jul 2016, 18:34:20 UTC - in response to Message 3724.  

Thanks. That looks like what I had, tho' I upped the memory significantly. I'll try again next week, doing it from home is fraught with dangers. Via Laurence, Rom informs that VirtualBox's limit is 32 cores; I could only approach that on the XeonPhi (without hyperthreading the E5-2670 v2 boxen) but it only has 8 GB total memory.
ID: 3726 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,685
RAC: 1,837
Message 3727 - Posted: 17 Jul 2016, 7:45:31 UTC

4 physical cores - 8 threads - 100% CPU usage allowed.
4 threads used for a 4 processor VM and 1 thread running another BOINC task with high memory throughput.
http://lhcathomedev.cern.ch/vLHCathome-dev/result.php?resultid=222593

Overall performance 93,12% - 49 jobs done

The last 4 jobs:

08:10:42 +0200 2016-07-17 [INFO] # Job finished in slot1 with 0.
08:12:38 +0200 2016-07-17 [INFO] # Job finished in slot2 with 0.
08:19:15 +0200 2016-07-17 [INFO] # Job finished in slot4 with 0.
09:07:07 +0200 2016-07-17 [INFO] # Job finished in slot3 with 0.


2 minutes 3 cores
7 minutes 2 cores and
48 minutes 1 core in use.

Again not all info is captured from VBox.log into the result log (stderr.txt)
ID: 3727 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3742 - Posted: 19 Jul 2016, 6:40:56 UTC

Here is an example for how an ultra-long job affects a task.

http://lhcathomedev.cern.ch/vLHCathome-dev/result.php?resultid=222063

Slot 2 runs all the way to the 18h limit--does not finish.

All other 7 tasks are left idle for about 5h.

The more cores/jobs are running, the greater the loss of computing time.

If you don't consider this a problem, please let me know.
ID: 3742 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Theory Application : New Muti-core version V1.9


©2024 CERN