Message boards : CMS Application : New Version v48.00
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 4671 - Posted: 19 Feb 2017, 18:27:15 UTC - in response to Message 4670.  

To close to call it a significant performance increase, I tend to say.


According to your results, there is no difference.
(about 0.5% variation is not statistically relevant)



Thanks for trying.
ID: 4671 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 4720 - Posted: 25 Feb 2017, 9:08:18 UTC

Here are some suggestions to improve performance.
These are causing the biggest delays.

1) reduce the upload size.

2) arrange for the uploading job's core to work on the next job instead of sitting almost idle for as long as the upload takes.(put the uploading to the background)

3) stagger the uploads. (at least within the same vm)


I have observed, that the over-all upload size is more than twice the download size (excluding the initial image download at project initialization)

Example: 2 tasks with two cores each.(upload per job say: 10min)

Best case: each job uploads at a d1fferent time(cpu time lost 4*10=40min)
Worst case: all 4 jobs uploading simultaneously (cpu time lost 4*4*10=160min)

Comments?
ID: 4720 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 4791 - Posted: 12 Mar 2017, 10:16:08 UTC
Last modified: 12 Mar 2017, 10:18:15 UTC

I have been running two 2 core tasks for a while.
Then, i tried a 4 core task.
I have noticed, that the total upload is fundamentally less than the 2 core tasks.
(total up/download according to the router traffic for this computer <30MB for uploading 4 jobs)

1)Has anyone noticed that.
2)Is it actually producing valid results? https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=317572
3)Why does this only happen with > 2 core (have not tested 3 core tasks)

According to boinc, the result is valid.
ID: 4791 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1128
Credit: 7,870,419
RAC: 595
Message 4792 - Posted: 12 Mar 2017, 11:48:54 UTC - in response to Message 4791.  

From my tests, 4-core is fundamentally less efficient than 2-core. It looks like the algorithm might have changed since then, when two jobs started together and the other two started twenty minutes later. Look at your timing:
Slot 1 starts 2002
Slot 3 starts 2015
Slot 2 starts 2023
Slot 4 starts 2035

Then at the end:
Slot 1 finishes at 0434, task waits for the rest to finish
Slot 2 finishes at 0533
Slot 4 finishes at 0535
Slot 3 finishes at 0602

So that's a total of (13+21+33+88+29+27=) 211 minutes dead-time in 4x10 hours, or nearly 9%.
ID: 4792 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 4793 - Posted: 12 Mar 2017, 14:57:25 UTC

Thanks, Ivan.

It looks, that the 4-core task is bypassing the host OS.
Network activity from the VM is not recognized, but the router does(of course).

Why that happens on the 4-core, but not the 2-core tasks puzzles me.


It would be really nice, if simultaneous upload of job results could be avoided.

2-core tasks are probably the most efficient, compared to single or 3+core tasks.
ID: 4793 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 2,009
Message 4794 - Posted: 12 Mar 2017, 15:01:42 UTC - in response to Message 4792.  

From my tests, 4-core is fundamentally less efficient than 2-core. It looks like the algorithm might have changed since then, when two jobs started together and the other two started twenty minutes later. Look at your timing:
Slot 1 starts 2002
Slot 3 starts 2015
Slot 2 starts 2023
Slot 4 starts 2035

Then at the end:
Slot 1 finishes at 0434, task waits for the rest to finish
Slot 2 finishes at 0533
Slot 4 finishes at 0535
Slot 3 finishes at 0602

So that's a total of (13+21+33+88+29+27=) 211 minutes dead-time in 4x10 hours, or nearly 9%.

I think Rasputin's task was not a normal task.
It looks like it did not get jobs anymore, where it should have get new ones.
VM started at 20:00, so should get new jobs until at least 08:00 the next morning.
ID: 4794 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 4795 - Posted: 12 Mar 2017, 17:12:43 UTC

I just checked the job startup times for a new 4-core task:
1623
1623
1644
1735

If there is a system to that, it escapes me.

Is there?
ID: 4795 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1128
Credit: 7,870,419
RAC: 595
Message 4796 - Posted: 12 Mar 2017, 21:09:31 UTC - in response to Message 4795.  

I just checked the job startup times for a new 4-core task:
1623
1623
1644
1735

If there is a system to that, it escapes me.

Is there?

It's closer to what I've seen in the past, with two jobs starting simultaneously and the next 20 minutes later, but in my case the third and fourth jobs started together. It's a bit weird that your fourth job took so long to start.
ID: 4796 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 4797 - Posted: 13 Mar 2017, 16:13:22 UTC

Task only running for 10h.

Why is that?
I stopped it for a few minutes, but should it not run for at least 12h?

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=317641
ID: 4797 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1128
Credit: 7,870,419
RAC: 595
Message 4800 - Posted: 15 Mar 2017, 19:06:11 UTC - in response to Message 4797.  

Some weird things over the last few days due to an auto-update messing with our scripts -- three times! We hope we have all auto-updates shut off now.
ID: 4800 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 4809 - Posted: 20 Mar 2017, 10:09:13 UTC

I have noticed, that sometimes a finished job is listed several times as finished_x.log,finished_x+1.log, finished_x+3.log.
The time difference is about 2 min, so i do not believe, it is uploading again.

Has anyone notices that?
ID: 4809 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 4827 - Posted: 4 Apr 2017, 6:48:08 UTC

What happened here?

The tasks was running for too long. I suspended it and resumed.

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=320004


It reported shortly after that.
ID: 4827 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : CMS Application : New Version v48.00


©2024 CERN