Message boards :
CMS Application :
New Version v48.00
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
To close to call it a significant performance increase, I tend to say. According to your results, there is no difference. (about 0.5% variation is not statistically relevant) Thanks for trying. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Here are some suggestions to improve performance. These are causing the biggest delays. 1) reduce the upload size. 2) arrange for the uploading job's core to work on the next job instead of sitting almost idle for as long as the upload takes.(put the uploading to the background) 3) stagger the uploads. (at least within the same vm) I have observed, that the over-all upload size is more than twice the download size (excluding the initial image download at project initialization) Example: 2 tasks with two cores each.(upload per job say: 10min) Best case: each job uploads at a d1fferent time(cpu time lost 4*10=40min) Worst case: all 4 jobs uploading simultaneously (cpu time lost 4*4*10=160min) Comments? |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I have been running two 2 core tasks for a while. Then, i tried a 4 core task. I have noticed, that the total upload is fundamentally less than the 2 core tasks. (total up/download according to the router traffic for this computer <30MB for uploading 4 jobs) 1)Has anyone noticed that. 2)Is it actually producing valid results? https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=317572 3)Why does this only happen with > 2 core (have not tested 3 core tasks) According to boinc, the result is valid. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 270 |
From my tests, 4-core is fundamentally less efficient than 2-core. It looks like the algorithm might have changed since then, when two jobs started together and the other two started twenty minutes later. Look at your timing: Slot 1 starts 2002 Slot 3 starts 2015 Slot 2 starts 2023 Slot 4 starts 2035 Then at the end: Slot 1 finishes at 0434, task waits for the rest to finish Slot 2 finishes at 0533 Slot 4 finishes at 0535 Slot 3 finishes at 0602 So that's a total of (13+21+33+88+29+27=) 211 minutes dead-time in 4x10 hours, or nearly 9%. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Thanks, Ivan. It looks, that the 4-core task is bypassing the host OS. Network activity from the VM is not recognized, but the router does(of course). Why that happens on the 4-core, but not the 2-core tasks puzzles me. It would be really nice, if simultaneous upload of job results could be avoided. 2-core tasks are probably the most efficient, compared to single or 3+core tasks. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 61 |
From my tests, 4-core is fundamentally less efficient than 2-core. It looks like the algorithm might have changed since then, when two jobs started together and the other two started twenty minutes later. Look at your timing: I think Rasputin's task was not a normal task. It looks like it did not get jobs anymore, where it should have get new ones. VM started at 20:00, so should get new jobs until at least 08:00 the next morning. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I just checked the job startup times for a new 4-core task: 1623 1623 1644 1735 If there is a system to that, it escapes me. Is there? |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 270 |
I just checked the job startup times for a new 4-core task: It's closer to what I've seen in the past, with two jobs starting simultaneously and the next 20 minutes later, but in my case the third and fourth jobs started together. It's a bit weird that your fourth job took so long to start. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Task only running for 10h. Why is that? I stopped it for a few minutes, but should it not run for at least 12h? https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=317641 |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 270 |
Some weird things over the last few days due to an auto-update messing with our scripts -- three times! We hope we have all auto-updates shut off now. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I have noticed, that sometimes a finished job is listed several times as finished_x.log,finished_x+1.log, finished_x+3.log. The time difference is about 2 min, so i do not believe, it is uploading again. Has anyone notices that? |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
What happened here? The tasks was running for too long. I suspended it and resumed. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=320004 It reported shortly after that. |
©2024 CERN