Message boards :
News :
Multicore Shutdown
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
A new feature has been implemented to shutdown multicore VMs at the end of their life when the idle time > busy time. This means that if we are wasting more time with empty cores than we would loose by kill the remaining jobs, those jobs are killed. This should avoid the situation where a looping Theory job keeps the VM alive while wasting idle cores. If this works the following message should be seen in the task output. Multicore Shutdown: Idle > Busy ( 1439s > 1382s ) Let me know how it goes. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
When does it start to count the "busy" time? |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 859,751 RAC: 25 |
I'm testing your new feature with a 3-core Theory VM. After the 12 hours lifetime meanwhile 2 jobs have finished. I'm waiting now 27 minutes for the kill of the 3rd (last) job. Idle time of the VM most of the time about 60%. If you're using the 10 minutes grace period for killing VM's also for your new feature: In this 10 minutes the idle time sometimes goes below 50% because of the Plotter.exe process popping up. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 859,751 RAC: 25 |
The task finished oldfashioned after all 3 jobs has finished completely without killing the last job. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=362103 2017-09-13 08:58:10 (12604): Guest Log: [INFO] Job finished in slot3 with 0. 2017-09-13 09:06:11 (12604): Guest Log: [INFO] Job finished in slot2 with 0. 2017-09-13 11:42:01 (12604): Guest Log: [INFO] Job finished in slot1 with 0. |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
It starts to count busy time when the job starts to run. |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
so assuming my maths is correct ... The VM started at 19:03:56 but was paused between 23:04:07 and 07:29:08 (8h 25m 01s). Job in slot 1 ran for 04h 44m 19s Slot 2 idled for 2h 35m 50s Slot 3 idled for 2h 43m 51s 05h 19m 41s (2h 35m 50s + 2h 43m 51s) > 04h 44m 19s so the last job should have been killed. |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
It could be that the busy time did not take into consideration the suspend time so the busy time was 12h 13m 27s |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 859,751 RAC: 25 |
The VM started at 19:03:56 but was paused between 23:04:07 and 07:29:08 (8h 25m 01s). For testing I suspended the VM overnight, cause for close watching the 3 jobs after the 12 hours lifetime (07:03:56 in the morning), it could be a bit early. Therefore the resume the next morning. The lifetime always has counted the suspend time to the lifetime. It only doesn't when the VM is booted. Then the lifetime seems to be zeroed. IMO your feature doesn't work like expected, I think. |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
I think you are right. Will look into using a different algorithm. If anyone has any thoughts, let me know. |
Send message Joined: 21 Sep 15 Posts: 89 Credit: 383,017 RAC: 0 |
Haven't been following this, but now I'm confused - your "multicore" tasks just reserve "n" cores instead of one, and run "n" separate processes? How is this any more efficient than just having "n" separate BOINC tasks? Especially if, as discussed here, "n-1" of the threads could be idle? VBox biting you again? Doing "multicore" in order to have less VBox overhead? What is the gain for "multicore"? Sounds like a patch on top of a patch on top of a patch, instead of trying to solve the real problem... |
©2024 CERN