Thread 'Multicore Shutdown'

Author	Message
Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 0	Message 5125 - Posted: 12 Sep 2017, 13:07:26 UTC A new feature has been implemented to shutdown multicore VMs at the end of their life when the idle time > busy time. This means that if we are wasting more time with empty cores than we would loose by kill the remaining jobs, those jobs are killed. This should avoid the situation where a looping Theory job keeps the VM alive while wasting idle cores. If this works the following message should be seen in the task output. Multicore Shutdown: Idle > Busy ( 1439s > 1382s ) Let me know how it goes. ID: 5125 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 17	Message 5127 - Posted: 12 Sep 2017, 17:07:26 UTC - in response to Message 5125. Last modified: 12 Sep 2017, 17:08:16 UTC When does it start to count the "busy" time? ID: 5127 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1256 Credit: 1,013,826 RAC: 148	Message 5128 - Posted: 13 Sep 2017, 7:34:32 UTC I'm testing your new feature with a 3-core Theory VM. After the 12 hours lifetime meanwhile 2 jobs have finished. I'm waiting now 27 minutes for the kill of the 3rd (last) job. Idle time of the VM most of the time about 60%. If you're using the 10 minutes grace period for killing VM's also for your new feature: In this 10 minutes the idle time sometimes goes below 50% because of the Plotter.exe process popping up. ID: 5128 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1256 Credit: 1,013,826 RAC: 148	Message 5129 - Posted: 13 Sep 2017, 10:10:50 UTC The task finished oldfashioned after all 3 jobs has finished completely without killing the last job. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=362103 2017-09-13 08:58:10 (12604): Guest Log: [INFO] Job finished in slot3 with 0. 2017-09-13 09:06:11 (12604): Guest Log: [INFO] Job finished in slot2 with 0. 2017-09-13 11:42:01 (12604): Guest Log: [INFO] Job finished in slot1 with 0. ID: 5129 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 0	Message 5130 - Posted: 14 Sep 2017, 7:44:05 UTC - in response to Message 5127. It starts to count busy time when the job starts to run. ID: 5130 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 0	Message 5131 - Posted: 14 Sep 2017, 8:15:16 UTC - in response to Message 5129. so assuming my maths is correct ... The VM started at 19:03:56 but was paused between 23:04:07 and 07:29:08 (8h 25m 01s). Job in slot 1 ran for 04h 44m 19s Slot 2 idled for 2h 35m 50s Slot 3 idled for 2h 43m 51s 05h 19m 41s (2h 35m 50s + 2h 43m 51s) > 04h 44m 19s so the last job should have been killed. ID: 5131 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 0	Message 5132 - Posted: 14 Sep 2017, 12:16:34 UTC - in response to Message 5131. so the last job should have been killed. It could be that the busy time did not take into consideration the suspend time so the busy time was 12h 13m 27s ID: 5132 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1256 Credit: 1,013,826 RAC: 148	Message 5133 - Posted: 14 Sep 2017, 14:45:05 UTC - in response to Message 5131. The VM started at 19:03:56 but was paused between 23:04:07 and 07:29:08 (8h 25m 01s). For testing I suspended the VM overnight, cause for close watching the 3 jobs after the 12 hours lifetime (07:03:56 in the morning), it could be a bit early. Therefore the resume the next morning. The lifetime always has counted the suspend time to the lifetime. It only doesn't when the VM is booted. Then the lifetime seems to be zeroed. IMO your feature doesn't work like expected, I think. ID: 5133 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 0	Message 5136 - Posted: 15 Sep 2017, 7:42:25 UTC - in response to Message 5133. IMO your feature doesn't work like expected, I think. I think you are right. Will look into using a different algorithm. If anyone has any thoughts, let me know. ID: 5136 · Rating: 0 · rate: / Reply Quote

Tern Send message Joined: 21 Sep 15 Posts: 89 Credit: 383,017 RAC: 0	Message 5164 - Posted: 30 Sep 2017, 15:36:23 UTC Haven't been following this, but now I'm confused - your "multicore" tasks just reserve "n" cores instead of one, and run "n" separate processes? How is this any more efficient than just having "n" separate BOINC tasks? Especially if, as discussed here, "n-1" of the threads could be idle? VBox biting you again? Doing "multicore" in order to have less VBox overhead? What is the gain for "multicore"? Sounds like a patch on top of a patch on top of a patch, instead of trying to solve the real problem... ID: 5164 · Rating: 0 · rate: / Reply Quote

Development for LHC@home