Message boards : News : Multicore Shutdown
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1021
Credit: 274,753
RAC: 0
Message 5125 - Posted: 12 Sep 2017, 13:07:26 UTC

A new feature has been implemented to shutdown multicore VMs at the end of their life when the idle time > busy time. This means that if we are wasting more time with empty cores than we would loose by kill the remaining jobs, those jobs are killed. This should avoid the situation where a looping Theory job keeps the VM alive while wasting idle cores. If this works the following message should be seen in the task output.

Multicore Shutdown: Idle > Busy ( 1439s > 1382s )


Let me know how it goes.
ID: 5125 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 965
Credit: 1,201,381
RAC: 0
Message 5127 - Posted: 12 Sep 2017, 17:07:26 UTC - in response to Message 5125.  
Last modified: 12 Sep 2017, 17:08:16 UTC

When does it start to count the "busy" time?
ID: 5127 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1010
Credit: 591,548
RAC: 2
Message 5128 - Posted: 13 Sep 2017, 7:34:32 UTC

I'm testing your new feature with a 3-core Theory VM.
After the 12 hours lifetime meanwhile 2 jobs have finished.
I'm waiting now 27 minutes for the kill of the 3rd (last) job.
Idle time of the VM most of the time about 60%.

If you're using the 10 minutes grace period for killing VM's also for your new feature:
In this 10 minutes the idle time sometimes goes below 50% because of the Plotter.exe process popping up.
ID: 5128 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1010
Credit: 591,548
RAC: 2
Message 5129 - Posted: 13 Sep 2017, 10:10:50 UTC

The task finished oldfashioned after all 3 jobs has finished completely without killing the last job.

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=362103

2017-09-13 08:58:10 (12604): Guest Log: [INFO] Job finished in slot3 with 0.

2017-09-13 09:06:11 (12604): Guest Log: [INFO] Job finished in slot2 with 0.

2017-09-13 11:42:01 (12604): Guest Log: [INFO] Job finished in slot1 with 0.
ID: 5129 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1021
Credit: 274,753
RAC: 0
Message 5130 - Posted: 14 Sep 2017, 7:44:05 UTC - in response to Message 5127.  

It starts to count busy time when the job starts to run.
ID: 5130 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1021
Credit: 274,753
RAC: 0
Message 5131 - Posted: 14 Sep 2017, 8:15:16 UTC - in response to Message 5129.  

so assuming my maths is correct ...

The VM started at 19:03:56 but was paused between 23:04:07 and 07:29:08 (8h 25m 01s).

Job in slot 1 ran for 04h 44m 19s
Slot 2 idled for 2h 35m 50s
Slot 3 idled for 2h 43m 51s

05h 19m 41s (2h 35m 50s + 2h 43m 51s) > 04h 44m 19s

so the last job should have been killed.
ID: 5131 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1021
Credit: 274,753
RAC: 0
Message 5132 - Posted: 14 Sep 2017, 12:16:34 UTC - in response to Message 5131.  


so the last job should have been killed.


It could be that the busy time did not take into consideration the suspend time so the busy time was 12h 13m 27s
ID: 5132 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1010
Credit: 591,548
RAC: 2
Message 5133 - Posted: 14 Sep 2017, 14:45:05 UTC - in response to Message 5131.  

The VM started at 19:03:56 but was paused between 23:04:07 and 07:29:08 (8h 25m 01s).

For testing I suspended the VM overnight, cause for close watching the 3 jobs after the 12 hours lifetime (07:03:56 in the morning), it could be a bit early.
Therefore the resume the next morning.

The lifetime always has counted the suspend time to the lifetime. It only doesn't when the VM is booted. Then the lifetime seems to be zeroed.

IMO your feature doesn't work like expected, I think.
ID: 5133 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1021
Credit: 274,753
RAC: 0
Message 5136 - Posted: 15 Sep 2017, 7:42:25 UTC - in response to Message 5133.  


IMO your feature doesn't work like expected, I think.


I think you are right. Will look into using a different algorithm. If anyone has any thoughts, let me know.
ID: 5136 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bill Michael

Send message
Joined: 21 Sep 15
Posts: 89
Credit: 339,021
RAC: 5
Message 5164 - Posted: 30 Sep 2017, 15:36:23 UTC

Haven't been following this, but now I'm confused - your "multicore" tasks just reserve "n" cores instead of one, and run "n" separate processes?

How is this any more efficient than just having "n" separate BOINC tasks? Especially if, as discussed here, "n-1" of the threads could be idle? VBox biting you again? Doing "multicore" in order to have less VBox overhead?

What is the gain for "multicore"? Sounds like a patch on top of a patch on top of a patch, instead of trying to solve the real problem...
ID: 5164 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : Multicore Shutdown


©2020 CERN