Message boards :
ATLAS Application :
ATLAS Monitoring v3
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Jul 16 Posts: 485 Credit: 394,839 RAC: 0 |
To prepare ATLAS Monitoring v3 please find below a simulation of the modified screenshot. While terms like "standard deviation" or "arithmetic mean" are well defined in scientific environments, other terms and phrases, e.g. "uncertainty", may not. Hence comments from native speakers (English) are welcome (Are there terms/phrases that fit better?). Of course, other's comments are also welcome. Uncertainty in this context should give an impression how "good" the time left estimation might be and is derived from - standard deviation of event runtimes - number of events not yet finished - number of workers Event status lists different tasks phases: worker 3: N/A # directly after a task has started worker 4: ... # the logfile shows that event 1 has been started but is not yet finished worker x: ... # values from the last finished event Time values are rounded to integer (s, m) as it makes no sense to be more precise. Simulated screenshot: ********************************************************************* * ATLAS Event Progress Monitoring * * v3.y.z * * last display update (VM time): 16:10:23 * ********************************************************************* Number of events : to be processed : 200 already finished : 119 Event runtimes : arithmetic mean : 1033 s standard deviation : 16 s Estimated time left : total : overdue 0 d 5 h 44 m uncertainty : 0 d 0 h 5 m --------------------------------------------------------------------- Event status per worker thread: worker 1: Event nr. 30 took 1385 s worker 2: Event nr. 29 took 978 s worker 3: N/A worker 4: Event nr. 1 currently processing worker 5: Event nr. 30 took 807 s worker 6: Event nr. 30 took 1131 s --------------------------------------------------------------------- Calculation completed. Preparing HITS file ... |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 |
Hence comments from native speakers (English) are welcome (Are there terms/phrases that fit better?). Less is more v2.2.0 is simple readable and more than we need. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 875,820 RAC: 738 |
Time values are rounded to integer (s, m) as it makes no sense to be more precise.That's an improvement. overdue is just for the examble, I suppose.Estimated time left : total : overdue 0 d 5 h 44 m uncertainty : 0 d 0 h 5 m Make it simple is my comment. Just call 'arithmetic mean' average and replace 'standard deviation' by 2 lines: minimum and maximum event run time of all done tasks. The runtime during a task may and will vary a lot by itself and by the load of the host too, so it makes no sense to be very precise. Is the host always under 100% load or are there periods that only the ATLAS-task is running. |
Send message Joined: 28 Jul 16 Posts: 485 Credit: 394,839 RAC: 0 |
Thanks for the comments made so far. Over the weekend I modified the monitoring script so it will produce the output below. Explanation arithmetic mean I'd prefer this term instead of average as it explains exactly which average is used here. But since it is only a string it can easily be replaced. standard deviation I replaced it with "min / max" following CP's request. I agree that the latter may be more meaningful. In addition sd is used to calculate "uncertainty", hence it would be redundant. overdue This term appears only when time left becomes <0, typically when the last event is a longrunner. Simulated screenshot: *************************************************** * ATLAS Event Progress Monitoring * * v3.0.0 * * last display update (VM time): 20:59:12 * *************************************************** Number of events to be processed : 200 already finished : 125 Event runtimes arithmetic mean : 1035 s min / max : 678 / 1482 s Estimated time left total : 10 h 46 min uncertainty : 10 min --------------------------------------------------- Status of last event per worker thread: worker 1: Event nr. 62 took 1073 s worker 2: Event nr. 63 took 1207 s |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 875,820 RAC: 738 |
Over the weekend I modified the monitoring script so it will produce the output below.Can't wait to see this version going live ;) |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
This version is now live! Some minor comments on the English: "to be processed: 200" sounds like this is the number of remaining events, it should be made clearer that this is the total so I would just say "total" here "overdue" might make people worry that there is something wrong with the task, I'm not sure it's a good idea to have this, or maybe there can be a clearer message what it means "Event status per worker thread" -> "Processing status per worker thread" EDIT: Ignore the last comment since this phrase was already fixed on the newest version. |
Send message Joined: 28 Jul 16 Posts: 485 Credit: 394,839 RAC: 0 |
"to be processed: 200" sounds like this is the number of remaining events, it should be made clearer that this is the total so I would just say "total" here OK Will postpone the change until its decided what to do with the next one. "overdue" might make people worry that there is something wrong with the task, I'm not sure it's a good idea to have this, or maybe there can be a clearer message what it means What about "overtime" or "extra time"? Most people may know that from sports. Just to mention it, nothing to worry about: This text field has a fixed width. If we extend it the whole layout will be a bit wider. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 875,820 RAC: 738 |
What about "overtime" or "extra time"?"overtime" sounds good to me. It's also known as 'overwork', what's the case when the job needs more time. |
Send message Joined: 28 Jul 16 Posts: 485 Credit: 394,839 RAC: 0 |
Overtime Message Forgot to mention: The same term is shown while the HITS file is in preparation. Would you prefer to - leave it blank and just count the remaining time upwards (confusing?) - display the same term - display a different term |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 875,820 RAC: 738 |
Overtime MessageWhen the HITS-file is in preparation, just leave it (overdue) blank and display 0 (zero) for time left and uncertainty. My opinion ;) |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
Overtime MessageWhen the HITS-file is in preparation, just leave it (overdue) blank and display 0 (zero) for time left and uncertainty. My opinion ;) That sounds good. I would like to avoid if possible alarming people with scary-looking messages, when the task is proceeding normally. |
Send message Joined: 28 Jul 16 Posts: 485 Credit: 394,839 RAC: 0 |
Just created a pull request at github to update to v3.1.0 changes Output string - "Number of events to be processed" to "Number of events total" - "overdue" to "overtime" While HITS file generation is in progress - "overtime" will not appear any more - "Time left total" will remain 0 (zero) |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
3.1.0 is now live. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 875,820 RAC: 738 |
3.1.0 is now live.That's looking good: |
Send message Joined: 28 Jul 16 Posts: 485 Credit: 394,839 RAC: 0 |
Last minute change request. v3.1.0 uses standard deviation to calculate uncertainty. This covers 68.27% (2 sigma) of the runtime values' standard distribution. To cover 99.73% (6 sigma) standard deviation must be multiplied with 3. Did a few local simulations with 6 sigma and it looks like the resulting values make more "sense" when compared to other values on the display, e.g. arithmetic mean or min/max. Sent a github pull request to David that includes the change to v3.2.0. Beside that the recent version runs fine and I suggest to deploy it on the production server. |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
3.2.0 is now live here. Let's wait for a few test WU here (I just submitted a few more) then I will deploy in production. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 875,820 RAC: 738 |
v3.2.0 running here with an older task, that was 'In progress' for the server and on my client 'Ready to start' waiting for the new Monitoring version. I'll make screen captures every minute to hopefully find nothing weird. |
Send message Joined: 28 Jul 16 Posts: 485 Credit: 394,839 RAC: 0 |
Finished a few 3-core and a few 1-core tasks with monitoring v3.2.0. All of them ran like a Swiss clockwork. ;-) |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 875,820 RAC: 738 |
Finished a few 3-core and a few 1-core tasks with monitoring v3.2.0. You have to look quick, but when the HITS-line is deleted (almost) at the end of the task, just before the VM is stopped and RDP closed 'uncertainty' shows -1; must be a leap second and not a real issue ;) |
Send message Joined: 28 Jul 16 Posts: 485 Credit: 394,839 RAC: 0 |
Finished a few 3-core and a few 1-core tasks with monitoring v3.2.0. Saw this only once during all the tests. It happens while the VM is about to shut down all processes and the monitoring can't get valid data any more. I suggest to ignore it since a few seconds later the VM shuts down anyway. |
©2025 CERN