Thread 'vbox console monitoring with 3.01'

Author	Message
David Cameron Project administrator Project developer Project tester Project scientist Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0	Message 7986 - Posted: 20 Mar 2023, 13:07:14 UTC The event processing monitoring in console 2 was not working with the new tasks we were trying here. This was due to a slightly different logging format in the new ATLAS software version. I have changed the monitoring tool to work with both old and new formats, and submitted some new tasks (shorter with 50 events each). It would be great if someone can confirm if the console works now or not (it's hard for me to test it myself). ID: 7986 · Rating: 0 · rate: / Reply Quote

Richie_unstable Send message Joined: 31 Aug 21 Posts: 14 Credit: 1,118,739 RAC: 0	Message 7988 - Posted: 20 Mar 2023, 13:39:33 UTC I would do that but I'm not sure where or how. What keys to press or where to point'n click ? ID: 7988 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 544 Credit: 400,710 RAC: 0	Message 7989 - Posted: 20 Mar 2023, 13:56:16 UTC - in response to Message 7986. As already mentioned here https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=614&postid=7964#7964 it's not that easy. The monitoring scripts assume a singlecore task although I'm running a 4-core VM. Beside that the log entries written to log.EVNTtoHITS look a bit "messed", meaning they don't follow a uniq pattern (recently seen in a native task). This needs to be sorted, but I'd prefer to have a stable event processing first. BTW The dev task I'm currently running still requests Frontier data from atlasfrontier-ai.cern.ch instead of atlascern-frontier.openhtc.io. ID: 7989 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 544 Credit: 400,710 RAC: 0	Message 7990 - Posted: 20 Mar 2023, 14:11:44 UTC - in response to Message 7988. 1. open your VirtualBox Manager 2. click on the VM you want to look into 3. click the "show" button (and wait) 4. once the console window is open, use ALT + Fn to switch between the consoles F2: Progress Monitoring F3: top To close the console window use "Detach GUI" from the "Machine" Menu. Never use other methods since those would tell BOINC to suspend/resume the VM which puts a high load on the computer. ID: 7990 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 544 Credit: 400,710 RAC: 0	Message 7991 - Posted: 20 Mar 2023, 14:34:34 UTC Other findings The differencing image grows to at least 1.7 GB. 825 MB are caused by swap usage since the VM has only 2241 MB RAM. Not sure if the VM's CVMFS cache has been cleaned and refilled with ATLAS v3 data. This is required to - reduce the initial VDI size - keep the differencing image small - result in a faster startup phase ID: 7991 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1281 Credit: 1,048,477 RAC: 49	Message 7992 - Posted: 20 Mar 2023, 15:26:16 UTC Last modified: 20 Mar 2023, 15:55:11 UTC The monitoring is not working very well. I'm running a dual core VM, 4800 MB RAM Only 1 worker is displayed (I also see only 1 python process almost 200% CPU) Total number of events is displayed (50) already finished, mean, min, max, estimated time left is not displayed and every 60 seconds several lines of text are flashing (not readable) and cleared. Worker 1 Event showing showed 2nd, 4th and then 3th event ?? for this worker took ### s (### is changing now and then e.g. 421) My differencing file 900MB Edit picture and no swap used, but initial setup etc lasted over 30 minutes. Flashing text: ID: 7992 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 544 Credit: 400,710 RAC: 0	Message 7993 - Posted: 20 Mar 2023, 15:36:32 UTC Another 4-core task is in progress. Manually set the VM's RAM size to 3900 MB (the default used for older 1-core VMs). Top now shows 776 kB swap being used. The differencing image uses around 880 MB and grows slowly while the events are being processed. The main python process uses 2.2 GB RAM and close to 400% CPU which corresponds to the 4-core setup. ID: 7993 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 544 Credit: 400,710 RAC: 0	Message 7994 - Posted: 20 Mar 2023, 15:40:54 UTC - in response to Message 7992. Total number of events is displayed (50) already finished, mean, min, max, estimated time left is not displayed and every 60 seconds several lines of text are flashing (not readable) and cleared. Worker 1 Event showing showed 2nd, 4th and then 3th event ?? for this worker took ### s (### is changing now and then e.g. 421) Edit picture and no swap used Flashing text:... Same here ID: 7994 · Rating: 0 · rate: / Reply Quote

David Cameron Project administrator Project developer Project tester Project scientist Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0	Message 7995 - Posted: 20 Mar 2023, 15:56:01 UTC Not sure if the VM's CVMFS cache has been cleaned and refilled with ATLAS v3 data. I ran a v3 task inside the existing prod image (since we'll have to run both in parallel for some time) and used that image here, so the software should be cached. This is why the image is much larger than before (4.4GB vs 2.8GB). 825 MB are caused by swap usage since the VM has only 2241 MB RAM. This is not intentional, something is not configured correctly after removing the per-core memory scaling. I am checking it. The monitoring scripts assume a singlecore task although I'm running a 4-core VM. I thought that the new tasks would all behave the same as old single-core tasks, writing to a single file. I changed the code searching for the event times to handle the old and new message format, but maybe something else needs changed. I will look deeper. The dev task I'm currently running still requests Frontier data from atlasfrontier-ai.cern.ch instead of atlascern-frontier.openhtc.io. This was changed last week but I forgot to restart one service to pick up the changes. New tasks should have the correct frontier. ID: 7995 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1281 Credit: 1,048,477 RAC: 49	Message 7996 - Posted: 20 Mar 2023, 16:03:18 UTC - in response to Message 7995. The readable text is showing this: ID: 7996 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 544 Credit: 400,710 RAC: 0	Message 7997 - Posted: 20 Mar 2023, 16:08:06 UTC - in response to Message 7995. I thought that the new tasks would all behave the same as old single-core tasks, writing to a single file. I changed the code searching for the event times to handle the old and new message format, but maybe something else needs changed. I will look deeper. @ David It is not only the search pattern. An additional point is that the old monitoring expects the result lines in order. This was guaranteed in singlecore mode within the main log as well as in multicore mode within each of the worker logs. Now the log entries are no longer in order and this fact has to be respected by the monitoring. Another point that needs to be checked: The timing averages reported by the workers refer to the worker thread they are coming from. Hence, they are not valid to calculate the total average. ID: 7997 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 544 Credit: 400,710 RAC: 0	Message 7998 - Posted: 20 Mar 2023, 16:15:37 UTC CP's screenshots show what I described as "messed" logfile lines. See: "worker 1..." As a result the monitoring script can't extract the runtime (here: 338 s) This leads to the missing values now reported as "N/A". A side effect results in the "flashing text" CP already reported. ID: 7998 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 544 Credit: 400,710 RAC: 0	Message 7999 - Posted: 20 Mar 2023, 16:25:00 UTC - in response to Message 7995. since we'll have to run both in parallel for some time Did the old version change it's logging behaviour? If not, we would need to keep the old monitoring branch and call it if an old ATLAS version is processed. ID: 7999 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 806 Credit: 4,294,466 RAC: 1,957	Message 8007 - Posted: 21 Mar 2023, 10:05:11 UTC - in response to Message 7996. Last modified: 21 Mar 2023, 10:06:55 UTC Crystal seeing the same in Console F2. Now 50 instead of 500 Collisions. ID: 8007 · Rating: 0 · rate: / Reply Quote

Development for LHC@home