Message boards :
ATLAS Application :
vbox console monitoring with 3.01
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
The event processing monitoring in console 2 was not working with the new tasks we were trying here. This was due to a slightly different logging format in the new ATLAS software version. I have changed the monitoring tool to work with both old and new formats, and submitted some new tasks (shorter with 50 events each). It would be great if someone can confirm if the console works now or not (it's hard for me to test it myself). |
Send message Joined: 31 Aug 21 Posts: 13 Credit: 1,118,469 RAC: 0 |
I would do that but I'm not sure where or how. What keys to press or where to point'n click ? |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
As already mentioned here https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=614&postid=7964#7964 it's not that easy. The monitoring scripts assume a singlecore task although I'm running a 4-core VM. Beside that the log entries written to log.EVNTtoHITS look a bit "messed", meaning they don't follow a uniq pattern (recently seen in a native task). This needs to be sorted, but I'd prefer to have a stable event processing first. BTW The dev task I'm currently running still requests Frontier data from atlasfrontier-ai.cern.ch instead of atlascern-frontier.openhtc.io. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
1. open your VirtualBox Manager 2. click on the VM you want to look into 3. click the "show" button (and wait) 4. once the console window is open, use ALT + Fn to switch between the consoles F2: Progress Monitoring F3: top To close the console window use "Detach GUI" from the "Machine" Menu. Never use other methods since those would tell BOINC to suspend/resume the VM which puts a high load on the computer. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
Other findings The differencing image grows to at least 1.7 GB. 825 MB are caused by swap usage since the VM has only 2241 MB RAM. Not sure if the VM's CVMFS cache has been cleaned and refilled with ATLAS v3 data. This is required to - reduce the initial VDI size - keep the differencing image small - result in a faster startup phase |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 3 |
The monitoring is not working very well. I'm running a dual core VM, 4800 MB RAM Only 1 worker is displayed (I also see only 1 python process almost 200% CPU) Total number of events is displayed (50) already finished, mean, min, max, estimated time left is not displayed and every 60 seconds several lines of text are flashing (not readable) and cleared. Worker 1 Event showing showed 2nd, 4th and then 3th event ?? for this worker took ### s (### is changing now and then e.g. 421) My differencing file 900MB Edit picture and no swap used, but initial setup etc lasted over 30 minutes. Flashing text: |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
Another 4-core task is in progress. Manually set the VM's RAM size to 3900 MB (the default used for older 1-core VMs). Top now shows 776 kB swap being used. The differencing image uses around 880 MB and grows slowly while the events are being processed. The main python process uses 2.2 GB RAM and close to 400% CPU which corresponds to the 4-core setup. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
Total number of events is displayed (50) Same here |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
Not sure if the VM's CVMFS cache has been cleaned and refilled with ATLAS v3 data. I ran a v3 task inside the existing prod image (since we'll have to run both in parallel for some time) and used that image here, so the software should be cached. This is why the image is much larger than before (4.4GB vs 2.8GB). 825 MB are caused by swap usage since the VM has only 2241 MB RAM. This is not intentional, something is not configured correctly after removing the per-core memory scaling. I am checking it. The monitoring scripts assume a singlecore task although I'm running a 4-core VM. I thought that the new tasks would all behave the same as old single-core tasks, writing to a single file. I changed the code searching for the event times to handle the old and new message format, but maybe something else needs changed. I will look deeper. The dev task I'm currently running still requests Frontier data from atlasfrontier-ai.cern.ch instead of atlascern-frontier.openhtc.io. This was changed last week but I forgot to restart one service to pick up the changes. New tasks should have the correct frontier. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 861,475 RAC: 3 |
The readable text is showing this: |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
I thought that the new tasks would all behave the same as old single-core tasks, writing to a single file. I changed the code searching for the event times to handle the old and new message format, but maybe something else needs changed. I will look deeper. @ David It is not only the search pattern. An additional point is that the old monitoring expects the result lines in order. This was guaranteed in singlecore mode within the main log as well as in multicore mode within each of the worker logs. Now the log entries are no longer in order and this fact has to be respected by the monitoring. Another point that needs to be checked: The timing averages reported by the workers refer to the worker thread they are coming from. Hence, they are not valid to calculate the total average. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
CP's screenshots show what I described as "messed" logfile lines. See: "worker 1..." As a result the monitoring script can't extract the runtime (here: 338 s) This leads to the missing values now reported as "N/A". A side effect results in the "flashing text" CP already reported. |
Send message Joined: 28 Jul 16 Posts: 484 Credit: 394,839 RAC: 1 |
since we'll have to run both in parallel for some time Did the old version change it's logging behaviour? If not, we would need to keep the old monitoring branch and call it if an old ATLAS version is processed. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 3 |
Crystal seeing the same in Console F2. Now 50 instead of 500 Collisions. |
©2024 CERN