Message boards :
Theory Application :
Theory native - getting runRivet.log early
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Jul 16 Posts: 482 Credit: 394,720 RAC: 0 |
Found a method to copy the 1st line of runRivet.log to stderr.txt immediately after the log line is written. Code is based on cranky-0.0.28. Insert this function somewhere before "function run": function read_runrivet_log { local logline1="$(grep -m1 '.*' <(tail --pid $1 -F cernvm/shared/runRivet.log 2>/dev/null))" echo $(date -u +"%R:%S %Y-%m-%d:") "runRivet.log: [INFO] ${logline1}" >&2 } Then insert the function call in "function run": trap pause SIGTSTP trap resume SIGCONT read_runrivet_log $$ & ${runc} --root state run -b cernvm ${result_id} & Getting the same line again in "function prepare_output" is now obsolete and can be disabled: #head -n 1 cernvm/shared/runRivet.log >&2 The patch is currently running on only 1 of my hosts. The result can be seen here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=220687133 |
Send message Joined: 28 Jul 16 Posts: 482 Credit: 394,720 RAC: 0 |
Timestamps written to stderr.txt use local time (wrapper) as well as UTC (cranky). I suggest to let cranky also report in local time to avoid confusion. It would also be helpful if debugging is necessary as the timestamps from the logs then correspond to the timestamps of local files. Nonetheless I would leave the logline copied from runRivet.log untouched as this time is explicitly marked to be UTC. To implement the changes 2 lines would have to be changed. in "function send_to_log" # echo $(date -u +"%R:%S %Y-%m-%d:") "$(basename $0): [$1] $2" >&2 echo $(date +"%R:%S %Y-%m-%d:") "$(basename $0): [$1] $2" >&2 in "function read_runrivet_log" # echo $(date -u +"%R:%S %Y-%m-%d:") "runRivet.log: [INFO] ${logline1}" >&2 echo $(date +"%R:%S %Y-%m-%d:") "runRivet.log: [INFO] ${logline1}" >&2 |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 859,751 RAC: 30 |
Found a method to copy the 1st line of runRivet.log to stderr.txt immediately after the log line is written. That is what I proposed when Laurence introduced it, but Laurence probably wanted to do it in an existing function, but surely it's the best way to get this information as soon as possible in the stdout. This way we can see which jobs are causing an error. I am not a programmer, but fiddled around a bit with your code (open source, I think) to extract only the job-name from that line and came to this code for your function read_runrivet_log: function read_runrivet_log { local logline1="$(grep -m1 -o '\[boinc.*' <(tail --pid $1 -F cernvm/shared/runRivet.log 2>/dev/null))" echo $(date -u +"%R:%S %Y-%m-%d:") "Job started : [INFO] ${logline1}" >&2 } The output looks like this: 17:16:48 2019-04-07: cranky-0.0.28: [INFO] Running Container 'runc'. 17:16:50 2019-04-07: Job started : [INFO] [boinc pp mb-nsd 2360 - - pythia8 8.230 tune-AU2lox 100000 40] |
Send message Joined: 28 Jul 16 Posts: 482 Credit: 394,720 RAC: 0 |
As soon as the logline is available in cranky you can do with it whatever you want. I left it untouched to show it's written by another process and to see if that process writes garbage but it's also no problem to work with arbitrary parts of it. That's more or less a philosophical question (however a very small and unimportant one). I know that the following has already been asked, but out of the discussion about sherpa: Do we need other information in stderr.txt, e.g. logfile size or last x loglines? Both could easily be included. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 859,751 RAC: 30 |
I left it untouched to show it's written by another process and to see if that process writes garbage but it's also no problem to work with arbitrary parts of it.Yes you are right, so I returned to your version. However I noticed that before [INFO] normally the process name is used, so I suggest to change in the echo line runRivet.log by runRivet.sh with the needed spaces to outline the semi-colons. 07:57:08 2019-04-08: cranky-0.0.28: [INFO] Running Container 'runc'. 07:57:11 2019-04-08: runRivet.sh : [INFO] ===> [runRivet] Mon Apr 8 07:57:09 UTC 2019 [boinc pp jets 8000 180,-,3560 - herwig7 7.0.4 UE-MMHT 100000 40] I know that the following has already been asked, but out of the discussion about sherpa: Maybe only when an error occurs. |
Send message Joined: 28 Jul 16 Posts: 482 Credit: 394,720 RAC: 0 |
I left it untouched to show it's written by another process and to see if that process writes garbage but it's also no problem to work with arbitrary parts of it.Yes you are right, so I returned to your version. I checked the code of runRivet.sh to be sure the logline is written by that script. It is, hence "runRivet.sh" may be used in the stderr.txt output as you suggested. What about the timestamps? I would prefer local time as written here: https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=467&postid=6282 Other comments are, of course, also welcome. @Laurence I'd like to know if the suggestions violate any CERN coding policies. I know that the following has already been asked, but out of the discussion about sherpa: To make it work also for cancelled tasks it would be necessary to implement a signal handler that traps the TERM signal. It would also mean to make some changes in "function fail" to write the log before the script exits. Not such a big deal, but I'd like to get Laurence's comment first. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 859,751 RAC: 30 |
I checked the code of runRivet.sh to be sure the logline is written by that script.I checked that before I made my suggestion with e.g. pid: sudo ls -l /proc/4492/fd What about the timestamps?It would be good not to mix up UTC and local time in the stderr output. I would also prefer local time. In your extracted logline UTC is added to the timegroup, so no confusion possible. |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
I would suggest using tail --pid $$ -n 1000 -F /var/lib/boinc-client/slots/0/cernvm/shared/runRivet.log | head -n1 As far as I understand $1 will not give the pid and and $$ is required. I have added -n 1000 to catch the situation where -F does not refresh quick enough and more than 10 lines are printed. |
Send message Joined: 28 Jul 16 Posts: 482 Credit: 394,720 RAC: 0 |
I have added -n 1000 to catch the situation where -F does not refresh quick enough ... Agree. I'm not sure if "| head" reliably terminated the pipe during my tests. Hence I used grep. The trick is as follows: Insert a subfunction call (here: "read_runrivet_log") just before runc is executed. A subfunction because it can run in the background. Run that subfunction in the background "&". Call the subfunction with "$$" as a parameter. The latter becomes "$1" within the subfunction. Within the subfunction a pipe is set up. Left side of the pipe: grep -m1 '.*' outputs 1 line and then terminates (itself and the whole pipe). Right side of the pipe: tail --pid $1 -F cernvm/shared/runRivet.log 2>/dev/null This waits for the appearance of runRivet.log even if it is not yet created "-F" (uppercase!). Tail terminates if "$1" (the main process) ends (important not to leave a zombie pipe). |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
I have added -n 1000 to catch the situation where -F does not refresh quick enough ... I missed this key point :)
ok, I will implement this on Monday as we should never make changes on a Friday afternoon. |
Send message Joined: 28 Jul 16 Posts: 482 Credit: 394,720 RAC: 0 |
@Laurence Do you agree to use local time instead of UTC as discussed below? https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=467&postid=6287 |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
@Laurence Yes |
©2024 CERN