Message boards : Theory Application : Theory native - getting runRivet.log early
Message board moderation

To post messages, you must log in.

AuthorMessage
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 482
Credit: 394,720
RAC: 0
Message 6280 - Posted: 6 Apr 2019, 22:24:28 UTC

Found a method to copy the 1st line of runRivet.log to stderr.txt immediately after the log line is written.

Code is based on cranky-0.0.28.

Insert this function somewhere before "function run":
function read_runrivet_log {
    local logline1="$(grep -m1 '.*' <(tail --pid $1 -F cernvm/shared/runRivet.log 2>/dev/null))"
    echo $(date -u +"%R:%S %Y-%m-%d:") "runRivet.log:  [INFO] ${logline1}" >&2
}



Then insert the function call in "function run":
    trap pause SIGTSTP
    trap resume SIGCONT
    read_runrivet_log $$ &
    ${runc} --root state run -b cernvm ${result_id} &



Getting the same line again in "function prepare_output" is now obsolete and can be disabled:
    #head -n 1  cernvm/shared/runRivet.log >&2


The patch is currently running on only 1 of my hosts.
The result can be seen here:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=220687133
ID: 6280 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 482
Credit: 394,720
RAC: 0
Message 6282 - Posted: 7 Apr 2019, 8:34:35 UTC

Timestamps written to stderr.txt use local time (wrapper) as well as UTC (cranky).
I suggest to let cranky also report in local time to avoid confusion.
It would also be helpful if debugging is necessary as the timestamps from the logs then correspond to the timestamps of local files.

Nonetheless I would leave the logline copied from runRivet.log untouched as this time is explicitly marked to be UTC.


To implement the changes 2 lines would have to be changed.
in "function send_to_log"
#   echo $(date -u +"%R:%S %Y-%m-%d:") "$(basename $0): [$1] $2" >&2
    echo $(date +"%R:%S %Y-%m-%d:") "$(basename $0): [$1] $2" >&2


in "function read_runrivet_log"
#   echo $(date -u +"%R:%S %Y-%m-%d:") "runRivet.log:  [INFO] ${logline1}" >&2
    echo $(date +"%R:%S %Y-%m-%d:") "runRivet.log:  [INFO] ${logline1}" >&2
ID: 6282 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 859,751
RAC: 30
Message 6283 - Posted: 7 Apr 2019, 17:34:39 UTC - in response to Message 6280.  

Found a method to copy the 1st line of runRivet.log to stderr.txt immediately after the log line is written.

That is what I proposed when Laurence introduced it, but Laurence probably wanted to do it in an existing function,
but surely it's the best way to get this information as soon as possible in the stdout. This way we can see which jobs are causing an error.

I am not a programmer, but fiddled around a bit with your code (open source, I think) to extract only the job-name from that line
and came to this code for your function read_runrivet_log:
function read_runrivet_log {
    local logline1="$(grep -m1 -o '\[boinc.*' <(tail --pid $1 -F cernvm/shared/runRivet.log 2>/dev/null))"
    echo $(date -u +"%R:%S %Y-%m-%d:") "Job started  : [INFO] ${logline1}" >&2
}

The output looks like this:
17:16:48 2019-04-07: cranky-0.0.28: [INFO] Running Container 'runc'.
17:16:50 2019-04-07: Job started  : [INFO] [boinc pp mb-nsd 2360 - - pythia8 8.230 tune-AU2lox 100000 40]
ID: 6283 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 482
Credit: 394,720
RAC: 0
Message 6284 - Posted: 7 Apr 2019, 20:29:57 UTC - in response to Message 6283.  

As soon as the logline is available in cranky you can do with it whatever you want.
I left it untouched to show it's written by another process and to see if that process writes garbage but it's also no problem to work with arbitrary parts of it.
That's more or less a philosophical question (however a very small and unimportant one).


I know that the following has already been asked, but out of the discussion about sherpa:
Do we need other information in stderr.txt, e.g. logfile size or last x loglines?
Both could easily be included.
ID: 6284 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 859,751
RAC: 30
Message 6285 - Posted: 8 Apr 2019, 7:59:00 UTC - in response to Message 6284.  

I left it untouched to show it's written by another process and to see if that process writes garbage but it's also no problem to work with arbitrary parts of it.
Yes you are right, so I returned to your version.
However I noticed that before [INFO] normally the process name is used, so I suggest to change in the echo line runRivet.log by runRivet.sh with the needed spaces to outline the semi-colons.

07:57:08 2019-04-08: cranky-0.0.28: [INFO] Running Container 'runc'.
07:57:11 2019-04-08: runRivet.sh  : [INFO] ===> [runRivet] Mon Apr  8 07:57:09 UTC 2019 [boinc pp jets 8000 180,-,3560 - herwig7 7.0.4 UE-MMHT 100000 40]


I know that the following has already been asked, but out of the discussion about sherpa:
Do we need other information in stderr.txt, e.g. logfile size or last x loglines?

Maybe only when an error occurs.
ID: 6285 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 482
Credit: 394,720
RAC: 0
Message 6286 - Posted: 8 Apr 2019, 9:39:39 UTC - in response to Message 6285.  

I left it untouched to show it's written by another process and to see if that process writes garbage but it's also no problem to work with arbitrary parts of it.
Yes you are right, so I returned to your version.
However I noticed that before [INFO] normally the process name is used, so I suggest to change in the echo line runRivet.log by runRivet.sh with the needed spaces to outline the semi-colons.

07:57:08 2019-04-08: cranky-0.0.28: [INFO] Running Container 'runc'.
07:57:11 2019-04-08: runRivet.sh  : [INFO] ===> [runRivet] Mon Apr  8 07:57:09 UTC 2019 [boinc pp jets 8000 180,-,3560 - herwig7 7.0.4 UE-MMHT 100000 40]

I checked the code of runRivet.sh to be sure the logline is written by that script.
It is, hence "runRivet.sh" may be used in the stderr.txt output as you suggested.


What about the timestamps?
I would prefer local time as written here:
https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=467&postid=6282

Other comments are, of course, also welcome.


@Laurence
I'd like to know if the suggestions violate any CERN coding policies.




I know that the following has already been asked, but out of the discussion about sherpa:
Do we need other information in stderr.txt, e.g. logfile size or last x loglines?

Maybe only when an error occurs.

To make it work also for cancelled tasks it would be necessary to implement a signal handler that traps the TERM signal.
It would also mean to make some changes in "function fail" to write the log before the script exits.
Not such a big deal, but I'd like to get Laurence's comment first.
ID: 6286 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 859,751
RAC: 30
Message 6287 - Posted: 8 Apr 2019, 9:59:51 UTC - in response to Message 6286.  

I checked the code of runRivet.sh to be sure the logline is written by that script.
It is, hence "runRivet.sh" may be used in the stderr.txt output as you suggested.
I checked that before I made my suggestion with e.g. pid: sudo ls -l /proc/4492/fd

What about the timestamps?
I would prefer local time as written here:
https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=467&postid=6282
It would be good not to mix up UTC and local time in the stderr output. I would also prefer local time.
In your extracted logline UTC is added to the timegroup, so no confusion possible.
ID: 6287 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 0
Message 6328 - Posted: 3 May 2019, 12:11:19 UTC - in response to Message 6286.  

I would suggest using

tail --pid $$ -n 1000 -F /var/lib/boinc-client/slots/0/cernvm/shared/runRivet.log | head -n1


As far as I understand $1 will not give the pid and and $$ is required. I have added -n 1000 to catch the situation where -F does not refresh quick enough and more than 10 lines are printed.
ID: 6328 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 482
Credit: 394,720
RAC: 0
Message 6331 - Posted: 3 May 2019, 13:14:20 UTC - in response to Message 6328.  

I have added -n 1000 to catch the situation where -F does not refresh quick enough ...

Agree.


I'm not sure if "| head" reliably terminated the pipe during my tests. Hence I used grep.



The trick is as follows:


Insert a subfunction call (here: "read_runrivet_log") just before runc is executed.
A subfunction because it can run in the background.
Run that subfunction in the background "&".
Call the subfunction with "$$" as a parameter.
The latter becomes "$1" within the subfunction.


Within the subfunction a pipe is set up.

Left side of the pipe:
grep -m1 '.*'
outputs 1 line and then terminates (itself and the whole pipe).



Right side of the pipe:
tail --pid $1 -F cernvm/shared/runRivet.log 2>/dev/null

This waits for the appearance of runRivet.log even if it is not yet created "-F" (uppercase!).
Tail terminates if "$1" (the main process) ends (important not to leave a zombie pipe).
ID: 6331 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 0
Message 6334 - Posted: 3 May 2019, 14:08:00 UTC - in response to Message 6331.  

I have added -n 1000 to catch the situation where -F does not refresh quick enough ...

Agree.


I'm not sure if "| head" reliably terminated the pipe during my tests. Hence I used grep.



The trick is as follows:


Insert a subfunction call (here: "read_runrivet_log") just before runc is executed.
A subfunction because it can run in the background.
Run that subfunction in the background "&".
Call the subfunction with "$$" as a parameter.

I missed this key point :)

The latter becomes "$1" within the subfunction.


Within the subfunction a pipe is set up.

Left side of the pipe:
grep -m1 '.*'
outputs 1 line and then terminates (itself and the whole pipe).



Right side of the pipe:
tail --pid $1 -F cernvm/shared/runRivet.log 2>/dev/null

This waits for the appearance of runRivet.log even if it is not yet created "-F" (uppercase!).
Tail terminates if "$1" (the main process) ends (important not to leave a zombie pipe).


ok, I will implement this on Monday as we should never make changes on a Friday afternoon.
ID: 6334 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 482
Credit: 394,720
RAC: 0
Message 6338 - Posted: 3 May 2019, 15:19:18 UTC

@Laurence

Do you agree to use local time instead of UTC as discussed below?
https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=467&postid=6287
ID: 6338 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 0
Message 6352 - Posted: 6 May 2019, 8:21:44 UTC - in response to Message 6338.  

@Laurence

Do you agree to use local time instead of UTC as discussed below?
https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=467&postid=6287


Yes
ID: 6352 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Theory Application : Theory native - getting runRivet.log early


©2024 CERN