Thread 'Theory native - getting runRivet.log early'

Author	Message
computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 520 Credit: 400,710 RAC: 0	Message 6280 - Posted: 6 Apr 2019, 22:24:28 UTC a method to copy the 1st line of runRivet.log to stderr.txt immediately after the log line is written. Code is based on cranky-0.0.28. Insert this function somewhere before "function run": [pre]function read_runrivet_log { local logline1="$(grep -m1 '.*' <(tail --pid $1 -F cernvm/shared/runRivet.log 2>/dev/null))" echo $(date -u +"%R:%S %Y-%m-%d:") "runRivet.log: [INFO] ${logline1}" >&2 }[/pre] Then insert the function call in "function run": [pre] trap pause SIGTSTP trap resume SIGCONT read_runrivet_log $$ & ${runc} --root state run -b cernvm ${result_id} &[/pre] Getting the same line again in "function prepare_output" is now obsolete and can be disabled: [pre] #head -n 1 cernvm/shared/runRivet.log >&2[/pre] The patch is currently running on only 1 of my hosts. The result can be seen here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=220687133 ID: 6280 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 520 Credit: 400,710 RAC: 0	Message 6282 - Posted: 7 Apr 2019, 8:34:35 UTC amps written to stderr.txt use local time (wrapper) as well as UTC (cranky). I suggest to let cranky also report in local time to avoid confusion. It would also be helpful if debugging is necessary as the timestamps from the logs then correspond to the timestamps of local files. Nonetheless I would leave the logline copied from runRivet.log untouched as this time is explicitly marked to be UTC. To implement the changes 2 lines would have to be changed. in "function send_to_log" [pre]# echo $(date -u +"%R:%S %Y-%m-%d:") "$(basename $0): [$1] $2" >&2 echo $(date +"%R:%S %Y-%m-%d:") "$(basename $0): [$1] $2" >&2[/pre] in "function read_runrivet_log" [pre]# echo $(date -u +"%R:%S %Y-%m-%d:") "runRivet.log: [INFO] ${logline1}" >&2 echo $(date +"%R:%S %Y-%m-%d:") "runRivet.log: [INFO] ${logline1}" >&2[/pre] ID: 6282 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1244 Credit: 967,581 RAC: 262	Message 6283 - Posted: 7 Apr 2019, 17:34:39 UTC - in response to Message 6280. REPLACE_PRE_CODE BUG text: Found a method to copy the 1st line of runRivet.log to stderr.txt immediately after the log line is written. That is what I proposed when Laurence introduced it, but Laurence probably wanted to do it in an existing function, but surely it's the best way to get this information as soon as possible in the stdout. This way we can see which jobs are causing an error. I am not a programmer, but fiddled around a bit with your code (open source, I think) to extract only the job-name from that line and came to this code for your function read_runrivet_log: [code]function read_runrivet_log { local logline1="$(grep -m1 -o '\[boinc.*' <(tail --pid $1 -F cernvm/shared/runRivet.log 2>/dev/null))" echo $(date -u +"%R:%S %Y-%m-%d:") "Job started : [INFO] ${logline1}" >&2 } [/code] The output looks like this: [pre] 17:16:48 2019-04-07: cranky-0.0.28: [INFO] Running Container 'runc'. 17:16:50 2019-04-07: Job started : [INFO] [boinc pp mb-nsd 2360 - - pythia8 8.230 tune-AU2lox 100000 40][/pre] ID: 6283 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 520 Credit: 400,710 RAC: 0	Message 6284 - Posted: 7 Apr 2019, 20:29:57 UTC - in response to Message 6283. As soon as the logline is available in cranky you can do with it whatever you want. I left it untouched to show it's written by another process and to see if that process writes garbage but it's also no problem to work with arbitrary parts of it. That's more or less a philosophical question (however a very small and unimportant one). I know that the following has already been asked, but out of the discussion about sherpa: Do we need other information in stderr.txt, e.g. logfile size or last x loglines? Both could easily be included. ID: 6284 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1244 Credit: 967,581 RAC: 262	Message 6285 - Posted: 8 Apr 2019, 7:59:00 UTC - in response to Message 6284. I left it untouched to show it's written by another process and to see if that process writes garbage but it's also no problem to work with arbitrary parts of it. Yes you are right, so I returned to your version. However I noticed that before [INFO] normally the process name is used, so I suggest to change in the echo line runRivet.log by runRivet.sh with the needed spaces to outline the semi-colons. 07:57:08 2019-04-08: cranky-0.0.28: [INFO] Running Container 'runc'. 07:57:11 2019-04-08: runRivet.sh : [INFO] ===> [runRivet] Mon Apr 8 07:57:09 UTC 2019 [boinc pp jets 8000 180,-,3560 - herwig7 7.0.4 UE-MMHT 100000 40] I know that the following has already been asked, but out of the discussion about sherpa: Do we need other information in stderr.txt, e.g. logfile size or last x loglines? Maybe only when an error occurs. ID: 6285 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 520 Credit: 400,710 RAC: 0	Message 6286 - Posted: 8 Apr 2019, 9:39:39 UTC - in response to Message 6285. I left it untouched to show it's written by another process and to see if that process writes garbage but it's also no problem to work with arbitrary parts of it. Yes you are right, so I returned to your version. However I noticed that before [INFO] normally the process name is used, so I suggest to change in the echo line runRivet.log by runRivet.sh with the needed spaces to outline the semi-colons. 07:57:08 2019-04-08: cranky-0.0.28: [INFO] Running Container 'runc'. 07:57:11 2019-04-08: runRivet.sh : [INFO] ===> [runRivet] Mon Apr 8 07:57:09 UTC 2019 [boinc pp jets 8000 180,-,3560 - herwig7 7.0.4 UE-MMHT 100000 40] I checked the code of runRivet.sh to be sure the logline is written by that script. It is, hence "runRivet.sh" may be used in the stderr.txt output as you suggested. What about the timestamps? I would prefer local time as written here: https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=467&postid=6282 Other comments are, of course, also welcome. @Laurence I'd like to know if the suggestions violate any CERN coding policies. I know that the following has already been asked, but out of the discussion about sherpa: Do we need other information in stderr.txt, e.g. logfile size or last x loglines? Maybe only when an error occurs. To make it work also for cancelled tasks it would be necessary to implement a signal handler that traps the TERM signal. It would also mean to make some changes in "function fail" to write the log before the script exits. Not such a big deal, but I'd like to get Laurence's comment first. ID: 6286 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1244 Credit: 967,581 RAC: 262	Message 6287 - Posted: 8 Apr 2019, 9:59:51 UTC - in response to Message 6286. I checked the code of runRivet.sh to be sure the logline is written by that script. It is, hence "runRivet.sh" may be used in the stderr.txt output as you suggested. I checked that before I made my suggestion with e.g. pid: sudo ls -l /proc/4492/fd What about the timestamps? I would prefer local time as written here: https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=467&postid=6282 It would be good not to mix up UTC and local time in the stderr output. I would also prefer local time. In your extracted logline UTC is added to the timegroup, so no confusion possible. ID: 6287 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1142 Credit: 339,231 RAC: 0	Message 6328 - Posted: 3 May 2019, 12:11:19 UTC - in response to Message 6286. I would suggest using tail --pid $$ -n 1000 -F /var/lib/boinc-client/slots/0/cernvm/shared/runRivet.log \| head -n1 As far as I understand $1 will not give the pid and and $$ is required. I have added -n 1000 to catch the situation where -F does not refresh quick enough and more than 10 lines are printed. ID: 6328 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 520 Credit: 400,710 RAC: 0	Message 6331 - Posted: 3 May 2019, 13:14:20 UTC - in response to Message 6328. I have added -n 1000 to catch the situation where -F does not refresh quick enough ... Agree. I'm not sure if "\| head" reliably terminated the pipe during my tests. Hence I used grep. The trick is as follows: Insert a subfunction call (here: "read_runrivet_log") just before runc is executed. A subfunction because it can run in the background. Run that subfunction in the background "&". Call the subfunction with "$$" as a parameter. The latter becomes "$1" within the subfunction. Within the subfunction a pipe is set up. Left side of the pipe: grep -m1 '.*' outputs 1 line and then terminates (itself and the whole pipe). Right side of the pipe: tail --pid $1 -F cernvm/shared/runRivet.log 2>/dev/null This waits for the appearance of runRivet.log even if it is not yet created "-F" (uppercase!). Tail terminates if "$1" (the main process) ends (important not to leave a zombie pipe). ID: 6331 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1142 Credit: 339,231 RAC: 0	Message 6334 - Posted: 3 May 2019, 14:08:00 UTC - in response to Message 6331. I have added -n 1000 to catch the situation where -F does not refresh quick enough ... Agree. I'm not sure if "\| head" reliably terminated the pipe during my tests. Hence I used grep. The trick is as follows: Insert a subfunction call (here: "read_runrivet_log") just before runc is executed. A subfunction because it can run in the background. Run that subfunction in the background "&". Call the subfunction with "$$" as a parameter. I missed this key point :) The latter becomes "$1" within the subfunction. Within the subfunction a pipe is set up. Left side of the pipe: grep -m1 '.*' outputs 1 line and then terminates (itself and the whole pipe). Right side of the pipe: tail --pid $1 -F cernvm/shared/runRivet.log 2>/dev/null This waits for the appearance of runRivet.log even if it is not yet created "-F" (uppercase!). Tail terminates if "$1" (the main process) ends (important not to leave a zombie pipe). ok, I will implement this on Monday as we should never make changes on a Friday afternoon. ID: 6334 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 520 Credit: 400,710 RAC: 0	Message 6338 - Posted: 3 May 2019, 15:19:18 UTC @Laurence Do you agree to use local time instead of UTC as discussed below? https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=467&postid=6287 ID: 6338 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1142 Credit: 339,231 RAC: 0	Message 6352 - Posted: 6 May 2019, 8:21:44 UTC - in response to Message 6338. @Laurence Do you agree to use local time instead of UTC as discussed below? https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=467&postid=6287 Yes ID: 6352 · Rating: 0 · rate: / Reply Quote

Development for LHC@home