Message boards :
LHCb Application :
Errors
Message board moderation
Author | Message |
---|---|
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I am getting these errors: 12/05/16 16:12:01 (pid:4505) ReliSock::put_file_with_permissions(): Failed to stat file '/var/lib/condor/execute/dir_4505/pilot.out': No such file or directory (errno: 2, si_error: 1) |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
Thanks for reporting this. We have noticed them too. Am investigating ... |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Appears to be working again, as of about 1h ago. |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
Yes, we have been having some trouble with the LHCb submission recently. On top of that it looks like an LHCb software version was bumped and the application was removed from CVMFS. The error you saw was due to the pilots failing and so the output not being available. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Tasks failing after about 3.5h-- Heartbeat missing. http://lhcathomedev.cern.ch/vLHCathome-dev/result.php?resultid=291697 |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
There is this interesting post on the topic. Once the consolidation is stable, we can focus on things like this. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
How can it be, that after 13h it decides, that the heartbeat is missing??? The only way to get a valid result is to shut a task down manually, before it reaches the end. http://lhcathomedev.cern.ch/vLHCathome-dev/result.php?resultid=291794 |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
The VM freezes. This could be related to the post. If the system is loaded in such a way that the internal clock slows down, this could result in the heartbeat slowing down to a point where the external monitor thinks it is dead. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Would it not be better to check the heartbeat file for ANY change? Even if it is not up to date, but if it has changed since the last time it was checked, that should be sufficient to declare the VM is still alive. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Disregard this post. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
The "LHCb jobs" graphs are not working. |
©2024 CERN