Message boards : LHCb Application : Errors
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 14 |
I am getting these errors: 12/05/16 16:12:01 (pid:4505) ReliSock::put_file_with_permissions(): Failed to stat file '/var/lib/condor/execute/dir_4505/pilot.out': No such file or directory (errno: 2, si_error: 1) |
Laurence CERN![]() Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 0 |
Thanks for reporting this. We have noticed them too. Am investigating ... |
|
Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 14 |
Appears to be working again, as of about 1h ago. |
Laurence CERN![]() Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 0 |
Yes, we have been having some trouble with the LHCb submission recently. On top of that it looks like an LHCb software version was bumped and the application was removed from CVMFS. The error you saw was due to the pilots failing and so the output not being available. |
|
Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 14 |
Tasks failing after about 3.5h-- Heartbeat missing. http://lhcathomedev.cern.ch/vLHCathome-dev/result.php?resultid=291697 |
Laurence CERN![]() Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 0 |
There is this interesting post on the topic. Once the consolidation is stable, we can focus on things like this. |
|
Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 14 |
How can it be, that after 13h it decides, that the heartbeat is missing??? The only way to get a valid result is to shut a task down manually, before it reaches the end. http://lhcathomedev.cern.ch/vLHCathome-dev/result.php?resultid=291794 |
Laurence CERN![]() Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 0 |
The VM freezes. This could be related to the post. If the system is loaded in such a way that the internal clock slows down, this could result in the heartbeat slowing down to a point where the external monitor thinks it is dead. |
|
Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 14 |
Would it not be better to check the heartbeat file for ANY change? Even if it is not up to date, but if it has changed since the last time it was checked, that should be sufficient to declare the VM is still alive. |
|
Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 14 |
Disregard this post. |
|
Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 14 |
The "LHCb jobs" graphs are not working. |
©2025 CERN