Message boards : Number crunching : Expect errors eventually
Message board moderation
Previous · 1 . . . 9 · 10 · 11 · 12
Author | Message |
---|---|
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
That's getting beyond my pay-grade, I'm afraid! :-) ![]() |
![]() ![]() Send message Joined: 12 Sep 14 Posts: 1128 Credit: 339,230 RAC: 19 ![]() |
This makes sense. You can see that although there are many job failures, this does not translate into walltime lost. Those jobs are probably failing in the server so not even sent to the VM. The walltime plot shows a few failures, probably because the finished job could not contact the server. |
![]() ![]() Send message Joined: 12 Sep 14 Posts: 1128 Credit: 339,230 RAC: 19 ![]() |
It just means that once the condor client in the VM has matched a job, it will keep getting jobs without having to be re-matched. Sort of like keeping a session open with connections. It makes things more efficient and as we are not doing complex scheduling with priorities and our VMs will only last for 12h, it should not have any negative effects. It should however avoid the claim expired error with suspend/resume. |
![]() ![]() Send message Joined: 12 Sep 14 Posts: 1128 Credit: 339,230 RAC: 19 ![]() |
Discussions like these should now be in the CMS application topic. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
Maybe you should put a big note somewhere to stop this from happening again. |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
Maybe you should put a big note somewhere to stop this from happening again. +1 I'll ask RAL. ![]() |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
This makes sense. You can see that although there are many job failures, this does not translate into walltime lost. Those jobs are probably failing in the server so not even sent to the VM. The walltime plot shows a few failures, probably because the finished job could not contact the server. Maybe not walltime lost, but the nearly all fails ran for 20min.This adds up to quite some wasted time. |
©2025 CERN