Message boards : CMS Application : Batch Progress
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1152 Credit: 8,310,612 RAC: 0 ![]() |
[cms005@lcggwms02:~] > cat stats.sh #!/bin/bash grep 'NodeStatus ' $1/node_state.txt|sort|uniq -c Mon May 23 12:30:46 [cms005@lcggwms02:~] > ./stats.sh 160518_203523:ireid_crab_CMS_at_Home_TTbar_50ev_prodB 4173 NodeStatus = 1; /* "STATUS_READY" */ 1075 NodeStatus = 3; /* "STATUS_SUBMITTED" */ 4745 NodeStatus = 5; /* "STATUS_DONE" */ 7 NodeStatus = 6; /* "STATUS_ERROR" */ ![]() |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,215,383 RAC: 5 ![]() |
Thanks for the info. How can there be fewer "submitted" than "Ready" or "Done"? |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1152 Credit: 8,310,612 RAC: 0 ![]() |
Thanks for the info. When I send a batch - 10,000 in this case - all the jobs are "ready". Then, up to ~1,000 are moved into the queue and become "submitted". As jobs are taken up by processes, jobs move into the queue to replace them and thus go out of "ready" into "submitted"; I think that jobs which are re-queued for retry also are "submitted". I'm not sure what state running jobs are in, probably also submitted since the sum of "idle" and "running" is about the number submitted. And of course jobs which are successful get to "done", those which fail three tries (or other errors) go into "error". Note that all four categories add up to the total number in the batch. ![]() |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,215,383 RAC: 5 ![]() |
Thanks for the explanation Ivan. I thought, submitted means the same as in dashboard. |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1152 Credit: 8,310,612 RAC: 0 ![]() |
|
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,215,383 RAC: 5 ![]() |
How is the proxy lease? Does it not need a refresh. soon? |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1152 Credit: 8,310,612 RAC: 0 ![]() |
How is the proxy lease? 160518_203523, so due on the 25th. Current stats: 3776 NodeStatus = 1; /* "STATUS_READY" */ 1084 NodeStatus = 3; /* "STATUS_SUBMITTED" */ 1 NodeStatus = 4; /* "STATUS_POSTRUN" */ 5132 NodeStatus = 5; /* "STATUS_DONE" */ 7 NodeStatus = 6; /* "STATUS_ERROR" */ ![]() |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1152 Credit: 8,310,612 RAC: 0 ![]() |
|
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1152 Credit: 8,310,612 RAC: 0 ![]() |
|
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,215,383 RAC: 5 ![]() |
Thanks, Ivan! |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1152 Credit: 8,310,612 RAC: 0 ![]() |
|
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1152 Credit: 8,310,612 RAC: 0 ![]() |
|
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1152 Credit: 8,310,612 RAC: 0 ![]() |
|
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,215,383 RAC: 5 ![]() |
Congrats! Best one, yet. Only 0.32% ERROR. |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1152 Credit: 8,310,612 RAC: 0 ![]() |
|
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,215,383 RAC: 5 ![]() |
Looks like we lost a few in the system, though -- I only count 9,816 result files on the data-bridge. Some may yet turn up, but it's doubtful. Does that include the error jobs? |
![]() ![]() Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 5 ![]() |
I prefer 99.68% SUCCESS. We should try to understand what happened to the 32 jobs that failed. |
![]() ![]() Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 5 ![]() |
If you can give me some examples of missing files I can investigate. |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1152 Credit: 8,310,612 RAC: 0 ![]() |
|
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1152 Credit: 8,310,612 RAC: 0 ![]() |
|
©2025 CERN