Message boards : CMS Application : Batch Progress
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
[cms005@lcggwms02:~] > cat stats.sh #!/bin/bash grep 'NodeStatus ' $1/node_state.txt|sort|uniq -c Mon May 23 12:30:46 [cms005@lcggwms02:~] > ./stats.sh 160518_203523:ireid_crab_CMS_at_Home_TTbar_50ev_prodB 4173 NodeStatus = 1; /* "STATUS_READY" */ 1075 NodeStatus = 3; /* "STATUS_SUBMITTED" */ 4745 NodeStatus = 5; /* "STATUS_DONE" */ 7 NodeStatus = 6; /* "STATUS_ERROR" */ ![]() |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
Thanks for the info. How can there be fewer "submitted" than "Ready" or "Done"? |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
Thanks for the info. When I send a batch - 10,000 in this case - all the jobs are "ready". Then, up to ~1,000 are moved into the queue and become "submitted". As jobs are taken up by processes, jobs move into the queue to replace them and thus go out of "ready" into "submitted"; I think that jobs which are re-queued for retry also are "submitted". I'm not sure what state running jobs are in, probably also submitted since the sum of "idle" and "running" is about the number submitted. And of course jobs which are successful get to "done", those which fail three tries (or other errors) go into "error". Note that all four categories add up to the total number in the batch. ![]() |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
Thanks for the explanation Ivan. I thought, submitted means the same as in dashboard. |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
No, that's from Condor itself. Confusing terminology, I must admit... ![]() |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
How is the proxy lease? Does it not need a refresh. soon? |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
How is the proxy lease? 160518_203523, so due on the 25th. Current stats: 3776 NodeStatus = 1; /* "STATUS_READY" */ 1084 NodeStatus = 3; /* "STATUS_SUBMITTED" */ 1 NodeStatus = 4; /* "STATUS_POSTRUN" */ 5132 NodeStatus = 5; /* "STATUS_DONE" */ 7 NodeStatus = 6; /* "STATUS_ERROR" */ ![]() |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
Previous large batch: [cms005@lcggwms02:~] > ./stats.sh 160509_200134:ireid_crab_CMS_at_Home_TTbar_50ev_prodA 9881 NodeStatus = 5; /* "STATUS_DONE" */ 119 NodeStatus = 6; /* "STATUS_ERROR" */ ![]() |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
Currrent batch: [cms005@lcggwms02:~] > ./stats.sh 160518_203523:ireid_crab_CMS_at_Home_TTbar_50ev_prodB 1771 NodeStatus = 1; /* "STATUS_READY" */ 1095 NodeStatus = 3; /* "STATUS_SUBMITTED" */ 7126 NodeStatus = 5; /* "STATUS_DONE" */ 8 NodeStatus = 6; /* "STATUS_ERROR" */ Proxy renewed today. :-) ![]() |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
Thanks, Ivan! |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
[cms005@lcggwms02:~] > ./stats.sh 160518_203523:ireid_crab_CMS_at_Home_TTbar_50ev_prodB 617 NodeStatus = 3; /* "STATUS_SUBMITTED" */ 9356 NodeStatus = 5; /* "STATUS_DONE" */ 27 NodeStatus = 6; /* "STATUS_ERROR" */ I'll have to submit a new batch sometime over the (long) weekend. ![]() |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
New batch (5,000 x 50 TTbar events) submitted. Let's see if Dashboard is fully operational again... ![]() |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
Final status: [cms005@lcggwms02:~] > ./stats.sh 160518_203523:ireid_crab_CMS_at_Home_TTbar_50ev_prodB 9968 NodeStatus = 5; /* "STATUS_DONE" */ 32 NodeStatus = 6; /* "STATUS_ERROR" */ ![]() |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
Congrats! Best one, yet. Only 0.32% ERROR. |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
Congrats! Looks like we lost a few in the system, though -- I only count 9,816 result files on the data-bridge. Some may yet turn up, but it's doubtful. ![]() |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
Looks like we lost a few in the system, though -- I only count 9,816 result files on the data-bridge. Some may yet turn up, but it's doubtful. Does that include the error jobs? |
![]() ![]() Send message Joined: 12 Sep 14 Posts: 1071 Credit: 334,981 RAC: 6 ![]() |
I prefer 99.68% SUCCESS. We should try to understand what happened to the 32 jobs that failed. |
![]() ![]() Send message Joined: 12 Sep 14 Posts: 1071 Credit: 334,981 RAC: 6 ![]() |
If you can give me some examples of missing files I can investigate. |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
Looks like we lost a few in the system, though -- I only count 9,816 result files on the data-bridge. Some may yet turn up, but it's doubtful. Yes, I presume they didn't make it all the way through for some reason or another. ![]() |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
If you can give me some examples of missing files I can investigate. I'll see what I can dig out, Laurence. It may be a bit messy, but that's what bash, awk and python are for... ![]() |
©2025 CERN