Message boards :
Theory Application :
New Version v2.4
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
This new version provides HTCondor 8.4.8 using the standard CernVM production distribution. It should finally solve the suspend/resume problems and is half the size of the previous image. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 859,751 RAC: 25 |
Resume after a suspend period of over 30,000 seconds of inactivity on all 4 saved VM's went fine. All saved jobs finishing fine after the resume and new jobs started. After the last finished during the >12 hours elapsed time and the 5 minutes idle time some Error remarks in StartLog. Seems like cleanup dust: 08/06/16 10:37:52 Got activate_claim request from shadow (188.184.187.167) 08/06/16 10:37:52 Remote job ID is 1461227.0 08/06/16 10:37:52 Got universe "VANILLA" (5) from request classad 08/06/16 10:37:52 State change: claim-activation protocol successful 08/06/16 10:37:52 Changing activity: Idle -> Busy 08/06/16 11:31:36 Called deactivate_claim_forcibly() 08/06/16 11:31:36 Starter pid 36909 exited with status 0 08/06/16 11:31:36 State change: starter exited 08/06/16 11:31:36 Changing activity: Busy -> Idle 08/06/16 11:31:36 State change: START is false 08/06/16 11:31:36 Changing state and activity: Claimed/Idle -> Preempting/Vacating 08/06/16 11:31:36 State change: No preempting claim, returning to owner 08/06/16 11:31:36 Changing state and activity: Preempting/Vacating -> Owner/Idle 08/06/16 11:31:36 State change: IS_OWNER is false 08/06/16 11:31:36 Changing state: Owner -> Unclaimed 08/06/16 11:31:36 Error: can't find resource with ClaimId (<10.0.2.15:14809>#1470431871#1#...) for 444 (ACTIVATE_CLAIM) 08/06/16 11:31:36 Error: can't find resource with ClaimId (<10.0.2.15:14809>#1470431871#1#...) -- perhaps this claim was already removed? 08/06/16 11:31:36 Error: problem finding resource for 403 (DEACTIVATE_CLAIM) |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
The CLAIM_WORKLIFE value on the server has been increased from 1200s to 86400s. This will hopefully remove that error message. As the image has not changed and seems fine, it will be released to production. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
I have a vm, that is clearly running, but the vm-box-manager shows it as "powered off". How can that be? This might be the reason for the "heartbeat" problem. Shutting down boinc fully, put the VM into "saved" state. Restarting Boinc started the VM normally. |
©2024 CERN