Message boards :
CMS Application :
No Tasks
Message board moderation
Previous · 1 · 2 · 3 · 4
Author | Message |
---|---|
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Why is there a queue of 100 on LHCb and theory on the dev project? The queue length of 100 is good for the dev-project, as admin does not check very often. Maybe there are so few users on CMS, because it often fails. If there is nothing to test, they should should shut down the dev project and make people go to the production project. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
The queue is down to 10. Seems I was wrong a few days ago; the queue was manually filled to 100. Now we have the new WMAgent running again, we're using automatic task creation again with a goal of 10 tasks waiting. That's usually enough, although if it were more I wouldn't have to check as much. :-) |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
If there is nothing to test, they should should shut down the dev project and make people go to the production project. This I am happy to discuss. In general, the app is identical to production but any changes go here first. If everyone thinks that we should only have tasks when were are testing and update then was could do that. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 |
It's better to test with 10 User here than with thousands in production! Last week upload-Server crash in production was a good example what can be. |
Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 13 |
It's better to test with 10 User here than with thousands in production! In general, I agree, though we should have some trickle-through and not a static situation. For example, Laurence, what is our experience with multi-core/job VMs and are they ready to be deployed to production? |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
...if it were more I wouldn't have to check as much That is the point. It would give you more time to react, if no new tasks are generated, for whatever reason. Otherwise, volunteers computers will go idle unnecessarily. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
If there is nothing to crunch here, then i would not check that often, if there was anything new to test here. Therefore, doing "production" like crunching here would be good, however this would make you having to check this site every once in a while, even you are not testing anything new. It is up to you--having to maintain this test-project to keep crunchers, when you need to test new things --or-- not running "production" tasks here and risking, that there are not any(or too few) volunteers to test things, when you need them to. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
I have thought about this and will not do anything. There a two modes of operation. The first is having a continuous trickle of jobs to keep the system alive, constantly monitor and when things break check the boards to see if there have been any updates. The other is to announce a change, submit a bunch of jobs and check that everything is working. As we have to in both cases make an announcement, the jobs are real (i.e useful) jobs and submitting tasks is automated, from an operations perspective they are almost the same. As there is a constant trickle of jobs, then you are free to decide which mode you wish to follow. Do you prefer to continuously monitor the output of tasks or the message boards? |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 874,807 RAC: 1,175 |
Do you prefer to continuously monitor the output of tasks or the message boards? I'm around here mostly once a day for News and other posts on the message boards, but it's fine to always have some jobs in the pipeline. For you less work when there should be tested something, like your new CMS-version with the updated CVMFS proxy configuration at the moment. I'll test that tomorrow, cause this night I want to shutdown my machine and longer suspensions are still evil for the CMS-application. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
So, is the CMS-project now completly dead?(dev and production) |
Send message Joined: 12 Sep 14 Posts: 65 Credit: 544 RAC: 0 |
So, is the CMS-project now completly dead?(dev and production) No, the CMS people are still trying to get their production job submission working. Don't know how hard it is or how hard they are working - vacation time... |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 |
Thanks for the information. The project has not been working correctly/no tasks for a long time . I believed, it is dead. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 874,807 RAC: 1,175 |
So, is the CMS-project now completly dead?(dev and production) After a well-deserved vacation CMS is back on the dev-server: https://lhcathomedev.cern.ch/lhcathome-dev/cms_job.php, Although so far all (6) returned tasks have failed. I see a cmsRun busy over an hour running almost 100%, but as before no output in running.log or stdout.log. |
Send message Joined: 12 Sep 14 Posts: 1069 Credit: 334,882 RAC: 0 |
Thanks, it may take a few iterations for this to work correctly. The main change is that the control of the job workflows is moving to the CMS central operations team. |
Send message Joined: 28 Jul 16 Posts: 485 Credit: 394,839 RAC: 0 |
... it may take a few iterations for this to work correctly. Will there be a note here when the changes are implemented and should be tested? |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 874,807 RAC: 1,175 |
Thanks, it may take a few iterations for this to work correctly. The main change is that the control of the job workflows is moving to the CMS central operations team. After 1 hour VM uptime, I saw this. One of the iterations to change? 11/16/18 12:51:30 PERMISSION DENIED to gsi@unmapped from host 10.0.2.15 for command 448 (GIVE_STATE), access level READ: reason: READ authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 10.0.2.15,10.0.2.15, hostname size = 1, original ip address = 10.0.2.15 11/16/18 12:51:30 DC_AUTHENTICATE: Command not authorized, done! |
Send message Joined: 18 Aug 15 Posts: 14 Credit: 125,335 RAC: 0 |
Got this error message on a CMS test task this morning. 11/17/2018 7:24:24 AM | lhcathome-dev | Aborting task CMS_2887042_1542275487.030765_0: exceeded disk limit: 8435.33MB > 7629.39MB Task had been suspended overnight for a PC shutdown. Task aborted after startup this morning. Please let me know if you need more information. |
©2025 CERN