Message boards : CMS Application : No Tasks
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 5277 - Posted: 16 Dec 2017, 12:59:07 UTC - in response to Message 5276.  

Why is there a queue of 100 on LHCb and theory on the dev project?


The queue length of 100 is good for the dev-project, as admin does not check very often.

Maybe there are so few users on CMS, because it often fails.

If there is nothing to test, they should should shut down the dev project and make people go to the production project.
ID: 5277 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,718
RAC: 266
Message 5278 - Posted: 18 Dec 2017, 9:09:39 UTC - in response to Message 5275.  

The queue is down to 10.
Maybe it is time to intervene?

Seems I was wrong a few days ago; the queue was manually filled to 100. Now we have the new WMAgent running again, we're using automatic task creation again with a goal of 10 tasks waiting. That's usually enough, although if it were more I wouldn't have to check as much. :-)
ID: 5278 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 328,405
RAC: 158
Message 5279 - Posted: 18 Dec 2017, 14:33:09 UTC - in response to Message 5277.  
Last modified: 18 Dec 2017, 14:33:33 UTC

If there is nothing to test, they should should shut down the dev project and make people go to the production project.


This I am happy to discuss. In general, the app is identical to production but any changes go here first. If everyone thinks that we should only have tasks when were are testing and update then was could do that.
ID: 5279 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 666
Credit: 1,807,614
RAC: 2,394
Message 5280 - Posted: 18 Dec 2017, 15:29:56 UTC - in response to Message 5279.  

It's better to test with 10 User here than with thousands in production!
Last week upload-Server crash in production was a good example what can be.
ID: 5280 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,876,718
RAC: 266
Message 5281 - Posted: 18 Dec 2017, 22:03:20 UTC - in response to Message 5280.  

It's better to test with 10 User here than with thousands in production!
Last week upload-Server crash in production was a good example what can be.

In general, I agree, though we should have some trickle-through and not a static situation.
For example, Laurence, what is our experience with multi-core/job VMs and are they ready to be deployed to production?
ID: 5281 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 5283 - Posted: 19 Dec 2017, 20:25:01 UTC - in response to Message 5281.  
Last modified: 19 Dec 2017, 20:25:49 UTC

...if it were more I wouldn't have to check as much


That is the point.
It would give you more time to react, if no new tasks are generated, for whatever reason.
Otherwise, volunteers computers will go idle unnecessarily.
ID: 5283 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 5284 - Posted: 19 Dec 2017, 20:38:13 UTC - in response to Message 5279.  

If there is nothing to crunch here, then i would not check that often, if there was anything new to test here.
Therefore, doing "production" like crunching here would be good, however this would make you having to check this site every once in a while, even you are not testing anything new.
It is up to you--having to maintain this test-project to keep crunchers, when you need to test new things --or-- not running "production" tasks here and risking, that there are not any(or too few) volunteers to test things, when you need them to.
ID: 5284 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 328,405
RAC: 158
Message 5285 - Posted: 20 Dec 2017, 10:56:59 UTC - in response to Message 5284.  

I have thought about this and will not do anything. There a two modes of operation.

The first is having a continuous trickle of jobs to keep the system alive, constantly monitor and when things break check the boards to see if there have been any updates. The other is to announce a change, submit a bunch of jobs and check that everything is working. As we have to in both cases make an announcement, the jobs are real (i.e useful) jobs and submitting tasks is automated, from an operations perspective they are almost the same. As there is a constant trickle of jobs, then you are free to decide which mode you wish to follow. Do you prefer to continuously monitor the output of tasks or the message boards?
ID: 5285 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1181
Credit: 815,336
RAC: 238
Message 5287 - Posted: 20 Dec 2017, 17:37:49 UTC - in response to Message 5285.  

Do you prefer to continuously monitor the output of tasks or the message boards?

I'm around here mostly once a day for News and other posts on the message boards, but it's fine to always have some jobs in the pipeline.
For you less work when there should be tested something, like your new CMS-version with the updated CVMFS proxy configuration at the moment.
I'll test that tomorrow, cause this night I want to shutdown my machine and longer suspensions are still evil for the CMS-application.
ID: 5287 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 5478 - Posted: 10 Aug 2018, 17:57:23 UTC

So, is the CMS-project now completly dead?(dev and production)
ID: 5478 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ben Segal
Volunteer moderator
Volunteer developer
Volunteer tester

Send message
Joined: 12 Sep 14
Posts: 65
Credit: 544
RAC: 0
Message 5479 - Posted: 11 Aug 2018, 11:56:06 UTC - in response to Message 5478.  

So, is the CMS-project now completly dead?(dev and production)

No, the CMS people are still trying to get their production job submission working. Don't know how hard it is or how hard they are working - vacation time...
ID: 5479 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 5480 - Posted: 11 Aug 2018, 18:11:21 UTC - in response to Message 5479.  

Thanks for the information.
The project has not been working correctly/no tasks for a long time .
I believed, it is dead.
ID: 5480 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1181
Credit: 815,336
RAC: 238
Message 5655 - Posted: 16 Nov 2018, 12:06:10 UTC - in response to Message 5479.  

So, is the CMS-project now completly dead?(dev and production)

No, the CMS people are still trying to get their production job submission working. Don't know how hard it is or how hard they are working - vacation time...

After a well-deserved vacation CMS is back on the dev-server: https://lhcathomedev.cern.ch/lhcathome-dev/cms_job.php,
Although so far all (6) returned tasks have failed.
I see a cmsRun busy over an hour running almost 100%, but as before no output in running.log or stdout.log.
ID: 5655 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 328,405
RAC: 158
Message 5658 - Posted: 16 Nov 2018, 12:47:30 UTC - in response to Message 5655.  
Last modified: 16 Nov 2018, 12:47:46 UTC

Thanks, it may take a few iterations for this to work correctly. The main change is that the control of the job workflows is moving to the CMS central operations team.
ID: 5658 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 475
Credit: 389,411
RAC: 34
Message 5660 - Posted: 16 Nov 2018, 13:05:44 UTC - in response to Message 5658.  

... it may take a few iterations for this to work correctly.

Will there be a note here when the changes are implemented and should be tested?
ID: 5660 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1181
Credit: 815,336
RAC: 238
Message 5662 - Posted: 16 Nov 2018, 14:16:40 UTC - in response to Message 5658.  

Thanks, it may take a few iterations for this to work correctly. The main change is that the control of the job workflows is moving to the CMS central operations team.

After 1 hour VM uptime, I saw this. One of the iterations to change?

11/16/18 12:51:30 PERMISSION DENIED to gsi@unmapped from host 10.0.2.15 for command 448 (GIVE_STATE), access level READ: reason: READ authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 10.0.2.15,10.0.2.15, hostname size = 1, original ip address = 10.0.2.15
11/16/18 12:51:30 DC_AUTHENTICATE: Command not authorized, done!
ID: 5662 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
captainjack

Send message
Joined: 18 Aug 15
Posts: 14
Credit: 125,335
RAC: 146
Message 5666 - Posted: 17 Nov 2018, 13:46:40 UTC

Got this error message on a CMS test task this morning.

11/17/2018 7:24:24 AM | lhcathome-dev | Aborting task CMS_2887042_1542275487.030765_0: exceeded disk limit: 8435.33MB > 7629.39MB

Task had been suspended overnight for a PC shutdown. Task aborted after startup this morning.

Please let me know if you need more information.
ID: 5666 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : CMS Application : No Tasks


©2024 CERN