61) Message boards : General Discussion : Scheduler Change (Message 6860)
Posted 27 Nov 2019 by Profile Laurence
Post:
Thanks for testing. I am going to update the production sched as I think it is an improvement over what we have now.
62) Message boards : General Discussion : Scheduler Change (Message 6856)
Posted 27 Nov 2019 by Profile Laurence
Post:
There is still a small issue left to be solved. If I have 3 CPUs and have set Max 2 CPUs. The request will come back with 2 2 CPU jobs whereas I would like 1 2 CPU job and 1 1C PU job. If these are separate requests everything is fine. Anyway, I have opened an issue on the topic to hopefully get input from others.
63) Message boards : General Discussion : Scheduler Change (Message 6855)
Posted 27 Nov 2019 by Profile Laurence
Post:
I think I have found a solution. Please can you see if the settings work as expected. You should get the following:

Max 2 CPU, Max 1 Job => 2 threaded job 
Max 1 CPU, Max 2 Job => 2 single threaded jobs
64) Message boards : General Discussion : Scheduler Change (Message 6851)
Posted 26 Nov 2019 by Profile Laurence
Post:
(not sure where MAX_CPUS is set; looks like a global setting)

In the header file.

if (project_prefs.max_cpus) {
    if (n > project_prefs.max_cpus) {
        n = project_prefs.max_cpus;
    }

This code is here, without any doubt, but why here?
Guess a user wants to run 2-core ATLAS tasks and sets the web preferences to "2 CPUs".
This code will make every brand new 256-core-amdintel_ripper appear as a Fred Flintstone's 2 core machine.


These lines have now been deleted. Will look where they should be put.
65) Message boards : General Discussion : Scheduler Change (Message 6850)
Posted 26 Nov 2019 by Profile Laurence
Post:
Setting
Max # jobs 3
Max # CPUs 2

no app_config in use - I got a 3rd task and avg_ncpus stays 1, creating a single core VM.
Requesting new tasks, I get: lhcathome-dev 26 Nov 19:06:18 This computer has reached a limit on tasks in progress
When the max cpus was misused as task-limit you got 'No tasks available'

This is what I would expect for Theory as it is now a single core app. Try CMS which is multi-core.
66) Message boards : General Discussion : Scheduler Change (Message 6846)
Posted 26 Nov 2019 by Profile Laurence
Post:
I have disabled the Max # CPUs so setting it should have no affect.
67) Message boards : General Discussion : Scheduler Change (Message 6845)
Posted 26 Nov 2019 by Profile Laurence
Post:
As far as I understand the scheduler code, the issue is that the project preferences setting is limiting the ncpus and that this value is used to set the number of threads. We probably don't want to touch ncpus and just set the number of threads.
68) Message boards : Theory Application : New version 5.00 (Message 6791)
Posted 29 Oct 2019 by Profile Laurence
Post:
I am planning to put v5.18 on the production server tomorrow. Any objections?

I will have to do this later as the 32bit app also needs to be updated.
69) Message boards : Theory Application : New version 5.00 (Message 6788)
Posted 28 Oct 2019 by Profile Laurence
Post:
I am planning to put v5.18 on the production server tomorrow. Any objections?
70) Message boards : Theory Application : New version 5.00 (Message 6737)
Posted 2 Oct 2019 by Profile Laurence
Post:
The changes are included in v5.18.
71) Message boards : Theory Application : New version 5.00 (Message 6736)
Posted 2 Oct 2019 by Profile Laurence
Post:
A new version (v5.15) is available that hopefully fixes all outstanding issues.
This job info (example from a task, where we still had several jobs in 1 task until 12 hours elapsed time was past):
2019-07-02 16:33:11 (372): Guest Log: [INFO] New Job Starting in slot1
2019-07-02 16:33:11 (372): Guest Log: [INFO] Condor JobID:  502264.4 in slot1
2019-07-02 16:33:16 (372): Guest Log: [INFO] MCPlots JobID: 50563426 in slot1
2019-07-02 16:33:22 (372): Guest Log: [INFO] ===> [runRivet] Tue Jul  2 16:33:08 CEST 2019 [boinc ee zhad 206 - - pythia6 6.427 358 100000 76]
2019-07-02 16:44:39 (372): Guest Log: [INFO] Job finished in slot1 with 0.
did not make it in the code so far (or it must be in v5.16)
The line ===> [runRivet] etc etc would suffice, when added to stderr.txt directly after runc has started.

Fixed in v5.18.
72) Message boards : Theory Application : New version 5.00 (Message 6733)
Posted 1 Oct 2019 by Profile Laurence
Post:
I have just defined the variable. Let's see how far that gets us.
73) Message boards : Theory Application : New version 5.00 (Message 6731)
Posted 1 Oct 2019 by Profile Laurence
Post:
To solve the issue this section from https://gitlab.cern.ch/vc/vm/blob/master/sbin/bootstrap must be inserted before line 27.

If we are just going to create that file, do we need the part that downloads it or should we just define the variable?
74) Message boards : Theory Application : New version 5.00 (Message 6729)
Posted 1 Oct 2019 by Profile Laurence
Post:
A new version (v5.15) is available that hopefully fixes all outstanding issues. Local proxy use should also be supported as well as OpenHTC.io.
75) Message boards : Theory Application : New version 5.00 (Message 6728)
Posted 1 Oct 2019 by Profile Laurence
Post:
Is it the answer to my question?
How can I extend job duration?

This will be fixed in the next version.
76) Message boards : Theory Application : New version 5.00 (Message 6726)
Posted 1 Oct 2019 by Profile Laurence
Post:
Some errors that seems not to influence the correct working of the today's new vdi

Am working on it.
77) Message boards : Theory Application : New version 5.00 (Message 6725)
Posted 1 Oct 2019 by Profile Laurence
Post:
Above task needed 23 hours run time and thus exceeding the default maximum job duration of 18 hours.
I have increased this to ten days.
In the last Theory_2019_09_30.xml I received, the job_duration still is 64800 (18 hours).
I can change it myself and will do, if I have a long runner, but it would be good to extend the job duration server site.
Maybe you changed it, but is not sent to the clients.

It thought I changed it but the changed file wasn't taken.
78) Message boards : Theory Application : New version 5.00 (Message 6719)
Posted 30 Sep 2019 by Profile Laurence
Post:
... causes top to be stopped if a user accidentally hits a key at the top console.

If possible, be so kind as to use a tty number that is not related to one of the usual function keys (F1..F12).
If /dev/tty7 is used and a user hits ALT-F7 the top process at /dev/tty3 will stop.

I will fix this when I address the proxy functionality.
79) Message boards : Theory Application : New version 5.00 (Message 6717)
Posted 30 Sep 2019 by Profile Laurence
Post:
top has disappeared on ALT-F3

Right.
The recent bootstrap redirects stdin from /dev/null.
I also tested that and a couple of other sources like files or fifos.
None of them worked except /dev/ttyn.
Hence I suggested to use a "fake" tty, e.g. /dev/tty13.
# doesn't work
bash -c "top -d 5 >/dev/tty3 2>/dev/null </dev/null" &

# might work, hence should be tested.
bash -c "top -d 5 >/dev/tty3 2>/dev/null </dev/tty13" &


Fixed in v5.11

Thanks.
80) Message boards : Theory Application : New version 5.00 (Message 6715)
Posted 30 Sep 2019 by Profile Laurence
Post:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2827020
runspec=boinc pp jets 13000 180,-,3560 - pythia8 8.235 tune-1 91000 132

Above task needed 23 hours run time and thus exceeding the default maximum job duration of 18 hours.

I have increased this to ten days.


Previous 20 · Next 20


©2024 CERN