41) Message boards : CMS Application : Dip? (Message 4902)
Posted 12 May 2017 by Rasputin42
Post:
Things don't always fail on Friday afternoon, but when they do there's a higher probability of no-one being around to fix it.



Understood.

Thanks,Ivan.
42) Message boards : CMS Application : Dip? (Message 4900)
Posted 12 May 2017 by Rasputin42
Post:
No, it's Friday night. :-( The WMAgent is down


Is this some anticipated (known to reoccur) incident, that nobody can/want to fix?

Maybe some scheduled maintenance should be implemented. Then, at least, these down-times occur at a known time, rather than happening at more or less random intervals.

Similar to seti downtime every Tuesday.
43) Message boards : CMS Application : Dip? (Message 4898)
Posted 12 May 2017 by Rasputin42
Post:
Cannot get new jobs?
Just a glitch?
44) Message boards : CMS Application : Dip? (Message 4894)
Posted 5 May 2017 by Rasputin42
Post:
Drop in the activity graph.


This time all graphs.
Still a dashboard issue?
45) Message boards : CMS Application : Dip? (Message 4859)
Posted 26 Apr 2017 by Rasputin42
Post:
WMAgent has died again. Set no new tasks if you can...



Déjà vu.
46) Message boards : Theory Application : A new 32bit image is available (Message 4850)
Posted 20 Apr 2017 by Rasputin42
Post:
I am running 32-bit tasks at lhc@home(no VT-x).

A LOT of time is wasted by tasks hitting the 18h mark, therefore not finishing the last job,loosing at least 6h of computing time.
(This also applies to 64-bit tasks)

About 3 out of 12 tasks are ending that way.(in my case)

This issue has been known for lots of months, yet nothing appears to have happened, to address this.

Is there any intention to make improvements or is admin happy, that it "kind of works good enough"?
47) Message boards : CMS Application : CMS@Home downtime pencilled in for Thursday (Message 4845)
Posted 19 Apr 2017 by Rasputin42
Post:
Thanks for the notice.
What is the estimate, when jobs will be available again?
48) Message boards : CMS Application : Dip? (Message 4840)
Posted 11 Apr 2017 by Rasputin42
Post:
Please set No New Tasks if you can.


Too late!
This REALLY needs too be fixed.
It happened 3 or more times.

I looked at the "Task Tracker" and NOTHING has been done for month.
It should be removed.
49) Message boards : Number crunching : Scheduler request failed: HTTP file not found (Message 4831)
Posted 6 Apr 2017 by Rasputin42
Post:
+1
(cms tasks cannot report, no new tasks are downloaded)
50) Message boards : CMS Application : New Version v48.00 (Message 4827)
Posted 4 Apr 2017 by Rasputin42
Post:
What happened here?

The tasks was running for too long. I suspended it and resumed.

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=320004


It reported shortly after that.
51) Message boards : CMS Application : Dip? (Message 4817)
Posted 24 Mar 2017 by Rasputin42
Post:
Thanks,Ivan.

BTW. The running jobs graph is getting more and more "spiky".
52) Message boards : CMS Application : Dip? (Message 4815)
Posted 23 Mar 2017 by Rasputin42
Post:
Here we go again -- something else in WMAgent has died. Set No New Tasks to protect your daily quota. :-(


This is a nice idea---if you catch it, before the quota is used up.

I really think, that needs to be fixed for good.
53) Message boards : CMS Application : New Version v48.00 (Message 4809)
Posted 20 Mar 2017 by Rasputin42
Post:
I have noticed, that sometimes a finished job is listed several times as finished_x.log,finished_x+1.log, finished_x+3.log.
The time difference is about 2 min, so i do not believe, it is uploading again.

Has anyone notices that?
54) Message boards : CMS Application : Dip? (Message 4806)
Posted 18 Mar 2017 by Rasputin42
Post:
Good news.
Is there a way to view the results?
You posted a link, a little while ago, but i could not get get it to work.
55) Message boards : CMS Application : Dip? (Message 4803)
Posted 18 Mar 2017 by Rasputin42
Post:
Thanks for the reply, Ivan.
I just thought, i mention it.
If the boinc-task could be made to not produce an error( send shutdown file to the "shared" folder), when no jobs available, the problem would be solved.
56) Message boards : CMS Application : Dip? (Message 4801)
Posted 18 Mar 2017 by Rasputin42
Post:
No Jobs?


The bad thing about this is, that boinc-tasks error out, ruining the quota.

Therefore, when jobs are available again, the quota is used up and no tasks can be run for another 24h.

Maybe it is a good time to address this issue.
57) Message boards : CMS Application : New Version v48.00 (Message 4797)
Posted 13 Mar 2017 by Rasputin42
Post:
Task only running for 10h.

Why is that?
I stopped it for a few minutes, but should it not run for at least 12h?

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=317641
58) Message boards : CMS Application : New Version v48.00 (Message 4795)
Posted 12 Mar 2017 by Rasputin42
Post:
I just checked the job startup times for a new 4-core task:
1623
1623
1644
1735

If there is a system to that, it escapes me.

Is there?
59) Message boards : CMS Application : New Version v48.00 (Message 4793)
Posted 12 Mar 2017 by Rasputin42
Post:
Thanks, Ivan.

It looks, that the 4-core task is bypassing the host OS.
Network activity from the VM is not recognized, but the router does(of course).

Why that happens on the 4-core, but not the 2-core tasks puzzles me.


It would be really nice, if simultaneous upload of job results could be avoided.

2-core tasks are probably the most efficient, compared to single or 3+core tasks.
60) Message boards : CMS Application : New Version v48.00 (Message 4791)
Posted 12 Mar 2017 by Rasputin42
Post:
I have been running two 2 core tasks for a while.
Then, i tried a 4 core task.
I have noticed, that the total upload is fundamentally less than the 2 core tasks.
(total up/download according to the router traffic for this computer <30MB for uploading 4 jobs)

1)Has anyone noticed that.
2)Is it actually producing valid results? https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=317572
3)Why does this only happen with > 2 core (have not tested 3 core tasks)

According to boinc, the result is valid.


Previous 20 · Next 20


©2024 CERN