Message boards : CMS Application : Here we go again.....
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 736
Credit: 11,558,539
RAC: 1,940
Message 5428 - Posted: 20 May 2018, 3:56:57 UTC

2018-05-19 20:09:02 (6672): Guest Log: [INFO] Requesting an X509 credential from LHC@home
: Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev
Guest Log: [INFO] Running the fast benchmark.
Guest Log: [INFO] Machine performance 11.67 HEPSEC06
: Guest Log: [INFO] CMS application starting. Check log files.
: Guest Log: [DEBUG] HTCondor ping
: Guest Log: [DEBUG] 0
Guest Log: [ERROR] Condor exited after 885s without running a job.
: Guest Log: [INFO] Shutting Down.
: VM Completion File Detected.
: VM Completion Message: Condor exited after 885s without running a job.

I will try a couple more and if it just keeps doing this I will as usual switch back to Theory tasks.

(I have 42 over at LHC that I have not started yet but they will probably do the same thing)

You would think the person that sends these out would run a couple first
Mad Scientist For Life
ID: 5428 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 20 Jun 17
Posts: 25
Credit: 2,940,586
RAC: 2,287
Message 5429 - Posted: 20 May 2018, 18:46:33 UTC - in response to Message 5428.  

You would think the person that sends these out would run a couple first


Would think they wouldn't generate tasks w/o sub tasks.
ID: 5429 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 736
Credit: 11,558,539
RAC: 1,940
Message 5432 - Posted: 2 Jun 2018, 8:17:35 UTC

https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=1054

I almost tried running some here again and glad I didn't (but got about 20+ of these *no jobs* over at LHC)

I am glad to see it also is happening with linux
ID: 5432 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 20 Jun 17
Posts: 25
Credit: 2,940,586
RAC: 2,287
Message 5535 - Posted: 22 Sep 2018, 18:59:15 UTC

2018-09-22 08:42:29 (5764): Guest Log: [ERROR] Condor exited after 30947s without running a job.
2018-09-22 08:42:29 (5764): Guest Log: [INFO] Shutting Down.
2018-09-22 08:42:29 (5764): VM Completion File Detected.
2018-09-22 08:42:29 (5764): VM Completion Message: Condor exited after 30947s without running a job.

How does it go so long and use more than 1 CPU core w/o a job? Why are tasks available when there are no jobs? I fail to see after years and years why this is still an issue.
ID: 5535 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 736
Credit: 11,558,539
RAC: 1,940
Message 5536 - Posted: 22 Sep 2018, 19:31:05 UTC - in response to Message 5535.  

Well mmonnin as usual even more server problems happen on the weekend so nothing will get done and we have the same problem over at LHC

All the Theory tasks are gone over there and I got close to 70 of those tasks that do nothing but end after about 23 minutes with the usual [ERROR] Condor exited after 759s without running a job.

And I always check another members tasks who runs lots of them like I do to make sure it is all of us wasting computer time.

Not to mention the new version over there give only about 1/6th the credit as the previous version so that usually makes people switch to something else.

And here I am only running one LHCb at a time since here in the western hemisphere we don't get to d/l a new and almost 1GB vdi 5 different times on 10 different computers without paying even more money to the ISP and the light bill company.

BUT so far the new version of LHCb is working but I am only half way finished with the 2nd one for this pc.

And of course we are the only ones that watch what is going on and get no information.......until maybe monday.

Maybe it was a bird flying over the LHC again and dropping a piece of bread on the power supply box again
Mad Scientist For Life
ID: 5536 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 20 Jun 17
Posts: 25
Credit: 2,940,586
RAC: 2,287
Message 5537 - Posted: 23 Sep 2018, 4:51:46 UTC

Server status said 1 user for CMS and now its 2 with me so not much to compare to.

I thought CMS ended in like 10min if there was no condor connection. And if there wasn't there would be no CPU time during those 10min. These actually had more run CPU time than run time and welllll past 10min.

I'm running LHCb here. They just are using a hefty 6GB of RAM but are completing.
ID: 5537 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,719,912
RAC: 3,195
Message 5538 - Posted: 23 Sep 2018, 5:15:46 UTC
Last modified: 23 Sep 2018, 5:17:35 UTC

CMS-task finished with Condor-Error:
2018-09-22 22:06:48 (9992): Status Report: Job Duration: '64800.000000'
2018-09-22 22:06:48 (9992): Status Report: Elapsed Time: '36000.000000'
2018-09-22 22:06:48 (9992): Status Report: CPU Time: '67702.484375'
2018-09-22 23:33:31 (9992): Guest Log: [ERROR] Condor exited after 40890s without running a job.
2018-09-22 23:33:31 (9992): Guest Log: [INFO] Shutting Down.
This task:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2414834
Is it possible, that Condor don't know, that a CMS-Job was running?
OpenHTC.io is missing!
ID: 5538 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 736
Credit: 11,558,539
RAC: 1,940
Message 5539 - Posted: 23 Sep 2018, 8:24:44 UTC
Last modified: 23 Sep 2018, 8:26:08 UTC

Yes the only thing working for me for Cern right now is the LHCb here (but not at LHC)

But I only have this running on one pc since I'm not going to d/l that vdi again on the other pc's for a while.

And as you mentioned you need more that 8GB ram to run more than one of these multi-core tasks and this one only has 8GB so one at a time. (they don't use much of the CPU)

My other 8-core pc's have 16 - 24GB ram but they last were running the previous versions of LHCb and as you know we had 4 different version changes in 2 weeks.

This is a rare time when I have 7 of my computers shut down.
Mad Scientist For Life
ID: 5539 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 467
Credit: 389,411
RAC: 503
Message 5540 - Posted: 23 Sep 2018, 11:39:49 UTC - in response to Message 5539.  

... they don't use much of the CPU ...

Well, during job processing LHCb should use 100 % of 1 core (per internal slot).
Usually for around 80 min on my hosts.
Each job should produce an intermediate result of about 60-80 MB that should be uploaded immediately to lbboinc01.cern.ch, TCP port 9148.
During this upload the CPU usage drops to nearly idle.

If your hosts are idle and you don't notice an upload the jobs got stuck either during job setup or during stageout.
I suspect it's not an error on the volunteer's side but a wrong/missing connection to a backend system at CERN.
ID: 5540 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 20 Jun 17
Posts: 25
Credit: 2,940,586
RAC: 2,287
Message 5541 - Posted: 23 Sep 2018, 21:43:38 UTC - in response to Message 5540.  

... they don't use much of the CPU ...

Well, during job processing LHCb should use 100 % of 1 core (per internal slot).
Usually for around 80 min on my hosts.
Each job should produce an intermediate result of about 60-80 MB that should be uploaded immediately to lbboinc01.cern.ch, TCP port 9148.
During this upload the CPU usage drops to nearly idle.

If your hosts are idle and you don't notice an upload the jobs got stuck either during job setup or during stageout.
I suspect it's not an error on the volunteer's side but a wrong/missing connection to a backend system at CERN.


It's using about half of a CPU core on my 3570k and run for like 12-13 hours. Much longer than 80min.
ID: 5541 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 736
Credit: 11,558,539
RAC: 1,940
Message 5542 - Posted: 23 Sep 2018, 22:34:07 UTC

I am just watching the Task Manager as it is running a 2-core task

It is at 62% running (11 hours) and the Task Managers says it uses 5.5GB ram and CPU running at 35% and the CPU has been at that % every time I take a look.
ID: 5542 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : CMS Application : Here we go again.....


©2024 CERN