Message boards : Number crunching : CMS doesn't crunch
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 547 - Posted: 12 Aug 2015, 18:25:04 UTC

Hi, as already mentioned in the News-Thread:

Okay, I checked around and got the Feeling, CMS should work and do something usefull now ?

Re-Enabled CMS, downloaded latest files and started. So far, so good.

Inside the VM nothing usefull seems to happen. I can go to console3, but most of the time it is idle, the %CPU is 0.3 up to 1.3 with TOP or INIT

Runtime is now 1 hour and 15 minutes, but nothing real is Happening.

If I go through the consoles 1 - 10, I see nothing that geives me the Feeling of crunching.

What can I do / What shall I do ?

EDIT: Windows 7, VB 4.3.12


I played a bit around, cancelled the running Job after 6 hours (it was still doing really nothing), aborted the Task and fetched a new WU.

Booting the new VM, I could make following snapshot:



Don't know if this really matters, but until now (12 minutes) the CMS-VM seems still not to be crunching anything
ID: 547 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 548 - Posted: 12 Aug 2015, 18:32:01 UTC - in response to Message 547.  

By the way, I'm behind a strong Firewall and you have set up a new config / technic to use. Does it work within these mentioned ports und IPs from http://lhcathome.web.cern.ch/faq



This project uses the following ports:

•Jabber messaging which needs XMPP (port 5222),
•Chirp (port 9094) for moving data in and out,
•HTTP (port 80) and
•HTTPS (port 443)

And if you want, you can grant access to the entire CERN network:
•137.138.0.0/16
•128.141.0.0/16
•128.142.0.0/16

No other ports are used by the project.

ID: 548 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1126
Credit: 7,861,186
RAC: 9
Message 551 - Posted: 12 Aug 2015, 20:38:49 UTC - in response to Message 547.  

Are you getting tasks since I submitted more jobs? The screenshot you posted is familiar to me, it does continue after a while. There are lots of delays in the startup processes, mainly to do with downloading files, filling the cvmfs cache, etc. When you start throwing in time-out delays before switching to secondary servers, etc., it is a bit glacial at times
ID: 551 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 552 - Posted: 12 Aug 2015, 21:58:44 UTC

Just checked but it looks as if nothing is crunching

For now I have to stop tomorrow we can go on
ID: 552 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1126
Credit: 7,861,186
RAC: 9
Message 553 - Posted: 12 Aug 2015, 22:05:00 UTC - in response to Message 552.  

Well, something seems to be running Condor jobs, but just a few of them so far.
ID: 553 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1126
Credit: 7,861,186
RAC: 9
Message 554 - Posted: 13 Aug 2015, 7:22:07 UTC - in response to Message 553.  

Well, something seems to be running Condor jobs, but just a few of them so far.

OK, they've been running overnight:



At the moment, I think there are only 45 Condor slots available; as well each job is retried twice after failure. For a more detailed analysis, here's a snapshot from CMS Dashboard (.pdf).
ID: 554 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Phil

Send message
Joined: 9 Apr 15
Posts: 57
Credit: 230,221
RAC: 0
Message 555 - Posted: 13 Aug 2015, 10:56:41 UTC - in response to Message 553.  
Last modified: 13 Aug 2015, 10:59:45 UTC

Well, something seems to be running Condor jobs, but just a few of them so far.

I've got 2 machines today (Thursday) running cmsRun jobs.
Took about 15 mins from bootup and now doing jobs taking 30-50 mins.
ID: 555 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 556 - Posted: 13 Aug 2015, 15:49:01 UTC

Okay, so far I'm out of ideas :-(

I have updated my VB to 4.3.30, aborted the old CMS-dev-WU, downloaded a new one but it keeps to be still the same. I gave it time up to 30 / 40 minutes to start crunching but nothing happens.

I have opened a big hole in my Firewall to check if the Firewall is blocking but nothing helps :-(

Are there still Jobs in the Queue ?

On which Screen ALT/F? should I see a crunching WU ?

ALT/F1 says: Starting vmcontext_epilog
............ bootlogd: no process killed

ALT/F2 says: sh
............ CMSJobAgent.sh
............ python
............ wget

ALT/F3 says: normal Task overview

ALT/F4 /F5 : blank Screen

ATL/F6 says: welcome to CERN Virtual Machine Version 3.3.0.20
............ ...
............ localhost Login

ALT/F7 - F9: nothing

ALT/F10 says: welcome to CERN Virtual Machine Version 3.3.0.20
............ ...
............ Instance pairing pin:
ID: 556 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1126
Credit: 7,861,186
RAC: 9
Message 558 - Posted: 13 Aug 2015, 21:23:00 UTC - in response to Message 556.  

Okay, so far I'm out of ideas :-(
Are there still Jobs in the Queue ?
Last time I looked there were 999 or so. However, we only have about 45 slots to serve them out and there are about 90 machines asking for jobs...
On which Screen ALT/F? should I see a crunching WU ?
ALT/F3 says: normal Task overview
That would usually be top; if you type 'u' then 'boinc' you should reduce the clutter to just crunching jobs, if any...
ALT/F6 says: welcome to CERN Virtual Machine Version 3.3.0.20
............ ...
............ localhost Login
You can log-in there if you can suss the password... :-)
ID: 558 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Phil

Send message
Joined: 9 Apr 15
Posts: 57
Credit: 230,221
RAC: 0
Message 559 - Posted: 13 Aug 2015, 21:53:13 UTC - in response to Message 558.  
Last modified: 13 Aug 2015, 21:54:58 UTC

However, we only have about 45 slots to serve them out and there are about 90 machines asking for jobs...

My PS display shows either Nothing, or glidin_startup with "sleep" for the past few hours, and loadav of 0.00, 0.00, 0.01

[edit] Wow, as I type its found another job, did you kick something?[/edit]
ID: 559 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1126
Credit: 7,861,186
RAC: 9
Message 560 - Posted: 13 Aug 2015, 22:19:55 UTC - in response to Message 559.  
Last modified: 13 Aug 2015, 22:22:36 UTC

[edit]Wow, as I type its found another job, did you kick something?[/edit]
No, you probably got lucky and struck a ready slot. I'll ask Andrew tomorrow if we can increase them. As I'm sure you know, we're really still in an alpha, or at least pre-beta, stage, and didn't expect this level of interest before we officially went public.
I am pleased at the progress the development team has made lately though. I have a meeting in four weeks where I'd like to present significant results from CMS@Home compared to normal GRID jobs. Hopefully, summer holidays permitting, we might make that milestone.
We must also look at the "Server Status" page -- the job backlog there bears no resemblance to what I believe is the actual situation. Historical baggage not updated perhaps?
ID: 560 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 561 - Posted: 13 Aug 2015, 23:15:44 UTC

Okay, on my Laptop, I installed VB 4.3.28 and from home it worked immediatly, I could crunch my first real CMS-WU.

On my Desktop in the Office, I resetted the CMS-Project, but that didn't help. So I can track it down to two Points:

VirtualBox 4.3.30
Firewall

Tomorrow I will first try with VB 4.3.28
ID: 561 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1126
Credit: 7,861,186
RAC: 9
Message 562 - Posted: 14 Aug 2015, 10:19:13 UTC - in response to Message 560.  

you probably got lucky and struck a ready slot. I'll ask Andrew tomorrow if we can increase them. As I'm sure you know, we're really still in an alpha, or at least pre-beta, stage, and didn't expect this level of interest before we officially went public.

Looks like I'd misunderstood -- the number of job slots isn't fixed, so since running jobs < active tasks (according to ServerStatus) there must be other reasons for tasks to sit idle. :-(
ID: 562 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 563 - Posted: 14 Aug 2015, 10:52:28 UTC

Okay, I think I finally got it !

Make your guess, was it VB 4.3.30 or Firewall ?

Right, it has been a spread of IPs and Ports that are not announced by the official network-Statement from CERN; I will make a separte thread about this.

I had to open several ports and IPs in the Firewall, but now it Looks as if both boxes are doing fine.

Now I need an Admin form CMS, that can check, if my boxes are doing really fine now and you get back all you need from them ?

Thanks in advance

By the way: It would be nice to get a link where we can check this ourselves like "MCPLOTS Stats" from vLHC
ID: 563 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1126
Credit: 7,861,186
RAC: 9
Message 567 - Posted: 14 Aug 2015, 13:20:54 UTC - in response to Message 563.  

Now I need an Admin form CMS, that can check, if my boxes are doing really fine now and you get back all you need from them ?
We're working on that; I'm told it will be "real soon now".

By the way: It would be nice to get a link where we can check this ourselves like "MCPLOTS Stats" from vLHC
I guess we'll work on that too, but let's take things one step at a time.
ID: 567 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 568 - Posted: 14 Aug 2015, 13:27:24 UTC - in response to Message 567.  

Now I need an Admin form CMS, that can check, if my boxes are doing really fine now and you get back all you need from them ?
We're working on that; I'm told it will be "real soon now".

Could you check this once for me ? I would like to ensure that I found all needed Ports and IPs
ID: 568 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1126
Credit: 7,861,186
RAC: 9
Message 569 - Posted: 14 Aug 2015, 16:58:03 UTC - in response to Message 568.  

Now I need an Admin form CMS, that can check, if my boxes are doing really fine now and you get back all you need from them ?
We're working on that; I'm told it will be "real soon now".

Could you check this once for me ? I would like to ensure that I found all needed Ports and IPs

The only way I know (at the moment) is to wait for your jobs to finish and look at the stderr; Laurence may know some magic incantations to identify jobs-in-progress. If your "top" window is showing cmsRun at ~100% CPU for 15 minutes or so (perhaps more depending on what jobs are in the queue) and then nothing for ~10 minutes, then running again, you are at least getting jobs and running them successfully.
ID: 569 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 570 - Posted: 14 Aug 2015, 19:56:35 UTC

Okay, it seems as if we are out of Jobs now. Both machines sit here idle

------------------------------------------

One more for the ToDo-List: I suspended my boinc on the Laptop, shut it down and took it again with me home. At home I switched it on, reactivated BOINC but the VM seemed to have gone into Limbo. All Screens only black or no reaction on ALT/Fx

Booted it with Head, send CTRL-ALT-DEL and "Switch off via APC"; don't know which comand did it, but the VM shut itself down.

Afterwards I could use it as normal in BOINC
ID: 570 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1167
Credit: 785,507
RAC: 664
Message 571 - Posted: 14 Aug 2015, 20:04:41 UTC - in response to Message 568.  

Now I need an Admin form CMS, that can check, if my boxes are doing really fine now and you get back all you need from them ?
We're working on that; I'm told it will be "real soon now".

Could you check this once for me ? I would like to ensure that I found all needed Ports and IPs

Maybe you'll find some information in the CMS@home machine logs, accessible by BOINC Manager.
Highlight the CMS-dev task and press the 'Show graphics' button on the left.
ID: 571 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 29 May 15
Posts: 147
Credit: 2,842,484
RAC: 0
Message 572 - Posted: 14 Aug 2015, 20:05:13 UTC

Sorry, but one more:

The VMs seems to differ from my Host to Host.

My Laptop says in ALT/F1:

starting VM_context Epilog ...
bootlogd: no process killed
started CMS Job Agent

My Desktop says only in ALT/F1:
starting VM_context Epilog ...
bootlogd: no process killed

My Laptop in ALT/F4 has a Fullscreen Protokoll, but my Desktop shows nothing on ALT/F4
ID: 572 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : CMS doesn't crunch


©2024 CERN