Message boards : Number crunching : Issues running jobs
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
OK, starting a new thread for people to report issues with running, and hopefully get feedback from the crew at the coalface as to what might be going on. Currently, the Server Status page reports 100 active tasks; on the Condor machine I see 49 of "my" jobs running, so there must be about fifty tasks that have not picked up a job to run. If your task is one of them*, please feel free to report here and ask for help. * To check, bring up the VM interface (you need the extension pack matching your VirtualBox version to be installed) and go to the "top" display using ALT+F3. You can de-clutter the display considerably by pressing "u" (for user) and then typing "boinc" + carriage return to only display boinc's jobs. If all is well, after 20 minutes or so of starting you should see a task called cmsRun near the top of the display, and after a little while it should be showing close to 100% usage. This will run for some time (the current jobs are a little over-ambitious and may take several hours) before stopping to report results and download a new job, at which point you should see cmsRun working again, until the task times out at around 24 hours. ![]() |
![]() Send message Joined: 20 May 15 Posts: 217 Credit: 6,193,119 RAC: 0 ![]() ![]() |
I can see the one in front of me (on my laptop) is running fine (>95% cpu usage). Do you need me to look on all machines or just if I think there is a problem and report it then ? |
![]() Send message Joined: 29 May 15 Posts: 147 Credit: 2,842,484 RAC: 0 ![]() ![]() |
If your task is one of them*, please feel free to report here and ask for help. Okay, I Need help with this http://boincai05.cern.ch/CMS-dev/forum_thread.php?id=67 As Long as this isn't fixed I can not use my Laptop and I'm afraid I might loose much crunching power by suspending Tasks (and my Network is configured to suspend when normal Tasks ask for CPU-Power) |
Send message Joined: 13 Feb 15 Posts: 1221 Credit: 920,536 RAC: 1,595 ![]() ![]() ![]() |
Currently, the Server Status page reports 100 active tasks; on the Condor machine I see 49 of "my" jobs running, so there must be about fifty tasks that have not picked up a job to run. If your task is one of them*, please feel free to report here and ask for help. I suppose those 100 'active' tasks, you mean, come from CMS-dev Server status page (atm 102 in progress) Those 102 are not 'active' in the way that they are already started on 1 of the BOINC-clients. They could not have started due to lower resource share compared to other BOINC-projects or by a user who have suspended the task. Anyway the client has a whole week to finish the 24 hours BOINC-task. And of course there are users not babysitting their machines and not aware that a started CMS-task isn't doing real work - not using CPU and the VM does not ask/get CMS-jobs. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
I have 3 VboxHeadless.exe processes running in windows.One is close to 100% (of one core) utilization, the other 2 have zero run-time. The task has been running for 20h or so without any interruption. This is no bug as such, but why are these extra 2 processes there? Is there a way to modify/set the amount of RAM the v-box uses?(i tried, but it is locked) Would a higher amount improve performance for the project? |
![]() Send message Joined: 29 May 15 Posts: 147 Credit: 2,842,484 RAC: 0 ![]() ![]() |
I have 3 VboxHeadless.exe processes running in windows.One is close to 100% (of one core) utilization, the other 2 have zero run-time. The task has been running for 20h or so without any interruption. I have 5x vBoxHeadless and I run 1x CMS and 4x Atlas So, I guess the 2 sleeping vBoxHeadless are from other, suspended projects |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
I do not run any other projects with v-box use(nor did i for the past few month). |
Send message Joined: 13 Feb 15 Posts: 1221 Credit: 920,536 RAC: 1,595 ![]() ![]() ![]() |
I have 3 VboxHeadless.exe processes running in windows.One is close to 100% (of one core) utilization, the other 2 have zero run-time. The task has been running for 20h or so without any interruption. It's how Oracle VirtualBox has setup the running of Virtual Machines. 1 VM use 3 processes. The 3rdchild is doing the real work and uses the most CPU. It has to do with better running processes in a more secure sandbox. |
Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0 ![]() |
Thanks for the info. The two other instances have ZERO run-time, so they are doing absolutely nothing. |
![]() ![]() Send message Joined: 12 Sep 14 Posts: 1128 Credit: 339,230 RAC: 10 ![]() |
Yeti, solving the suspend/resume issue is high on our priority list. |
Send message Joined: 13 Feb 15 Posts: 1221 Credit: 920,536 RAC: 1,595 ![]() ![]() ![]() |
The current cmsRun in the VM is running already 11 hours and 20 minutes and consumes the 'normal ~98% of the CPU.' I don't think that's right. No output in the screen ALT+F4 and ALT+F5 and the cmsRun-stdout.log in the Logs is from the very first finished job. |
![]() Send message Joined: 29 May 15 Posts: 147 Credit: 2,842,484 RAC: 0 ![]() ![]() |
The current cmsRun in the VM is running already 11 hours and 20 minutes and consumes the 'normal ~98% of the CPU.' This is normal behaviour on all my Clients; the only way to see if it is crunching is using the ALT F3 Screen and looking for cmsRun |
Send message Joined: 13 Feb 15 Posts: 1221 Credit: 920,536 RAC: 1,595 ![]() ![]() ![]() |
The current cmsRun in the VM is running already 11 hours and 20 minutes and consumes the 'normal ~98% of the CPU.' I think you don't understand or did not read well. I'm looking already 12 hours and 12 minutes to the same cmsRun. 2 days ago I had long running jobs lasting about 3-4 hours and after ivan killed those longer ones, the jobs on my system normally lasted about 35 minutes (the equivalent of 200 records). |
![]() Send message Joined: 29 May 15 Posts: 147 Credit: 2,842,484 RAC: 0 ![]() ![]() |
I think you don't understand or did not read well. I'm looking already 12 hours and 12 minutes to the same cmsRun. Indeed I didn't realize this, sorry |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
The current cmsRun in the VM is running already 11 hours and 20 minutes and consumes the 'normal ~98% of the CPU.' I do see a job in the Condor system that has 0+12:45:50 runtime; nothing else is over 3h, so it does indeed look like something has gone amiss. Probably best to abort it so your CPU goes to better use. ![]() |
Send message Joined: 13 Feb 15 Posts: 1221 Credit: 920,536 RAC: 1,595 ![]() ![]() ![]() |
I do see a job in the Condor system that has 0+12:45:50 runtime; nothing else is over 3h, so it does indeed look like something has gone amiss. Probably best to abort it so your CPU goes to better use. Ok, thanks. Will reboot the VM. |
![]() ![]() Send message Joined: 20 Jan 15 Posts: 1139 Credit: 8,310,612 RAC: 0 ![]() |
The current cmsRun in the VM is running already 11 hours and 20 minutes and consumes the 'normal ~98% of the CPU.' I've just come up with one like that too. When the first job was finishing there was nothing on tty4 and tty5 even though I was watching the stdout file growing as events were processed; when the second job started the cmsRun-stdout.log didn't get overwritten. There was also a failure in the stage-out, but I'll go straight to Laurence with that. [Edit] Firstly, there was no stage-out failure; my browser hadn't refreshed... Secondly, the later jobs which don't appear on the console or refresh the browser display appear to continue to run. I tracked one within the VM and when it finished the output file appeared on the stage-out server. (Currently we have 85.5% successful completion of the first 1000 jobs of the present batch, and 91% for the second half. [/Edit] ![]() |
©2025 CERN