Message boards : CMS Application : Multi-core VM
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,870,629
RAC: 576
Message 4199 - Posted: 20 Oct 2016, 10:15:55 UTC
Last modified: 20 Oct 2016, 10:17:00 UTC

Nobody else seems to be running the multi-core CMS application at the moment. I revisited it last week, having initially had start-up problems, and have worked my way up to 2x8-core tasks on one of my 20-core Xeons. But I see no other machines running like this:

[cms005@lcggwms02:~] > condor_status|grep slot
slot1@9-1054-19383 LINUX X86_64 Claimed Busy 1.000 3937 0+02:13:20
slot2@9-1054-19383 LINUX X86_64 Claimed Busy 1.000 3937 0+02:13:21
slot3@9-1054-19383 LINUX X86_64 Claimed Busy 0.990 3937 0+01:52:49
slot4@9-1054-19383 LINUX X86_64 Claimed Busy 1.000 3937 0+01:52:50
slot5@9-1054-19383 LINUX X86_64 Claimed Busy 1.000 3937 0+01:33:16
slot6@9-1054-19383 LINUX X86_64 Claimed Busy 1.000 3937 0+01:33:17
slot7@9-1054-19383 LINUX X86_64 Claimed Busy 0.990 3937 0+01:13:11
slot8@9-1054-19383 LINUX X86_64 Claimed Busy 1.050 3937 0+01:13:04
slot1@9-1054-22251 LINUX X86_64 Claimed Busy 1.040 3937 0+01:55:46
slot2@9-1054-22251 LINUX X86_64 Claimed Busy 1.060 3937 0+01:55:47
slot3@9-1054-22251 LINUX X86_64 Claimed Busy 1.050 3937 0+01:35:55
slot4@9-1054-22251 LINUX X86_64 Claimed Busy 1.040 3937 0+01:35:56
slot5@9-1054-22251 LINUX X86_64 Claimed Busy 1.050 3937 0+01:15:41
slot6@9-1054-22251 LINUX X86_64 Claimed Busy 1.120 3937 0+00:55:49
slot7@9-1054-22251 LINUX X86_64 Claimed Busy 1.040 3937 0+01:15:42
slot8@9-1054-22251 LINUX X86_64 Claimed Busy 1.060 3937 0+00:55:42


My app_config.xml file is
<app_config>
<project_max_concurrent>2</project_max_concurrent>
<app>
<name>ATLAS</name>
<max_concurrent>1</max_concurrent>
</app>
<app>
<name>ALICE</name>
<max_concurrent>1</max_concurrent>
</app>
<app>
<name>CMS</name>
<max_concurrent>2</max_concurrent>
</app>
<app_version>
  <app_name>CMS</app_name>
  <plan_class>vbox64_mt_mcore_cms</plan_class>
  <avg_ncpus>8.000000</avg_ncpus>
  <cmdline>--nthreads 8.000000</cmdline>
  <cmdline>--memory_size_mb 20480</cmdline>
 </app_version>
<app>
<name>LHCb</name>
<max_concurrent>1</max_concurrent>
</app>
<app>
<name>Theory</name>
<max_concurrent>1</max_concurrent>
</app>
</app_config>


My problems may have been trying to start too many VMs at once -- they complained of not being able to find the boot image. With just two VMs I'm not seeing that problem now. Is anyone else in a position to re-try the multi-core VM again now?
ID: 4199 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 467
Credit: 389,411
RAC: 503
Message 4200 - Posted: 20 Oct 2016, 10:41:13 UTC - in response to Message 4199.  

Nobody else seems to be running the multi-core CMS application at the moment. ...

I would, but I still get only an old version (v47.30) from the project server.
:-(

See:
https://lhcathome.cern.ch/vLHCathome-dev/results.php?hostid=1464
ID: 4200 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ivan
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 20 Jan 15
Posts: 1129
Credit: 7,870,629
RAC: 576
Message 4201 - Posted: 20 Oct 2016, 13:20:25 UTC - in response to Message 4200.  

Nobody else seems to be running the multi-core CMS application at the moment. ...

I would, but I still get only an old version (v47.30) from the project server.
:-(

See:
https://lhcathome.cern.ch/vLHCathome-dev/results.php?hostid=1464

Have you tried a project reset? That should force downloading a new image when you resume/update.
ID: 4201 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 2,009
Message 4202 - Posted: 20 Oct 2016, 14:53:34 UTC - in response to Message 4199.  

Is anyone else in a position to re-try the multi-core VM again now?

I give it a try after I had reset the project.
With 8 threads and 16 MB I configured to get 2 multi-core VM with each 4 cores and 6144 MB.
However I got 2 tasks version: CMS Simulation v47.40 (vbox64_mt_mcore) windows_x86_64
and app_class vbox64_mt_mcore_cms is unknown.

vLHCathome-dev 20 Oct 16:41:52 Entry in app_config.xml for app 'CMS', plan class 'vbox64_mt_mcore_cms' doesn't match any app versions

I retried with plan class vbox64_mt_mcore and both 4-core VM's started.
4 minutes in now and cvmfs2 rather busy; TTYL
ID: 4202 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 467
Credit: 389,411
RAC: 503
Message 4203 - Posted: 20 Oct 2016, 15:17:53 UTC - in response to Message 4201.  

Have you tried a project reset? ...


I have. See:
https://lhcathome.cern.ch/vLHCathome-dev/forum_thread.php?id=302&postid=4183


The same happened today within the regular project. My host downloaded CMS Simulations v47.42 although only v47.50 is listed:
https://lhcathome.cern.ch/vLHCathome/result.php?resultid=6645841

At least the regular WU runs normal for 1.5 h now.
ID: 4203 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 2,009
Message 4204 - Posted: 20 Oct 2016, 15:36:15 UTC - in response to Message 4202.  
Last modified: 20 Oct 2016, 16:26:08 UTC

I retried with plan class vbox64_mt_mcore and both 4-core VM's started.
4 minutes in now and cvmfs2 rather busy; TTYL

Over 45 minutes running now and both VM's are running each with 4 CMS-jobs.
Jobs 5321, 5322, 5371 and 5372 in one VM and in the other jobs 5369, 5370, 5319 and 5320.

On both VM's 2 jobs started about 20 minutes later than the first 2.
ID: 4204 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,720,123
RAC: 3,061
Message 4205 - Posted: 20 Oct 2016, 19:08:24 UTC

JOBS:1 CPUS:2 in -dev-preferences:

Reset of project before task was starting. 47.40 was Application.

https://lhcathome.cern.ch/vLHCathome-dev/result.php?resultid=276361

exit-Error 207 after a few minutes.
ID: 4205 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 2,009
Message 4206 - Posted: 20 Oct 2016, 19:39:16 UTC - in response to Message 4205.  

JOBS:1 CPUS:2 in -dev-preferences:

Reset of project before task was starting. 47.40 was Application.

https://lhcathome.cern.ch/vLHCathome-dev/result.php?resultid=276361

exit-Error 207 after a few minutes.

As far as I know, a CMS-mt task will only run when using an app_config.xml with the right settings in the app_version part.
ID: 4206 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 2,009
Message 4207 - Posted: 20 Oct 2016, 19:41:43 UTC - in response to Message 4204.  

On both VM's 2 jobs started about 20 minutes later than the first 2.

Due to my own fault (exhausting the host memory) one task crashed -> https://lhcathome.cern.ch/vLHCathome-dev/result.php?resultid=276244
ID: 4207 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 467
Credit: 389,411
RAC: 503
Message 4208 - Posted: 20 Oct 2016, 20:04:47 UTC - in response to Message 4206.  

As far as I know, a CMS-mt task will only run when using an app_config.xml with the right settings in the app_version part.


false:
If you focus on "using an app_config.xml". Apps without an app_config.xml run with standard settings.

true:
If you focus on "with the right settings in the app_version part". Wrong settings may lead to scheduler requests that canĀ“t be served.
ID: 4208 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,720,123
RAC: 3,061
Message 4209 - Posted: 21 Oct 2016, 7:01:24 UTC - in response to Message 4206.  
Last modified: 21 Oct 2016, 7:09:11 UTC


As far as I know, a CMS-mt task will only run when using an app_config.xml with the right settings in the app_version part.


Helo CP,

have no app_config.xml.

With JOBS:1 and TASKS:1 it would run, but weeks ago. Will test today again.

This message is as first message, but in German :-))
Edit:
<message>
Der Ring 2-Stapel wird bereits verwendet.
(0xcf) - exit code 207 (0xcf)
</message>
ID: 4209 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 467
Credit: 389,411
RAC: 503
Message 4210 - Posted: 21 Oct 2016, 8:24:45 UTC

Next try:
Fr 21 Okt 2016 10:06:44 CEST | vLHCathome-dev | Resetting project
Fr 21 Okt 2016 10:07:57 CEST | vLHCathome-dev | update requested by user
Fr 21 Okt 2016 10:08:04 CEST | vLHCathome-dev | Master file download succeeded
Fr 21 Okt 2016 10:08:09 CEST | vLHCathome-dev | Sending scheduler request: Requested by user.
Fr 21 Okt 2016 10:08:09 CEST | vLHCathome-dev | Requesting new tasks for CPU
Fr 21 Okt 2016 10:08:11 CEST | vLHCathome-dev | Scheduler request completed: got 1 new tasks
Fr 21 Okt 2016 10:08:13 CEST | vLHCathome-dev | Started download of vboxwrapper_26196_x86_64-pc-linux-gnu
Fr 21 Okt 2016 10:08:13 CEST | vLHCathome-dev | Started download of CMS_2016_03_22.xml
Fr 21 Okt 2016 10:08:13 CEST | vLHCathome-dev | Started download of CMS_2016_08_08.vdi
Fr 21 Okt 2016 10:08:15 CEST | vLHCathome-dev | work fetch suspended by user
Fr 21 Okt 2016 10:08:15 CEST | vLHCathome-dev | Giving up on download of vboxwrapper_26196_x86_64-pc-linux-gnu: permanent HTTP error
Fr 21 Okt 2016 10:08:15 CEST | vLHCathome-dev | Giving up on download of CMS_2016_03_22.xml: permanent HTTP error
Fr 21 Okt 2016 10:08:15 CEST | vLHCathome-dev | Giving up on download of CMS_2016_08_08.vdi: permanent HTTP error
ID: 4210 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 467
Credit: 389,411
RAC: 503
Message 4211 - Posted: 21 Oct 2016, 9:44:44 UTC - in response to Message 4210.  

Got 1 WU running.
But still v47.30 on only 1 core with 2 GB RAM and without an app_config.xml.
I will cancel the WU as v47.50 should be tested.
ID: 4211 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 2,009
Message 4215 - Posted: 21 Oct 2016, 13:41:54 UTC

I'm running another multi-core VM with 4 cores and 6144MB of memory using an app_config.xml.
I'm still getting application v47.40.

Again I noticed that 2 jobs are starting immediately and the 2 other jobs 20 minutes later.
No idea why. Inside the VM there was 3GB RAM free.
Now running with 4 jobs still 1.3 GB free of memory.
ID: 4215 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,720,123
RAC: 3,061
Message 4216 - Posted: 22 Oct 2016, 6:11:59 UTC - in response to Message 4209.  


With JOBS:1 and TASKS:1 it would run, but weeks ago. Will test today again.


On both Computer this combination finished successful - but Version 47.40:

https://lhcathome.cern.ch/vLHCathome-dev/result.php?resultid=277159

https://lhcathome.cern.ch/vLHCathome-dev/result.php?resultid=276983
ID: 4216 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 4301 - Posted: 8 Nov 2016, 22:50:32 UTC
Last modified: 8 Nov 2016, 23:05:56 UTC

Would all volunteers please make a short post, if they are unable to run tasks with more than one core?

It seems, the settings in preferences for "Max # CPUs" is completely ignored.

I am running Atlas multi-core tasks without any problems.

Apparently, some volunteers can, others not.

Even with an app_config, it is not possible also.

(Sorry,accidental double posting)
ID: 4301 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 4302 - Posted: 8 Nov 2016, 22:54:34 UTC
Last modified: 8 Nov 2016, 23:00:47 UTC

Would all volunteers please make a short post, if they are unable to run tasks with more than one core?

It seems, the setting in preferences for "Max # CPUs" is completely ignored.

I am running Atlas multi-core tasks without any problems.

Apparently, some volunteers can, others not.

Even with an app_config, it is not possible also.(boinc 7.6.33)
ID: 4302 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,720,123
RAC: 3,061
Message 4373 - Posted: 1 Dec 2016, 10:00:46 UTC
Last modified: 1 Dec 2016, 10:02:21 UTC

Have tested Multicore-CMS (2 CPU's). This task ended after about 12 Minutes.
Boinc 7.6.22 Virtualbox 5.1.10, Computer-ID 1165

https://lhcathome.cern.ch/vLHCathome-dev/result.php?resultid=290410

A reset of the project before get new task was made.
ID: 4373 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,720,123
RAC: 3,061
Message 4376 - Posted: 1 Dec 2016, 11:23:05 UTC - in response to Message 4373.  
Last modified: 1 Dec 2016, 11:25:01 UTC

Have no app_config.xml.

RDP-Port 51630:
ALT+F2 Running job output should appear here as first line (no work?)

ALT+F3 Python and CVMFS2 with most cpu shown

Finished after about 12 minutes.

2016-12-01 10:45:29 (1268): Guest Log: [INFO] Reading volunteer information
2016-12-01 10:45:29 (1268): Guest Log: [INFO] Volunteer: maeax (378) Host: 1165
2016-12-01 10:45:29 (1268): Guest Log: [INFO] VMID: 6d0ae20b-f23e-4d5d-b5ca-600a8fb1d26c
2016-12-01 10:45:29 (1268): Guest Log: [INFO] Requesting an X509 credential from vLHC@home
2016-12-01 10:45:33 (1268): Guest Log: [INFO] Requesting an X509 credential from LHC@home
2016-12-01 10:45:33 (1268): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev
2016-12-01 10:45:33 (1268): Guest Log: [INFO] CMS application starting. Check log files.
2016-12-01 10:45:33 (1268): Guest Log: [DEBUG] HTCondor ping
2016-12-01 10:45:33 (1268): Guest Log: [DEBUG] 0
2016-12-01 10:55:40 (1268): Guest Log: [ERROR] Condor exited after 613s without running a job.
2016-12-01 10:55:40 (1268): Guest Log: [INFO] Shutting Down.
2016-12-01 10:55:40 (1268): VM Completion File Detected.
2016-12-01 10:55:40 (1268): VM Completion Message: Condor exited after 613s without running a job.
.
2016-12-01 10:55:40 (1268): Powering off VM.
2016-12-01 10:55:44 (1268): Successfully stopped VM.
2016-12-01 10:55:49 (1268): Deregistering VM. (boinc_83dffbc3ff2390d3, slot#3)
2016-12-01 10:55:49 (1268): Removing virtual disk drive(s) from VM.
2016-12-01 10:55:49 (1268): Removing network bandwidth throttle group from VM.
2016-12-01 10:55:49 (1268): Removing storage controller(s) from VM.
2016-12-01 10:55:50 (1268): Removing VM from VirtualBox.
10:55:55 (1268): called boinc_finish(206)
ID: 4376 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 249
Message 4384 - Posted: 1 Dec 2016, 14:28:20 UTC - in response to Message 4376.  

When we moved to plan classes to address the memory issue, we did not specify max_threads so it defaulted to 1 CPU. This value has now been set to 32 and it is working for Theory. It should also work for CMS but this has not been tested.
ID: 4384 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : CMS Application : Multi-core VM


©2024 CERN