Multi-core VM

Author	Message
Laurence Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1067 Credit: 329,397 RAC: 234	Message 4411 - Posted: 2 Dec 2016, 12:41:50 UTC - in response to Message 4404. I suppose you don't use an app_config.xml, so what CMS_year_mm_dd.xml do you have and what's the contents? CMS_2016_03_22.xml with CMS_2016_10_31.vdi ID: 4411 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1182 Credit: 815,528 RAC: 214	Message 4412 - Posted: 2 Dec 2016, 13:57:42 UTC - in response to Message 4411. I suppose you don't use an app_config.xml, so what CMS_year_mm_dd.xml do you have and what's the contents? CMS_2016_03_22.xml with CMS_2016_10_31.vdi The same had/have I. Reattached with the old not secure URL and now I got <cmdline> --memory_size_mb 3384</cmdline> with my work request (2 CPU's set) Looks like using old and new url's is not transparent. ID: 4412 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1182 Credit: 815,528 RAC: 214	Message 4413 - Posted: 2 Dec 2016, 15:34:37 UTC This whole part is missing in the sched_reply.xml when attaching with the new URL https://lhcathome.cern.ch/vLHCathome-dev/ <app> <name>CMS</name> <user_friendly_name>CMS Simulation</user_friendly_name> <non_cpu_intensive>0</non_cpu_intensive> <fraction_done_exact>0</fraction_done_exact> </app> <file_info> <name>vboxwrapper_26196_windows_x86_64.exe</name> <url>http://lhcathomedev.cern.ch/vLHCathome-dev/download/vboxwrapper_26196_windows_x86_64.exe</url> <executable/> <file_signature> 428d85c43d7e1a7530937ee738375f8c3604e727f6e5a7a16b0ba6256836046d 1e07a5706106299a306210b41bfcba3406f1f594797747e5b8f70dcaf0a1a85c cf472d88aefe910e3ab367f0accf899dcafa7e8ab30bca14e6bc0dfbca0de4d5 c3c8ce370d5dadb5138acab6b9b3b32f0903744d4d4148edbccad9f8a592b58a . </file_signature> <nbytes>1370920</nbytes> </file_info> <file_info> <name>CMS_2016_03_22.xml</name> <url>http://lhcathomedev.cern.ch/vLHCathome-dev/download/CMS_2016_03_22.xml</url> <file_signature> 27c5337125b46e12a0962231ec20aefd6b12c40d604826ac4da8a5657dbdb29c c2d315484e7f2ef1c4d18324e7aac95e566d9f878b9a8b02b59c5bfae9296511 3542fa6969ab661ce3f43287b83a9675116dfb78661717e3f85759db7631ea90 3bd1085404be01f5eb9aec4862a332558f15cef893243746190bd87cc57a0705 . </file_signature> <nbytes>577</nbytes> </file_info> <file_info> <name>CMS_2016_10_31.vdi</name> <url>http://lhcathomedev.cern.ch/vLHCathome-dev/download/CMS_2016_10_31.vdi</url> <gzipped_url>http://lhcathomedev.cern.ch/vLHCathome-dev/download/CMS_2016_10_31.vdi.gz</gzipped_url> <file_signature> 847e52ae60465ee5ca009ea01e1b9ce7285264e6a449777643dcce9f15fa40c9 415ee2de817841186d83640b8b15b1d21199bb248832f70190a8d8977f626213 9b059a8003e32e5c1df130f4f8639a72b7a85d3b433ecc3263cfc1507725aac9 9f293f2b5ccd6a2b6346dfc905b686f11947faaf843f2e30dac429a1f5e81842 . </file_signature> <nbytes>1730150400</nbytes> <gzipped_nbytes>665579700</gzipped_nbytes> </file_info> <file_info> <name>vboxwrapper_26196_windows_x86_64.pdb</name> <url>http://lhcathomedev.cern.ch/vLHCathome-dev/download/vboxwrapper_26196_windows_x86_64.pdb</url> <executable/> <file_signature> 5e9905841873b9e3a47e154b042b5e14ea13ac7847d2fc3d1d0aca1044b4b374 23988a98d8b4b7e0721bb04c51c878e20387bfc8f79a5761f874a3e2ca7a69d2 609942e0e69ed16879bff9200e69409aa7686f733a0667c530951a00ebd5f53f 7f2804b785c204a8d228deb7e983a8a28875ab4b1829c0847a79c32ed489c71d . </file_signature> <nbytes>6310912</nbytes> </file_info> <app_version> <app_name>CMS</app_name> <version_num>4770</version_num> <api_version>7.7.0</api_version> <file_ref> <file_name>vboxwrapper_26196_windows_x86_64.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>CMS_2016_03_22.xml</file_name> <open_name>vbox_job.xml</open_name> </file_ref> <file_ref> <file_name>CMS_2016_10_31.vdi</file_name> <open_name>vm_image.vdi</open_name> <copy_file/> </file_ref> <file_ref> <file_name>vboxwrapper_26196_windows_x86_64.pdb</file_name> </file_ref> <dont_throttle/> <needs_network/> <platform>windows_x86_64</platform> <plan_class>vbox64_mt_mcore_cms</plan_class> <avg_ncpus>2.000000</avg_ncpus> <max_ncpus>2.000000</max_ncpus> <flops>30991330939.967888</flops> <cmdline> --memory_size_mb 3384</cmdline> </app_version> <workunit> <rsc_fpops_est>1000000000005000.000000</rsc_fpops_est> <rsc_fpops_bound>2000000000000000000.000000</rsc_fpops_bound> <rsc_memory_bound>3548381184.000000</rsc_memory_bound> <rsc_disk_bound>8000000000.000000</rsc_disk_bound> <name>CMS_9704_1480644979.690483</name> <app_name>CMS</app_name> </workunit> <result> <report_deadline>1481291397</report_deadline> <wu_name>CMS_9704_1480644979.690483</wu_name> <name>CMS_9704_1480644979.690483_0</name> <report_immediately/> <platform>windows_x86_64</platform> <version_num>4770</version_num> <plan_class>vbox64_mt_mcore_cms</plan_class> </result> <code_sign_key> 1024 d26a9d6cba06f561aabe6dab5d76b59a087d8c84b6445082d44059429a2f5c2e f51f5ae57167c8afda52df605193dce8088016d284967af06532ac4056056d33 b19b863b1cbe0278aeedb9fe509f1a73dd50cc82e73724066e4f58b52f299abc 107391364c19db2dc7999611c745b9956cdc4ba43ca9f581b169eb51977b08d5 0000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000010001 . </code_sign_key> ID: 4413 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 667 Credit: 1,807,614 RAC: 2,394	Message 4414 - Posted: 2 Dec 2016, 15:59:26 UTC Last modified: 2 Dec 2016, 16:00:35 UTC This task ended after TWO Minutes: https is using 2016-12-02 16:54:04 (6072): Guest Log: [INFO] Reading volunteer information 2016-12-02 16:54:04 (6072): Guest Log: [INFO] Volunteer: maeax (378) Host: 1377 2016-12-02 16:54:04 (6072): Guest Log: [INFO] VMID: 6d0ae20b-f23e-4d5d-b5ca-600a8fb1d26c 2016-12-02 16:54:04 (6072): Guest Log: [INFO] Requesting an X509 credential from vLHC@home 2016-12-02 16:54:04 (6072): Guest Log: [INFO] Requesting an X509 credential from LHC@home 2016-12-02 16:54:04 (6072): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev 2016-12-02 16:54:14 (6072): Guest Log: [INFO] CMS application starting. Check log files. 2016-12-02 16:54:14 (6072): Guest Log: [DEBUG] HTCondor ping 2016-12-02 16:54:14 (6072): Guest Log: [DEBUG] 139 2016-12-02 16:54:14 (6072): Guest Log: [DEBUG] 12/02/16 16:54:01 recognized DC_NOP as command name, using command 60011. 2016-12-02 16:54:14 (6072): Guest Log: [ERROR] Could not ping HTCondor. 2016-12-02 16:54:14 (6072): Guest Log: [INFO] Shutting Down. 2016-12-02 16:54:14 (6072): VM Completion File Detected. 2016-12-02 16:54:14 (6072): VM Completion Message: Could not ping HTCondor. ID: 4414 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 475 Credit: 389,411 RAC: 28	Message 4415 - Posted: 2 Dec 2016, 16:21:58 UTC - in response to Message 4413. This whole part is missing in the sched_reply.xml when attaching with the new URL https://lhcathome.cern.ch/vLHCathome-dev/ ... Those parts are not always included in the request/reply. Only if there were changes since the last communication between the client and the server. ID: 4415 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1182 Credit: 815,528 RAC: 214	Message 4416 - Posted: 2 Dec 2016, 16:57:17 UTC - in response to Message 4415. Last modified: 2 Dec 2016, 17:00:36 UTC This whole part is missing in the sched_reply.xml when attaching with the new URL https://lhcathome.cern.ch/vLHCathome-dev/ ... Those parts are not always included in the request/reply. Only if there were changes since the last communication between the client and the server. That's probably right, so I reattached for the 20th time and looked at the differences between the last lines of the app_version part using the old and new URL: OLD <app_version> .. .. <dont_throttle/> <needs_network/> <platform>windows_x86_64</platform> <plan_class>vbox64_mt_mcore_cms</plan_class> <avg_ncpus>2.000000</avg_ncpus> <max_ncpus>2.000000</max_ncpus> <flops>30991330939.967888</flops> <cmdline> --memory_size_mb 3384</cmdline> </app_version> NEW <app_version> .. .. <dont_throttle/> <needs_network/> <platform>windows_x86_64</platform> <plan_class>vbox64_mt_mcore_cms</plan_class> <avg_ncpus>2.000000</avg_ncpus> <max_ncpus>2.000000</max_ncpus> <flops>30740549607.500664</flops> <cmdline>--nthreads 2.000000</cmdline> </app_version> Conclusion: For those getting the right RAM assigned are using the old (not secure) URL (including Laurence). ID: 4416 · Rating: 0 · rate: / Reply Quote

Laurence Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1067 Credit: 329,397 RAC: 234	Message 4417 - Posted: 2 Dec 2016, 19:35:53 UTC - in response to Message 4416. Thanks for pointing this out. It is because the http and https sites are on essentially different servers and the plan class was missing. I have added it so hopefully https will work now. ID: 4417 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1182 Credit: 815,528 RAC: 214	Message 4418 - Posted: 2 Dec 2016, 22:10:19 UTC - in response to Message 4417. Thanks for pointing this out. It is because the http and https sites are on essentially different servers and the plan class was missing. I have added it so hopefully https will work now. I tested with 5 cores. The RAM issued is 6768 MB what's in accordance with your formula (1+5)*1128MB. ID: 4418 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 475 Credit: 389,411 RAC: 28	Message 4419 - Posted: 3 Dec 2016, 10:59:03 UTC I am currently running 1 CMS-dev WU (2 cores) on each of my 2 hosts for 11 hours and I am sure they will successfully finish. The RAM request has been sent correctly (as defined) by the server but I limited the VMs to 2944 MB to check if this is enough. My experience according the RAM requirement is as follows: Base (1 core): 2176 MB (2.125 GB) add per core: 768 MB 1 core WU: 2176 MB 2 core WU: 2944 MB Lower settings work but the VM starts using more and more internal swap (this is not necessarily bad). Higher values lead to more free RAM inside the VM (wasted). The numbers are slightly lower than the current formula on the server but higher than the bare minimum of Rasputin42. It reflects the RAM requirements of the most recent WUs. Future WUs may have different needs. ID: 4419 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,876,718 RAC: 266	Message 4424 - Posted: 3 Dec 2016, 13:28:32 UTC Ah, I think I may have spotted why my multi-core tasks don't get assigned CMS jobs. I just tried again, and while the 2x8 slots were active on the Condor server I did an analysis of one of the jobs in the queue: 1736464.000: Run analysis summary. Of 744 machines, 16 are rejected by your job's requirements 72 reject your job because of their own requirements 654 match and are already running your jobs 0 match but are serving other users 2 are available to run your job The Requirements expression for your job is: ( ( ( target.IS_GLIDEIN isnt true ) \|\| ( target.GLIDEIN_CMSSite isnt undefined ) ) && ( GLIDEIN_REQUIRED_OS is "rhel6" \|\| OpSysMajorVer is 6 ) ) && ( ( Memory >= 1 ) && ( Disk >= 1 ) ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.HasFileTransfer ) Your job defines the following attributes: RequestDisk = 1 RequestMemory = 2000 The Requirements expression for your job reduces to these conditions: Slots Step Matched Condition ----- -------- --------- [1] 744 target.GLIDEIN_CMSSite isnt undefined [4] 744 OpSysMajorVer is 6 [7] 744 Memory >= 1 [8] 744 Disk >= 1 [11] 744 TARGET.Arch == "X86_64" [13] 744 TARGET.OpSys == "LINUX" [15] 744 TARGET.Disk >= RequestDisk [17] 728 TARGET.Memory >= RequestMemory Suggestions: Condition Machines Matched Suggestion --------- ---------------- ---------- 1 ( GLIDEIN_REQUIRED_OS is "rhel6" \|\| OpSysMajorVer is 6 ) 0 REMOVE 2 ( ( Memory >= 1 ) && ( Disk >= 1 ) )0 REMOVE 3 ( TARGET.Memory >= 2000 ) 728 4 ( ( target.IS_GLIDEIN isnt true ) \|\| ( target.GLIDEIN_CMSSite isnt undefined ) ) 744 5 ( TARGET.Arch == "X86_64" ) 744 6 ( TARGET.OpSys == "LINUX" ) 744 7 ( TARGET.Disk >= 1 ) 744 8 ( TARGET.HasFileTransfer ) 744 The formatting is screwed up by the forum display, but you should be able to see that there were 744 slots available, but only 728 of them would accept the job, 16 being rejected for not having >=2000 MB of memory available -- my 16! Now, if I look at one of my slots: [cms005@lcggwms02:~] > condor_status -l slot8@9-1054-20424.9-1054-20424\|sort\|grep -i memory DetectedMemory = 9907 Memory = 1875 TotalMemory = 15000 TotalSlotMemory = 1875 TotalVirtualMemory = 11193544 VirtualMemory = 1399193 and compare it with a slot that has a running job: [cms005@lcggwms02:~] > condor_status -l 9819-67287-6433.9819-67287-6433\|sort\|grep -i memory DetectedMemory = 1996 Memory = 3000 TotalMemory = 3000 TotalSlotMemory = 3000 TotalVirtualMemory = 3089540 VirtualMemory = 3089540 then indeed my slot does not have the requested memory! Now I just have to find out how to change that... ID: 4424 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1182 Credit: 815,528 RAC: 214	Message 4425 - Posted: 3 Dec 2016, 14:33:30 UTC - in response to Message 4424. Ah, I think I may have spotted why my multi-core tasks don't get assigned CMS jobs.. . . .. Now I just have to find out how to change that... If you like, I could try an 8-core CMS-task on my Windows machine and see whether my task will also exit without running a job? ID: 4425 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 475 Credit: 389,411 RAC: 28	Message 4426 - Posted: 3 Dec 2016, 14:43:38 UTC - in response to Message 4424. Once upon a time a couple of volunteers started a discussion about the correct RAM settings for CMS WUs. It seems that your contribution can bring some light in the darkness if you translate it into a RAM formula that can be used by the BOINC system. What we need are values for the following variables: - Minimum RAM for 1 slot/core (including basic system of the VM) - Additional RAM for each additional slot/core ;-) ID: 4426 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 966 Credit: 1,211,816 RAC: 0	Message 4427 - Posted: 3 Dec 2016, 18:02:31 UTC Last modified: 3 Dec 2016, 18:45:21 UTC I think, the "optimal" memory size is the middle of the difference between the bare minimum and the amount it wants to take, if you give it plenty. For a single core tasks that would be about 2560 MB. Comments? EDIT: Scrap that idea, no good. ID: 4427 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,876,718 RAC: 266	Message 4428 - Posted: 3 Dec 2016, 18:21:35 UTC - in response to Message 4426. OK, I set my preference to one CPU and have two tasks running. The Condor slot memory is [cms005@lcggwms02:~] > condor_status -l 9-1054-10500.9-1054-10500\|grep -i memory VirtualMemory = 3302048 MachineResources = "Cpus Memory Disk Swap" TotalMemory = 4500 TotalSlotMemory = 4500 TotalVirtualMemory = 3302048 DetectedMemory = 2200 Memory = 4500 Meanwhile, John Greer is running 6 slots at once, and details for one of his slots are [cms005@lcggwms02:~] > condor_status -l slot6@314-1207-3364.314-1207-3364\|grep -i memory VirtualMemory = 1486757 MachineResources = "Cpus Memory Disk Swap" TotalMemory = 12000 TotalSlotMemory = 2000 TotalVirtualMemory = 8920544 DetectedMemory = 7687 Memory = 2000 captainjack has a 4-slot task running on all cores [cms005@lcggwms02:~] > condor_status -l slot4@287-1548-31691.287-1548-31691\|grep -i memory VirtualMemory = 1661374 MachineResources = "Cpus Memory Disk Swap" TotalMemory = 9000 TotalSlotMemory = 2250 TotalVirtualMemory = 6645496 DetectedMemory = 5465 Memory = 2250 ...and OLI is running 4 cores but only one is busy: [cms005@lcggwms02:~] > condor_status -l slot3@222-361-5738.222-361-5738\|grep -i memory VirtualMemory = 1661374 MachineResources = "Cpus Memory Disk Swap" TotalMemory = 9000 TotalSlotMemory = 2250 TotalVirtualMemory = 6645496 DetectedMemory = 5465 Memory = 2250 Curiously, his three idle slots have the same memory figures, so there must be some other constraint stopping them from running. Finally, David Duvall has 4 slots all running: [cms005@lcggwms02:~] > condor_status -l slot1@217-1483-19635.217-1483-19635\|grep -i memory VirtualMemory = 1661374 MachineResources = "Cpus Memory Disk Swap" TotalMemory = 9000 TotalSlotMemory = 2250 TotalVirtualMemory = 6645496 DetectedMemory = 5465 Memory = 2250 ID: 4428 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,876,718 RAC: 266	Message 4429 - Posted: 3 Dec 2016, 18:26:48 UTC - in response to Message 4428. Last modified: 3 Dec 2016, 18:28:25 UTC Looks like the formula 3 GB + 1.5 GB/core is being applied somewhere... ID: 4429 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1182 Credit: 815,528 RAC: 214	Message 4431 - Posted: 3 Dec 2016, 20:25:22 UTC I tried an 8-core CMS VM without success. VM RAM allocated 10152 MB. StartLog: 12/03/16 21:07:45 **************************************************** 12/03/16 21:07:45 condor_startd (CONDOR_STARTD) STARTING UP 12/03/16 21:07:45 /usr/sbin/condor_startd 12/03/16 21:07:45 SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1) 12/03/16 21:07:45 Configuration: subsystem:STARTD local:<NONE> class:DAEMON 12/03/16 21:07:45 $CondorVersion: 8.4.8 Jun 30 2016 BuildID: 373513 $ 12/03/16 21:07:45 $CondorPlatform: x86_64_RedHat6 $ 12/03/16 21:07:45 PID = 4341 12/03/16 21:07:45 Log last touched time unavailable (No such file or directory) 12/03/16 21:07:45 **************************************************** 12/03/16 21:07:45 Using config source: /etc/condor/condor_config 12/03/16 21:07:45 Using local config sources: 12/03/16 21:07:45 /etc/condor/config.d/10_security.config 12/03/16 21:07:45 /etc/condor/config.d/14_network.config 12/03/16 21:07:45 /etc/condor/config.d/20_workernode.config 12/03/16 21:07:45 /etc/condor/config.d/30_lease.config 12/03/16 21:07:45 /etc/condor/config.d/35_cms.config 12/03/16 21:07:45 /etc/condor/config.d/40_ccb.config 12/03/16 21:07:45 /etc/condor/condor_config.local 12/03/16 21:07:45 config Macros = 156, Sorted = 156, StringBytes = 6021, TablesBytes = 5712 12/03/16 21:07:45 CLASSAD_CACHING is ENABLED 12/03/16 21:07:45 Daemon Log is logging: D_ALWAYS D_ERROR 12/03/16 21:07:45 Daemoncore: Listening at <10.0.2.15:56167> on TCP (ReliSock). 12/03/16 21:07:45 DaemonCore: command socket at <10.0.2.15:56167?addrs=10.0.2.15-56167&noUDP> 12/03/16 21:07:45 DaemonCore: private command socket at <10.0.2.15:56167?addrs=10.0.2.15-56167> 12/03/16 21:07:55 CCBListener: registered with CCB server lcggwms02.gridpp.rl.ac.uk:9623 as ccbid 130.246.180.120:9623#1384525 12/03/16 21:07:56 HibernationSupportedStates invalid '' in ad from hibernation plugin /usr/libexec/condor/condor_power_state 12/03/16 21:07:56 VM-gahp server reported an internal error 12/03/16 21:07:56 VM universe will be tested to check if it is available 12/03/16 21:07:56 History file rotation is enabled. 12/03/16 21:07:56 Maximum history file size is: 20971520 bytes 12/03/16 21:07:56 Number of rotated history files is: 2 12/03/16 21:07:56 Allocating auto shares for slot type 0: Cpus: auto, Memory: auto, Swap: auto, Disk: auto slot type 0: Cpus: 1.000000, Memory: 1875, Swap: 12.50%, Disk: 12.50% slot type 0: Cpus: 1.000000, Memory: 1875, Swap: 12.50%, Disk: 12.50% slot type 0: Cpus: 1.000000, Memory: 1875, Swap: 12.50%, Disk: 12.50% slot type 0: Cpus: 1.000000, Memory: 1875, Swap: 12.50%, Disk: 12.50% slot type 0: Cpus: 1.000000, Memory: 1875, Swap: 12.50%, Disk: 12.50% slot type 0: Cpus: 1.000000, Memory: 1875, Swap: 12.50%, Disk: 12.50% slot type 0: Cpus: 1.000000, Memory: 1875, Swap: 12.50%, Disk: 12.50% slot type 0: Cpus: 1.000000, Memory: 1875, Swap: 12.50%, Disk: 12.50% 12/03/16 21:07:56 slot1: New machine resource allocated 12/03/16 21:07:56 Setting up slot pairings 12/03/16 21:07:56 slot2: New machine resource allocated 12/03/16 21:07:56 Setting up slot pairings 12/03/16 21:07:56 slot3: New machine resource allocated 12/03/16 21:07:56 Setting up slot pairings 12/03/16 21:07:56 slot4: New machine resource allocated 12/03/16 21:07:56 Setting up slot pairings 12/03/16 21:07:56 slot5: New machine resource allocated 12/03/16 21:07:56 Setting up slot pairings 12/03/16 21:07:56 slot6: New machine resource allocated 12/03/16 21:07:56 Setting up slot pairings 12/03/16 21:07:56 slot7: New machine resource allocated 12/03/16 21:07:56 Setting up slot pairings 12/03/16 21:07:56 slot8: New machine resource allocated 12/03/16 21:07:56 Setting up slot pairings 12/03/16 21:07:56 CronJobList: Adding job 'mips' 12/03/16 21:07:56 CronJobList: Adding job 'kflops' 12/03/16 21:07:56 CronJob: Initializing job 'mips' (/usr/libexec/condor/condor_mips) 12/03/16 21:07:56 CronJob: Initializing job 'kflops' (/usr/libexec/condor/condor_kflops) 12/03/16 21:07:56 slot1: State change: IS_OWNER is false 12/03/16 21:07:56 slot1: Changing state: Owner -> Unclaimed 12/03/16 21:07:56 State change: RunBenchmarks is TRUE 12/03/16 21:07:56 slot1: Changing activity: Idle -> Benchmarking 12/03/16 21:07:56 BenchMgr:StartBenchmarks() 12/03/16 21:07:56 slot2: State change: IS_OWNER is false 12/03/16 21:07:56 slot2: Changing state: Owner -> Unclaimed 12/03/16 21:07:56 State change: RunBenchmarks is TRUE 12/03/16 21:07:56 slot2: Changing activity: Idle -> Benchmarking 12/03/16 21:07:56 slot2: Changing activity: Benchmarking -> Idle 12/03/16 21:07:56 slot3: State change: IS_OWNER is false 12/03/16 21:07:56 slot3: Changing state: Owner -> Unclaimed 12/03/16 21:07:56 State change: RunBenchmarks is TRUE 12/03/16 21:07:56 slot3: Changing activity: Idle -> Benchmarking 12/03/16 21:07:56 slot3: Changing activity: Benchmarking -> Idle 12/03/16 21:07:56 slot4: State change: IS_OWNER is false 12/03/16 21:07:56 slot4: Changing state: Owner -> Unclaimed 12/03/16 21:07:56 State change: RunBenchmarks is TRUE 12/03/16 21:07:56 slot4: Changing activity: Idle -> Benchmarking 12/03/16 21:07:56 slot4: Changing activity: Benchmarking -> Idle 12/03/16 21:07:56 slot5: State change: IS_OWNER is false 12/03/16 21:07:56 slot5: Changing state: Owner -> Unclaimed 12/03/16 21:07:56 State change: RunBenchmarks is TRUE 12/03/16 21:07:56 slot5: Changing activity: Idle -> Benchmarking 12/03/16 21:07:56 slot5: Changing activity: Benchmarking -> Idle 12/03/16 21:07:56 slot6: State change: IS_OWNER is false 12/03/16 21:07:56 slot6: Changing state: Owner -> Unclaimed 12/03/16 21:07:56 State change: RunBenchmarks is TRUE 12/03/16 21:07:56 slot6: Changing activity: Idle -> Benchmarking 12/03/16 21:07:56 slot6: Changing activity: Benchmarking -> Idle 12/03/16 21:07:56 slot7: State change: IS_OWNER is false 12/03/16 21:07:56 slot7: Changing state: Owner -> Unclaimed 12/03/16 21:07:56 State change: RunBenchmarks is TRUE 12/03/16 21:07:56 slot7: Changing activity: Idle -> Benchmarking 12/03/16 21:07:56 slot7: Changing activity: Benchmarking -> Idle 12/03/16 21:07:56 slot8: State change: IS_OWNER is false 12/03/16 21:07:56 slot8: Changing state: Owner -> Unclaimed 12/03/16 21:07:56 State change: RunBenchmarks is TRUE 12/03/16 21:07:56 slot8: Changing activity: Idle -> Benchmarking 12/03/16 21:07:56 slot8: Changing activity: Benchmarking -> Idle 12/03/16 21:08:26 State change: RunBenchmarks is TRUE 12/03/16 21:08:26 slot1: Changing activity: Benchmarking -> Idle 12/03/16 21:08:26 State change: benchmarks completed ID: 4431 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1182 Credit: 815,528 RAC: 214	Message 4432 - Posted: 3 Dec 2016, 20:43:37 UTC - in response to Message 4429. Looks like the formula 3 GB + 1.5 GB/core is being applied somewhere... Do you mean for the VM itself or something inside the VM? BOINC uses for CMS the formula 1128MB + #cores1128MB; so for a dual core 3384MB BOINC uses for Theory the formula 630MB + #cores100MB; so for a 4-core 1030MB ID: 4432 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,876,718 RAC: 266	Message 4433 - Posted: 3 Dec 2016, 22:14:21 UTC - in response to Message 4432. Looks like the formula 3 GB + 1.5 GB/core is being applied somewhere... Do you mean for the VM itself or something inside the VM? BOINC uses for CMS the formula 1128MB + #cores1128MB; so for a dual core 3384MB BOINC uses for Theory the formula 630MB + #cores100MB; so for a 4-core 1030MB I was looking at the TotalMemory statistic. For the cases I quoted, that scales as the formula, and your previous post verifies what I found for eight cores, 1875 MB/slot => 15 GB total. 12/03/16 21:07:56 Allocating auto shares for slot type 0: Cpus: auto, Memory: auto, Swap: auto, Disk: auto slot type 0: Cpus: 1.000000, Memory: 1875, Swap: 12.50%, Disk: 12.50% slot type 0: Cpus: 1.000000, Memory: 1875, Swap: 12.50%, Disk: 12.50% slot type 0: Cpus: 1.000000, Memory: 1875, Swap: 12.50%, Disk: 12.50% slot type 0: Cpus: 1.000000, Memory: 1875, Swap: 12.50%, Disk: 12.50% slot type 0: Cpus: 1.000000, Memory: 1875, Swap: 12.50%, Disk: 12.50% slot type 0: Cpus: 1.000000, Memory: 1875, Swap: 12.50%, Disk: 12.50% slot type 0: Cpus: 1.000000, Memory: 1875, Swap: 12.50%, Disk: 12.50% slot type 0: Cpus: 1.000000, Memory: 1875, Swap: 12.50%, Disk: 12.50% Since we've established that the Condor ClassAdd requires >= 2000 MB for a job to run, the maximum possible at present is 6 cores/task. How and why these limits are set is a whole other story. Doing an Alt-F3 in the VM window of my running 1-CPU VM shows that the total memory utilisation is very close to 2 GB. Tomorrow, when my current tasks have expired, I'll submit some 6-core tasks and see what they actually use. ID: 4433 · Rating: 0 · rate: / Reply Quote

Laurence Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1067 Credit: 329,397 RAC: 234	Message 4434 - Posted: 3 Dec 2016, 22:45:42 UTC - in response to Message 4433. Last modified: 3 Dec 2016, 22:46:25 UTC The Condor ClassAdd requires >= 2000 MB is a remnant of the general 2GB per core rule of thumb that is used in WLCG. The issue is that HTCondor will automatically assign 1 job slot per core and then splits the memory equally between them. This results in us having less than 2GB per core and hence why we don't get any jobs in some multicore VMs. So if were are to optimize the memory usage, we also need to relax that requirement to reflect what we can actually run on or go back to having 2GB per core. As an aside, when we are happy with multicore, we can add it to lhc@home as the standard was of working just as ATLAS has done. ID: 4434 · Rating: 0 · rate: / Reply Quote

ivan Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 20 Jan 15 Posts: 1129 Credit: 7,876,718 RAC: 266	Message 4435 - Posted: 3 Dec 2016, 22:57:19 UTC - in response to Message 4434. Thanks, Laurence. Do we know where the 3 GB + N*1.5 GB for TotalMemory comes from? ID: 4435 · Rating: 0 · rate: / Reply Quote

Development for LHC@home