Thread 'New version v1.03'

Author	Message
Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1156 Credit: 342,328 RAC: 0	Message 5487 - Posted: 3 Sep 2018, 9:09:11 UTC CVMFS configuration improvements ID: 5487 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 531 Credit: 400,710 RAC: 0	Message 5488 - Posted: 3 Sep 2018, 13:02:41 UTC omments regarding a 2-core WU that is currently in progress. CVMFS Works perfect together with openhtc.io as well as with the local proxy. As a result the startup time is rather short. [pre]2018-09-03 13:36:58 (9955): vboxwrapper (7.7.26196): starting 2018-09-03 13:38:16 (9955): Guest Log: [DEBUG] Detected squid proxy http://<hostname_censored_by_volunteer/>:3128 2018-09-03 13:39:23 (9955): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2018-09-03 13:39:23 (9955): Guest Log: 2.4.4.0 3529 1 25696 7069 2 1 1734417 10240001 2 65024 0 15 100 1 2 http://s1cern-cvmfs.openhtc.io/cvmfs/grid.cern.ch http://<local_IP_censored_by_volunteer/>:3128 1 2018-09-03 13:41:26 (9955): Guest Log: [INFO] New Job Starting in slot1 2018-09-03 13:41:26 (9955): Guest Log: [INFO] New Job Starting in slot2[/pre] Multicore Delay A very short job in slot1 caused the same delay that is decribed in the non-dev message board: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4790&postid=36609 [pre]2018-09-03 13:50:36 (9955): Guest Log: [INFO] Job finished in slot1 with . 2018-09-03 14:01:46 (9955): Guest Log: [INFO] New Job Starting in slot1[/pre] VM's RAM Setting Much lower than it is compared to the non-dev version. Thus kswapd0 uses lots of CPU cycles (6 min within 1:15 runtime). I suggest to set it higher for the final version or at least to give users with enough RAM a hint to tune the RAM setting via app_config.xml. ID: 5488 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 531 Credit: 400,710 RAC: 0	Message 5489 - Posted: 4 Sep 2018, 5:18:28 UTC It looks like the LHCb VM uses the same default RAM size than the Theory VM (1-core: 730 MB; 2-core: 830 MB). This is much too low as every single job inside the VM needs around 1.3 GB. It causes lots of swapping activity and finally a crash: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2362165 2 other tasks at 2432 MB (1-core) and 4864 MB (2-core) are running fine and are both close before the finish line: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2362161 https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2362166 ID: 5489 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 788 Credit: 4,098,402 RAC: 1,762	Message 5490 - Posted: 4 Sep 2018, 5:26:49 UTC Thinking, that this two-Core LHCb is not running well, upgraded Boinc(7.12.1) and Virtualbox(5.2.18). https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2362164 Windows 10pro is also upgrading, because of AMD-Meltdown corrections. https://support.microsoft.com/en-us/help/4346783/windows-10-update-kb4346783 ID: 5490 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 950 Credit: 17,290,175 RAC: 18,023	Message 5491 - Posted: 4 Sep 2018, 9:19:06 UTC I was not having any problems with the previous version but now they are trying to d/l another HUGE vdi so that will take many hours for each host I use here. v1.03 is 968.76MB just for that vdi and d/l'ing at a SLOW 8.6kbps and this is after 2am here so that is when I have the fastest time on my end. So far I see 2 host d/l'ing this at the same time and one is at 85% after over 21 HOURS Ok after looking at host #3 it is at 77% after 15 HOURS And the one I am on is at 24% after 14 HOURS I hope that closest one get finished soon so maybe the other 2 will speed up. One thing for sure is I won't be d/l'ing this on my other 8-core pc's I have running over at LHC Mad Scientist For Life ID: 5491 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1156 Credit: 342,328 RAC: 0	Message 5492 - Posted: 5 Sep 2018, 7:39:21 UTC - in response to Message 5489. Yes this is using the theory plan class. We will need to create an LHCb. What are good values for the base memory and memory per cpu. ID: 5492 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 950 Credit: 17,290,175 RAC: 18,023	Message 5493 - Posted: 5 Sep 2018, 7:45:38 UTC - in response to Message 5492. Last modified: 5 Sep 2018, 7:46:55 UTC https://lhcathomedev.cern.ch/lhcathome-dev/results.php?userid=192 Only 2 Valids and now nothing but these Invalids 09/05/18 09:21:28 recognized DC_NOP as command name, using command 60011. 2018-09-05 00:21:40 (3496): Guest Log: 09/05/18 09:21:40 Condor GSI authentication failure 2018-09-05 00:21:40 (3496): Guest Log: GSS Major Status: Authentication Failed 2018-09-05 00:21:40 (3496): Guest Log: GSS Minor Status Error Chain: 2018-09-05 00:21:40 (3496): Guest Log: globus_gss_assist: Error during context initialization 2018-09-05 00:21:40 (3496): Guest Log: globus_gsi_callback_module: Could not verify credential (3496): Guest Log: globus_gsi_callback_module: Invalid CRL: The available CRL has expired 2018-09-05 00:21:40 (3496): Guest Log: 09/05/18 09:21:41 SECMAN: required authentication with local collector failed, so aborting command DC_SEC_QUERY. 2018-09-05 00:22:25 (3496): Guest Log: [ERROR] Could not ping HTCondor. ID: 5493 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 531 Credit: 400,710 RAC: 0	Message 5494 - Posted: 5 Sep 2018, 8:55:16 UTC - in response to Message 5492. Yes this is using the theory plan class. We will need to create an LHCb. What are good values for the base memory and memory per cpu. A good starting point for a 1-core setup would be 2048 MB as this value is defined in the LHCb_2017_05_05.xml. Maybe a bit more to avoid swapping. My singlecore VMs at the production site use 2432 MB and swap out roughly 12 MB. Recent jobs need up to 1.3 GB per core but IIRC there were jobs with larger requests in the past. The project scientists should know what could be expected in the future. That value should be added per additional core. ID: 5494 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1156 Credit: 342,328 RAC: 0	Message 5495 - Posted: 5 Sep 2018, 9:16:48 UTC - in response to Message 5494. I have added a plan class for LHCb to reflect those values. Please let me know how it goes. ID: 5495 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 950 Credit: 17,290,175 RAC: 18,023	Message 5496 - Posted: 5 Sep 2018, 9:28:53 UTC I guess I will have to add that I suspended all mine here and I have been running most of these LHCb multi's and they do not work with this new version AND you are welcome to check anyone elses errors with this new version because they are not working. Since I have been running most of these I already have 30 of these Errors I mentioned and checking the few that are run by other members I see they are all the same Error. I never had any problems as far as Ram with the previous version of these tasks. v1.02.......hundreds of those tasks worked with no problems. Mad Scientist For Life ID: 5496 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1275 Credit: 1,042,491 RAC: 182	Message 5497 - Posted: 5 Sep 2018, 12:36:18 UTC - in response to Message 5495. Last modified: 5 Sep 2018, 13:05:16 UTC I have added a plan class for LHCb to reflect those values. Please let me know how it goes. The VM's are shutdown. => "Could not ping HTCondor" Something wrong with ?? Invalid CRL: The available CRL has expired ?? https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2362317 ID: 5497 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 950 Credit: 17,290,175 RAC: 18,023	Message 5498 - Posted: 5 Sep 2018, 21:32:39 UTC Sometimes I wonder if my posts ever get read here........I suggest that certain people pay attention to what is said by the member that does MOST of the testing here. Not wait for a member that has only done one task of any of these new versions. ID: 5498 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 788 Credit: 4,098,402 RAC: 1,762	Message 5499 - Posted: 5 Sep 2018, 23:42:52 UTC Is it possible to go back to the old .vdi 1.02? ID: 5499 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 950 Credit: 17,290,175 RAC: 18,023	Message 5500 - Posted: 6 Sep 2018, 19:03:54 UTC - in response to Message 5499. Is it possible to go back to the old .vdi 1.02? ID: 5500 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1156 Credit: 342,328 RAC: 0	Message 5501 - Posted: 7 Sep 2018, 8:27:28 UTC - in response to Message 5499. Is it possible to go back to the old .vdi 1.02? No. The purpose here is to test things rather than maintain a stable running service. We should be looking towards v1.04. The issue is this. The Certificate Revocation Lists (CRLs) are not being updated. When the VM was built they were current. Now they are stale but should be refreshed over CVMFS. The CVMFS configuration is therefore not broken but not working. We have to investigate. The purpose of this change is to move to openhtc.io like we have done for CMS and Theory. ID: 5501 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 788 Credit: 4,098,402 RAC: 1,762	Message 5502 - Posted: 7 Sep 2018, 8:33:18 UTC - in response to Message 5501. Last modified: 7 Sep 2018, 9:04:02 UTC Thank you Laurence, ID: 5502 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1156 Credit: 342,328 RAC: 0	Message 5503 - Posted: 7 Sep 2018, 8:33:41 UTC - in response to Message 5498. Sometimes I wonder if my posts ever get read here........I suggest that certain people pay attention to what is said by the member that does MOST of the testing here. Not wait for a member that has only done one task of any of these new versions. Your posts always get read! But not all may get an answer. Yesterday was a public holiday. ID: 5503 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 950 Credit: 17,290,175 RAC: 18,023	Message 5505 - Posted: 7 Sep 2018, 12:29:30 UTC - in response to Message 5503. Your posts always get read! But not all may get an answer. Yesterday was a public holiday. Scroll back and see that I said this new version does not work in BOLD text and you posted here after and before that and said nothing in reply to that. I posted it the first time minutes after you posted something and I started typing ALL of the facts 2 minutes after your post so you would see everything. Then I said it again and you posted about something else. The ones that should get an answer/reply are the ones you get from a member who does most of the testing here of those multi-core Theory and LHCb tasks not questions or statements from members who have not even been running these tasks. And btw those stats pages STILL are not re[aired and so I have to go through the members one by one to find ANYONE else who might have run ANY of these tasks to make a comparison. It as usual will say many members have been running these tasks yet if you take a look they have not been here at all for over a year and never ran one single multi core task as far as the Theory and LHCb or even CMS. It is only two members doing that and I have done most of them yet that doesn't seem to be the tasks that get checked here to see if things are working. I did thousands of the Theory multi-cores before they finally got moved over to LHC and I was the only one that did that and now 500 of these LHCb's Valid and working fine yet that doesn't seem to be how we test things here. I am the only member that has been running all the testing for Cern since day one and 24/7 since VB started in 2011 including the Atlas-Alpha testing before they even came over here to -dev. The stats pages here are still all wrong so I have to check everyone on the first page to see who actually is here running any of these LHCb's and the same when I did the thousands of Theory multi-cores. Funny how it still has members who have not been here for over a year and never ran a single multi-core Theory or LHCb task up at the top of a stats page and I am willing to bet I am the only one that does these tasks and then digs through that stats list trying to find ONE single member here running these tasks while I do and the funny part is I asked him to run some so it wasn't just me doing this for a comparison........is that my job here too? I am even typing this out at 5:30am so I know what time it is over there (and no I didn't get up early either) and I have never pulled "its a holiday or a vacation time" out of my pocket either. So I will just check all 9 of my computers that I have running the same Theory multi-core at LHC and then finally get some sleep. (and as usual the only problem there is at the Cern server end) goodnight Mad Scientist For Life ID: 5505 · Rating: 0 · rate: / Reply Quote

Development for LHC@home