Thread 'New Native App

Author	Message
computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 527 Credit: 400,710 RAC: 0	Message 6164 - Posted: 7 Mar 2019, 14:33:18 UTC One reason to suspend/stop a task (especially a longrunner) is to prepare a client shutdown, e.g. to run system updates. At LHC production David Cameron explained that regardless if ATLAS native would respect the STOP/CONT signals a task would always start from the scratch after a client restart or a reboot. How would Theory native proceed in such a case? Would it also start from the scratch or would it continue the task from the point where it stopped? Well, on windows you would snapshot the vbox VM. ID: 6164 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 859	Message 6196 - Posted: 12 Mar 2019, 9:35:44 UTC Last modified: 12 Mar 2019, 9:38:46 UTC This task crashed a few minutes ago: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2759703 Three Volunteers had this problem: max # von Fehler/Gesamt/Erfolg Aufgaben 3, 3, 1 Fehler Zu viele Ergebnisse insgesamt ID: 6196 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 859	Message 6213 - Posted: 13 Mar 2019, 6:56:55 UTC https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=1882669 ===> [runRivet] Wed Mar 13 03:14:46 UTC 2019 [boinc pp jets 8000 25 - pythia6 6.428 390 100000 30] Setting environment... grep: /etc/redhat-release: No such file or directory MCGENERATORS=/cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt67c g++ = /cvmfs/sft.cern.ch/lcg/external/gcc/4.8.4/x86_64-slc6/bin/g++ g++ version = 4.8.4 RIVET=/cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt67c/rivet/2.6.1/x86_64-slc6-gcc48-opt Rivet version = rivet v2.6.1 RIVET_REF_PATH=/cvmfs/sft.cern.ch/lcg/external/MCGenerators_lcgcmt67c/rivet/2.6.1/x86_64-slc6-gcc48-opt/share/Rivet RIVET_ANALYSIS_PATH=/shared/analyses GSL=/cvmfs/sft.cern.ch/lcg/external/GSL/1.10/x86_64-slc6-gcc48-opt HEPMC=/cvmfs/sft.cern.ch/lcg/external/HepMC/2.06.08/x86_64-slc6-gcc48-opt FASTJET=/cvmfs/sft.cern.ch/lcg/external/fastjet/3.0.3/x86_64-slc6-gcc48-opt PYTHON=/cvmfs/sft.cern.ch/lcg/external/Python/2.7.4/x86_64-slc6-gcc48-opt ROOTSYS=/cvmfs/sft.cern.ch/lcg/app/releases/ROOT/5.34.26/x86_64-slc6-gcc48-opt/root Input parameters: mode=boinc beam=pp process=jets energy=8000 params=25 specific=- generator=pythia6 version=6.428 tune=390 nevts=100000 seed=30 was unsuccessful for all three Volunteers: <core_client_version>7.5.1</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61) </message> <stderr_txt> 04:14:25 (22719): wrapper (7.15.26016): starting 04:14:25 (22719): wrapper (7.15.26016): starting 04:14:25 (22719): wrapper: running ../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.28 () 03:14:25 2019-03-13: cranky-0.0.28: [INFO] Detected Theory App 03:14:25 2019-03-13: cranky-0.0.28: [INFO] Checking CVMFS. 03:14:42 2019-03-13: cranky-0.0.28: [INFO] Checking runc. 03:14:43 2019-03-13: cranky-0.0.28: [INFO] Creating the filesystem. 03:14:43 2019-03-13: cranky-0.0.28: [INFO] Using /cvmfs/cernvm-prod.cern.ch/cvm3 03:14:43 2019-03-13: cranky-0.0.28: [INFO] Updating config.json. 03:14:43 2019-03-13: cranky-0.0.28: [INFO] Running Container 'runc'. 05:24:33 2019-03-13: cranky-0.0.28: [ERROR] Container 'runc' terminated with status code 1. 06:24:33 (22719): cranky exited; CPU time 6570.264835 06:24:33 (22719): app exit status: 0xce 06:24:33 (22719): called boinc_finish(195) </stderr_txt> ]]> ID: 6213 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1252 Credit: 1,006,745 RAC: 913	Message 6214 - Posted: 13 Mar 2019, 7:04:37 UTC - in response to Message 6213. was unsuccessful for all three Volunteers The same happened with this workunit: https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=1882576 3 times the same error: Exit status 195 (0x000000C3) EXIT_CHILD_FAILED I do not know the job description. ID: 6214 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 859	Message 6215 - Posted: 13 Mar 2019, 7:18:07 UTC - in response to Message 6214. Thanks Crystal, two are more than one ;-) It can be this Pythia 6. ID: 6215 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 1	Message 6216 - Posted: 13 Mar 2019, 8:52:18 UTC - in response to Message 6215. I see that we get a few jobs per day like this. Have found one and will run it myself to see what is happening. ID: 6216 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1150 Credit: 342,328 RAC: 1	Message 6217 - Posted: 13 Mar 2019, 9:59:16 UTC - in response to Message 6216. Yes, I see that the job failed. 100000 events processed dumping histograms... Rivet.Analysis.Handler: INFO Finalising analyses terminate called after throwing an instance of 'YODA::LowStatsError' what(): Requested variance of a distribution with only one effective entry ./runRivet.sh: line 376: 267 Aborted (core dumped) $rivetExecString (wd: /shared/tmp/tmp.8MuWyGp9sd) Processing histograms... input = /shared/tmp/tmp.8MuWyGp9sd/flat output = /shared ./runRivet.sh: line 850: 268 Killed display_service $tmpd_dump "$beam $process $energy $params $generator $version $tune" (wd: /shared) ERROR: following histograms should be produced according to run parameters, but missing from Rivet output: It then gets resubmitted as it may be a host issue. We will see if we can stop these jobs either being sent or being resubmitted. ID: 6217 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 859	Message 6224 - Posted: 13 Mar 2019, 13:51:01 UTC This Computer show MCPlot-tasks for today, but the last one was finishing 36 hours ago on March.11 https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=3723 2019-03-10 33 0 33 2019-03-11 29 0 29 2019-03-12 7 0 7 2019-03-13 6 0 6 ID: 6224 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 859	Message 6227 - Posted: 15 Mar 2019, 10:19:40 UTC SL76 with Boinc 7.5.1: Clean start from Linux: The -dev-Tasks (max. 6 Tasks) - Every Minute ONE -dev Task is downloading and not once all! This info is important for the production-transfer, when it is on other Linux also. The Server thanks it with a better performance. cvmfs:config with openhtc.io openhtc.io show a entry 206.167.181.94:3128 - Is this ok? ID: 6227 · Rating: 0 · rate: / Reply Quote

computezrmle Volunteer moderator Project tester Volunteer developer Volunteer tester Help desk expert Send message Joined: 28 Jul 16 Posts: 527 Credit: 400,710 RAC: 0	Message 6228 - Posted: 15 Mar 2019, 11:13:02 UTC - in response to Message 6227. ]SL76 with Boinc 7.5.1: Clean start from Linux: The -dev-Tasks (max. 6 Tasks) - Every Minute ONE -dev Task is downloading and not once all! This info is important for the production-transfer, when it is on other Linux also. The Server thanks it with a better performance. cvmfs:config with openhtc.io openhtc.io show a entry 206.167.181.94:3128 - Is this ok?[/quote] Seems to be weird. This is not cloudflare. [pre]dig -x 206.167.181.94 ; <<>> DiG 9.9.9-P1 <<>> -x 206.167.181.94 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10471 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 5 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;94.181.167.206.in-addr.arpa. IN PTR ;; ANSWER SECTION: 94.181.167.206.in-addr.arpa. 3600 IN PTR 206-167-181-94.cloud.computecanada.ca. ;; AUTHORITY SECTION: 181.167.206.in-addr.arpa. 86400 IN NS ns1.zonerisq.ca. 181.167.206.in-addr.arpa. 86400 IN NS ns2.zonerisq.ca. ;; ADDITIONAL SECTION: ns1.zonerisq.ca. 86400 IN A 162.219.54.2 ns2.zonerisq.ca. 86400 IN A 162.219.55.2 ns1.zonerisq.ca. 86400 IN AAAA 2620:10a:80eb::2 ns2.zonerisq.ca. 86400 IN AAAA 2620:10a:80ec::2 ;; Query time: 590 msec ;; SERVER: 192.168.0.12#53(192.168.0.12) ;; WHEN: Fri Mar 15 12:01:07 CET 2019 ;; MSG SIZE rcvd: 240[/pre] Port 3128 is a typical squid port. openhtc.io uses port 80 instead. Did you configure this proxy in your local CVMFS configuration? If not, did you configure an automatic proxy detection. You may try a "cvmfs_config showconfig -s" to see if one of the repositories uses a special setup. ID: 6228 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 859	Message 6241 - Posted: 24 Mar 2019, 9:10:27 UTC - in response to Message 6073. Have 7 tasks in parallel, but slot-Nr. are shown up to 21! Maybe, they are not deleted after finishing? Yes there was an issue with a previous image where the slot directories were not be clean. Let me know if that is still the case. Since 2 days have only native-Theory in one SL76 running. Therefore only 6 tasks are possible, the Slot-Numbers are growing to 14 for the moment. Watching it to find a point, when one new Slot is created. Up to now found no identical info from event-text in Boinc to the creation of a new slot-Nr. It can be controlled when the new date-time is shown in the explorer of the Linux. Can it be, when one task finished and parallel one is downloading and begin exact at the same time the running? ID: 6241 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1252 Credit: 1,006,745 RAC: 913	Message 6242 - Posted: 24 Mar 2019, 16:06:57 UTC - in response to Message 6241. Can it be, when one task finished and parallel one is downloading and begin exact at the same time the running? I run only native Theory and directly after 1 job has finished, the same slot is used by the task in queue. Are the not used slots empty and are empty slots removed after a BOINC-restart? ID: 6242 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 859	Message 6243 - Posted: 24 Mar 2019, 17:25:14 UTC - in response to Message 6242. Last modified: 24 Mar 2019, 17:45:51 UTC ATM 18 and 19 as new slot-No. are active and 0 thru 3 (because of 6 tasks). The difference between a new slot number and the one before is sometime one Hour or 2, 3 hours. Seeing at the timestamp in the explorer. Have boinc always running since two days, so no exit. But think all were cleared after a restart. Will testing this when it is important. It must be a special constellation to let Boinc building a new slot number. Have Boinc 7.5.1. Is this a problem? Laurence know about this new Slot building. See the copied answer in the message. BTW WCG have no problems with the same Boinc 7.5.1 in a other SL76, where the Sherpa is now running 126 hours. OMG. This is the watching Computer: https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=3723 The old slot No. are empty, sorry, have forgotten to answer. Crystal, this evening have no more time. Ned-D ;-) ID: 6243 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1252 Credit: 1,006,745 RAC: 913	Message 6244 - Posted: 24 Mar 2019, 18:51:53 UTC - in response to Message 6243. Last modified: 24 Mar 2019, 18:52:29 UTC Have Boinc 7.5.1. Is this a problem? Could be the problem. 7.5.1 is even not an official Berkeley version. I've BOINC 7.12.0 running. Crystal, this evening have no more time. Ned-D ;-) I think I'll be watching 'Tatort' from Munich ;) Depends on the plot. ID: 6244 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 859	Message 6245 - Posted: 25 Mar 2019, 7:36:06 UTC - in response to Message 6244. Have Boinc 7.5.1. Is this a problem? Could be the problem. 7.5.1 is even not an official Berkeley version. I've BOINC 7.12.0 running. This Version is from -native Atlas Installation-Guide. SL have no own Software-Installation for Boinc. When you start this Version is a Event-message, this is a Development-Version. Don't know whats the best Boinc-Version to let it run in ScientificLinux. Up to now there are no problems with Boinc (SL69 or SL76). ID: 6245 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 859	Message 6246 - Posted: 25 Mar 2019, 10:09:34 UTC - in response to Message 6245. WCG show in Download a Fedora-Info for the EPEL-Release. Is this a option for SL69 or SL76 to install this Boinc-Version from WCG? ID: 6246 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 782 Credit: 4,057,880 RAC: 859	Message 6275 - Posted: 28 Mar 2019, 8:47:54 UTC SL76 with Boinc 7.5.1 6 Tasks for Theory possible: Scratch show folder for example 0a,0b,0c... with entries days ago as timestamp. Is it possible that they were not cleared after finishing Theory Tasks? Boinc need 13 MByte more disk-space. Running only one Sherpa from 6 possible tasks, because of the space limit. Can clear this manuell or exit the Boinc after finishing the last Sherpa. ID: 6275 · Rating: 0 · rate: / Reply Quote

Development for LHC@home