Message boards :
ATLAS Application :
ATLAS v0.50 and 0.51
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
We are testing out ATLAS tasks using the new backend filesystem on LHC-dev. These are exactly the same WU as on LHC so let us know if you see any problems. The new app versions are available for Windows and Linux native. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 |
Get no work for windows: NUMBER OF TASKS: 1 NUMBER OF CPU:1 https://lhcathomedev.cern.ch/lhcathome-dev/show_host_detail.php?hostid=1165 21.12.2017 20:23:31 | lhcathome-dev | Sending scheduler request: To fetch work. 21.12.2017 20:23:31 | lhcathome-dev | Requesting new tasks for CPU 21.12.2017 20:23:32 | lhcathome-dev | Scheduler request completed: got 0 new tasks 21.12.2017 20:23:32 | lhcathome-dev | No tasks sent 21.12.2017 20:23:32 | lhcathome-dev | No tasks are available for ATLAS Simulation 21.12.2017 20:23:32 | lhcathome-dev | Tasks for AMD/ATI GPU are available, but your preferences are set to not accept them Will testing native App also. This evening or tomorrow morning. |
Send message Joined: 10 May 15 Posts: 4 Credit: 39,333 RAC: 0 |
All my Atlas work failed on this machine. https://lhcathomedev.cern.ch/lhcathome-dev/show_host_detail.php?hostid=2290 This is an example: Name DyAMDm4oKmrnOmbckoZBDNAoABFKDmABFKDm4NGKDmABFKDmPLzY5n_0 Workunit 360631 Created 21 Dec 2017, 18:32:56 UTC Sent 22 Dec 2017, 0:49:58 UTC Report deadline 29 Dec 2017, 0:49:58 UTC Received 22 Dec 2017, 3:50:17 UTC Server state Over Outcome Computation error Client state Compute error Exit status 195 (0x000000C3) EXIT_CHILD_FAILED Computer ID 2290 Run time 10 min 4 sec CPU time Validate state Invalid Credit 0.00 Device peak FLOPS 39.97 GFLOPS Application version ATLAS Simulation v0.50 (native_mt) x86_64-pc-linux-gnu Peak working set size 6.91 MB Peak swap size 170.48 MB Peak disk usage 253.28 MB Stderr output <core_client_version>7.7.0</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 20:11:16 (22315): wrapper (7.7.26015): starting 20:11:16 (22315): wrapper: running run_atlas (--nthreads 12) sys.argv = ['run_atlas', '--nthreads', '12'] THREADS=12 Checking for CVMFS sh: cvmfs_config: command not found sh: cvmfs_config: command not found sh: cvmfs_config: command not found check cvmfs return values are 32512, 32512 CVMFS not found, aborting running start_atlas return value is 1 tar czvf shared/result.tar.gz tar: Cowardly refusing to create an empty archive Try `tar --help' or `tar --usage' for more information. *****************The last 100 lines of the pilot log****************** tail: cannot open ‘pilotlog.txt’ for reading: No such file or directory ***************diag file************ cat: *.diag: No such file or directory ******************************WorkDir*********************** total 36 drwxrwx--x 3 root root 177 Dec 21 20:11 . drwxrwx--x 31 root root 286 Dec 14 19:37 .. -rw-r--r-- 1 root root 0 Dec 21 20:11 boinc_lockfile -rw-r--r-- 1 root root 8192 Dec 21 20:11 boinc_mmap_file -rw-r--r-- 1 root root 5920 Dec 21 20:11 init_data.xml -rw-r--r-- 1 root root 112 Dec 21 20:11 job.xml -rwxr-xr-x 1 root root 7456 Dec 21 20:11 run_atlas drwxrwx--x 2 root root 86 Dec 21 20:11 shared -rw-r--r-- 1 root root 760 Dec 21 20:11 stderr.txt -rw-r--r-- 1 root root 107 Dec 21 20:11 wrapper_26015_x86_64-pc-linux-gnuparent process exit 1 child process exit 1 20:21:19 (22315): run_atlas exited; CPU time 0.034227 20:21:19 (22315): app exit status: 0x1 20:21:19 (22315): called boinc_finish(195) </stderr_txt> ]]>[/url] Proud Founder and member of Have a look at my WebCam |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 |
Will testing native App also. This evening or tomorrow morning. Native App get also no new tasks. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 15 |
I can't test Windows multi-core version 0.51 without getting a task: lhcathome-dev 22-12-2017 7:15 No tasks are available for ATLAS Simulation lhcathome-dev 22-12-2017 7:27 No tasks are available for ATLAS Simulation lhcathome-dev 22-12-2017 7:28 No tasks are available for ATLAS Simulation lhcathome-dev 22-12-2017 7:41 No tasks are available for ATLAS Simulation lhcathome-dev 22-12-2017 7:42 No tasks are available for ATLAS Simulation lhcathome-dev 22-12-2017 7:48 No tasks are available for ATLAS Simulation lhcathome-dev 22-12-2017 7:49 No tasks are available for ATLAS Simulation lhcathome-dev 22-12-2017 8:00 No tasks are available for ATLAS Simulation lhcathome-dev 22-12-2017 8:03 No tasks are available for ATLAS Simulation lhcathome-dev 22-12-2017 8:11 No tasks are available for ATLAS Simulation lhcathome-dev 22-12-2017 8:18 No tasks are available for ATLAS Simulation lhcathome-dev 22-12-2017 8:42 No tasks are available for ATLAS Simulation lhcathome-dev 22-12-2017 9:04 No tasks are available for ATLAS Simulation lhcathome-dev 22-12-2017 9:05 No tasks are available for ATLAS Simulation lhcathome-dev 22-12-2017 9:20 No tasks are available for ATLAS Simulation lhcathome-dev 22-12-2017 9:35 No tasks are available for ATLAS Simulation lhcathome-dev 22-12-2017 9:45 No tasks are available for ATLAS Simulation lhcathome-dev 22-12-2017 9:47 No tasks are available for ATLAS Simulation |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
You have to install CVMFS and Singularity to run the native app - please see this thread for more details: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4395 |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
I can't test Windows multi-core version 0.51 without getting a task: There was a syntax error in the configuration of the windows tasks, I've fixed it and see that some of you have WU now. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 |
running now both native App and Windows. Thank you David. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 15 |
I got an ATLAS task now, however the task was postponed for 86400 seconds after 10 minutes run time and the VM was powered off.I can't test Windows multi-core version 0.51 without getting a task:There was a syntax error in the configuration of the windows tasks, I've fixed it and see that some of you have WU now. 22-Dec-2017 10:26:37 [lhcathome-dev] Sending scheduler request: To fetch work. 22-Dec-2017 10:26:37 [lhcathome-dev] Requesting new tasks for CPU and AMD/ATI GPU 22-Dec-2017 10:26:39 [lhcathome-dev] Scheduler request completed: got 1 new tasks 22-Dec-2017 10:26:41 [lhcathome-dev] Started download of 9ZNNDmpGRmrnOmbckoZBDNAoABFKDmABFKDmffNKDmABFKDm1os8em_EVNT.12575767._000064.pool.root.1 --> 263.546.489 bytes 22-Dec-2017 10:26:41 [lhcathome-dev] Started download of 9ZNNDmpGRmrnOmbckoZBDNAoABFKDmABFKDmffNKDmABFKDm1os8em_input.tar.gz 22-Dec-2017 10:26:43 [lhcathome-dev] Finished download of 9ZNNDmpGRmrnOmbckoZBDNAoABFKDmABFKDmffNKDmABFKDm1os8em_input.tar.gz 22-Dec-2017 10:26:43 [lhcathome-dev] Started download of rte_9ZNNDmpGRmrnOmbckoZBDNAoABFKDmABFKDmffNKDmABFKDm1os8em.tar.gz 22-Dec-2017 10:26:44 [lhcathome-dev] Finished download of rte_9ZNNDmpGRmrnOmbckoZBDNAoABFKDmABFKDmffNKDmABFKDm1os8em.tar.gz 22-Dec-2017 10:26:44 [lhcathome-dev] Started download of boinc_job_script.hsr0s1 22-Dec-2017 10:26:45 [lhcathome-dev] Finished download of boinc_job_script.hsr0s1 22-Dec-2017 10:26:46 [World Community Grid] update requested by user 22-Dec-2017 10:26:50 [World Community Grid] Sending scheduler request: Requested by user. 22-Dec-2017 10:26:50 [World Community Grid] Requesting new tasks for CPU 22-Dec-2017 10:26:51 [World Community Grid] Scheduler request completed: got 0 new tasks 22-Dec-2017 10:26:51 [World Community Grid] No tasks sent 22-Dec-2017 10:26:51 [World Community Grid] No tasks are available for FightAIDS@Home - Vina 22-Dec-2017 10:26:51 [World Community Grid] No tasks are available for the applications you have selected. 22-Dec-2017 10:27:09 [lhcathome-dev] Finished download of 9ZNNDmpGRmrnOmbckoZBDNAoABFKDmABFKDmffNKDmABFKDm1os8em_EVNT.12575767._000064.pool.root.1 22-Dec-2017 10:27:12 [lhcathome-dev] Starting task 9ZNNDmpGRmrnOmbckoZBDNAoABFKDmABFKDmffNKDmABFKDm1os8em_1 . . 22-Dec-2017 11:37:08 [lhcathome-dev] task postponed 86400.000000 sec: VM job unmanageable, restarting later. After restarting BOINC the task restarted OK and is running now. |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 |
Windows task finished after 20 min: https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=360714 |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 15 |
Windows task finished after 20 min: Not good, it should have crunched 200 events. BOINC validated OK, because it's not your machine's fault, but the ATLAS-job errored with: Error information from the Sim_tf transformation, report version 2.0.7 Error TRF_EXEC_FAIL, exit code 65: Non-zero return code from EVNTtoHITS (65); Logfile error in log.EVNTtoHITS: "AthMpEvtLoopMgr FATAL makePool failed for AthMpEvtLoopMgr.SharedEvtQueueProvider" Executor error: Non-zero return code from EVNTtoHITS (65) Error details: FATAL: line 8486: AthMpEvtLoopMgr FATAL makePool failed for AthMpEvtLoopMgr.SharedEvtQueueProvider ERROR: line 8485: AthMpEvtLoopMgr... ERROR Unable to map the flag on all subprocesses in the group |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 |
Thanks Crystal, was surprised about the quick finish. Will stop windows tasks in -dev for the next two weeks (also holiday like.... ;-). Native Linux App is running since more 6 hours and seem to work well. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 862,257 RAC: 15 |
The Windows task postponed again: 22-Dec-2017 20:33:19 [lhcathome-dev] task postponed 86400.000000 sec: VM job unmanageable, restarting later. after > 9 hours running and > 36 hours CPU-time. Problem is that the VM is not saved, but powered off, so after BOINC's restart the job has to start from scratch. Not happy crunching :-( |
Send message Joined: 10 May 15 Posts: 4 Credit: 39,333 RAC: 0 |
I installed the two applications but still get the same error. I stopped and started boinc after the install. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=373956 The system is Linux CentOS Linux 7 (Core) [3.10.0-693.11.1.el7.x86_64] and this shows that the software is loaded: [root@dingo4 ~]# yum info cvmfs Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * base: centos.gbeservers.com * extras: mirror.steadfast.net * updates: mirror.dal10.us.leaseweb.net Available Packages Name : cvmfs Arch : x86_64 Version : 2.4.4 Release : 1.el7.centos Size : 8.8 M Repo : cernvm/7/x86_64 Summary : CernVM File System License : BSD Description : HTTP File System for Distributing Software to CernVM. : See http://cernvm.cern.ch : Copyright (c) CERN ---------------------------------------------------------------- yum could not find singularity as it wasn't installed with yum. But it is on the system: [root@dingo4 ~]# find / -name singularity /root/singularity /root/singularity/bin/singularity /root/singularity/etc/bash_completion.d/singularity /usr/local/bin/singularity /usr/local/etc/bash_completion.d/singularity /usr/local/etc/singularity /usr/local/include/singularity /usr/local/lib/singularity /usr/local/libexec/singularity /usr/local/var/singularity I cannot seem to run this any help appreciated. Proud Founder and member of Have a look at my WebCam |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 |
First Linux native App finished: https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=360627 |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 |
Hi Dingo, When you have wish, there is a short info for SL69 in production: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4502 Otherwise take a deeper look at https://atlasathome.cern.ch/boinc_conf |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
Thanks for all the feedback! Since things seem to work ok in general I've increased the number of ATLAS WU coming here. This error "AthMpEvtLoopMgr FATAL makePool failed" usually means there is not enough memory to run the WU. The memory settings are the same as for LHC@Home so I wonder if you have an app_config.xml with old settings for ATLAS? |
Send message Joined: 20 Apr 16 Posts: 180 Credit: 1,355,327 RAC: 0 |
This error "AthMpEvtLoopMgr FATAL makePool failed" usually means there is not enough memory to run the WU. The memory settings are the same as for LHC@Home so I wonder if you have an app_config.xml with old settings for ATLAS? Sorry, this is actually a problem on our side. Some time ago on LHC we had a bug where single core WU actually ran 8 processes inside the VM, which obviously fails because there is not enough memory. This bug had crept back in here but I've fixed it now. |
Send message Joined: 8 Apr 15 Posts: 781 Credit: 12,422,653 RAC: 2,032 |
David did you finally tell the server to send me some of the Atlas tasks at full speed instead of taking 10 hours to d/l these? I just got two of the 2-core tasks with the d/l speed at 22Mbps ......a bit faster than the usual 22Kbps Mad Scientist For Life |
Send message Joined: 22 Apr 16 Posts: 677 Credit: 2,002,766 RAC: 0 |
Sorry, this is actually a problem on our side. Some time ago on LHC we had a bug where single core WU actually ran 8 processes inside the VM, which obviously fails because there is not enough memory. This bug had crept back in here but I've fixed it now. At the moment, it's not possible for me to test Atlas, because need a new PowerSupply. The Computer is down. |
©2024 CERN