Message boards : ATLAS Application : ATLAS v0.50 and 0.51
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 5289 - Posted: 21 Dec 2017, 15:42:27 UTC

We are testing out ATLAS tasks using the new backend filesystem on LHC-dev. These are exactly the same WU as on LHC so let us know if you see any problems. The new app versions are available for Windows and Linux native.
ID: 5289 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 2
Message 5290 - Posted: 21 Dec 2017, 19:46:02 UTC
Last modified: 21 Dec 2017, 19:52:12 UTC

Get no work for windows: NUMBER OF TASKS: 1 NUMBER OF CPU:1
https://lhcathomedev.cern.ch/lhcathome-dev/show_host_detail.php?hostid=1165
21.12.2017 20:23:31 | lhcathome-dev | Sending scheduler request: To fetch work.
21.12.2017 20:23:31 | lhcathome-dev | Requesting new tasks for CPU
21.12.2017 20:23:32 | lhcathome-dev | Scheduler request completed: got 0 new tasks
21.12.2017 20:23:32 | lhcathome-dev | No tasks sent
21.12.2017 20:23:32 | lhcathome-dev | No tasks are available for ATLAS Simulation
21.12.2017 20:23:32 | lhcathome-dev | Tasks for AMD/ATI GPU are available, but your preferences are set to not accept them

Will testing native App also. This evening or tomorrow morning.
ID: 5290 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Dingo
Avatar

Send message
Joined: 10 May 15
Posts: 4
Credit: 39,333
RAC: 0
Message 5292 - Posted: 22 Dec 2017, 3:56:27 UTC

All my Atlas work failed on this machine. https://lhcathomedev.cern.ch/lhcathome-dev/show_host_detail.php?hostid=2290

This is an example:

Name DyAMDm4oKmrnOmbckoZBDNAoABFKDmABFKDm4NGKDmABFKDmPLzY5n_0
Workunit 360631
Created 21 Dec 2017, 18:32:56 UTC
Sent 22 Dec 2017, 0:49:58 UTC
Report deadline 29 Dec 2017, 0:49:58 UTC
Received 22 Dec 2017, 3:50:17 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 195 (0x000000C3) EXIT_CHILD_FAILED
Computer ID 2290
Run time 10 min 4 sec
CPU time
Validate state Invalid
Credit 0.00
Device peak FLOPS 39.97 GFLOPS
Application version ATLAS Simulation v0.50 (native_mt)
x86_64-pc-linux-gnu
Peak working set size 6.91 MB
Peak swap size 170.48 MB
Peak disk usage 253.28 MB
Stderr output
<core_client_version>7.7.0</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
20:11:16 (22315): wrapper (7.7.26015): starting
20:11:16 (22315): wrapper: running run_atlas (--nthreads 12)
sys.argv = ['run_atlas', '--nthreads', '12']
THREADS=12
Checking for CVMFS
sh: cvmfs_config: command not found
sh: cvmfs_config: command not found
sh: cvmfs_config: command not found
check cvmfs return values are 32512, 32512
CVMFS not found, aborting

running start_atlas return value is 1
tar czvf shared/result.tar.gz
tar: Cowardly refusing to create an empty archive
Try `tar --help' or `tar --usage' for more information.

*****************The last 100 lines of the pilot log******************
tail: cannot open &#226;&#128;&#152;pilotlog.txt&#226;&#128;&#153; for reading: No such file or directory
***************diag file************
cat: *.diag: No such file or directory
******************************WorkDir***********************
total 36
drwxrwx--x 3 root root 177 Dec 21 20:11 .
drwxrwx--x 31 root root 286 Dec 14 19:37 ..
-rw-r--r-- 1 root root 0 Dec 21 20:11 boinc_lockfile
-rw-r--r-- 1 root root 8192 Dec 21 20:11 boinc_mmap_file
-rw-r--r-- 1 root root 5920 Dec 21 20:11 init_data.xml
-rw-r--r-- 1 root root 112 Dec 21 20:11 job.xml
-rwxr-xr-x 1 root root 7456 Dec 21 20:11 run_atlas
drwxrwx--x 2 root root 86 Dec 21 20:11 shared
-rw-r--r-- 1 root root 760 Dec 21 20:11 stderr.txt
-rw-r--r-- 1 root root 107 Dec 21 20:11 wrapper_26015_x86_64-pc-linux-gnuparent process exit 1
child process exit 1
20:21:19 (22315): run_atlas exited; CPU time 0.034227
20:21:19 (22315): app exit status: 0x1
20:21:19 (22315): called boinc_finish(195)

</stderr_txt>
]]>[/url]

Proud Founder and member of



Have a look at my WebCam
ID: 5292 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 2
Message 5293 - Posted: 22 Dec 2017, 7:46:11 UTC - in response to Message 5290.  

Will testing native App also. This evening or tomorrow morning.

Native App get also no new tasks.
ID: 5293 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 5294 - Posted: 22 Dec 2017, 8:51:20 UTC

I can't test Windows multi-core version 0.51 without getting a task:

lhcathome-dev 22-12-2017 7:15 No tasks are available for ATLAS Simulation
lhcathome-dev 22-12-2017 7:27 No tasks are available for ATLAS Simulation
lhcathome-dev 22-12-2017 7:28 No tasks are available for ATLAS Simulation
lhcathome-dev 22-12-2017 7:41 No tasks are available for ATLAS Simulation
lhcathome-dev 22-12-2017 7:42 No tasks are available for ATLAS Simulation
lhcathome-dev 22-12-2017 7:48 No tasks are available for ATLAS Simulation
lhcathome-dev 22-12-2017 7:49 No tasks are available for ATLAS Simulation
lhcathome-dev 22-12-2017 8:00 No tasks are available for ATLAS Simulation
lhcathome-dev 22-12-2017 8:03 No tasks are available for ATLAS Simulation
lhcathome-dev 22-12-2017 8:11 No tasks are available for ATLAS Simulation
lhcathome-dev 22-12-2017 8:18 No tasks are available for ATLAS Simulation
lhcathome-dev 22-12-2017 8:42 No tasks are available for ATLAS Simulation
lhcathome-dev 22-12-2017 9:04 No tasks are available for ATLAS Simulation
lhcathome-dev 22-12-2017 9:05 No tasks are available for ATLAS Simulation
lhcathome-dev 22-12-2017 9:20 No tasks are available for ATLAS Simulation
lhcathome-dev 22-12-2017 9:35 No tasks are available for ATLAS Simulation
lhcathome-dev 22-12-2017 9:45 No tasks are available for ATLAS Simulation
lhcathome-dev 22-12-2017 9:47 No tasks are available for ATLAS Simulation
ID: 5294 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 5295 - Posted: 22 Dec 2017, 8:57:44 UTC - in response to Message 5292.  


Checking for CVMFS
sh: cvmfs_config: command not found
sh: cvmfs_config: command not found
sh: cvmfs_config: command not found
check cvmfs return values are 32512, 32512
CVMFS not found, aborting


You have to install CVMFS and Singularity to run the native app - please see this thread for more details: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4395
ID: 5295 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 5296 - Posted: 22 Dec 2017, 9:51:45 UTC - in response to Message 5294.  

I can't test Windows multi-core version 0.51 without getting a task:


There was a syntax error in the configuration of the windows tasks, I've fixed it and see that some of you have WU now.
ID: 5296 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 2
Message 5297 - Posted: 22 Dec 2017, 11:07:01 UTC

running now both native App and Windows.
Thank you David.
ID: 5297 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 5298 - Posted: 22 Dec 2017, 11:35:16 UTC - in response to Message 5296.  

I can't test Windows multi-core version 0.51 without getting a task:
There was a syntax error in the configuration of the windows tasks, I've fixed it and see that some of you have WU now.
I got an ATLAS task now, however the task was postponed for 86400 seconds after 10 minutes run time and the VM was powered off.

22-Dec-2017 10:26:37 [lhcathome-dev] Sending scheduler request: To fetch work.
22-Dec-2017 10:26:37 [lhcathome-dev] Requesting new tasks for CPU and AMD/ATI GPU
22-Dec-2017 10:26:39 [lhcathome-dev] Scheduler request completed: got 1 new tasks
22-Dec-2017 10:26:41 [lhcathome-dev] Started download of 9ZNNDmpGRmrnOmbckoZBDNAoABFKDmABFKDmffNKDmABFKDm1os8em_EVNT.12575767._000064.pool.root.1 --> 263.546.489 bytes
22-Dec-2017 10:26:41 [lhcathome-dev] Started download of 9ZNNDmpGRmrnOmbckoZBDNAoABFKDmABFKDmffNKDmABFKDm1os8em_input.tar.gz
22-Dec-2017 10:26:43 [lhcathome-dev] Finished download of 9ZNNDmpGRmrnOmbckoZBDNAoABFKDmABFKDmffNKDmABFKDm1os8em_input.tar.gz
22-Dec-2017 10:26:43 [lhcathome-dev] Started download of rte_9ZNNDmpGRmrnOmbckoZBDNAoABFKDmABFKDmffNKDmABFKDm1os8em.tar.gz
22-Dec-2017 10:26:44 [lhcathome-dev] Finished download of rte_9ZNNDmpGRmrnOmbckoZBDNAoABFKDmABFKDmffNKDmABFKDm1os8em.tar.gz
22-Dec-2017 10:26:44 [lhcathome-dev] Started download of boinc_job_script.hsr0s1
22-Dec-2017 10:26:45 [lhcathome-dev] Finished download of boinc_job_script.hsr0s1
22-Dec-2017 10:26:46 [World Community Grid] update requested by user
22-Dec-2017 10:26:50 [World Community Grid] Sending scheduler request: Requested by user.
22-Dec-2017 10:26:50 [World Community Grid] Requesting new tasks for CPU
22-Dec-2017 10:26:51 [World Community Grid] Scheduler request completed: got 0 new tasks
22-Dec-2017 10:26:51 [World Community Grid] No tasks sent
22-Dec-2017 10:26:51 [World Community Grid] No tasks are available for FightAIDS@Home - Vina
22-Dec-2017 10:26:51 [World Community Grid] No tasks are available for the applications you have selected.
22-Dec-2017 10:27:09 [lhcathome-dev] Finished download of 9ZNNDmpGRmrnOmbckoZBDNAoABFKDmABFKDmffNKDmABFKDm1os8em_EVNT.12575767._000064.pool.root.1
22-Dec-2017 10:27:12 [lhcathome-dev] Starting task 9ZNNDmpGRmrnOmbckoZBDNAoABFKDmABFKDmffNKDmABFKDm1os8em_1
.
.
22-Dec-2017 11:37:08 [lhcathome-dev] task postponed 86400.000000 sec: VM job unmanageable, restarting later.


After restarting BOINC the task restarted OK and is running now.
ID: 5298 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 2
Message 5299 - Posted: 22 Dec 2017, 14:03:25 UTC

ID: 5299 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 5300 - Posted: 22 Dec 2017, 16:56:03 UTC - in response to Message 5299.  

Windows task finished after 20 min:
https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=360714

Not good, it should have crunched 200 events. BOINC validated OK, because it's not your machine's fault, but the ATLAS-job errored with:

Error information from the Sim_tf transformation, report version 2.0.7
Error TRF_EXEC_FAIL, exit code 65: Non-zero return code from EVNTtoHITS (65); Logfile error in log.EVNTtoHITS: "AthMpEvtLoopMgr FATAL makePool failed for AthMpEvtLoopMgr.SharedEvtQueueProvider"

Executor error: Non-zero return code from EVNTtoHITS (65)
Error details:
FATAL: line 8486:
AthMpEvtLoopMgr FATAL makePool failed for AthMpEvtLoopMgr.SharedEvtQueueProvider
ERROR: line 8485:
AthMpEvtLoopMgr... ERROR Unable to map the flag on all subprocesses in the group
ID: 5300 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 2
Message 5301 - Posted: 22 Dec 2017, 18:04:58 UTC

Thanks Crystal,
was surprised about the quick finish. Will stop windows tasks in -dev for the next two weeks (also holiday like.... ;-).
Native Linux App is running since more 6 hours and seem to work well.
ID: 5301 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 5302 - Posted: 22 Dec 2017, 21:01:41 UTC

The Windows task postponed again:

22-Dec-2017 20:33:19 [lhcathome-dev] task postponed 86400.000000 sec: VM job unmanageable, restarting later.

after > 9 hours running and > 36 hours CPU-time.

Problem is that the VM is not saved, but powered off, so after BOINC's restart the job has to start from scratch.

Not happy crunching :-(
ID: 5302 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Dingo
Avatar

Send message
Joined: 10 May 15
Posts: 4
Credit: 39,333
RAC: 0
Message 5303 - Posted: 23 Dec 2017, 3:51:54 UTC - in response to Message 5295.  


Checking for CVMFS
sh: cvmfs_config: command not found
sh: cvmfs_config: command not found
sh: cvmfs_config: command not found
check cvmfs return values are 32512, 32512
CVMFS not found, aborting


You have to install CVMFS and Singularity to run the native app - please see this thread for more details: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4395



I installed the two applications but still get the same error. I stopped and started boinc after the install. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=373956

The system is Linux CentOS Linux 7 (Core) [3.10.0-693.11.1.el7.x86_64] and this shows that the software is loaded:

[root@dingo4 ~]# yum info cvmfs
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: centos.gbeservers.com
* extras: mirror.steadfast.net
* updates: mirror.dal10.us.leaseweb.net
Available Packages
Name : cvmfs
Arch : x86_64
Version : 2.4.4
Release : 1.el7.centos
Size : 8.8 M
Repo : cernvm/7/x86_64
Summary : CernVM File System
License : BSD
Description : HTTP File System for Distributing Software to CernVM.
: See http://cernvm.cern.ch
: Copyright (c) CERN

----------------------------------------------------------------

yum could not find singularity as it wasn't installed with yum. But it is on the system:

[root@dingo4 ~]# find / -name singularity
/root/singularity
/root/singularity/bin/singularity
/root/singularity/etc/bash_completion.d/singularity
/usr/local/bin/singularity
/usr/local/etc/bash_completion.d/singularity
/usr/local/etc/singularity
/usr/local/include/singularity
/usr/local/lib/singularity
/usr/local/libexec/singularity
/usr/local/var/singularity


I cannot seem to run this any help appreciated.

Proud Founder and member of



Have a look at my WebCam
ID: 5303 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 2
Message 5306 - Posted: 23 Dec 2017, 8:35:03 UTC

ID: 5306 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 2
Message 5307 - Posted: 23 Dec 2017, 8:47:09 UTC - in response to Message 5303.  
Last modified: 23 Dec 2017, 8:47:36 UTC

Hi Dingo,

When you have wish, there is a short info for SL69 in production:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4502
Otherwise take a deeper look at https://atlasathome.cern.ch/boinc_conf
ID: 5307 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 5323 - Posted: 8 Jan 2018, 14:44:32 UTC

Thanks for all the feedback! Since things seem to work ok in general I've increased the number of ATLAS WU coming here.

This error "AthMpEvtLoopMgr FATAL makePool failed" usually means there is not enough memory to run the WU. The memory settings are the same as for LHC@Home so I wonder if you have an app_config.xml with old settings for ATLAS?
ID: 5323 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 5324 - Posted: 8 Jan 2018, 15:51:37 UTC - in response to Message 5323.  

This error "AthMpEvtLoopMgr FATAL makePool failed" usually means there is not enough memory to run the WU. The memory settings are the same as for LHC@Home so I wonder if you have an app_config.xml with old settings for ATLAS?


Sorry, this is actually a problem on our side. Some time ago on LHC we had a bug where single core WU actually ran 8 processes inside the VM, which obviously fails because there is not enough memory. This bug had crept back in here but I've fixed it now.
ID: 5324 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 781
Credit: 12,324,905
RAC: 1,506
Message 5325 - Posted: 8 Jan 2018, 21:44:21 UTC

David did you finally tell the server to send me some of the Atlas tasks at full speed instead of taking 10 hours to d/l these?

I just got two of the 2-core tasks with the d/l speed at 22Mbps ......a bit faster than the usual 22Kbps
Mad Scientist For Life
ID: 5325 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 2
Message 5326 - Posted: 9 Jan 2018, 10:31:53 UTC - in response to Message 5324.  

Sorry, this is actually a problem on our side. Some time ago on LHC we had a bug where single core WU actually ran 8 processes inside the VM, which obviously fails because there is not enough memory. This bug had crept back in here but I've fixed it now.

At the moment, it's not possible for me to test Atlas, because need a new PowerSupply. The Computer is down.
ID: 5326 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : ATLAS Application : ATLAS v0.50 and 0.51


©2024 CERN