Message boards : ATLAS Application : ATLAS native 0.99
Message board moderation

To post messages, you must log in.

AuthorMessage
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 6926 - Posted: 7 Jan 2020, 13:59:13 UTC

To fix this problem I've released 0.99 and submitted a bunch of test tasks.

Thanks for testing and reporting problems, if 0.99 looks ok then hopefully I can release it into production this week.
ID: 6926 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 3
Message 6931 - Posted: 8 Jan 2020, 8:02:49 UTC

I tested several of the 0.99 tasks. Seems OK to me, but better look yourself to the results: https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=3717&offset=0&show_names=0&state=0&appid=5

The part where there was obviously a tar-problem looks like this now:
di  7 jan 2020 21:49:54 CET: CVMFS is ok
di  7 jan 2020 21:49:54 CET: System is not Red Hat/CentOS 7, singularity is required
di  7 jan 2020 21:49:54 CET: Using singularity image /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img
di  7 jan 2020 21:49:54 CET: Checking for singularity binary...
di  7 jan 2020 21:49:54 CET: Singularity is not installed, using version from CVMFS
di  7 jan 2020 21:49:54 CET: Checking singularity works with /cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img hostname
di  7 jan 2020 21:50:34 CET: INFO:  Convert SIF file to sandbox... LinAH125 INFO:  Cleaning up image...
di  7 jan 2020 21:50:34 CET: Singularity works
di  7 jan 2020 21:50:34 CET: Set ATHENA_PROC_NUMBER=4
di  7 jan 2020 21:50:34 CET: Starting ATLAS job with PandaID=4002876565
di  7 jan 2020 21:50:34 CET: Running command: /cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity exec --pwd /var/lib/boinc-client/slots/0 -B /cvmfs,/var /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img sh start_atlas.sh
ID: 6931 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 3
Message 6932 - Posted: 8 Jan 2020, 11:57:11 UTC

This task was running on HP i7 without docker.
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2856616
Mi 8. Jan 12:51:06 CET 2020: *** Error codes and diagnostics ***
Mi 8. Jan 12:51:06 CET 2020: "exeErrorCode": 65,
Mi 8. Jan 12:51:06 CET 2020: "exeErrorDiag": "EVNTtoHITS got a SIGBUS signal (exit code 135)",
Mi 8. Jan 12:51:06 CET 2020: "pilotErrorCode": 0,
Mi 8. Jan 12:51:06 CET 2020: "pilotErrorDiag": "",

Don't know what is needed.
Atlas-native on Production is running well:
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10618519
ID: 6932 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 3
Message 6933 - Posted: 8 Jan 2020, 17:42:42 UTC - in response to Message 6932.  
Last modified: 8 Jan 2020, 18:11:04 UTC

HP i7 Centos7-VM now rebooted after last reboot from 1.12.19(!).
Had upgraded to 15 GByte RAM in use. Task is now running 0.99 without Docker.
https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=1969419
After writing this lines. Task is crashed.....
Mi 8. Jan 18:45:45 CET 2020: "exeErrorDiag": "EVNTtoHITS got a SIGBUS signal (exit code 135)",

Edit: Get tomorrow 32 GByte more RAM for Ryzen 2700.
Will than testing CentOS7-VM WITH Docker again.

Edit2: Is a other Version of Python needed?
ID: 6933 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 6954 - Posted: 20 Jan 2020, 12:44:56 UTC

This version has now been released on the production server. Once it looks ok I will update the docker configuration to pull WU from there instead of the LHC-dev server.
ID: 6954 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 20 Apr 16
Posts: 180
Credit: 1,355,327
RAC: 0
Message 6955 - Posted: 20 Jan 2020, 15:56:49 UTC - in response to Message 6954.  

Well, that didn't go too well.. lots of the SIGBUS errors like maeax reported. Sorry for not looking more closely at these problems here.

I will submit some more test WU to try and get to the bottom of the problem.
ID: 6955 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 3
Message 6956 - Posted: 20 Jan 2020, 17:34:14 UTC - in response to Message 6955.  

no risc, no fun ;-)
ID: 6956 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : ATLAS Application : ATLAS native 0.99


©2024 CERN