Message boards : Theory Application : New Native App - Linux Only
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 10 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 319,055
RAC: 49
Message 5798 - Posted: 7 Feb 2019, 12:20:37 UTC
Last modified: 7 Feb 2019, 12:35:31 UTC

Theory has been reconfigured to provide the native application for Linux. To use this you will need an up-to-date Linux distribution with runc and cvmfs installed. CVMFS will need to be configured. In the file /etc/cvmfs/default.local ensure that it contains the following:
CVMFS_REPOSITORIES=cernvm-prod.cern.ch,grid.cern.ch,sft.cern.ch

This has been successfully tested using Ubuntu 18:10.
Note that a new submission method is also being used which may add some instability to the job handling in the BOINC server.

Edit: Note that the results of these jobs are not currently sent back to the mcplots server
ID: 5798 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 319,055
RAC: 49
Message 5800 - Posted: 7 Feb 2019, 12:34:46 UTC - in response to Message 5798.  
Last modified: 7 Feb 2019, 12:34:59 UTC

If you want to follow the job log file.
sudo tail -f /var/lib/boinc-client/slots/0/cernvm/shared/runRivet.log

Replacing the slot number with the relevant value.
ID: 5800 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 467
Credit: 379,047
RAC: 0
Message 5801 - Posted: 7 Feb 2019, 12:45:56 UTC
Last modified: 7 Feb 2019, 12:53:59 UTC

Reactivated a test client that worked with lhc-dev in the past.
Did a project reset.
Got a task, but it failed.

See:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2751935

<edit>
The error may be due to a wrong filename.
My project folder contains a file named "Tensorflow_job_2018_12_12.xml" but no file named "init_data.xml".
</edit>
ID: 5801 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 319,055
RAC: 49
Message 5802 - Posted: 7 Feb 2019, 13:04:02 UTC - in response to Message 5801.  

Reactivated a test client that worked with lhc-dev in the past.
Did a project reset.
Got a task, but it failed.

See:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2751935


The error may be due to a wrong filename.
My project folder contains a file named "Tensorflow_job_2018_12_12.xml" but no file named "init_data.xml".


Thanks, I have fixed this but it is just a cosmetic issue. Have sent you a private email.
ID: 5802 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 720
Credit: 11,489,696
RAC: 796
Message 5803 - Posted: 7 Feb 2019, 21:34:37 UTC

Sure would have been nice if you let the Windows Theory tasks finish and sent back since they had been running for over 16 hours.

Instead I got 2/7/2019 1:03:11 AM | lhcathome-dev | [error] garbage_collect(); still have active task for acked result Theory_3244172_1549350823.091561_0; state 5
Exit status 202 (0x000000CA) EXIT_ABORTED_BY_PROJECT

It has been cold enough where I live to have penguins running around my place but so far in all my years I have never allowed one on my computers

So I guess a rare occasion where I won't have -dev tasks running.
(but I always check the server a couple times a day)
ID: 5803 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 319,055
RAC: 49
Message 5804 - Posted: 8 Feb 2019, 9:17:31 UTC - in response to Message 5803.  

Sure would have been nice if you let the Windows Theory tasks finish and sent back since they had been running for over 16 hours.

Instead I got 2/7/2019 1:03:11 AM | lhcathome-dev | [error] garbage_collect(); still have active task for acked result Theory_3244172_1549350823.091561_0; state 5
Exit status 202 (0x000000CA) EXIT_ABORTED_BY_PROJECT

It has been cold enough where I live to have penguins running around my place but so far in all my years I have never allowed one on my computers

So I guess a rare occasion where I won't have -dev tasks running.
(but I always check the server a couple times a day)


Sorry about that. I tried to only cancel unset tasks but it looks like a few running ones were also cancelled. I will try to get a Windows version running at some point soon after the Linux version is working.
ID: 5804 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 319,055
RAC: 49
Message 5805 - Posted: 8 Feb 2019, 9:19:33 UTC - in response to Message 5798.  
Last modified: 8 Feb 2019, 9:19:49 UTC

Congratulations to captainjack who is the first person to successfully run this native app. Note that they also have Ubuntu 18:10.
ID: 5805 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 651
Credit: 1,678,202
RAC: 7
Message 5806 - Posted: 8 Feb 2019, 10:12:18 UTC
Last modified: 8 Feb 2019, 10:29:39 UTC

This Task get from three User Errors, including Laurence ;-)
<core_client_version>7.5.1</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)
</message>
<stderr_txt>
11:07:54 (10952): wrapper (7.7.26015): starting
11:07:54 (10952): wrapper: running ../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.12 ()
/usr/bin/env: python3: Datei oder Verzeichnis nicht gefunden
11:07:55 (10952): cranky exited; CPU time 0.000999
11:07:55 (10952): app exit status: 0x7f
11:07:55 (10952): called boinc_finish(195)

</stderr_txt>
]]>
https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=1876564

For SL69 Python3 is not avalaible.
ID: 5806 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 720
Credit: 11,489,696
RAC: 796
Message 5807 - Posted: 8 Feb 2019, 11:05:40 UTC - in response to Message 5804.  



Sorry about that. I tried to only cancel unset tasks but it looks like a few running ones were also cancelled. I will try to get a Windows version running at some point soon after the Linux version is working.


Thanks Laurence, and you know I will be ready when you are.
ID: 5807 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 867,936
RAC: 27
Message 5808 - Posted: 8 Feb 2019, 15:05:21 UTC
Last modified: 8 Feb 2019, 15:46:50 UTC

I've no idea what happened here.
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752133
This is Centos7 which (apparently) must have Python 2.7. I've installed Python 3.6 (as well) which fixed this:-
14:34:52 (4883): wrapper: running ../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.13 ()
/usr/bin/env: python3: No such file or directory

but it still doesn't work...
ID: 5808 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 867,936
RAC: 27
Message 5809 - Posted: 8 Feb 2019, 19:06:20 UTC
Last modified: 8 Feb 2019, 19:46:09 UTC

From this:-
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752133
came this:-
cranky-0.0.13 INFO: Running Container 'runc'.
nsenter: failed to unshare user namespace: Invalid argument
container_linux.go:336: starting container process caused "process_linux.go:279: running exec setns process for init caused \"exit status 39\""
cranky-0.0.13 ERROR: Container 'runc' failed.

For those more knowledgeable than I, there may be some ideas here:-
https://coderwall.com/p/s_ydlq/using-user-namespaces-on-docker
I've followed the instructions to enable namespaces on the kernel, but now there's no work to try it out... and it's Friday.
ID: 5809 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 651
Credit: 1,678,202
RAC: 7
Message 5814 - Posted: 9 Feb 2019, 7:58:19 UTC - in response to Message 5806.  
Last modified: 9 Feb 2019, 8:02:05 UTC

/usr/bin/env: python3: Datei oder Verzeichnis nicht gefunden
11:07:55 (10952): cranky exited; CPU time 0.000999
11:07:55 (10952): app exit status: 0x7f
11:07:55 (10952): called boinc_finish(195)
</stderr_txt>
https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=1876564

For SL69 Python3 is not avalaible.


Leap 15.0 had yesterday a update from Suse for Python3.
ID: 5814 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gyllic

Send message
Joined: 10 Mar 17
Posts: 40
Credit: 108,345
RAC: 0
Message 5815 - Posted: 9 Feb 2019, 12:58:57 UTC - in response to Message 5809.  

From this:-
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752133
came this:-
cranky-0.0.13 INFO: Running Container 'runc'.
nsenter: failed to unshare user namespace: Invalid argument
container_linux.go:336: starting container process caused "process_linux.go:279: running exec setns process for init caused \"exit status 39\""
cranky-0.0.13 ERROR: Container 'runc' failed.

For those more knowledgeable than I, there may be some ideas here:-
https://coderwall.com/p/s_ydlq/using-user-namespaces-on-docker
I've followed the instructions to enable namespaces on the kernel, but now there's no work to try it out... and it's Friday.
No clue if this will help fixing your problem (if it still persists), but here are two links:
https://github.com/moby/moby/issues/34011
https://github.com/opencontainers/runc/issues/1343

Yeah, unfortunately no testing is possible since no work is available.
ID: 5815 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 651
Credit: 1,678,202
RAC: 7
Message 5819 - Posted: 10 Feb 2019, 23:34:36 UTC
Last modified: 10 Feb 2019, 23:36:52 UTC

This is a new Task with Vers.4.04, and Python3 Error:
<core_client_version>7.5.1</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)
</message>
<stderr_txt>
00:31:39 (23696): wrapper (7.7.26015): starting
00:31:39 (23696): wrapper: running ../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.13 ()
/usr/bin/env: python3: Datei oder Verzeichnis nicht gefunden
00:31:40 (23696): cranky exited; CPU time 0.000000
00:31:40 (23696): app exit status: 0x7f
00:31:40 (23696): called boinc_finish(195)
</stderr_txt>
]]>
https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=1876593
ID: 5819 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
m
Volunteer tester

Send message
Joined: 20 Mar 15
Posts: 243
Credit: 867,936
RAC: 27
Message 5820 - Posted: 11 Feb 2019, 4:44:15 UTC
Last modified: 11 Feb 2019, 5:03:30 UTC

The python3 error is back.

 /usr/bin/env: python3: No such file or directory 

Python3.6 is in ./usr/bin/python3.6 and the above command works if python3 is changed to python3.6. so must be missing from an environment or path somewhere. I haven't been able to work out how (or where) to properly fix this (or to add an alias.?) The existing python (which works) should provide a clue - but I can't find that either... so stuck for now
ID: 5820 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 651
Credit: 1,678,202
RAC: 7
Message 5821 - Posted: 11 Feb 2019, 7:44:43 UTC - in response to Message 5820.  
Last modified: 11 Feb 2019, 7:45:03 UTC

Laurence had a successful Task with CentOS7 from Friday.
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752161
We have to wait for his answer.
ID: 5821 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gyllic

Send message
Joined: 10 Mar 17
Posts: 40
Credit: 108,345
RAC: 0
Message 5822 - Posted: 11 Feb 2019, 8:25:28 UTC
Last modified: 11 Feb 2019, 8:29:53 UTC

I have cloned and built the lastest runc code (from https://github.com/opencontainers/runc) on Debian Stretch, and the task produces this error message:

<core_client_version>7.6.33</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)
</message>
<stderr_txt>
09:10:56 (2772): wrapper (7.7.26015): starting
09:10:56 (2772): wrapper: running ../../projects/lhcathomedev.cern.ch_lhcathome-dev/cranky-0.0.13 ()
cranky-0.0.13 INFO: Starting
cranky-0.0.13 INFO: Detected Theory App
cranky-0.0.13 INFO: Checking CVMFS.
cranky-0.0.13 INFO: Checking runc.
cranky-0.0.13 ERROR: 'runc spec version < 1.1
09:11:02 (2772): cranky exited; CPU time 0.188000
09:11:02 (2772): app exit status: 0x1
09:11:02 (2772): called boinc_finish(195)

</stderr_txt>
]]>

This is the corresponding task:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2752234

The command "runc --version" gives the output: "runc version spec: 1.0.1-dev"

Am I doing something wrong or is the mistake on the application side?
ID: 5822 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 467
Credit: 379,047
RAC: 0
Message 5823 - Posted: 11 Feb 2019, 8:32:39 UTC - in response to Message 5822.  

cranky-0.0.13 ERROR: 'runc spec version < 1.1

The app requires runc to be at least version 1.1.

Have the same situation with opensuse where the standard package is <1.1.
ID: 5823 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gyllic

Send message
Joined: 10 Mar 17
Posts: 40
Credit: 108,345
RAC: 0
Message 5824 - Posted: 11 Feb 2019, 8:36:19 UTC - in response to Message 5823.  
Last modified: 11 Feb 2019, 9:12:39 UTC

cranky-0.0.13 ERROR: 'runc spec version < 1.1

The app requires runc to be at least version 1.1.

Have the same situation with opensuse where the standard package is <1.1.
Yes, but i have used the latest runc source code from their github master branch for building it. Is there any other repositoriy where the active development is done or where is the newest code, since the latest code on github delivers version 1.0.1. The standard runc version in debian stretch is 0.1.1.
ID: 5824 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 319,055
RAC: 49
Message 5825 - Posted: 11 Feb 2019, 9:15:40 UTC - in response to Message 5806.  



For SL69 Python3 is not avalaible.


It will not work on SL6 as the kernel does not support user namespaces.
ID: 5825 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 10 · Next

Message boards : Theory Application : New Native App - Linux Only


©2024 CERN