Message boards : Theory Application : Docker on Windows
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
boboviz

Send message
Joined: 24 Oct 19
Posts: 208
Credit: 581,115
RAC: 828
Message 8621 - Posted: 26 Mar 2025, 13:41:06 UTC - in response to Message 8620.  

In reply to Crystal Pellet's message of 26 Mar 2025:
I tried a Theory docker task on Windows 10 where I didn't have WSL installed. (so from scratch).
I scraped the information left and right and tried a first Theory task with docker instead of VBox.


Maybe it's a good idea to write a little "guide" (maybe a thread in the forum) to sum up all the info to start from scratch?
ID: 8621 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 519
Credit: 400,710
RAC: 11
Message 8622 - Posted: 26 Mar 2025, 13:48:58 UTC - in response to Message 8620.  

+1

The interesting part of the log is this:
Mounted CVMFS in the container.
job: htmld=/var/www/lighttpd
job: unpack exitcode=0
job: run exitcode=1
job: diskusage=6828
job: logsize=16 k
job: times=
0m0.010s 0m0.000s
0m19.678s 0m7.623s
job: cpuusage=27
===> [runRivet] Wed Mar 26 13:03:36 UTC 2025 [boinc pp z1j 8000 180 - pythia8 8.313 tune-monash13 100000 92]
Job Finished

It shows the log output of the scientific app.
The task was very short because the scientific app failed with "job: run exitcode=1".
ATM there are lots of those around but it is not related to docker.

Next docker version will tail runRivet.log to the BOINC slot (already available on Linux) for easier monitoring.
ID: 8622 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 519
Credit: 400,710
RAC: 11
Message 8623 - Posted: 26 Mar 2025, 13:53:29 UTC - in response to Message 8621.  

In reply to boboviz's message of 26 Mar 2025:
In reply to Crystal Pellet's message of 26 Mar 2025:
I tried a Theory docker task on Windows 10 where I didn't have WSL installed. (so from scratch).
I scraped the information left and right and tried a first Theory task with docker instead of VBox.


Maybe it's a good idea to write a little "guide" (maybe a thread in the forum) to sum up all the info to start from scratch?

You find most of it in a few threads here.
It's currently a moving target, hence makes no sense to write a complete documentation now.
What works on platform A today may not work on platform B, so modifications will have to be tested first.
ID: 8623 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileLaurence CERN
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1129
Credit: 339,230
RAC: 3
Message 8624 - Posted: 26 Mar 2025, 14:23:26 UTC - in response to Message 8620.  

In reply to Crystal Pellet's message of 26 Mar 2025:
I tried a Theory docker task on Windows 10 where I didn't have WSL installed. (so from scratch).

I scraped the information left and right and tried a first Theory task with docker instead of VBox.
Miraculously this very first docker task went right away, but was very short, so I'm not sure if any events were actually processed.

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3391407

A second task has processed 100000 events:

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3391435


This is great! Hopefully most of the setup will be done by the Windows installer for BOINC so we have to wait for the upstream release of the client.
ID: 8624 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
boboviz

Send message
Joined: 24 Oct 19
Posts: 208
Credit: 581,115
RAC: 828
Message 8625 - Posted: 26 Mar 2025, 14:30:49 UTC - in response to Message 8623.  

In reply to computezrmle's message of 26 Mar 2025:
You find most of it in a few threads here.
It's currently a moving target, hence makes no sense to write a complete documentation now.
What works on platform A today may not work on platform B, so modifications will have to be tested first.


Ok. Thank you!
ID: 8625 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 731
Credit: 2,205,280
RAC: 2,384
Message 8626 - Posted: 26 Mar 2025, 17:18:40 UTC
Last modified: 26 Mar 2025, 17:29:42 UTC

ID: 8626 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 731
Credit: 2,205,280
RAC: 2,384
Message 8629 - Posted: 27 Mar 2025, 6:19:20 UTC - in response to Message 8626.  

Where is in Github the link for Windows to install Boinc 8.1.0 using Docker.
atm have 8.0.4
ID: 8629 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1223
Credit: 933,122
RAC: 1,135
Message 8630 - Posted: 27 Mar 2025, 6:28:28 UTC
Last modified: 27 Mar 2025, 6:39:57 UTC

Some remarks to the docker version:

1. Suspend of a task works with or without ''Leave non-GPU tasks in memory while suspended" ticked.
2. After a BOINC restart the task survives, but starts from scratch.
3. High CPU-usage during event processing part of the main process vmmem:
--- On a quad-core with VBox for 1 task 25%, with Docker jumping between 27% and 42%.
ID: 8630 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1223
Credit: 933,122
RAC: 1,135
Message 8631 - Posted: 27 Mar 2025, 6:55:17 UTC

2 error-tasks:
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3391508
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3391569
Reason:
Errors during downloading metadata for repository 'epel'

Status code: 503 for https://mirrors.fedoraproject.org/metalink?repo=epel-9&arch=x86_64&infra=container&content=$contentdir (IP: 18.159.254.57)
ID: 8631 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 519
Credit: 400,710
RAC: 11
Message 8632 - Posted: 27 Mar 2025, 7:46:57 UTC - in response to Message 8631.  

Some remarks to the docker version:

1. Suspend of a task works with or without ''Leave non-GPU tasks in memory while suspended" ticked.
2. After a BOINC restart the task survives, but starts from scratch.
3. High CPU-usage during event processing part of the main process vmmem:
--- On a quad-core with VBox for 1 task 25%, with Docker jumping between 27% and 42%.

Thanks. Good to know.
As for (2.), that's at least not worse than before using native.

As for (3.)
Does it mean you ran a scenario 'A' with 1 vbox task and no docker tasks and later you ran 'B1' with no vbox tasks and n (how many?) docker tasks?
Or did you run vbox beside docker in scenario 'B2'?

If you monitor the docker containers - e.g. running 'docker stats' or 'podman stats' - you may notice that some containers use far more than 100% CPU. This is because each task runs 2 processes, the mc-generator and rivetvm.



Errors during downloading metadata for repository 'epel'
Status code: 503 for https://mirrors.fedoraproject.org/metalink?repo=epel-9&arch=x86_64&infra=container&content=$contentdir (IP: 18.159.254.57)

A temporary glitch affecting the CDN where fedora hosts the mirror list.
As a result the image build can't complete.
This is not under CERN's control.
Should work again after a few minutes when the DNS records time out and the next request gets the IP of a 'good' server.
ID: 8632 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
boboviz

Send message
Joined: 24 Oct 19
Posts: 208
Credit: 581,115
RAC: 828
Message 8633 - Posted: 27 Mar 2025, 8:39:57 UTC - in response to Message 8630.  
Last modified: 27 Mar 2025, 8:40:56 UTC

In reply to Crystal Pellet's message of 27 Mar 2025:
2. After a BOINC restart the task survives, but starts from scratch.

+1
But the wus are not so long, so the checkpoint it's a relative problem

3. High CPU-usage during event processing part of the main process vmmem:
--- On a quad-core with VBox for 1 task 25%, with Docker jumping between 27% and 42%.

Strange. On my pc, running 2 wus uses 20% of cpu...
ID: 8633 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1223
Credit: 933,122
RAC: 1,135
Message 8634 - Posted: 27 Mar 2025, 8:49:40 UTC - in response to Message 8632.  

In reply to computezrmle's message of 27 Mar 2025:
Does it mean you ran a scenario 'A' with 1 vbox task and no docker tasks and later you ran 'B1' with no vbox tasks and n (how many?) docker tasks?
The comparison is with standalone tasks.

- you may notice that some containers use far more than 100% CPU. This is because each task runs 2 processes, the mc-generator and rivetvm.

That explains the higher cpu usage. I have seen the same when using 2 cpu's for a Theory VM.
It depends on the used generator (Pythia, Herwig, Sherpa etc) and how fast the events are processed.
In between rivetvm has to do some processing. In the past we also had the plotter.exe running every now and than.
Example of a 2-core VBox-task: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3389481 : Run time 1 hours 8 min 35 sec <==> CPU time 1 hours 32 min 38 sec

For those users that want a responsive system, they could made use of

<app_version>
<app_name>Theory</app_name>
<plan_class>docker</plan_class>
<avg_ncpus>2</avg_ncpus>
</app_version>


in app_config.xml
ID: 8634 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1223
Credit: 933,122
RAC: 1,135
Message 8635 - Posted: 27 Mar 2025, 8:56:16 UTC - in response to Message 8633.  

In reply to boboviz's message of 27 Mar 2025:
3. High CPU-usage during event processing part of the main process vmmem:
--- On a quad-core with VBox for 1 task 25%, with Docker jumping between 27% and 42%.

Strange. On my pc, running 2 wus uses 20% of cpu...

The "during event processing" part of the sentence is important.
There are a lot of short tasks ending before the event processing starts.
ID: 8635 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 519
Credit: 400,710
RAC: 11
Message 8636 - Posted: 27 Mar 2025, 8:56:55 UTC - in response to Message 8633.  

In reply to boboviz's message of 27 Mar 2025:
In reply to Crystal Pellet's message of 27 Mar 2025:
2. After a BOINC restart the task survives, but starts from scratch.

+1
But the wus are not so long, so the checkpoint it's a relative problem

Depends on mcplots.
ATM the queue sends lots of tasks that fail early.
In the future there will be short tasks as well as tasks running for days, as usual because the scientific payload is more or less the same.

3. High CPU-usage during event processing part of the main process vmmem:
--- On a quad-core with VBox for 1 task 25%, with Docker jumping between 27% and 42%.

Strange. On my pc, running 2 wus uses 20% of cpu...

25% of 4 core => 1 core

20% of 12 core => 2.4 cores => 2 tasks with 120% each
20% of 16 core => 3.2 cores => 2 tasks with 160% each

At least for the 12 core roughly within the normal range.
ID: 8636 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
boboviz

Send message
Joined: 24 Oct 19
Posts: 208
Credit: 581,115
RAC: 828
Message 8637 - Posted: 27 Mar 2025, 12:57:42 UTC - in response to Message 8635.  
Last modified: 27 Mar 2025, 12:57:52 UTC

In reply to Crystal Pellet's message of 27 Mar 2025:
The "during event processing" part of the sentence is important.
There are a lot of short tasks ending before the event processing starts.


Yeaph. I see now some wus very short. But these are, anyway, validated.
ID: 8637 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 519
Credit: 400,710
RAC: 11
Message 8638 - Posted: 27 Mar 2025, 13:41:21 UTC - in response to Message 8637.  

In reply to boboviz's message of 27 Mar 2025:
But these are, anyway, validated.

On dev the #invalids/errors is limited to 32 in a row (IIRC per core per computer).
If a computer exceeds this limit it will not get further work for 24h (or maybe until midnight).

Since those errors are usually not what we test here the tasks report a success back to BOINC.
This may change once the app_version moves to prod (or maybe if people misuse dev as a prod like project).
ID: 8638 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 731
Credit: 2,205,280
RAC: 2,384
Message 8657 - Posted: 30 Mar 2025, 4:11:24 UTC

no new Tasks atm.
ID: 8657 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
boboviz

Send message
Joined: 24 Oct 19
Posts: 208
Credit: 581,115
RAC: 828
Message 8672 - Posted: 2 Apr 2025, 20:14:16 UTC

After a lot of correct wus, now some errors (after 10 minutes of run)

<message>
Funzione non corretta.
(0x1) - exit code 1 (0x1)</message>
<stderr_txt>
docker_wrapper config:
workdir: /boinc_slot_dir
use GPU: no
Web graphics guest port: 80
create args: --log-driver=k8s-file --cap-add=SYS_ADMIN --device /dev/fuse
verbose: 1
Using podman
running docker command: ps --all --filter "name=boinc__lhcathomedev.cern.ch_lhcathome-dev__theory_2848-4566191-333_0"
command output:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
EOM
creating container boinc__lhcathomedev.cern.ch_lhcathome-dev__theory_2848-4566191-333_0
running docker command: images
command output:
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/library/almalinux 9 df3270cc8bc8 3 weeks ago 217 MB
EOM
building image
running docker command: build . -t boinc__lhcathomedev.cern.ch_lhcathome-dev__theory_2848-4566191-333 -f Dockerfile
read_from_pipe() error: timeout
build_image() failed: -182

</stderr_txt>
ID: 8672 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1223
Credit: 933,122
RAC: 1,135
Message 8685 - Posted: 3 Apr 2025, 20:51:36 UTC

A valid task from BOINC's point of view, but run exit code = 1

87800 events processed
87900 events processed
./rungen.sh: line 2669: 2499 Segmentation fault (core dumped) /scratch/pythia8/pythia8.exe /scratch/tmp/tmp.4llDBUdfz2/generator.params /scratch/tmp/tmp.4llDBUdfz2/generator.hepmc
ERROR: failed to run pythia8 8.313
terminate called after throwing an instance of 'HepMC::IO_Exception'
what(): input stream encountered invalid data, stream is now corrupt
[1]- 1883 Exit 1 ( env $origEnv $generatorExecString; exit $? )
[2]+ 1884 Running ( $rivetExecString; exit $? ) & (wd: /scratch/tmp/tmp.4llDBUdfz2)
ERROR: fail to run pythia8 8.313 or Rivet (error exit code)
[/url]
ID: 8685 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 731
Credit: 2,205,280
RAC: 2,384
Message 8686 - Posted: 4 Apr 2025, 4:47:09 UTC
Last modified: 4 Apr 2025, 4:50:09 UTC

Computer 4639 is in MC Production, but 5337 not.
UserId 378.
ID: 8686 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Theory Application : Docker on Windows


©2025 CERN