Message boards : Theory Application : New version 5.00
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 6668 - Posted: 23 Sep 2019, 9:11:30 UTC

I am working on a new VM version that works similarly to the native version. It may take a few iterations until it work.
ID: 6668 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 20 Jun 17
Posts: 25
Credit: 5,472,506
RAC: 2,958
Message 6669 - Posted: 23 Sep 2019, 10:20:24 UTC
Last modified: 23 Sep 2019, 10:21:40 UTC

Wow 24.75GB of memory usage on 32t system. Necessary?

I have some v5.00 and v5.01 tasks. Are the 5.00 ones still needed?
ID: 6669 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 6670 - Posted: 23 Sep 2019, 13:34:08 UTC - in response to Message 6669.  

With v5.02 the jobs starts to run.
ID: 6670 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 6671 - Posted: 23 Sep 2019, 13:40:51 UTC - in response to Message 6670.  

With v5.02 the jobs starts to run.

But it dies due to missing heartbeat. New version on it's way.
ID: 6671 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 6672 - Posted: 23 Sep 2019, 14:12:14 UTC - in response to Message 6671.  

With v5.02 the jobs starts to run.

But it dies due to missing heartbeat. New version on it's way.


v5.03 goes further.
ID: 6672 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 6673 - Posted: 23 Sep 2019, 15:20:00 UTC - in response to Message 6672.  

I don't understand. So far you was testing here Linux Native and Windows BOINC VM app.
Suddenly you are testing here Linux VBox, is that right?
ID: 6673 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 20 Jun 17
Posts: 25
Credit: 5,472,506
RAC: 2,958
Message 6674 - Posted: 23 Sep 2019, 15:25:55 UTC - in response to Message 6670.  

With v5.02 the jobs starts to run.


Wait, so v5.00 and v5.01 the tasks do nothing? It looks like none of mine have returned yet.
ID: 6674 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 6675 - Posted: 23 Sep 2019, 15:27:20 UTC - in response to Message 6673.  

I don't understand. So far you was testing here Linux Native and Windows BOINC VM app.
Suddenly you are testing here Linux VBox, is that right?


That is correct. My first task returned ok. I will push out the Windows and Mac versions now.
ID: 6675 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 6676 - Posted: 23 Sep 2019, 16:30:23 UTC - in response to Message 6675.  
Last modified: 23 Sep 2019, 16:54:20 UTC

My first Windows task is running for 12 minutes without doing a job.
During boot I saw this:

On my host the shared folder is there with init_data.xml in it.
Now my VM is this showing without using CPU:

ALT-F2 is showing the 'top' output. In all other vbox application F3 is used and F2 is used for Job output. You switched both.
Stopped the task gracefully with shutdown in shared folder.
Result: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2825186
ID: 6676 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,969,210
RAC: 0
Message 6677 - Posted: 23 Sep 2019, 18:49:03 UTC
Last modified: 23 Sep 2019, 19:00:42 UTC

Single-core Base Memory has gone up from 730MB to 1500MB which means that where my 2-core Linux host with 4GB of RAM used to be able to run 2 x Theory (VBox or Native), it can now only run 1, with another "waiting for memory". 1 just started on a Windows host: 2-core (accidentally, I usually prefer singles) memory 2250MB

From the BoincVM thread "it should be possible to manage the BOINC client in the Guest via a Web browser."
"Show Graphics" button gets to an Apache landing page. Is this where that control will be?

Same output as CP.
Comparing the stderr of Laurence's successful task: mine gets to the line corresponding to
2019-09-23 16:00:54 (11016): Guest Log: 00:00:00.008616 main 5.2.6 r120293 started. Verbose level = 0
but no further.
The next line is the shutdown so I'll leave it for now and see what happens in an hour or so. These might just be "Blanks", not intended to do any actual work as presumably there would be details of a Job between those 2 lines.
ID: 6677 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 6678 - Posted: 23 Sep 2019, 20:55:03 UTC

I somehow managed on a not reproducible manner to run a valid task on my Windows host.

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2825190

I'm missing job info in your and my result.
ID: 6678 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 6679 - Posted: 24 Sep 2019, 7:14:43 UTC
Last modified: 24 Sep 2019, 7:38:38 UTC

This time I didn't see ERROR init_data.xml is mising, but WARNING and ERROR further in the boot process:


Edit: Next task is running a Herwig++, without any intervention of mine.
That's a resend from a user who aborted most of the tasks, returned 6 valids and 2 in progress on a MAC-machine
ID: 6679 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 6680 - Posted: 24 Sep 2019, 12:02:45 UTC

Not all Windows tasks are ending into an error. After my Herwig++ finished successful, a new job (pythia8) started well out of the box:
ID: 6680 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1069
Credit: 334,882
RAC: 0
Message 6681 - Posted: 24 Sep 2019, 13:41:34 UTC - in response to Message 6680.  

I think there is a race condition where the job starts before the shared directory in mounted. It will need a new vm but I probabably can't do that today.
ID: 6681 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 6682 - Posted: 24 Sep 2019, 14:36:47 UTC - in response to Message 6681.  
Last modified: 24 Sep 2019, 14:56:16 UTC

I think there is a race condition where the job starts before the shared directory in mounted. It will need a new vm but I probabably can't do that today.
OK, I'll stop testing then after the tasks return running now.
I pushed the brake and started 4 tasks one by by one on an else idle system. All 4 are running a job + the one I already had running.
ID: 6682 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 6683 - Posted: 24 Sep 2019, 17:42:18 UTC - in response to Message 6681.  

It will need a new vm but I probabably can't do that today.

When you create a new image, could you also have a look to the memory requirements for Theory vbox-tasks.
We always (at least for a longer period) had the requirement 630MB + (100MB * ncores).
Now it is suddenly 750MB + (750MB * ncores), so 1500MB for a single core Theory-VM.

This was (unneeded IMO) changed 1 or 2 months ago at the production too, causing within BOINC too much memory reservation and tasks waiting for memory that in principle is available.
ID: 6683 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 6684 - Posted: 24 Sep 2019, 20:18:02 UTC
Last modified: 24 Sep 2019, 20:34:19 UTC

I have in one of the VM's a job running with this description:
ee zhad 91.2 - - sherpa 2.2.5 default 2000
This is a known long runner or even never ending job.
I see that the job will be killed after 18 hours elapsed time and will not get credits in contrast to what's happening with the current Theory VBox production tasks when gracefully stopped.
ID: 6684 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 781
Credit: 12,330,025
RAC: 1,722
Message 6685 - Posted: 25 Sep 2019, 8:38:03 UTC - in response to Message 6680.  

Not all Windows tasks are ending into an error. After my Herwig++ finished successful, a new job (pythia8) started well out of the box:



I just ran 10 or those and were Valid with that same
ID: 6685 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 6686 - Posted: 25 Sep 2019, 9:04:50 UTC - in response to Message 6684.  

I see that the job will be killed after 18 hours elapsed time and will not get credits in contrast to what's happening with the current Theory VBox production tasks when gracefully stopped.

Not the before mentioned sherpa, but a pythia8 needing more than 18 hours elapsed time.
boinc pp jets 7000 100 - pythia8 8.235 cr1 100000 123
The machine did his job properly, task was killed gracefully, but no granted credits cause result file to upload was missing.
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2825825
ID: 6686 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 861,475
RAC: 2
Message 6687 - Posted: 25 Sep 2019, 11:42:12 UTC - in response to Message 6684.  

ee zhad 91.2 - - sherpa 2.2.5 default 2000
I see that the job will be killed after 18 hours elapsed time and will not get credits in contrast to what's happening with the current Theory VBox production tasks when gracefully stopped.
The sherpa ended into the error condition running too long: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2825826
ID: 6687 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Theory Application : New version 5.00


©2024 CERN