Message boards : Theory Application : The Theory Application
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,969,210
RAC: 14
Message 2687 - Posted: 12 Apr 2016, 14:56:05 UTC
Last modified: 12 Apr 2016, 14:59:17 UTC

Tasks on both machines got past the previous halt error and after c.10mins each has started processing events. I followed the Graphics button to the Machine logs link and from Alt-F2 can see events processing in real time. One looked like it had trouble with its initial Sherpa (like CP's) but has dropped that in favour of a Pythia which looks to be running fine. (Similarly, 1 machine has a Pythia6, the other a Pythia8)
I'll see if I can pay attention when each of these jobs finishes as to whether another will take over. Current Boinc estimated time remaining is a little over 4 days but extrapolating from % progress gives c.18hrs. Is there a self-termination at 24hrs like the standard Theory tasks?
Both tasks happy at 60 and 40 mins respectively.
ID: 2687 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 859,751
RAC: 30
Message 2688 - Posted: 12 Apr 2016, 15:00:17 UTC - in response to Message 2687.  

Current Boinc estimated time remaining is a little over 4 days but extrapolating from % progress gives c.18hrs. Is there a self-termination at 24hrs like the standard Theory tasks?

The wrapper will kill BOINC task after 18 hours -> job duration set to 64800 seconds
ID: 2688 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ben Segal
Volunteer moderator
Volunteer developer
Volunteer tester

Send message
Joined: 12 Sep 14
Posts: 65
Credit: 544
RAC: 0
Message 2690 - Posted: 12 Apr 2016, 15:02:37 UTC - in response to Message 2686.  

The VM shutdowning itself seems be solved. Very well!

The Sherpa's doesn't run well, but my first Pythia6 and Pythia8 do.

===> [runRivet] Tue Apr 12 16:31:15 CEST 2016 [boinc ppbar uemb-hard 1800 - - pythia6 6.428 391 100000 188]

===> [runRivet] Tue Apr 12 16:52:55 CEST 2016 [boinc ppbar uemb-hard 1800 15 - pythia8 8.186 tune-4c 100000 188]

Yes CP (and Ray), the first "real" T4T jobs have been submitted and the web logs are also working. The next step is to feed the results back into MCPlots, which Leonardo is doing currently, so you will start to get MCPlots stats updates for the work you do.

A lot of progress today!

So Rasputin, you can also restart testing now...
ID: 2690 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,969,210
RAC: 14
Message 2691 - Posted: 12 Apr 2016, 16:38:10 UTC

Update to my last post.

Job finished fine, replaced by another, and another. Task continues normally.
Good work, guys 8¬)

The wrapper will kill BOINC task after 18 hours -> job duration set to 64800 seconds

I wonder if that's a typo from the 86400 seconds (24hrs) that T4T runs?

With the base memory being 2GB rather than the ordinary 256MB, I'll need to squeeze some extra memory into 2 of my boxes to allow them to run these. (I was going to anyway but this is the spur to actually do something about it.)
ID: 2691 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 859,751
RAC: 30
Message 2693 - Posted: 12 Apr 2016, 17:26:13 UTC - in response to Message 2691.  

I wonder if that's a typo from the 86400 seconds (24hrs) that T4T runs?

No, it's the same 64800 (18 hrs-limit) as is used by the CMS-application,
when a VM is not stopped normally after 1 run of 6 or 12 hours.

With the base memory being 2GB rather than the ordinary 256MB, I'll need to squeeze some extra memory into 2 of my boxes to allow them to run these. (I was going to anyway but this is the spur to actually do something about it.)

Memory used at the moment ~1.2GB, swap used 0k, so maybe the 2GB could be lowered a bit.
For the multi-threaded Challenge-VM we used as a rule of thumb 512kB for 1 thread.
ID: 2693 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 13 Apr 15
Posts: 138
Credit: 2,969,210
RAC: 14
Message 2695 - Posted: 12 Apr 2016, 20:11:07 UTC
Last modified: 12 Apr 2016, 20:11:54 UTC

I exited Boinc and made sure VBox had saved the VM before restarting my hosts to apply some Windows updates. On reboot, and restart of Boinc, the TASK continued but the previously running JOB was lost however a new Job to replace it started up. (or it might have been a restart of that job; didn't note its id so can't tell either way)

If I have time 2moro I'll do some more "robustness" testing.
ID: 2695 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 0
Message 2696 - Posted: 12 Apr 2016, 20:57:06 UTC - in response to Message 2695.  

Back to the joys of Suspend and Resume with Condor :( I think we need to start a new thread on this topic. The advantage this time is that infrastructure is all under our control, so we should be able to make better progress. To help, the Condor log files (MasterLog, StartLog and StarterLog) can be found in the Web logs directory. The log file handling for the jobs has also be improved so each job log is now archived rather than being overwritten. Available once the CVMFS has updated.
ID: 2696 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 859,751
RAC: 30
Message 2698 - Posted: 13 Apr 2016, 6:28:01 UTC

I noticed that also sherpa is running well now:

===> [runRivet] Wed Apr 13 07:04:24 CEST 2016 [boinc ee zhad 197 - - sherpa 1.4.1 default 100000 188]
ID: 2698 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 20 May 15
Posts: 217
Credit: 6,106,329
RAC: 9,076
Message 2752 - Posted: 14 Apr 2016, 11:28:40 UTC - in response to Message 2698.  

Back to getting failure to start work due to...

2016-04-14 12:17:09 (16392): Guest Log: [INFO] Theory application starting. Check log files.
2016-04-14 12:23:10 (16392): Guest Log: [ERROR] App is not supported. Shutting down!
ID: 2752 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Phil

Send message
Joined: 9 Apr 15
Posts: 57
Credit: 230,221
RAC: 0
Message 2753 - Posted: 14 Apr 2016, 11:43:53 UTC - in response to Message 2752.  

Back to getting failure to start work due to...

2016-04-14 12:17:09 (16392): Guest Log: [INFO] Theory application starting. Check log files.
2016-04-14 12:23:10 (16392): Guest Log: [ERROR] App is not supported. Shutting down!


I just had one of those for a BOINC job that started with Sherpa.


I have also seen a number of EXT4 inode addressing errors (not recorded on the logs, alas), I'll get a new VDI.
ID: 2753 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 2754 - Posted: 14 Apr 2016, 12:18:42 UTC

What is the task exit strategy?
Does the task just time-out after 18h or is it properly shut down at the end of a job?
One task, that exited by itself stated:2016-04-14 05:59:24 (2448): Guest Log: [ERROR] App is not supported. Shutting down!
ID: 2754 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 859,751
RAC: 30
Message 3098 - Posted: 29 Apr 2016, 10:12:32 UTC

I set the memory for the Theory-VM's to 1024MB.
I've 3 Theory's running now.
Max memory used 916632k and max swap used 4528k out of 1GB.

Referring to Rasputin's question. What is the finish strategy for a BOINC-task?
Just let it run for 18 hours or a proper finish somewhere after 12 hours, when a Theory job has uploaded the result. So far I didn't found the answer.
ID: 3098 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1067
Credit: 334,882
RAC: 0
Message 3099 - Posted: 29 Apr 2016, 10:19:42 UTC - in response to Message 3098.  

I am just about to push a new version. Should I set the machine memory to 1024MB or 512MB?
ID: 3099 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3100 - Posted: 29 Apr 2016, 10:30:18 UTC - in response to Message 3099.  
Last modified: 29 Apr 2016, 10:44:56 UTC

It should be set to minimum,where it does not crash and barly uses swap.
If wanted, we can assign more.

I really think, it needs a second core. Any unused cpu will be used elsewhere.

The avergage cpu load with nearly all jobs is well above 1.5 (using 2 cores)
ID: 3100 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 859,751
RAC: 30
Message 3106 - Posted: 29 Apr 2016, 11:31:53 UTC - in response to Message 3099.  

I am just about to push a new version. Should I set the machine memory to 1024MB or 512MB?

The VM's I've seen are using between 750MB and 895MB, so 512 or 1024 should not be the only choice.
With the configured swap space and without too much swapping 896MB should suffice.

I don't agree with Rasputin to configure the VM with 2 cores.
In the past it was good to have 2 cores cause only 1 task was send.
Now we can run several VM's parallel each using 100% CPU.
In the past I've tested that a VM with 2 cores is using between 1.2 and 1.7 cores depending of what kind of job is running (Pythia6, Pythia8, Vincia, Sherpa, Herwig++ etc.)
ID: 3106 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3163 - Posted: 1 May 2016, 19:09:15 UTC

What is the task exit strategy?


It seems, the first job going over 12h is finished and the task is shut down properly.
Also MC-plot-Ids of processed jobs are in the stderr.log.
However, they do not show on console F4 or in the logs any more.
ID: 3163 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 689
Message 3165 - Posted: 2 May 2016, 9:35:17 UTC

Computer: 1164 OpenSuse 13.2(x64) Guest under Windows 8.1(x64)pro.
Boinc 7.2.42 Virtualbox 5.0.18_SUSEr10667

Theory_2016_04_30.xml:

<vbox_job>
<os_name>Linux26_64</os_name>
<memory_size_mb>630</memory_size_mb>
<enable_network/>
<enable_remotedesktop/>
<copy_to_shared>init_data.xml</copy_to_shared>
<completion_trigger_file>shutdown</completion_trigger_file>
<enable_shared_directory/>
<pf_host_port>7859</pf_host_port>
<pf_guest_port>80</pf_guest_port>
<job_duration>64800</job_duration>
<enable_vm_savestate_usage/>
<disable_automatic_checkpoints/>
<heartbeat_filename>heartbeat</heartbeat_filename>
<minimum_heartbeat_interval>1200</minimum_heartbeat_interval>
</vbox_job>

Task is waiting, because of upgrading Boinc to the latest Version.
ID: 3165 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rasputin42
Volunteer tester

Send message
Joined: 16 Aug 15
Posts: 966
Credit: 1,211,816
RAC: 0
Message 3166 - Posted: 2 May 2016, 11:56:08 UTC

I had a coule of JOBS, which had a very long runtime.

Currently, i have one, that has been running for close to 4h and:

Display update finished (127 histograms, 1000 events).
Updating display...
Display update finished (127 histograms, 1000 events).
Event 1600 ( 1h 44m 36s elapsed / 11h 19m 59s left ) -> ETA: Tue May 03 01:09
1600 events processed
Updating display...


I have an average cpu and have already allocated more memory and a second core.

How can it be avoided, that job takes longer than the cutoff time of the boinc task?
ID: 3166 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1188
Credit: 859,751
RAC: 30
Message 3167 - Posted: 2 May 2016, 13:35:53 UTC - in response to Message 3166.  

How can it be avoided, that job takes longer than the cutoff time of the boinc task?

It can't. But maybe you have an endless loop. That sometimes happens.
2 options to get rid of it:
- Abort the task in BOINC Manager (no credits)
- Put a shutdown file into the shared directory (credits granted)
ID: 3167 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 677
Credit: 2,002,766
RAC: 689
Message 3194 - Posted: 3 May 2016, 9:42:21 UTC

Computer-ID:1165, Win 10(x64) pro
works since 2 and a half hour.

finished_1.log up to finished_4.log

Mem: 619.856k total 565.800k used
Swap: 1.048.572k total 44.884k used

seem to be running ok for the moment.
ID: 3194 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Theory Application : The Theory Application


©2024 CERN