Thread 'The Theory Application'

Author	Message
Ray Murray Send message Joined: 13 Apr 15 Posts: 138 Credit: 3,015,630 RAC: 0	Message 2687 - Posted: 12 Apr 2016, 14:56:05 UTC Last modified: 12 Apr 2016, 14:59:17 UTC Tasks on both machines got past the previous halt error and after c.10mins each has started processing events. I followed the Graphics button to the Machine logs link and from Alt-F2 can see events processing in real time. One looked like it had trouble with its initial Sherpa (like CP's) but has dropped that in favour of a Pythia which looks to be running fine. (Similarly, 1 machine has a Pythia6, the other a Pythia8) I'll see if I can pay attention when each of these jobs finishes as to whether another will take over. Current Boinc estimated time remaining is a little over 4 days but extrapolating from % progress gives c.18hrs. Is there a self-termination at 24hrs like the standard Theory tasks? Both tasks happy at 60 and 40 mins respectively. ID: 2687 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1279 Credit: 1,045,826 RAC: 136	Message 2688 - Posted: 12 Apr 2016, 15:00:17 UTC - in response to Message 2687. Current Boinc estimated time remaining is a little over 4 days but extrapolating from % progress gives c.18hrs. Is there a self-termination at 24hrs like the standard Theory tasks? The wrapper will kill BOINC task after 18 hours -> job duration set to 64800 seconds ID: 2688 · Rating: 0 · rate: / Reply Quote

Ben Segal Volunteer moderator Volunteer developer Volunteer tester Send message Joined: 12 Sep 14 Posts: 65 Credit: 544 RAC: 0	Message 2690 - Posted: 12 Apr 2016, 15:02:37 UTC - in response to Message 2686. The VM shutdowning itself seems be solved. Very well! The Sherpa's doesn't run well, but my first Pythia6 and Pythia8 do. ===> [runRivet] Tue Apr 12 16:31:15 CEST 2016 [boinc ppbar uemb-hard 1800 - - pythia6 6.428 391 100000 188] ===> [runRivet] Tue Apr 12 16:52:55 CEST 2016 [boinc ppbar uemb-hard 1800 15 - pythia8 8.186 tune-4c 100000 188] Yes CP (and Ray), the first "real" T4T jobs have been submitted and the web logs are also working. The next step is to feed the results back into MCPlots, which Leonardo is doing currently, so you will start to get MCPlots stats updates for the work you do. A lot of progress today! So Rasputin, you can also restart testing now... ID: 2690 · Rating: 0 · rate: / Reply Quote

Ray Murray Send message Joined: 13 Apr 15 Posts: 138 Credit: 3,015,630 RAC: 0	Message 2691 - Posted: 12 Apr 2016, 16:38:10 UTC Update to my last post. Job finished fine, replaced by another, and another. Task continues normally. Good work, guys 8¬) The wrapper will kill BOINC task after 18 hours -> job duration set to 64800 seconds I wonder if that's a typo from the 86400 seconds (24hrs) that T4T runs? With the base memory being 2GB rather than the ordinary 256MB, I'll need to squeeze some extra memory into 2 of my boxes to allow them to run these. (I was going to anyway but this is the spur to actually do something about it.) ID: 2691 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1279 Credit: 1,045,826 RAC: 136	Message 2693 - Posted: 12 Apr 2016, 17:26:13 UTC - in response to Message 2691. I wonder if that's a typo from the 86400 seconds (24hrs) that T4T runs? No, it's the same 64800 (18 hrs-limit) as is used by the CMS-application, when a VM is not stopped normally after 1 run of 6 or 12 hours. With the base memory being 2GB rather than the ordinary 256MB, I'll need to squeeze some extra memory into 2 of my boxes to allow them to run these. (I was going to anyway but this is the spur to actually do something about it.) Memory used at the moment ~1.2GB, swap used 0k, so maybe the 2GB could be lowered a bit. For the multi-threaded Challenge-VM we used as a rule of thumb 512kB for 1 thread. ID: 2693 · Rating: 0 · rate: / Reply Quote

Ray Murray Send message Joined: 13 Apr 15 Posts: 138 Credit: 3,015,630 RAC: 0	Message 2695 - Posted: 12 Apr 2016, 20:11:07 UTC Last modified: 12 Apr 2016, 20:11:54 UTC I exited Boinc and made sure VBox had saved the VM before restarting my hosts to apply some Windows updates. On reboot, and restart of Boinc, the TASK continued but the previously running JOB was lost however a new Job to replace it started up. (or it might have been a restart of that job; didn't note its id so can't tell either way) If I have time 2moro I'll do some more "robustness" testing. ID: 2695 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1159 Credit: 342,328 RAC: 0	Message 2696 - Posted: 12 Apr 2016, 20:57:06 UTC - in response to Message 2695. Back to the joys of Suspend and Resume with Condor :( I think we need to start a new thread on this topic. The advantage this time is that infrastructure is all under our control, so we should be able to make better progress. To help, the Condor log files (MasterLog, StartLog and StarterLog) can be found in the Web logs directory. The log file handling for the jobs has also be improved so each job log is now archived rather than being overwritten. Available once the CVMFS has updated. ID: 2696 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1279 Credit: 1,045,826 RAC: 136	Message 2698 - Posted: 13 Apr 2016, 6:28:01 UTC I noticed that also sherpa is running well now: ===> [runRivet] Wed Apr 13 07:04:24 CEST 2016 [boinc ee zhad 197 - - sherpa 1.4.1 default 100000 188] ID: 2698 · Rating: 0 · rate: / Reply Quote

PDW Send message Joined: 20 May 15 Posts: 217 Credit: 6,294,052 RAC: 0	Message 2752 - Posted: 14 Apr 2016, 11:28:40 UTC - in response to Message 2698. Back to getting failure to start work due to... 2016-04-14 12:17:09 (16392): Guest Log: [INFO] Theory application starting. Check log files. 2016-04-14 12:23:10 (16392): Guest Log: [ERROR] App is not supported. Shutting down! ID: 2752 · Rating: 0 · rate: / Reply Quote

Phil Send message Joined: 9 Apr 15 Posts: 57 Credit: 230,221 RAC: 0	Message 2753 - Posted: 14 Apr 2016, 11:43:53 UTC - in response to Message 2752. Back to getting failure to start work due to... 2016-04-14 12:17:09 (16392): Guest Log: [INFO] Theory application starting. Check log files. 2016-04-14 12:23:10 (16392): Guest Log: [ERROR] App is not supported. Shutting down! I just had one of those for a BOINC job that started with Sherpa. I have also seen a number of EXT4 inode addressing errors (not recorded on the logs, alas), I'll get a new VDI. ID: 2753 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 2754 - Posted: 14 Apr 2016, 12:18:42 UTC What is the task exit strategy? Does the task just time-out after 18h or is it properly shut down at the end of a job? One task, that exited by itself stated:2016-04-14 05:59:24 (2448): Guest Log: [ERROR] App is not supported. Shutting down! ID: 2754 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1279 Credit: 1,045,826 RAC: 136	Message 3098 - Posted: 29 Apr 2016, 10:12:32 UTC I set the memory for the Theory-VM's to 1024MB. I've 3 Theory's running now. Max memory used 916632k and max swap used 4528k out of 1GB. Referring to Rasputin's question. What is the finish strategy for a BOINC-task? Just let it run for 18 hours or a proper finish somewhere after 12 hours, when a Theory job has uploaded the result. So far I didn't found the answer. ID: 3098 · Rating: 0 · rate: / Reply Quote

Laurence CERN Project administrator Project developer Project tester Send message Joined: 12 Sep 14 Posts: 1159 Credit: 342,328 RAC: 0	Message 3099 - Posted: 29 Apr 2016, 10:19:42 UTC - in response to Message 3098. I am just about to push a new version. Should I set the machine memory to 1024MB or 512MB? ID: 3099 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 3100 - Posted: 29 Apr 2016, 10:30:18 UTC - in response to Message 3099. Last modified: 29 Apr 2016, 10:44:56 UTC It should be set to minimum,where it does not crash and barly uses swap. If wanted, we can assign more. I really think, it needs a second core. Any unused cpu will be used elsewhere. The avergage cpu load with nearly all jobs is well above 1.5 (using 2 cores) ID: 3100 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1279 Credit: 1,045,826 RAC: 136	Message 3106 - Posted: 29 Apr 2016, 11:31:53 UTC - in response to Message 3099. I am just about to push a new version. Should I set the machine memory to 1024MB or 512MB? The VM's I've seen are using between 750MB and 895MB, so 512 or 1024 should not be the only choice. With the configured swap space and without too much swapping 896MB should suffice. I don't agree with Rasputin to configure the VM with 2 cores. In the past it was good to have 2 cores cause only 1 task was send. Now we can run several VM's parallel each using 100% CPU. In the past I've tested that a VM with 2 cores is using between 1.2 and 1.7 cores depending of what kind of job is running (Pythia6, Pythia8, Vincia, Sherpa, Herwig++ etc.) ID: 3106 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 3163 - Posted: 1 May 2016, 19:09:15 UTC What is the task exit strategy? It seems, the first job going over 12h is finished and the task is shut down properly. Also MC-plot-Ids of processed jobs are in the stderr.log. However, they do not show on console F4 or in the logs any more. ID: 3163 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 792 Credit: 4,211,539 RAC: 5,747	Message 3165 - Posted: 2 May 2016, 9:35:17 UTC Computer: 1164 OpenSuse 13.2(x64) Guest under Windows 8.1(x64)pro. Boinc 7.2.42 Virtualbox 5.0.18_SUSEr10667 Theory_2016_04_30.xml: <vbox_job> <os_name>Linux26_64</os_name> <memory_size_mb>630</memory_size_mb> <enable_network/> <enable_remotedesktop/> <copy_to_shared>init_data.xml</copy_to_shared> <completion_trigger_file>shutdown</completion_trigger_file> <enable_shared_directory/> <pf_host_port>7859</pf_host_port> <pf_guest_port>80</pf_guest_port> <job_duration>64800</job_duration> <enable_vm_savestate_usage/> <disable_automatic_checkpoints/> <heartbeat_filename>heartbeat</heartbeat_filename> <minimum_heartbeat_interval>1200</minimum_heartbeat_interval> </vbox_job> Task is waiting, because of upgrading Boinc to the latest Version. ID: 3165 · Rating: 0 · rate: / Reply Quote

Rasputin42 Volunteer tester Send message Joined: 16 Aug 15 Posts: 967 Credit: 1,216,795 RAC: 0	Message 3166 - Posted: 2 May 2016, 11:56:08 UTC I had a coule of JOBS, which had a very long runtime. Currently, i have one, that has been running for close to 4h and: Display update finished (127 histograms, 1000 events). Updating display... Display update finished (127 histograms, 1000 events). Event 1600 ( 1h 44m 36s elapsed / 11h 19m 59s left ) -> ETA: Tue May 03 01:09 1600 events processed Updating display... I have an average cpu and have already allocated more memory and a second core. How can it be avoided, that job takes longer than the cutoff time of the boinc task? ID: 3166 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1279 Credit: 1,045,826 RAC: 136	Message 3167 - Posted: 2 May 2016, 13:35:53 UTC - in response to Message 3166. How can it be avoided, that job takes longer than the cutoff time of the boinc task? It can't. But maybe you have an endless loop. That sometimes happens. 2 options to get rid of it: - Abort the task in BOINC Manager (no credits) - Put a shutdown file into the shared directory (credits granted) ID: 3167 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 792 Credit: 4,211,539 RAC: 5,747	Message 3194 - Posted: 3 May 2016, 9:42:21 UTC Computer-ID:1165, Win 10(x64) pro works since 2 and a half hour. finished_1.log up to finished_4.log Mem: 619.856k total 565.800k used Swap: 1.048.572k total 44.884k used seem to be running ok for the moment. ID: 3194 · Rating: 0 · rate: / Reply Quote

Development for LHC@home