New version v3.12

Author	Message
Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 753 Credit: 11,737,111 RAC: 8,707	Message 5698 - Posted: 1 Dec 2018, 23:35:50 UTC Last modified: 2 Dec 2018, 0:30:17 UTC After 24 Valids in a row today I started getting the "no sub tasks" errors. (6 in a row on three different pc's) [ERROR] Condor exited after 714s without running a job. Log: 00:56:34.612036 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={872da645-4a9b-1727-bee2-5585105b9eed} aComponent={ConsoleWrap} aText={The object is not ready}, preserve=false aResultDetail=0 ID: 5698 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 672 Credit: 1,899,163 RAC: 5,654	Message 5699 - Posted: 2 Dec 2018, 7:15:31 UTC Last modified: 2 Dec 2018, 7:19:31 UTC Since 18:00 yesterday: http://mcplots-dev.cern.ch/cache/stats/stats-jobs-hourly.txt http://mcplots-dev.cern.ch/production.php?view=control ID: 5699 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 753 Credit: 11,737,111 RAC: 8,707	Message 5700 - Posted: 2 Dec 2018, 9:26:49 UTC I guess my power of the mind worked and Laurence got my signal this morning Back up and running here and over at LHC and even getting new tasks to run ( I did have lots of them saved since I suspended all of them when the "no sub tasks" started happening) Just started up all the cores here and they look good on the VB Log. Thanks again Laurence...........goodnight! ID: 5700 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 753 Credit: 11,737,111 RAC: 8,707	Message 5701 - Posted: 2 Dec 2018, 20:24:08 UTC "no sub tasks" First thing I see when I get up......and all the ones I started at LHC failed to run but the next batch is up and running. I am just going to try one here right now just to see if one will get sub tasks ID: 5701 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 672 Credit: 1,899,163 RAC: 5,654	Message 5702 - Posted: 2 Dec 2018, 20:56:41 UTC We have to wait, until Laurence is tomorrow in the Office. It is better to test only with one task. Do this since a while. (one task with one cpu). ID: 5702 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 672 Credit: 1,899,163 RAC: 5,654	Message 5703 - Posted: 3 Dec 2018, 12:01:05 UTC Last modified: 3 Dec 2018, 12:04:22 UTC This message is shown, but in German: <message> Der Ring 2-Stapel wird bereits verwendet. (0xcf) - exit code 207 (0xcf)</message> <stderr_txt> This was in the past sometime shown. 2018-12-03 12:00:39 (10216): Guest Log: 12/03/18 11:49:54 HibernationSupportedStates invalid '' in ad from hibernation plugin /usr/libexec/condor/condor_power_state 2018-12-03 12:00:39 (10216): Guest Log: 12/03/18 11:49:54 VM-gahp server reported an internal error 2018-12-03 12:00:39 (10216): Guest Log: 12/03/18 11:49:54 VM universe will be tested to check if it is available 2018-12-03 12:00:39 (10216): Guest Log: 12/03/18 11:49:54 History file rotation is enabled. ID: 5703 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1185 Credit: 848,858 RAC: 1,746	Message 5704 - Posted: 3 Dec 2018, 14:50:22 UTC - in response to Message 5703. Last modified: 3 Dec 2018, 14:51:45 UTC Der Ring 2-Stapel wird bereits verwendet. Diese Meldung ist völlig daneben. The error code 207 from Windows is used in the result, but error code 207 from BOINC should be used, meaning: EXIT_NO_SUB_TASKS. This morning I also had 2 results without jobs. Now I've a task processing events. ID: 5704 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 753 Credit: 11,737,111 RAC: 8,707	Message 5705 - Posted: 3 Dec 2018, 16:49:36 UTC I just tried one with no luck again. I might start one more before I leave for a few hours. I hope all the ones I started for LHC are having better luck right now but I will find out when I get home. ID: 5705 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 672 Credit: 1,899,163 RAC: 5,654	Message 5706 - Posted: 3 Dec 2018, 19:00:44 UTC 2018-12-03 14:13:14 (9652): Guest Log: [INFO] New Job Starting in slot1 2018-12-03 14:13:14 (9652): Guest Log: [INFO] Condor JobID: 482425.951 in slot1 2018-12-03 14:13:19 (9652): Guest Log: [INFO] MCPlots JobID: 47548970 in slot1 This task https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2741458 finished, but MCProd show no entry. User=378 HostID=3452 ID: 5706 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 672 Credit: 1,899,163 RAC: 5,654	Message 5707 - Posted: 3 Dec 2018, 20:50:29 UTC MCProd is now avalaible.One hour later. ID: 5707 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 753 Credit: 11,737,111 RAC: 8,707	Message 5708 - Posted: 4 Dec 2018, 1:10:54 UTC https://lhcathomedev.cern.ch/lhcathome-dev/results.php?userid=192 No luck here or at LHC ID: 5708 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 672 Credit: 1,899,163 RAC: 5,654	Message 5709 - Posted: 4 Dec 2018, 5:26:21 UTC - in response to Message 5708. Yes, we have to wait again. Just in the moment a task is downloaded. In twenty minutes... good or bad! Magic, your link is access denied, sorry. ID: 5709 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 753 Credit: 11,737,111 RAC: 8,707	Message 5710 - Posted: 4 Dec 2018, 6:58:08 UTC - in response to Message 5709. Yeah I always forget that we can't just post a link to our tasks so it would be easy to see all the Invalids without checking each computer one at a time. https://lhcathomedev.cern.ch/lhcathome-dev/hosts_user.php?userid=192 I just turned off all of my computers so they aren't just wasting electricity and ISP data usage ( I have 18433.0 MB high speed left until the 13th) On this one I decided to d/l 2 Atlas tasks but it is basically a waste of time since I have to d/l over 1 GB just to get those 2 tasks that will just take a few hours and get very little credits so I will leave that to all the other Atlas users. I don't want to waste the high-speed internet just to wait 4 hours to d/l the vdi and one hour for each task . I could fire up all the GPU cards and run Einsteins but I rather not double my light bill this time of year. ID: 5710 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 753 Credit: 11,737,111 RAC: 8,707	Message 5711 - Posted: 9 Dec 2018, 21:39:28 UTC https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2742586 Gee thanks ID: 5711 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 672 Credit: 1,899,163 RAC: 5,654	Message 5712 - Posted: 10 Dec 2018, 4:30:42 UTC Last modified: 10 Dec 2018, 4:32:09 UTC Saw in gpu_grid also this error: They had a discussion with this boinc parameter for Condor: <rsc_disk_bound> Maybe Laurence can help us therefore. ID: 5712 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1185 Credit: 848,858 RAC: 1,746	Message 5715 - Posted: 10 Dec 2018, 18:31:59 UTC - in response to Message 5711. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2742586 Gee thanks Is this only happening on one certain host? Is this happening only when running 2 or more jobs (cores) in the Theory VM? Do you got an error message in BOINC's event log? Max allowed is 7629.39453125MB, your result shows: - "Peak disk usage 18,331.37 MB" All my Theory's are occupying between 700MB and 1000MB in the working slot directory. Some users have reported similar errors with tasks crunched on the production server. ID: 5715 · Rating: 0 · rate: / Reply Quote

Magic Quantum Mechanic Send message Joined: 8 Apr 15 Posts: 753 Credit: 11,737,111 RAC: 8,707	Message 5718 - Posted: 11 Dec 2018, 6:22:33 UTC Last modified: 11 Dec 2018, 6:33:51 UTC https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2742747 Just got another one on the same host and always 2-core tasks on this one. It has Valids before and after these and it also does them over at LHC ( 8-core running four 2-core tasks always) and those are all Valids (except that no jobs problem) I run a 2-core task here on another 8-core occasionally and 2 single core tasks on a quad-core that also is running a 2-core for LHC I do tend to look at the logs and they were running normal but after they finished and sent in these Invalids there is no log to look at on the VB Manager On the Boinc log with this one is..... 12/10/2018 9:48:25 PM \| lhcathome-dev \| Aborting task Theory_863944_1544362736.575293_0: exceeded disk limit: 13680.38MB > 7629.39MB 12/10/2018 9:49:27 PM \| lhcathome-dev \| Computation for task Theory_863944_1544362736.575293_0 finished 12/10/2018 9:49:56 PM \| lhcathome-dev \| Sending scheduler request: To report completed tasks. 12/10/2018 9:49:56 PM \| lhcathome-dev \| Reporting 1 completed tasks 12/10/2018 9:49:56 PM \| lhcathome-dev \| Not requesting tasks: "no new tasks" requested via Manager 12/10/2018 9:54:01 PM \| lhcathome-dev \| Scheduler request failed: HTTP gateway timeout And the Boinc pref is always set to run 100% memory and CPU and no limit of disc ID: 5718 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1185 Credit: 848,858 RAC: 1,746	Message 5719 - Posted: 11 Dec 2018, 10:00:56 UTC - in response to Message 5718. At least BOINC detects the oversize used in the slot directory. It would be useful to know what file is suddenly so big (GB's more than allowed). I've not seen it on my system, else I would write a batch-file to monitor the files and sizes in the from Theory used slots. ID: 5719 · Rating: 0 · rate: / Reply Quote

maeax Send message Joined: 22 Apr 16 Posts: 672 Credit: 1,899,163 RAC: 5,654	Message 5723 - Posted: 12 Dec 2018, 7:40:01 UTC Last modified: 12 Dec 2018, 8:02:54 UTC Edit: Sorry, the second Core had half a hour later started! 08:46:20 +0100 2018-12-12 [INFO] New Job Starting in slot2 08:46:20 +0100 2018-12-12 [INFO] New Job Starting in slot1 08:46:20 +0100 2018-12-12 [INFO] Condor JobID: 483255.108 in slot2 08:46:20 +0100 2018-12-12 [INFO] Condor JobID: 483255.96 in slot1 08:46:26 +0100 2018-12-12 [INFO] MCPlots JobID: 47688538 in slot2 08:46:26 +0100 2018-12-12 [INFO] MCPlots JobID: 47688491 in slot1 Task with 3 Cores running: https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=1867840 12/12/18 08:26:02 Allocating auto shares for slot type 0: Cpus: auto, Memory: auto, Swap: auto, Disk: auto slot type 0: Cpus: 1.000000, Memory: 500, Swap: 33.33%, Disk: 33.33% slot type 0: Cpus: 1.000000, Memory: 500, Swap: 33.33%, Disk: 33.33% slot type 0: Cpus: 1.000000, Memory: 500, Swap: 33.33%, Disk: 33.33% 12/12/18 08:26:02 slot1: New machine resource allocated 12/12/18 08:26:02 Setting up slot pairings 12/12/18 08:26:02 slot2: New machine resource allocated 12/12/18 08:26:02 Setting up slot pairings 12/12/18 08:26:02 slot3: New machine resource allocated 12/12/18 08:26:02 Setting up slot pairings Output of the job wrapper may appear here. 08:26:50 +0100 2018-12-12 [INFO] New Job Starting in slot3 08:26:50 +0100 2018-12-12 [INFO] New Job Starting in slot1 08:26:50 +0100 2018-12-12 [INFO] Condor JobID: 483254.75 in slot3 08:26:50 +0100 2018-12-12 [INFO] Condor JobID: 483254.65 in slot1 08:26:55 +0100 2018-12-12 [INFO] MCPlots JobID: 47688377 in slot3 08:26:55 +0100 2018-12-12 [INFO] MCPlots JobID: 47688432 in slot1 Where is Core TWO? ID: 5723 · Rating: 0 · rate: / Reply Quote

Crystal Pellet Volunteer tester Send message Joined: 13 Feb 15 Posts: 1185 Credit: 848,858 RAC: 1,746	Message 5724 - Posted: 12 Dec 2018, 9:14:07 UTC - in response to Message 5723. Where is Core TWO? At the start jobs are started pairwise with 20 minutes interval, but the slotnumbers used is not always/mostly not in sequence. Fast finishing jobs is confusing the sequence and interval too. 2018-11-30 15:33:03 (3164): Guest Log: [INFO] New Job Starting in slot4 2018-11-30 15:33:03 (3164): Guest Log: [INFO] New Job Starting in slot1 2018-11-30 15:53:12 (3164): Guest Log: [INFO] New Job Starting in slot2 2018-11-30 15:53:12 (3164): Guest Log: [INFO] New Job Starting in slot3 2018-11-30 16:14:21 (3164): Guest Log: [INFO] New Job Starting in slot5 2018-11-30 16:14:22 (3164): Guest Log: [INFO] New Job Starting in slot6 2018-11-30 16:35:08 (3164): Guest Log: [INFO] New Job Starting in slot8 2018-11-30 16:35:09 (3164): Guest Log: [INFO] New Job Starting in slot7 2018-11-30 16:56:16 (3164): Guest Log: [INFO] New Job Starting in slot9 2018-11-30 18:02:48 (3164): Guest Log: [INFO] New Job Starting in slot2 2018-11-30 18:43:35 (3164): Guest Log: [INFO] New Job Starting in slot4 2018-11-30 19:11:20 (3164): Guest Log: [INFO] New Job Starting in slot8 2018-11-30 19:12:26 (3164): Guest Log: [INFO] New Job Starting in slot2 2018-11-30 19:32:21 (3164): Guest Log: [INFO] New Job Starting in slot5 2018-11-30 19:36:07 (3164): Guest Log: [INFO] New Job Starting in slot3 This was just for test. When you have the RAM, just run only single core Theory-VMs. ID: 5724 · Rating: 0 · rate: / Reply Quote

Development for LHC@home