Message boards :
Theory Application :
New version v3.12
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 8 Apr 15 Posts: 753 Credit: 11,737,111 RAC: 8,707 |
After 24 Valids in a row today I started getting the "no sub tasks" errors. (6 in a row on three different pc's) [ERROR] Condor exited after 714s without running a job. Log: 00:56:34.612036 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={872da645-4a9b-1727-bee2-5585105b9eed} aComponent={ConsoleWrap} aText={The object is not ready}, preserve=false aResultDetail=0 |
Send message Joined: 22 Apr 16 Posts: 672 Credit: 1,899,163 RAC: 5,654 |
|
Send message Joined: 8 Apr 15 Posts: 753 Credit: 11,737,111 RAC: 8,707 |
I guess my power of the mind worked and Laurence got my signal this morning Back up and running here and over at LHC and even getting new tasks to run ( I did have lots of them saved since I suspended all of them when the "no sub tasks" started happening) Just started up all the cores here and they look good on the VB Log. Thanks again Laurence...........goodnight! |
Send message Joined: 8 Apr 15 Posts: 753 Credit: 11,737,111 RAC: 8,707 |
"no sub tasks" First thing I see when I get up......and all the ones I started at LHC failed to run but the next batch is up and running. I am just going to try one here right now just to see if one will get sub tasks |
Send message Joined: 22 Apr 16 Posts: 672 Credit: 1,899,163 RAC: 5,654 |
We have to wait, until Laurence is tomorrow in the Office. It is better to test only with one task. Do this since a while. (one task with one cpu). |
Send message Joined: 22 Apr 16 Posts: 672 Credit: 1,899,163 RAC: 5,654 |
This message is shown, but in German: <message> Der Ring 2-Stapel wird bereits verwendet. (0xcf) - exit code 207 (0xcf)</message> <stderr_txt> This was in the past sometime shown. 2018-12-03 12:00:39 (10216): Guest Log: 12/03/18 11:49:54 HibernationSupportedStates invalid '' in ad from hibernation plugin /usr/libexec/condor/condor_power_state 2018-12-03 12:00:39 (10216): Guest Log: 12/03/18 11:49:54 VM-gahp server reported an internal error 2018-12-03 12:00:39 (10216): Guest Log: 12/03/18 11:49:54 VM universe will be tested to check if it is available 2018-12-03 12:00:39 (10216): Guest Log: 12/03/18 11:49:54 History file rotation is enabled. |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 848,858 RAC: 1,746 |
Der Ring 2-Stapel wird bereits verwendet.Diese Meldung ist völlig daneben. The error code 207 from Windows is used in the result, but error code 207 from BOINC should be used, meaning: EXIT_NO_SUB_TASKS. This morning I also had 2 results without jobs. Now I've a task processing events. |
Send message Joined: 8 Apr 15 Posts: 753 Credit: 11,737,111 RAC: 8,707 |
I just tried one with no luck again. I might start one more before I leave for a few hours. I hope all the ones I started for LHC are having better luck right now but I will find out when I get home. |
Send message Joined: 22 Apr 16 Posts: 672 Credit: 1,899,163 RAC: 5,654 |
2018-12-03 14:13:14 (9652): Guest Log: [INFO] New Job Starting in slot1 2018-12-03 14:13:14 (9652): Guest Log: [INFO] Condor JobID: 482425.951 in slot1 2018-12-03 14:13:19 (9652): Guest Log: [INFO] MCPlots JobID: 47548970 in slot1 This task https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2741458 finished, but MCProd show no entry. User=378 HostID=3452 |
Send message Joined: 22 Apr 16 Posts: 672 Credit: 1,899,163 RAC: 5,654 |
MCProd is now avalaible.One hour later. |
Send message Joined: 8 Apr 15 Posts: 753 Credit: 11,737,111 RAC: 8,707 |
|
Send message Joined: 22 Apr 16 Posts: 672 Credit: 1,899,163 RAC: 5,654 |
Yes, we have to wait again. Just in the moment a task is downloaded. In twenty minutes... good or bad! Magic, your link is access denied, sorry. |
Send message Joined: 8 Apr 15 Posts: 753 Credit: 11,737,111 RAC: 8,707 |
Yeah I always forget that we can't just post a link to our tasks so it would be easy to see all the Invalids without checking each computer one at a time. https://lhcathomedev.cern.ch/lhcathome-dev/hosts_user.php?userid=192 I just turned off all of my computers so they aren't just wasting electricity and ISP data usage ( I have 18433.0 MB high speed left until the 13th) On this one I decided to d/l 2 Atlas tasks but it is basically a waste of time since I have to d/l over 1 GB just to get those 2 tasks that will just take a few hours and get very little credits so I will leave that to all the other Atlas users. I don't want to waste the high-speed internet just to wait 4 hours to d/l the vdi and one hour for each task . I could fire up all the GPU cards and run Einsteins but I rather not double my light bill this time of year. |
Send message Joined: 8 Apr 15 Posts: 753 Credit: 11,737,111 RAC: 8,707 |
|
Send message Joined: 22 Apr 16 Posts: 672 Credit: 1,899,163 RAC: 5,654 |
Saw in gpu_grid also this error: They had a discussion with this boinc parameter for Condor: <rsc_disk_bound> Maybe Laurence can help us therefore. |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 848,858 RAC: 1,746 |
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2742586 Is this only happening on one certain host? Is this happening only when running 2 or more jobs (cores) in the Theory VM? Do you got an error message in BOINC's event log? Max allowed is 7629.39453125MB, your result shows: - "Peak disk usage 18,331.37 MB" All my Theory's are occupying between 700MB and 1000MB in the working slot directory. Some users have reported similar errors with tasks crunched on the production server. |
Send message Joined: 8 Apr 15 Posts: 753 Credit: 11,737,111 RAC: 8,707 |
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2742747 Just got another one on the same host and always 2-core tasks on this one. It has Valids before and after these and it also does them over at LHC ( 8-core running four 2-core tasks always) and those are all Valids (except that no jobs problem) I run a 2-core task here on another 8-core occasionally and 2 single core tasks on a quad-core that also is running a 2-core for LHC I do tend to look at the logs and they were running normal but after they finished and sent in these Invalids there is no log to look at on the VB Manager On the Boinc log with this one is..... 12/10/2018 9:48:25 PM | lhcathome-dev | Aborting task Theory_863944_1544362736.575293_0: exceeded disk limit: 13680.38MB > 7629.39MB 12/10/2018 9:49:27 PM | lhcathome-dev | Computation for task Theory_863944_1544362736.575293_0 finished 12/10/2018 9:49:56 PM | lhcathome-dev | Sending scheduler request: To report completed tasks. 12/10/2018 9:49:56 PM | lhcathome-dev | Reporting 1 completed tasks 12/10/2018 9:49:56 PM | lhcathome-dev | Not requesting tasks: "no new tasks" requested via Manager 12/10/2018 9:54:01 PM | lhcathome-dev | Scheduler request failed: HTTP gateway timeout And the Boinc pref is always set to run 100% memory and CPU and no limit of disc |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 848,858 RAC: 1,746 |
At least BOINC detects the oversize used in the slot directory. It would be useful to know what file is suddenly so big (GB's more than allowed). I've not seen it on my system, else I would write a batch-file to monitor the files and sizes in the from Theory used slots. |
Send message Joined: 22 Apr 16 Posts: 672 Credit: 1,899,163 RAC: 5,654 |
Edit: Sorry, the second Core had half a hour later started! 08:46:20 +0100 2018-12-12 [INFO] New Job Starting in slot2 08:46:20 +0100 2018-12-12 [INFO] New Job Starting in slot1 08:46:20 +0100 2018-12-12 [INFO] Condor JobID: 483255.108 in slot2 08:46:20 +0100 2018-12-12 [INFO] Condor JobID: 483255.96 in slot1 08:46:26 +0100 2018-12-12 [INFO] MCPlots JobID: 47688538 in slot2 08:46:26 +0100 2018-12-12 [INFO] MCPlots JobID: 47688491 in slot1 Task with 3 Cores running: https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=1867840 12/12/18 08:26:02 Allocating auto shares for slot type 0: Cpus: auto, Memory: auto, Swap: auto, Disk: auto slot type 0: Cpus: 1.000000, Memory: 500, Swap: 33.33%, Disk: 33.33% slot type 0: Cpus: 1.000000, Memory: 500, Swap: 33.33%, Disk: 33.33% slot type 0: Cpus: 1.000000, Memory: 500, Swap: 33.33%, Disk: 33.33% 12/12/18 08:26:02 slot1: New machine resource allocated 12/12/18 08:26:02 Setting up slot pairings 12/12/18 08:26:02 slot2: New machine resource allocated 12/12/18 08:26:02 Setting up slot pairings 12/12/18 08:26:02 slot3: New machine resource allocated 12/12/18 08:26:02 Setting up slot pairings Output of the job wrapper may appear here. 08:26:50 +0100 2018-12-12 [INFO] New Job Starting in slot3 08:26:50 +0100 2018-12-12 [INFO] New Job Starting in slot1 08:26:50 +0100 2018-12-12 [INFO] Condor JobID: 483254.75 in slot3 08:26:50 +0100 2018-12-12 [INFO] Condor JobID: 483254.65 in slot1 08:26:55 +0100 2018-12-12 [INFO] MCPlots JobID: 47688377 in slot3 08:26:55 +0100 2018-12-12 [INFO] MCPlots JobID: 47688432 in slot1 Where is Core TWO? |
Send message Joined: 13 Feb 15 Posts: 1185 Credit: 848,858 RAC: 1,746 |
Where is Core TWO?At the start jobs are started pairwise with 20 minutes interval, but the slotnumbers used is not always/mostly not in sequence. Fast finishing jobs is confusing the sequence and interval too. 2018-11-30 15:33:03 (3164): Guest Log: [INFO] New Job Starting in slot4 2018-11-30 15:33:03 (3164): Guest Log: [INFO] New Job Starting in slot1 2018-11-30 15:53:12 (3164): Guest Log: [INFO] New Job Starting in slot2 2018-11-30 15:53:12 (3164): Guest Log: [INFO] New Job Starting in slot3 2018-11-30 16:14:21 (3164): Guest Log: [INFO] New Job Starting in slot5 2018-11-30 16:14:22 (3164): Guest Log: [INFO] New Job Starting in slot6 2018-11-30 16:35:08 (3164): Guest Log: [INFO] New Job Starting in slot8 2018-11-30 16:35:09 (3164): Guest Log: [INFO] New Job Starting in slot7 2018-11-30 16:56:16 (3164): Guest Log: [INFO] New Job Starting in slot9 2018-11-30 18:02:48 (3164): Guest Log: [INFO] New Job Starting in slot2 2018-11-30 18:43:35 (3164): Guest Log: [INFO] New Job Starting in slot4 2018-11-30 19:11:20 (3164): Guest Log: [INFO] New Job Starting in slot8 2018-11-30 19:12:26 (3164): Guest Log: [INFO] New Job Starting in slot2 2018-11-30 19:32:21 (3164): Guest Log: [INFO] New Job Starting in slot5 2018-11-30 19:36:07 (3164): Guest Log: [INFO] New Job Starting in slot3 This was just for test. When you have the RAM, just run only single core Theory-VMs. |
©2024 CERN