Message boards : Theory Application : New version v3.12
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 753
Credit: 11,737,111
RAC: 8,707
Message 5698 - Posted: 1 Dec 2018, 23:35:50 UTC
Last modified: 2 Dec 2018, 0:30:17 UTC

After 24 Valids in a row today I started getting the "no sub tasks" errors. (6 in a row on three different pc's)

[ERROR] Condor exited after 714s without running a job.

Log: 00:56:34.612036 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={872da645-4a9b-1727-bee2-5585105b9eed} aComponent={ConsoleWrap} aText={The object is not ready}, preserve=false aResultDetail=0
ID: 5698 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 672
Credit: 1,899,163
RAC: 5,654
Message 5699 - Posted: 2 Dec 2018, 7:15:31 UTC
Last modified: 2 Dec 2018, 7:19:31 UTC

ID: 5699 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 753
Credit: 11,737,111
RAC: 8,707
Message 5700 - Posted: 2 Dec 2018, 9:26:49 UTC

I guess my power of the mind worked and Laurence got my signal this morning

Back up and running here and over at LHC and even getting new tasks to run ( I did have lots of them saved since I suspended all of them when the "no sub tasks" started happening)

Just started up all the cores here and they look good on the VB Log.

Thanks again Laurence...........goodnight!
ID: 5700 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 753
Credit: 11,737,111
RAC: 8,707
Message 5701 - Posted: 2 Dec 2018, 20:24:08 UTC

"no sub tasks"

First thing I see when I get up......and all the ones I started at LHC failed to run but the next batch is up and running.

I am just going to try one here right now just to see if one will get sub tasks
ID: 5701 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 672
Credit: 1,899,163
RAC: 5,654
Message 5702 - Posted: 2 Dec 2018, 20:56:41 UTC

We have to wait, until Laurence is tomorrow in the Office.
It is better to test only with one task.
Do this since a while. (one task with one cpu).
ID: 5702 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 672
Credit: 1,899,163
RAC: 5,654
Message 5703 - Posted: 3 Dec 2018, 12:01:05 UTC
Last modified: 3 Dec 2018, 12:04:22 UTC

This message is shown, but in German:
<message>
Der Ring 2-Stapel wird bereits verwendet.
(0xcf) - exit code 207 (0xcf)</message>
<stderr_txt>
This was in the past sometime shown.
2018-12-03 12:00:39 (10216): Guest Log: 12/03/18 11:49:54 HibernationSupportedStates invalid '' in ad from hibernation plugin /usr/libexec/condor/condor_power_state

2018-12-03 12:00:39 (10216): Guest Log: 12/03/18 11:49:54 VM-gahp server reported an internal error

2018-12-03 12:00:39 (10216): Guest Log: 12/03/18 11:49:54 VM universe will be tested to check if it is available

2018-12-03 12:00:39 (10216): Guest Log: 12/03/18 11:49:54 History file rotation is enabled.
ID: 5703 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,858
RAC: 1,746
Message 5704 - Posted: 3 Dec 2018, 14:50:22 UTC - in response to Message 5703.  
Last modified: 3 Dec 2018, 14:51:45 UTC

Der Ring 2-Stapel wird bereits verwendet.
Diese Meldung ist völlig daneben.

The error code 207 from Windows is used in the result, but error code 207 from BOINC should be used, meaning: EXIT_NO_SUB_TASKS.

This morning I also had 2 results without jobs. Now I've a task processing events.
ID: 5704 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 753
Credit: 11,737,111
RAC: 8,707
Message 5705 - Posted: 3 Dec 2018, 16:49:36 UTC

I just tried one with no luck again.

I might start one more before I leave for a few hours.

I hope all the ones I started for LHC are having better luck right now but I will find out when I get home.
ID: 5705 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 672
Credit: 1,899,163
RAC: 5,654
Message 5706 - Posted: 3 Dec 2018, 19:00:44 UTC

2018-12-03 14:13:14 (9652): Guest Log: [INFO] New Job Starting in slot1
2018-12-03 14:13:14 (9652): Guest Log: [INFO] Condor JobID: 482425.951 in slot1
2018-12-03 14:13:19 (9652): Guest Log: [INFO] MCPlots JobID: 47548970 in slot1

This task
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2741458
finished, but MCProd show no entry.
User=378 HostID=3452
ID: 5706 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 672
Credit: 1,899,163
RAC: 5,654
Message 5707 - Posted: 3 Dec 2018, 20:50:29 UTC

MCProd is now avalaible.One hour later.
ID: 5707 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 753
Credit: 11,737,111
RAC: 8,707
Message 5708 - Posted: 4 Dec 2018, 1:10:54 UTC

ID: 5708 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 672
Credit: 1,899,163
RAC: 5,654
Message 5709 - Posted: 4 Dec 2018, 5:26:21 UTC - in response to Message 5708.  

Yes, we have to wait again.
Just in the moment a task is downloaded. In twenty minutes... good or bad!
Magic, your link is access denied, sorry.
ID: 5709 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 753
Credit: 11,737,111
RAC: 8,707
Message 5710 - Posted: 4 Dec 2018, 6:58:08 UTC - in response to Message 5709.  

Yeah I always forget that we can't just post a link to our tasks so it would be easy to see all the Invalids without checking each computer one at a time.

https://lhcathomedev.cern.ch/lhcathome-dev/hosts_user.php?userid=192

I just turned off all of my computers so they aren't just wasting electricity and ISP data usage ( I have 18433.0 MB high speed left until the 13th)

On this one I decided to d/l 2 Atlas tasks but it is basically a waste of time since I have to d/l over 1 GB just to get those 2 tasks that will just take a few hours and get very little credits so I will leave that to all the other Atlas users.

I don't want to waste the high-speed internet just to wait 4 hours to d/l the vdi and one hour for each task .

I could fire up all the GPU cards and run Einsteins but I rather not double my light bill this time of year.
ID: 5710 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 753
Credit: 11,737,111
RAC: 8,707
Message 5711 - Posted: 9 Dec 2018, 21:39:28 UTC

ID: 5711 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 672
Credit: 1,899,163
RAC: 5,654
Message 5712 - Posted: 10 Dec 2018, 4:30:42 UTC
Last modified: 10 Dec 2018, 4:32:09 UTC

Saw in gpu_grid also this error:
They had a discussion with this boinc parameter for Condor:
<rsc_disk_bound>
Maybe Laurence can help us therefore.
ID: 5712 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,858
RAC: 1,746
Message 5715 - Posted: 10 Dec 2018, 18:31:59 UTC - in response to Message 5711.  

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2742586

Gee thanks

Is this only happening on one certain host?

Is this happening only when running 2 or more jobs (cores) in the Theory VM?

Do you got an error message in BOINC's event log?

Max allowed is 7629.39453125MB, your result shows: - "Peak disk usage 18,331.37 MB"

All my Theory's are occupying between 700MB and 1000MB in the working slot directory.
Some users have reported similar errors with tasks crunched on the production server.
ID: 5715 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 8 Apr 15
Posts: 753
Credit: 11,737,111
RAC: 8,707
Message 5718 - Posted: 11 Dec 2018, 6:22:33 UTC
Last modified: 11 Dec 2018, 6:33:51 UTC

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2742747

Just got another one on the same host and always 2-core tasks on this one.

It has Valids before and after these and it also does them over at LHC ( 8-core running four 2-core tasks always) and those are all Valids (except that no jobs problem)

I run a 2-core task here on another 8-core occasionally and 2 single core tasks on a quad-core that also is running a 2-core for LHC

I do tend to look at the logs and they were running normal but after they finished and sent in these Invalids there is no log to look at on the VB Manager

On the Boinc log with this one is.....

12/10/2018 9:48:25 PM | lhcathome-dev | Aborting task Theory_863944_1544362736.575293_0: exceeded disk limit: 13680.38MB > 7629.39MB
12/10/2018 9:49:27 PM | lhcathome-dev | Computation for task Theory_863944_1544362736.575293_0 finished
12/10/2018 9:49:56 PM | lhcathome-dev | Sending scheduler request: To report completed tasks.
12/10/2018 9:49:56 PM | lhcathome-dev | Reporting 1 completed tasks
12/10/2018 9:49:56 PM | lhcathome-dev | Not requesting tasks: "no new tasks" requested via Manager
12/10/2018 9:54:01 PM | lhcathome-dev | Scheduler request failed: HTTP gateway timeout

And the Boinc pref is always set to run 100% memory and CPU and no limit of disc
ID: 5718 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,858
RAC: 1,746
Message 5719 - Posted: 11 Dec 2018, 10:00:56 UTC - in response to Message 5718.  

At least BOINC detects the oversize used in the slot directory.
It would be useful to know what file is suddenly so big (GB's more than allowed).

I've not seen it on my system, else I would write a batch-file to monitor the files and sizes in the from Theory used slots.
ID: 5719 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 672
Credit: 1,899,163
RAC: 5,654
Message 5723 - Posted: 12 Dec 2018, 7:40:01 UTC
Last modified: 12 Dec 2018, 8:02:54 UTC

Edit: Sorry, the second Core had half a hour later started!
08:46:20 +0100 2018-12-12 [INFO] New Job Starting in slot2
08:46:20 +0100 2018-12-12 [INFO] New Job Starting in slot1
08:46:20 +0100 2018-12-12 [INFO] Condor JobID: 483255.108 in slot2
08:46:20 +0100 2018-12-12 [INFO] Condor JobID: 483255.96 in slot1
08:46:26 +0100 2018-12-12 [INFO] MCPlots JobID: 47688538 in slot2
08:46:26 +0100 2018-12-12 [INFO] MCPlots JobID: 47688491 in slot1

Task with 3 Cores running:
https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=1867840

12/12/18 08:26:02 Allocating auto shares for slot type 0: Cpus: auto, Memory: auto, Swap: auto, Disk: auto
slot type 0: Cpus: 1.000000, Memory: 500, Swap: 33.33%, Disk: 33.33%
slot type 0: Cpus: 1.000000, Memory: 500, Swap: 33.33%, Disk: 33.33%
slot type 0: Cpus: 1.000000, Memory: 500, Swap: 33.33%, Disk: 33.33%
12/12/18 08:26:02 slot1: New machine resource allocated
12/12/18 08:26:02 Setting up slot pairings
12/12/18 08:26:02 slot2: New machine resource allocated
12/12/18 08:26:02 Setting up slot pairings
12/12/18 08:26:02 slot3: New machine resource allocated
12/12/18 08:26:02 Setting up slot pairings

Output of the job wrapper may appear here.
08:26:50 +0100 2018-12-12 [INFO] New Job Starting in slot3
08:26:50 +0100 2018-12-12 [INFO] New Job Starting in slot1
08:26:50 +0100 2018-12-12 [INFO] Condor JobID: 483254.75 in slot3
08:26:50 +0100 2018-12-12 [INFO] Condor JobID: 483254.65 in slot1
08:26:55 +0100 2018-12-12 [INFO] MCPlots JobID: 47688377 in slot3
08:26:55 +0100 2018-12-12 [INFO] MCPlots JobID: 47688432 in slot1


Where is Core TWO?
ID: 5723 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1185
Credit: 848,858
RAC: 1,746
Message 5724 - Posted: 12 Dec 2018, 9:14:07 UTC - in response to Message 5723.  

Where is Core TWO?
At the start jobs are started pairwise with 20 minutes interval,
but the slotnumbers used is not always/mostly not in sequence.
Fast finishing jobs is confusing the sequence and interval too.
2018-11-30 15:33:03 (3164): Guest Log: [INFO] New Job Starting in slot4
2018-11-30 15:33:03 (3164): Guest Log: [INFO] New Job Starting in slot1
2018-11-30 15:53:12 (3164): Guest Log: [INFO] New Job Starting in slot2
2018-11-30 15:53:12 (3164): Guest Log: [INFO] New Job Starting in slot3
2018-11-30 16:14:21 (3164): Guest Log: [INFO] New Job Starting in slot5
2018-11-30 16:14:22 (3164): Guest Log: [INFO] New Job Starting in slot6
2018-11-30 16:35:08 (3164): Guest Log: [INFO] New Job Starting in slot8
2018-11-30 16:35:09 (3164): Guest Log: [INFO] New Job Starting in slot7
2018-11-30 16:56:16 (3164): Guest Log: [INFO] New Job Starting in slot9
2018-11-30 18:02:48 (3164): Guest Log: [INFO] New Job Starting in slot2
2018-11-30 18:43:35 (3164): Guest Log: [INFO] New Job Starting in slot4
2018-11-30 19:11:20 (3164): Guest Log: [INFO] New Job Starting in slot8
2018-11-30 19:12:26 (3164): Guest Log: [INFO] New Job Starting in slot2
2018-11-30 19:32:21 (3164): Guest Log: [INFO] New Job Starting in slot5
2018-11-30 19:36:07 (3164): Guest Log: [INFO] New Job Starting in slot3

This was just for test. When you have the RAM, just run only single core Theory-VMs.
ID: 5724 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Theory Application : New version v3.12


©2024 CERN