Message boards : Sixtrack Application : The Sixtrack Application
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
captainjack

Send message
Joined: 18 Aug 15
Posts: 14
Credit: 117,668
RAC: 1,115
Message 5615 - Posted: 7 Nov 2018, 16:54:28 UTC

More error messages that might help with problem resolution:

Wed 07 Nov 2018 10:47:48 AM CST | lhcathome-dev | [error] garbage_collect(); still have active task for acked result Sixtrack_1538966_1540999458.954204_1579_1; state 9
Wed 07 Nov 2018 10:47:49 AM CST | lhcathome-dev | [error] garbage_collect(); still have active task for acked result Sixtrack_1538966_1540999458.954204_1579_1; state 5
Wed 07 Nov 2018 10:47:49 AM CST | lhcathome-dev | Output file Sixtrack_1538966_1540999458.954204_1579_1_r674300233_0 for task Sixtrack_1538966_1540999458.954204_1579_1 absent


Just in case you need it, computer id = 3601.

Please let me know if you need more info.
ID: 5615 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,719,912
RAC: 3,195
Message 5616 - Posted: 7 Nov 2018, 17:20:59 UTC - in response to Message 5615.  
Last modified: 7 Nov 2018, 17:40:54 UTC

Since half a hour have started with One Task and One Cpu, ComputerID=3596.
It is ok now, Boinc say no new tasks are needed and finish the running Task successful.
Will see how it works over night.
Edit
Started a second Computer ID=1164
Captain Jack
saw also error garbage_collect for one task.
We have to see what is going on overnight.
ID: 5616 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 2,009
Message 5617 - Posted: 7 Nov 2018, 17:42:10 UTC - in response to Message 5615.  

I've seen those
[error] garbage_collect()
messages several times.
Those tasks were already running and are aborted by the server and get the Aborted state in BOINC Manager,
but you are still able to look at the task properties and that tells you that there is still a process-id.
Well aborted tasks don't have a process-id.
After the next scheduler update the status of such an aborted task change to computation error.
Worse is that the OS does not terminate that boinc-process and so the memory (330MB) is not freed.

Meanwhile I've found a workaround without server abortions.
I set the server side limit of tasks to 8 on my 14 core machine. Other threads are running WCG and important set 'Leave Applications In Memory' (LAIM) on.
Set your local work buffer to 7 days. Every new SixTrack task starts directly in High Priority possible a WCG-task kicking out (therefore LAIM on).
ID: 5617 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,719,912
RAC: 3,195
Message 5618 - Posted: 7 Nov 2018, 17:59:20 UTC
Last modified: 7 Nov 2018, 18:53:32 UTC

Thanks Crystal,
leave in memory is for me always default!
With the memory, because of garbag error - tomorrow.
Let us crunsh.
Have a third Computer ID=3600 running.
Edit
garbag error is after 57 seconds, when Boinc say:
There are no new tasks, because sixtrack have reached a limit of tasks.
ID: 5618 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,719,912
RAC: 3,195
Message 5619 - Posted: 8 Nov 2018, 6:23:41 UTC

Computer-ID=3596 got only:
process got signal 11</message>
Hundreds of Tasks in the past 11 hours.
ID: 5619 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,719,912
RAC: 3,195
Message 5620 - Posted: 8 Nov 2018, 7:46:40 UTC - in response to Message 5619.  

Minimum RAM is 2 GByte for the Computer, than swapfile is Zero.
Saw in Top no problems with old Boinc-tasks after garbag error.
Computer-ID 1164.
ID: 5620 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 2,009
Message 5621 - Posted: 8 Nov 2018, 7:51:45 UTC

When it's not a faulty batch of tasks (don't look like that), Signal 11 mostly means that there is something wrong with your system.
It's often a memory or virtual memory issue. Maybe a reboot.will solve it (temporarily :( ).

Despite of my workaround described before works rather good, I still have 4 sixtrack processes orphaned during the last 8¾ hours locking up 1300MB RAM.
ID: 5621 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 28 Jul 16
Posts: 467
Credit: 389,411
RAC: 503
Message 5622 - Posted: 8 Nov 2018, 9:05:11 UTC

Same situation as a few days ago.

I started with a cleaned test environment and a restarted BOINC client.
With the first request I got 11 SixTrack tasks.
The next request sent another 11 tasks and cancelled the first ones except the currently running one.

I set the client to NNT but the following request cancelled all remaining tasks except the 2nd running one.

Final result:
- 2 tasks finished successfully
- 20 tasks were cancelled by the project
- no garbage remained in RAM or in a slot
ID: 5622 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,719,912
RAC: 3,195
Message 5623 - Posted: 8 Nov 2018, 10:34:43 UTC
Last modified: 8 Nov 2018, 11:11:42 UTC

Have now a fourth Computer active, Computer-ID 3607
They have all different Linux: OpenSuse (13.2, 42.3, 15.0) and SL69.
Edit
SL69-task is waiting without a garbag Error?
Only OpenSuse get it??
ID: 5623 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 20 Jun 17
Posts: 25
Credit: 2,940,586
RAC: 2,287
Message 5625 - Posted: 10 Nov 2018, 1:32:09 UTC

Looks like admins are still giving us the middle finger :(
ID: 5625 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,719,912
RAC: 3,195
Message 5626 - Posted: 10 Nov 2018, 8:02:24 UTC

Canceled Tasks have a greater room than it is normally.
Therefore it is developement.
We need to be patience and stayed by.
Will reboot my Linux every day. They stopped unexpected doing work. They need watching some time.
After a reboot the work go on.
ID: 5626 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 2,009
Message 5627 - Posted: 10 Nov 2018, 8:44:11 UTC

Reboot is not needed anymore for me, since with my workaound, I have only a few tasks with server cancellation a day.

See: https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=1778&offset=0&show_names=0&state=6&appid=7

The few limbo one's, I kill with: kill -9 'PID' unlocking the memory.
ID: 5627 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,719,912
RAC: 3,195
Message 5628 - Posted: 10 Nov 2018, 8:56:30 UTC

Thanks CP,
will set work buffer to seven days.
BTW, my successful work is greater than cancelled tasks ;-))
ID: 5628 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,719,912
RAC: 3,195
Message 5629 - Posted: 10 Nov 2018, 10:22:36 UTC

Have now a fifth Computer, Computer-ID=3609.
OpenSuse Tumbleweed.
ID: 5629 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,719,912
RAC: 3,195
Message 5631 - Posted: 12 Nov 2018, 7:57:07 UTC
Last modified: 12 Nov 2018, 8:18:19 UTC

Since 16:30 UTC yesterday -dev is running again. Thank you for your weekend work!

Don't know what the resolve should be to let sixtrack running well.
After one day or so it running always with errors after download.
A reboot let it running again successful.
Ok, Crystal have a solution therefore.
Will show what we as volunteers can help to eliminate this problem.
Edit:
This Computer had hundreds of tasks with error since 22 .00 UTC yesterday:
https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=3596

Is it possible to stop download after more than 20 unsuccessful tasks?
ID: 5631 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet
Volunteer tester

Send message
Joined: 13 Feb 15
Posts: 1178
Credit: 810,985
RAC: 2,009
Message 5632 - Posted: 12 Nov 2018, 9:04:26 UTC - in response to Message 5631.  

This Computer had hundreds of tasks with error since 22 .00 UTC yesterday:
https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=3596

Did not view all the errors, but what I've seen so far is 'Signal 11' errors and since not all tasks of all clients get that error,
it seems to me a local problem on your 1 core AMD10. At least that are not the server cancellations we suffer from.
Have you set the preference limit to 1 for that machine?

Is it possible to stop download after more than 20 unsuccessful tasks?

The server is reducing the max number of tasks per day with every error.
Sixtrack Simulation 467.03 x86_64-pc-linux-gnu (avx) now 501 and
Sixtrack Simulation 467.03 x86_64-pc-linux-gnu (sse2) 419

but is also incremented when you return a valid task.
ID: 5632 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,719,912
RAC: 3,195
Message 5634 - Posted: 12 Nov 2018, 14:16:48 UTC - in response to Message 5632.  
Last modified: 12 Nov 2018, 14:18:10 UTC

Have you set the preference limit to 1 for that machine?


Crystal,
have one Linux with 2 CPUs and 2 task in destination home and
the other four have 1 CPU with 1 task in destination work.
The Linux with this many Error-Tasks had destination work!
500 possible Error-tasks is quite a lot.
ID: 5634 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
maeax

Send message
Joined: 22 Apr 16
Posts: 660
Credit: 1,719,912
RAC: 3,195
Message 5636 - Posted: 13 Nov 2018, 11:14:22 UTC

It's important to stop download of sixtrack-tasks,
because of so many Server-killed tasks and
Client killed tasks(when 57 seconds are over and
Boinc is searching for a new task, but there is a running task).

Downdropping of new tasks is so slow on the Server stats (only some hundreds per day).
It would work better if Task-Errors are repaired.

One Core can do 200 tasks every day (7-8 minutes per tasks).

All not finished tasks doesn't freed the Memory(including swap)!
ID: 5636 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 12 Sep 14
Posts: 1064
Credit: 325,950
RAC: 278
Message 5641 - Posted: 14 Nov 2018, 15:29:58 UTC - in response to Message 5636.  

The Sixtrack application has been disabled for now. The scale tests were successful but the stale jobs need to be cleared.
ID: 5641 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : Sixtrack Application : The Sixtrack Application


©2024 CERN