Message boards :
Sixtrack Application :
The Sixtrack Application
Message board moderation
Previous · 1 · 2 · 3 · 4
Author | Message |
---|---|
Send message Joined: 18 Aug 15 Posts: 14 Credit: 125,335 RAC: 0 |
More error messages that might help with problem resolution: Wed 07 Nov 2018 10:47:48 AM CST | lhcathome-dev | [error] garbage_collect(); still have active task for acked result Sixtrack_1538966_1540999458.954204_1579_1; state 9 Just in case you need it, computer id = 3601. Please let me know if you need more info. |
Send message Joined: 22 Apr 16 Posts: 675 Credit: 1,989,451 RAC: 420 |
Since half a hour have started with One Task and One Cpu, ComputerID=3596. It is ok now, Boinc say no new tasks are needed and finish the running Task successful. Will see how it works over night. Edit Started a second Computer ID=1164 Captain Jack saw also error garbage_collect for one task. We have to see what is going on overnight. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 857,561 RAC: 33 |
I've seen those [error] garbage_collect()messages several times. Those tasks were already running and are aborted by the server and get the Aborted state in BOINC Manager, but you are still able to look at the task properties and that tells you that there is still a process-id. Well aborted tasks don't have a process-id. After the next scheduler update the status of such an aborted task change to computation error. Worse is that the OS does not terminate that boinc-process and so the memory (330MB) is not freed. Meanwhile I've found a workaround without server abortions. I set the server side limit of tasks to 8 on my 14 core machine. Other threads are running WCG and important set 'Leave Applications In Memory' (LAIM) on. Set your local work buffer to 7 days. Every new SixTrack task starts directly in High Priority possible a WCG-task kicking out (therefore LAIM on). |
Send message Joined: 22 Apr 16 Posts: 675 Credit: 1,989,451 RAC: 420 |
Thanks Crystal, leave in memory is for me always default! With the memory, because of garbag error - tomorrow. Let us crunsh. Have a third Computer ID=3600 running. Edit garbag error is after 57 seconds, when Boinc say: There are no new tasks, because sixtrack have reached a limit of tasks. |
Send message Joined: 22 Apr 16 Posts: 675 Credit: 1,989,451 RAC: 420 |
Computer-ID=3596 got only: process got signal 11</message> Hundreds of Tasks in the past 11 hours. |
Send message Joined: 22 Apr 16 Posts: 675 Credit: 1,989,451 RAC: 420 |
Minimum RAM is 2 GByte for the Computer, than swapfile is Zero. Saw in Top no problems with old Boinc-tasks after garbag error. Computer-ID 1164. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 857,561 RAC: 33 |
When it's not a faulty batch of tasks (don't look like that), Signal 11 mostly means that there is something wrong with your system. It's often a memory or virtual memory issue. Maybe a reboot.will solve it (temporarily :( ). Despite of my workaround described before works rather good, I still have 4 sixtrack processes orphaned during the last 8¾ hours locking up 1300MB RAM. |
Send message Joined: 28 Jul 16 Posts: 481 Credit: 394,720 RAC: 0 |
Same situation as a few days ago. I started with a cleaned test environment and a restarted BOINC client. With the first request I got 11 SixTrack tasks. The next request sent another 11 tasks and cancelled the first ones except the currently running one. I set the client to NNT but the following request cancelled all remaining tasks except the 2nd running one. Final result: - 2 tasks finished successfully - 20 tasks were cancelled by the project - no garbage remained in RAM or in a slot |
Send message Joined: 22 Apr 16 Posts: 675 Credit: 1,989,451 RAC: 420 |
Have now a fourth Computer active, Computer-ID 3607 They have all different Linux: OpenSuse (13.2, 42.3, 15.0) and SL69. Edit SL69-task is waiting without a garbag Error? Only OpenSuse get it?? |
Send message Joined: 20 Jun 17 Posts: 25 Credit: 4,777,813 RAC: 5,496 |
Looks like admins are still giving us the middle finger :( |
Send message Joined: 22 Apr 16 Posts: 675 Credit: 1,989,451 RAC: 420 |
Canceled Tasks have a greater room than it is normally. Therefore it is developement. We need to be patience and stayed by. Will reboot my Linux every day. They stopped unexpected doing work. They need watching some time. After a reboot the work go on. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 857,561 RAC: 33 |
Reboot is not needed anymore for me, since with my workaound, I have only a few tasks with server cancellation a day. See: https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=1778&offset=0&show_names=0&state=6&appid=7 The few limbo one's, I kill with: kill -9 'PID' unlocking the memory. |
Send message Joined: 22 Apr 16 Posts: 675 Credit: 1,989,451 RAC: 420 |
Thanks CP, will set work buffer to seven days. BTW, my successful work is greater than cancelled tasks ;-)) |
Send message Joined: 22 Apr 16 Posts: 675 Credit: 1,989,451 RAC: 420 |
Have now a fifth Computer, Computer-ID=3609. OpenSuse Tumbleweed. |
Send message Joined: 22 Apr 16 Posts: 675 Credit: 1,989,451 RAC: 420 |
Since 16:30 UTC yesterday -dev is running again. Thank you for your weekend work! Don't know what the resolve should be to let sixtrack running well. After one day or so it running always with errors after download. A reboot let it running again successful. Ok, Crystal have a solution therefore. Will show what we as volunteers can help to eliminate this problem. Edit: This Computer had hundreds of tasks with error since 22 .00 UTC yesterday: https://lhcathomedev.cern.ch/lhcathome-dev/results.php?hostid=3596 Is it possible to stop download after more than 20 unsuccessful tasks? |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 857,561 RAC: 33 |
This Computer had hundreds of tasks with error since 22 .00 UTC yesterday: Did not view all the errors, but what I've seen so far is 'Signal 11' errors and since not all tasks of all clients get that error, it seems to me a local problem on your 1 core AMD10. At least that are not the server cancellations we suffer from. Have you set the preference limit to 1 for that machine? Is it possible to stop download after more than 20 unsuccessful tasks? The server is reducing the max number of tasks per day with every error. Sixtrack Simulation 467.03 x86_64-pc-linux-gnu (avx) now 501 and Sixtrack Simulation 467.03 x86_64-pc-linux-gnu (sse2) 419 but is also incremented when you return a valid task. |
Send message Joined: 22 Apr 16 Posts: 675 Credit: 1,989,451 RAC: 420 |
Have you set the preference limit to 1 for that machine? Crystal, have one Linux with 2 CPUs and 2 task in destination home and the other four have 1 CPU with 1 task in destination work. The Linux with this many Error-Tasks had destination work! 500 possible Error-tasks is quite a lot. |
Send message Joined: 22 Apr 16 Posts: 675 Credit: 1,989,451 RAC: 420 |
It's important to stop download of sixtrack-tasks, because of so many Server-killed tasks and Client killed tasks(when 57 seconds are over and Boinc is searching for a new task, but there is a running task). Downdropping of new tasks is so slow on the Server stats (only some hundreds per day). It would work better if Task-Errors are repaired. One Core can do 200 tasks every day (7-8 minutes per tasks). All not finished tasks doesn't freed the Memory(including swap)! |
Send message Joined: 12 Sep 14 Posts: 1067 Credit: 334,882 RAC: 0 |
The Sixtrack application has been disabled for now. The scale tests were successful but the stale jobs need to be cleared. |
©2024 CERN