Message boards :
Sixtrack Application :
The Sixtrack Application
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 22 Apr 16 Posts: 675 Credit: 1,989,451 RAC: 420 |
Sorry, but - Crystal is it possible to transfer messages No. 5588, 5590 and 5591 to LHCb-thread. This is a sixtrack-thread. Thank you. BTW One CPU and ONE Task are running for sixtrack without ABORT from Server! Edit: Some tasks are also aborted, message in Boinc: [error] garbage_collect() still have active task for acked result Sixtrack.... |
Send message Joined: 22 Apr 16 Posts: 675 Credit: 1,989,451 RAC: 420 |
When a task started than after 57 seconds is a new request for a new task. At this moment, some tasks where interrupt and failed. Is it possible to eliminate this request? Boinc wrote: Requesting new tasks for CPU Scheduler request completed got 0 new task No tasks sent No tasks are avalaible for Sixtrack Simulation This computer has reached a limit on tasks in progress. Using One task and ONE cpu in preferences. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 857,561 RAC: 33 |
Sorry, Sorry, can't do that. I'm not a moderator and also don't wanna be here. Laurence or Nils could do that, but cause this project is only for development/testing 'normal'-not testing users should spend their CPU-power at the LHC production project. 'We' testers should be aware of several kind of failures of the software, no credit given or even crashing of your testing client machine. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 857,561 RAC: 33 |
BTW One CPU and ONE Task are running for sixtrack without ABORT from Server! Even when only 1 task receiving, that one is also cancelled by the server, when it is not yet in a running state and you're requesting more work. |
Send message Joined: 22 Apr 16 Posts: 675 Credit: 1,989,451 RAC: 420 |
My understanding for ONE task is: Boinc load a new task AFTER uploading the finished task. I see this in Boinc for the moment. Why is a new schedule request for a task after ONE Minute of work? Come this from the Server? When this request is simultan in many Computer if they start the task at the same time than goodnight ;-)) - a lot of traffic for a poor Server. |
Send message Joined: 22 Apr 16 Posts: 675 Credit: 1,989,451 RAC: 420 |
When Network-Adapter is interrupted after download no new work is checked from the Server. Sixtrack downloaded 4 Tasks (Preferences are 7 Tasks). This is a infrastructure-problem and no Boinc-schedule error. Will test it over some hours. Computer-ID 1164. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 857,561 RAC: 33 |
Another major misbehavior of this SixTrack version. When running for some time and aborts of running tasks by the server, the memory and swapfile is rapidly decreasing until only 1 or 2 tasks are really running on a 14-core system. The aborted tasks of course disappear in BOINC, but the memory is not freed up. To show you when only 1 task is busy and all other 13 cores are idle the boinc-processes and their allocated memory: top - 13:01:42 up 13:44, 1 user, load average: 1.48, 5.39, 8.90 Tasks: 168 total, 3 running, 165 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 7.2%ni, 92.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 9725296k total, 5097936k used, 4627360k free, 7332k buffers Swap: 2097148k total, 1907536k used, 189612k free, 24880k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6427 boinc 39 19 446m 330m 1996 R 100 3.5 6:33.56 sixtrack_lin64_ 1224 boinc 30 10 118m 11m 2584 S 0 0.1 2:21.07 boinc 5462 boinc 39 19 510m 2152 2140 S 0 0.0 0:14.68 sixtrack_lin64_ 5736 boinc 39 19 510m 80m 2328 S 0 0.9 1:00.64 sixtrack_lin64_ 5738 boinc 39 19 510m 330m 2332 S 0 3.5 1:00.62 sixtrack_lin64_ 5742 boinc 39 19 510m 2456 2332 S 0 0.0 1:00.63 sixtrack_lin64_ 5750 boinc 39 19 510m 330m 2332 S 0 3.5 1:00.50 sixtrack_lin64_ 5814 boinc 39 19 510m 2452 2328 S 0 0.0 0:19.39 sixtrack_lin64_ 5818 boinc 39 19 510m 330m 2328 S 0 3.5 0:24.51 sixtrack_lin64_ 5830 boinc 39 19 510m 2344 2328 S 0 0.0 0:22.45 sixtrack_lin64_ 5834 boinc 39 19 510m 91m 2332 S 0 1.0 0:17.18 sixtrack_lin64_ 5842 boinc 39 19 510m 294m 2328 S 0 3.1 0:23.40 sixtrack_lin64_ 5852 boinc 39 19 510m 330m 2332 S 0 3.5 0:16.35 sixtrack_lin64_ 5864 boinc 39 19 510m 330m 2260 S 0 3.5 0:19.43 sixtrack_lin64_ 5870 boinc 39 19 510m 330m 2332 S 0 3.5 1:00.75 sixtrack_lin64_ 5876 boinc 39 19 510m 330m 2328 S 0 3.5 0:23.50 sixtrack_lin64_ 5881 boinc 39 19 510m 330m 2332 S 0 3.5 0:57.05 sixtrack_lin64_ 5889 boinc 39 19 510m 324m 1764 S 0 3.4 0:13.61 sixtrack_lin64_ 5895 boinc 39 19 510m 330m 2332 S 0 3.5 0:50.08 sixtrack_lin64_ 5903 boinc 39 19 510m 325m 2152 S 0 3.4 0:14.32 sixtrack_lin64_ 5909 boinc 39 19 510m 330m 2340 S 0 3.5 0:54.34 sixtrack_lin64_ |
Send message Joined: 22 Apr 16 Posts: 675 Credit: 1,989,451 RAC: 420 |
With or without the Boinc-option: report_results_immediately (0/1) 13 Download-Error and 35 sixtrack-tasks running in three hours. It seem a network-traffic problem in the infrastructure to be. Boinc is proofing every Minute more than 10 times for Network-Connection. It is to difficult to find a good answer from the Client-Side. This test is with OpenSuse 13.2 and Boinc 7.2.42. A very stable Linux. |
Send message Joined: 28 Jul 16 Posts: 481 Credit: 394,720 RAC: 0 |
This is more likely a BOINC client issue than a SixTrack issue. Could you check if the SixTracks disappear if you quit your BOINC client? |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 857,561 RAC: 33 |
This is more likely a BOINC client issue than a SixTrack issue. No, I already checked that yesterday. When shutting down the BOINC-client the sixtrack processes don't disappear and memory and swap keeps allocated. The fastest and cleanest solution here is: $ reboot |
Send message Joined: 22 Apr 16 Posts: 675 Credit: 1,989,451 RAC: 420 |
Four tasks are downloaded. During the time ONE is running, every Minute the other three where killed and new three tasks are downloaded! Finishing my test for sixtrack now! |
Send message Joined: 28 Jul 16 Posts: 481 Credit: 394,720 RAC: 0 |
The normal way to end a child process would be to send a TERM signal first (which can be trapped/ignored by the child) and to send a KILL signal after a grace period. The latter can't be trapped and tells the kernel to immediately destroy the affected process. As I doubt there is a general kernel problem and due to the fact that the cancellation is initiated by the client I still think it's a BOINC client issue. May be that the client does only send the TERM signal. One way to check this would be to read the source code. |
Send message Joined: 22 Apr 16 Posts: 675 Credit: 1,989,451 RAC: 420 |
18/11/5 8:30 UTC is the beginning of the Problems with Server cancelled tasks. The time before, all tasks are running without those errors. Is there a Server-status seen? |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 857,561 RAC: 33 |
The normal way to end a child process would be to send a TERM signal first (which can be trapped/ignored by the child) and to send a KILL signal after a grace period. Retested the allocated and not freed memory after tasks aborted by the server. First 14 cores available, but meanwhile all memory used, so 12 tasks running 100% and 2 tasks 0% cpu cause not enougn memory. A lot of aborted tasks not showing up in BOINC Manager, but have not freed memory top - 18:10:18 up 4:58, 1 user, load average: 11.94, 12.00, 12.41 Tasks: 183 total, 13 running, 168 sleeping, 0 stopped, 2 zombie Cpu(s): 0.0%us, 0.1%sy, 85.8%ni, 14.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 9725296k total, 9350200k used, 375096k free, 13992k buffers Swap: 2097148k total, 2095516k used, 1632k free, 171360k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5686 boinc 39 19 446m 330m 1996 R 100 3.5 5:11.88 sixtrack_lin64_ 5965 boinc 39 19 446m 330m 1996 R 100 3.5 0:33.89 sixtrack_lin64_ 5968 boinc 39 19 446m 330m 1992 R 100 3.5 0:31.76 sixtrack_lin64_ 5583 boinc 39 19 446m 330m 1996 R 100 3.5 6:52.98 sixtrack_lin64_ 5600 boinc 39 19 446m 330m 1996 R 100 3.5 6:36.72 sixtrack_lin64_ 5607 boinc 39 19 446m 330m 1992 R 100 3.5 6:30.59 sixtrack_lin64_ 5642 boinc 39 19 446m 330m 1996 R 100 3.5 5:55.95 sixtrack_lin64_ 5925 boinc 39 19 446m 330m 1996 R 100 3.5 1:10.54 sixtrack_lin64_ 5931 boinc 39 19 446m 330m 1996 R 100 3.5 1:05.43 sixtrack_lin64_ 5950 boinc 39 19 446m 330m 1992 R 100 3.5 0:48.11 sixtrack_lin64_ 5976 boinc 39 19 446m 330m 1992 R 100 3.5 0:24.64 sixtrack_lin64_ 5935 boinc 39 19 446m 330m 1992 R 100 3.5 1:02.33 sixtrack_lin64_ 1235 boinc 30 10 117m 11m 2552 S 1 0.1 1:19.89 boinc 3331 boinc 39 19 510m 1908 1900 S 0 0.0 0:55.75 sixtrack_lin64_ 3335 boinc 39 19 510m 58m 1900 S 0 0.6 0:50.66 sixtrack_lin64_ 3340 boinc 39 19 510m 1908 1900 S 0 0.0 0:32.39 sixtrack_lin64_ 3344 boinc 39 19 510m 11m 1252 S 0 0.1 0:03.28 sixtrack_lin64_ 3360 boinc 39 19 510m 1676 1376 S 0 0.0 0:10.49 sixtrack_lin64_ 3368 boinc 39 19 510m 2580 1712 S 0 0.0 0:16.48 sixtrack_lin64_ 3374 boinc 39 19 510m 330m 1832 S 0 3.5 0:18.43 sixtrack_lin64_ 3380 boinc 39 19 510m 192m 1252 S 0 2.0 0:09.22 sixtrack_lin64_ 3384 boinc 39 19 510m 320m 1376 S 0 3.4 0:12.37 sixtrack_lin64_ 3389 boinc 39 19 510m 322m 1376 S 0 3.4 0:13.36 sixtrack_lin64_ 3393 boinc 39 19 510m 324m 1376 S 0 3.4 0:12.32 sixtrack_lin64_ 3405 boinc 39 19 510m 330m 1900 S 0 3.5 0:17.43 sixtrack_lin64_ 3416 boinc 39 19 510m 324m 1376 S 0 3.4 0:12.37 sixtrack_lin64_ 3421 boinc 39 19 510m 324m 1376 S 0 3.4 0:12.19 sixtrack_lin64_ 3427 boinc 39 19 510m 324m 1372 S 0 3.4 0:12.34 sixtrack_lin64_ 3431 boinc 39 19 510m 319m 1004 S 0 3.4 0:02.05 sixtrack_lin64_ 3435 boinc 39 19 510m 324m 1376 S 0 3.4 0:12.37 sixtrack_lin64_ 3440 boinc 39 19 510m 322m 1252 S 0 3.4 0:04.14 sixtrack_lin64_ 3452 boinc 39 19 510m 319m 1004 S 0 3.4 0:02.14 sixtrack_lin64_ 3458 boinc 39 19 510m 330m 1904 S 0 3.5 0:45.25 sixtrack_lin64_ 5070 boinc 39 19 510m 320m 1380 S 0 3.4 0:01.24 sixtrack_lin64_ 6001 boinc 39 19 0 0 0 Z 0 0.0 0:00.00 sixtrack_lin64_ <defunct> 6002 boinc 39 19 0 0 0 Z 0 0.0 0:00.00 sixtrack_lin64_ <defunct> Second table shows all processes owned by boinc, but all tasks are ready, uploaded and reported, so no single task in BOINC Manager is shown. top - 18:22:12 up 5:10, 1 user, load average: 1.08, 5.20, 9.09 Tasks: 169 total, 1 running, 168 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 9725296k total, 5161796k used, 4563500k free, 17112k buffers Swap: 2097148k total, 2094268k used, 2880k free, 22144k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1235 boinc 30 10 117m 11m 2440 S 0 0.1 1:23.49 boinc 3331 boinc 39 19 510m 1908 1900 S 0 0.0 0:55.76 sixtrack_lin64_ 3335 boinc 39 19 510m 58m 1900 S 0 0.6 0:50.68 sixtrack_lin64_ 3340 boinc 39 19 510m 1908 1900 S 0 0.0 0:32.39 sixtrack_lin64_ 3344 boinc 39 19 510m 11m 1252 S 0 0.1 0:03.28 sixtrack_lin64_ 3360 boinc 39 19 510m 1676 1376 S 0 0.0 0:10.50 sixtrack_lin64_ 3368 boinc 39 19 510m 2580 1712 S 0 0.0 0:16.49 sixtrack_lin64_ 3374 boinc 39 19 510m 330m 1832 S 0 3.5 0:18.44 sixtrack_lin64_ 3380 boinc 39 19 510m 192m 1252 S 0 2.0 0:09.22 sixtrack_lin64_ 3384 boinc 39 19 510m 320m 1376 S 0 3.4 0:12.38 sixtrack_lin64_ 3389 boinc 39 19 510m 322m 1376 S 0 3.4 0:13.38 sixtrack_lin64_ 3393 boinc 39 19 510m 324m 1376 S 0 3.4 0:12.35 sixtrack_lin64_ 3405 boinc 39 19 510m 330m 1900 S 0 3.5 0:17.44 sixtrack_lin64_ 3416 boinc 39 19 510m 324m 1376 S 0 3.4 0:12.38 sixtrack_lin64_ 3421 boinc 39 19 510m 324m 1376 S 0 3.4 0:12.20 sixtrack_lin64_ 3427 boinc 39 19 510m 324m 1372 S 0 3.4 0:12.35 sixtrack_lin64_ 3431 boinc 39 19 510m 319m 1004 S 0 3.4 0:02.06 sixtrack_lin64_ 3435 boinc 39 19 510m 324m 1376 S 0 3.4 0:12.38 sixtrack_lin64_ 3440 boinc 39 19 510m 322m 1252 S 0 3.4 0:04.16 sixtrack_lin64_ 3452 boinc 39 19 510m 319m 1004 S 0 3.4 0:02.15 sixtrack_lin64_ 3458 boinc 39 19 510m 330m 1904 S 0 3.5 0:45.25 sixtrack_lin64_ 5070 boinc 39 19 510m 320m 1380 S 0 3.4 0:01.26 sixtrack_lin64_ Third table shows all processes after I have stopped BOINC client with 'sudo /etc/init.d/boinc-client stop' top - 18:24:40 up 5:12, 1 user, load average: 0.10, 3.20, 7.77 Tasks: 168 total, 1 running, 167 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 9725296k total, 5161456k used, 4563840k free, 18392k buffers Swap: 2097148k total, 2091792k used, 5356k free, 28808k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3331 boinc 39 19 510m 1912 1900 S 0 0.0 0:55.77 sixtrack_lin64_ 3335 boinc 39 19 510m 58m 1900 S 0 0.6 0:50.68 sixtrack_lin64_ 3340 boinc 39 19 510m 1912 1900 S 0 0.0 0:32.40 sixtrack_lin64_ 3344 boinc 39 19 510m 11m 1252 S 0 0.1 0:03.29 sixtrack_lin64_ 3360 boinc 39 19 510m 1676 1376 S 0 0.0 0:10.51 sixtrack_lin64_ 3368 boinc 39 19 510m 2580 1712 S 0 0.0 0:16.49 sixtrack_lin64_ 3374 boinc 39 19 510m 330m 1832 S 0 3.5 0:18.44 sixtrack_lin64_ 3380 boinc 39 19 510m 192m 1252 S 0 2.0 0:09.22 sixtrack_lin64_ 3384 boinc 39 19 510m 320m 1376 S 0 3.4 0:12.39 sixtrack_lin64_ 3389 boinc 39 19 510m 322m 1376 S 0 3.4 0:13.39 sixtrack_lin64_ 3393 boinc 39 19 510m 324m 1376 S 0 3.4 0:12.35 sixtrack_lin64_ 3405 boinc 39 19 510m 330m 1900 S 0 3.5 0:17.44 sixtrack_lin64_ 3416 boinc 39 19 510m 324m 1376 S 0 3.4 0:12.38 sixtrack_lin64_ 3421 boinc 39 19 510m 324m 1376 S 0 3.4 0:12.20 sixtrack_lin64_ 3427 boinc 39 19 510m 324m 1372 S 0 3.4 0:12.35 sixtrack_lin64_ 3431 boinc 39 19 510m 319m 1004 S 0 3.4 0:02.07 sixtrack_lin64_ 3435 boinc 39 19 510m 324m 1376 S 0 3.4 0:12.38 sixtrack_lin64_ 3440 boinc 39 19 510m 322m 1252 S 0 3.4 0:04.16 sixtrack_lin64_ 3452 boinc 39 19 510m 319m 1004 S 0 3.4 0:02.15 sixtrack_lin64_ 3458 boinc 39 19 510m 330m 1904 S 0 3.5 0:45.25 sixtrack_lin64_ 5070 boinc 39 19 510m 320m 1380 S 0 3.4 0:01.27 sixtrack_lin64_ |
Send message Joined: 18 Aug 15 Posts: 14 Credit: 125,335 RAC: 0 |
Me too, I also have 45 aborted tasks that are each taking up 328MiB of memory. All of 16GB memory is currently being used and 1 GB of the SWAP file is being used. System currently has no active sixtrack test tasks showing in BOINC. Time for a reboot and stop running sixtrack test tasks. By the way, does anybody know what we are testing with all the sixtrack tasks? Would love to know if we are supposed to be watching for anything specific. |
Send message Joined: 28 Jul 16 Posts: 481 Credit: 394,720 RAC: 0 |
I also found one of them on my test system although I started the last task yesterday morning. The task itself is inactive as well as the BOINC client but it still "lives" in RAM and in the slot folder. This line is from it's init_data.xml <result_name>Sixtrack_1538806_1540999193.697062_9302_1</result_name> The client_state.xml does not contain a corresponding ID but it can be found in my task list: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2488109 The same WU has been sent a second time to the same host: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2488108 Another host (CP's Opteron) also got the same WU twice: https://lhcathomedev.cern.ch/lhcathome-dev/workunit.php?wuid=1743356 As multiple sends to the same host are not very common I wonder if we stumbled over a BOINC client bug. |
Send message Joined: 13 Feb 15 Posts: 1188 Credit: 857,561 RAC: 33 |
As multiple sends to the same host are not very common I wonder if we stumbled over a BOINC client bug. No bug on either client nor server side. It's a setting in the server configuration table to avoid sending the same task to the same host and/or the same user. Settings: <one_result_per_user_per_wu/> <one_result_per_host_per_wu/> |
Send message Joined: 28 Jul 16 Posts: 481 Credit: 394,720 RAC: 0 |
No bug on either client nor server side. Those server options are usually used to ensure that results from one user/host can be verified against another user/host. If a result is sent twice to the same host this must be handled by the client without a crash. Nonetheless there are obviously lots of crashes. What I suspect is that those multiple sends may be the reason for the crashes. |
Send message Joined: 20 Jun 17 Posts: 25 Credit: 4,777,813 RAC: 5,496 |
Ya mean like how I mentioned it before in this post? Not a word was said about it then. https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=415 Now nearly everything is being canceled by the server. This was working fine before the server went down. |
Send message Joined: 22 Apr 16 Posts: 675 Credit: 1,989,451 RAC: 420 |
Now nearly everything is being canceled by the server. This was working fine before the server went down. 18/11/5 8:30 UTC is the beginning of the Problems with Server cancelled tasks. |
©2024 CERN