Opened 10 years ago
Closed 10 years ago
#3787 closed defect (invalid)
OScam crushed often, even if I change computer hardware
Reported by: | fz | Owned by: | |
---|---|---|---|
Priority: | critical | Component: | General |
Severity: | high | Keywords: | oscam crush |
Cc: | Sensitive: | no |
Description
Revision
All last versions, with last OSCam r9791
Issue Description
In running oscam, radomly oscam crush hinself.
When the issue occurs
I use oscam with debian linux 7 amd64, last update installed.
I have tried to change hardware and I have change my server but I have the same problem.
OScam crush with those errors in /var/log/messages:
(...)
Jul 10 06:33:16 localhost kernel: traps: garbage_collect[20387] general protection ip:42b4a0 sp:7f2ba81f6ec0 error:0 in oscam[400000+f6000]
Jul 10 06:46:15 localhost kernel: garbage_collect[20891]: segfault at 0 ip 00007f12c39dbae9 sp 00007f12c4974eb0 error 4 in libc-2.13.so[7f12c3960000+182000]
Jul 10 08:06:43 localhost kernel: traps: garbage_collect[23399] general protection ip:7f27d8b43ac9 sp:7f27d9ad3eb0 error:0 in libc-2.13.so[7f27d8ac8000+182000]
Jul 10 10:19:47 localhost kernel: traps: wc27-cache-INDI[17723] general protection ip:7f5f0f1a6fed sp:7f5efeeadc30 error:0 in libc-2.13.so[7f5f0f130000+182000]
Jul 10 10:28:06 localhost kernel: traps: wr01-cache-SHAR[17804] general protection ip:7f3c0ce396ba sp:7f3c0765eb70 error:0 in libc-2.13.so[7f3c0cdc0000+182000]
Jul 10 11:53:10 localhost kernel: traps: serve_process[19187] general protection ip:7f3e72b6efed sp:7f3e6296aca0 error:0 in libc-2.13.so[7f3e72af8000+182000]
Jul 10 14:17:18 localhost kernel: traps: garbage_collect[22507] general protection ip:42b4a0 sp:7ffff7fd3ec0 error:0 in oscam[400000+f6000]
(...)
I have created a backtrace log with "gdb" debugging tool.
I attach all my logs in this ticket.
If another logs/confs can help you, developers, I can give all what you whant.
Attachments (2)
Change History (18)
by , 10 years ago
comment:1 by , 10 years ago
I forgot, my linux kernel version is 3.8.13 amd64.
Linux has installed last updated packages, for debian stable branch.
comment:2 by , 10 years ago
Yesterday I have installed last libc (https://www.debian.org/security/2014/dsa-2976) from debian.
Today I have already:
Jul 12 13:31:13 localhost kernel: garbage_collect[4623]: segfault at 0 ip 00007f188db3bbb9 sp 00007f188d262eb0 error 4 in libc-2.13.so[7f188dac0000+182000]
Jul 12 13:54:25 localhost kernel: traps: cw_process[5155] general protection ip:44ec5c sp:7fa39bccedc8 error:0
Jul 12 14:10:35 localhost kernel: garbage_collect[6259]: segfault at 7f1dae9bd4a4 ip 0000000000463629 sp 00007f1d3513aec0 error 4
Is so a noise bug!
Can anybody help me?
comment:4 by , 10 years ago
You are right, sorry for my old poor debug log.
Today I have created a good backtrace log.
#0 0x00007ffff6bcf407 in GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff6bd07e8 in GI_abort () at abort.c:89
#2 0x00007ffff6c1228d in malloc_assert (assertion=assertion@entry=0x7ffff6cfc185 "(bck->bk->size & 0x4) == 0", file=file@entry=0x7ffff6cfc04f "malloc.c",
line=line@entry=3524, function=function@entry=0x7ffff6cfc3d7 <func.11523> "_int_malloc") at malloc.c:293
#3 0x00007ffff6c1505f in _int_malloc (av=0x7fffd4000020, bytes=944) at malloc.c:3524
#4 0x00007ffff6c16220 in GI_libc_malloc (bytes=944) at malloc.c:2891
#5 0x00000000004368fe in cs_malloc (result=0x7fffb6c36950, size=944) at /home/tricky/oscam-svn/oscam-string.c:11
#6 0x0000000000414cda in get_ecmtask () at /home/user/oscam-svn/oscam-ecm.c:684
#7 0x000000000044de3a in cc_cache_push_in (cl=0xc243f0, buf=0x7fffd423fb84 <incomplete sequence \315>) at /home/tricky/oscam-svn/module-cccam.c:2364
#8 0x000000000044f3e2 in cc_parse_msg (cl=0xc243f0, buf=0x7fffd423fb80 "", l=85) at /home/user/oscam-svn/module-cccam.c:2731
#9 0x00000000004523f6 in cc_recv (cl=0xc243f0, buf=0x7fffd423fb80 "", l=2048) at /home/user/oscam-svn/module-cccam.c:3553
#10 0x0000000000422c58 in work_thread (ptr=0xa1d9c0) at /home/tricky/oscam-svn/oscam-work.c:316
#11 0x00007ffff6f4b0ca in start_thread (arg=0x7fffb6c37700) at pthread_create.c:309
#12 0x00007ffff6c8006d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:11
Hope it helps.
comment:5 by , 10 years ago
There is some progress in the forum, in which superseeds 9791.
Please try latest patch from there to see if this resolves it.
Forumthread: http://www.streamboard.tv/wbb2/thread.php?threadid=41540
Latest patch: http://www.streamboard.tv/wbb2/attachment.php?attachmentid=53456
by , 10 years ago
Attachment: | cccam_cachepush_datasize.patch added |
---|
Only cachepush if there is valid data, plz try and report!
comment:6 by , 10 years ago
1) Backtrace:
#0 pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:41
#1 0x0000000000411839 in check_is_pushed (cw=0x0, cl=0xbcad00) at /home/user/oscam-svn/oscam-cache.c:85
#2 0x000000000045b276 in camd35_cache_push_chk (cl=0xbcad00, er=0x7fffe806ee00) at /home/user/oscam-svn/module-camd35.c:671
#3 0x0000000000422ee9 in work_thread (ptr=0xa80b00) at /home/user/oscam-svn/oscam-work.c:371
#4 0x00007ffff6f4b0ca in start_thread (arg=0x7fffb6ceb700) at pthread_create.c:309
#5 0x00007ffff6c8006d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
2) Backtrace:
#0 pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:41
#1 0x0000000000411839 in check_is_pushed (cw=0x0, cl=0xcfc0e0) at /home/user/oscam-svn/oscam-cache.c:85
#2 0x000000000045b276 in camd35_cache_push_chk (cl=0xcfc0e0, er=0x7fffc841cde0) at /home/user/oscam-svn/module-camd35.c:671
#3 0x0000000000422ee9 in work_thread (ptr=0xb05060) at /home/user/oscam-svn/oscam-work.c:371
#4 0x00007ffff6f4b0ca in start_thread (arg=0x7fffb6c97700) at pthread_create.c:309
#5 0x00007ffff6c8006d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
comment:7 by , 10 years ago
I can only explain it if there is no client any longer, but this is checked just before doing the camd35_cache_push_chk. So I must assume that your host system is too weak to handle all data from and to the clients.
comment:8 by , 10 years ago
Yesterday I have patched with cccam_cachepush_datasize.patch, last svn version.
I have chenged in past days my server and today I use a virtulized AMD Opteron 2800 x 2, 2Gb of RAM. Is the commercial service called "VPS cloud 1" of OVH.
Is not enough?
My friends uses anothers poor vps (like vps classic from ovh) and everything goes ok.
Today I have another backtrace:
1)
#0 pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:41
#1 0x0000000000411839 in check_is_pushed (cw=0x0, cl=0x9aa9b0) at /home/user/oscam-svn/oscam-cache.c:85
#2 0x000000000044d74e in cc_cache_push_chk (cl=0x9aa9b0, er=0x7fffd0136a60) at /home/user/oscam-svn/module-cccam.c:2235
#3 0x0000000000422e5a in work_thread (ptr=0x8dc250) at /home/user/oscam-svn/oscam-work.c:364
#4 0x00007ffff6f4b0ca in start_thread (arg=0x7ffff4103700) at pthread_create.c:309
#5 0x00007ffff6c8006d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
2)
#0 pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:41
#1 0x0000000000411839 in check_is_pushed (cw=0x0, cw@entry=<error reading variable: Cannot access memory at address 0x7fffb6e76d58>, cl=0xa1cc10)
at /home/user/oscam-svn/oscam-cache.c:85
3)
#0 pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:41
#1 0x0000000000411839 in check_is_pushed (cw=0x0, cl=0xabba60) at /home/user/oscam-svn/oscam-cache.c:85
#2 0x000000000045b276 in camd35_cache_push_chk (cl=0xabba60, er=0x7ffff00f5a60) at /home/user/oscam-svn/module-camd35.c:671
#3 0x0000000000422ee9 in work_thread (ptr=0x9d1850) at /home/user/oscam-svn/oscam-work.c:371
#4 0x00007ffff6f4b0ca in start_thread (arg=0x7fffb6c45700) at pthread_create.c:309
#5 0x00007ffff6c8006d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
comment:9 by , 10 years ago
Another backtrace.
#0 0x00007ffff6bcf407 in GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff6bd07e8 in GI_abort () at abort.c:89
#2 0x00007ffff6c0d394 in libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7ffff6d00030 "* Error in `%s': %s: 0x%s *\n")
at ../sysdeps/posix/libc_fatal.c:175
#3 0x00007ffff6c12b7e in malloc_printerr (action=1, str=0x7ffff6cfc100 "free(): invalid size", ptr=<optimized out>) at malloc.c:4996
#4 0x00007ffff6c13886 in _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3840
#5 0x0000000000424452 in ll_destroy_free_data (l=0x7fffd40b0aa0) at /home/user/oscam-svn/oscam-llist.c:78
#6 0x0000000000413144 in cacheex_free_csp_lastnodes (er=0x7fffd40c7630) at /home/user/oscam-svn/module-cacheex.h:36
#7 0x0000000000414c8d in free_push_in_ecm (ecm=0x7fffd40c7630) at /home/user/oscam-svn/oscam-ecm.c:671
#8 0x00000000004143dd in cw_process () at /home/user/oscam-svn/oscam-ecm.c:377
#9 0x00007ffff6f4b0ca in start_thread (arg=0x7ffff7f5c700) at pthread_create.c:309
#10 0x00007ffff6c8006d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
comment:10 by , 10 years ago
May be
ECm is in cl->joblist and ECm is in function free_ecm
so we have possible segmentation fault.
this part of oscam not thread safe
ECm is in free_ecm and Ecm->matching_rdr = NULL and
ECM is used in function / example for(ea = er->matching_rdr; ea; ea = ea->next) /
so we have possible segmentation fault.
Ecm is in processing of ecm and ecm is removing from memory
so we have possible segmentation fault. / ecm was removed in bad time /
comment:11 by , 10 years ago
Yes, might be but why only this user has issues...
So there must be an unique oscam config that triggers these issues
What clienttimeout is in use?
comment:12 by , 10 years ago
comment:14 by , 10 years ago
clienttimeout 2000 means after 5 seconds garbage is cleaned, increase it to 8000 and observe if it makes any difference on the frequency of the crashes.
comment:15 by , 10 years ago
Today I have maked some test, I have used "clienttimeout = 8000" as theparasol says.
Yesterday I have no crash (with r9792). I don'to know why.
Today I have some crashes with r9796, with the same conf.
Instead today, I whant to debug in backtrace and I see those logs in livelogs:
2014/07/16 14:09:45 97B100 p WARNING: job queue reader cache3-USER1 has more than 2000 jobs! count=2001, dropped!
2014/07/16 14:09:45 97B100 p WARNING: job queue reader cache3-USER2 has more than 2000 jobs! count=2001, dropped!
2014/07/16 14:09:45 97B100 p WARNING: job queue reader cache3-USER3 has more than 2000 jobs! count=2001, dropped!
2014/07/16 14:09:45 97B100 p WARNING: job queue reader cache3-USER4 has more than 2000 jobs! count=2001, dropped!
So I think the counter of cache is too small and I want to increase it, but the option "max_count" in [cache] section is deprecated.
I whant to use this option with "max_count = 10000".
Someone in a forums says that I must change in oscam-work.c the job limit from 2000 to 2500, but I think that the best resolution is using a NEW CONFIGURATION VALUE in oscam.conf to increase or decrease for those problems.
I attach some backtrace:
1) Backtrace
#0 pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:41
#1 0x0000000000411848 in check_is_pushed (cw=0x0, cl=0xa583c0) at /home/user/oscam-svn-debug/oscam-cache.c:85
#2 0x000000000045b272 in camd35_cache_push_chk (cl=0xa583c0, er=0x7fffd41a3050) at /home/user/oscam-svn-debug/module-camd35.c:671
#3 0x0000000000422ef8 in work_thread (ptr=0xa58060) at /home/user/oscam-svn-debug/oscam-work.c:371
#4 0x00007ffff6f4b0a4 in start_thread (arg=0x7fffb76e2700) at pthread_create.c:309
#5 0x00007ffff6c8004d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
2) Backtrace
#0 pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:41
#1 0x0000000000411848 in check_is_pushed (cw=0x0, cw@entry=<error reading variable: Cannot access memory at address 0x7ffff41e0d58>, cl=0x9b5bb0)
at /home/user/oscam-svn-debug/oscam-cache.c:85
3) Backtrack
#0 0x00000000004350b4 in garbage_collector () at /home/user/oscam-svn-debug/oscam-garbage.c:108
#1 0x00007ffff6f4b0a4 in start_thread (arg=0x7ffff7fd1700) at pthread_create.c:309
#2 0x00007ffff6c8004d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
4) Backtrack
#0 0x00007ffff6bcf407 in GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff6bd07e8 in GI_abort () at abort.c:89
#2 0x00007ffff6c0d394 in libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7ffff6d00010 "* Error in `%s': %s: 0x%s *\n")
at ../sysdeps/posix/libc_fatal.c:175
#3 0x00007ffff6c12b6e in malloc_printerr (action=1, str=0x7ffff6d00038 "munmap_chunk(): invalid pointer", ptr=<optimized out>) at malloc.c:4996
#4 0x00000000004351c9 in garbage_collector () at /home/user/oscam-svn-debug/oscam-garbage.c:132
#5 0x00007ffff6f4b0a4 in start_thread (arg=0x7ffff7fd1700) at pthread_create.c:309
#6 0x00007ffff6c8004d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
comment:16 by , 10 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
just as I thought, you have far too many clients!
But increase total clients if you want, but realize that no client is interested in old or delayed ecm responses. So instead of fixing it you make the problem worse.
You have to bring the total amount of clients down on the oscam instance.
lesser hops could work too.
For that reason I close this ticket.
OScam crash