Opened 10 years ago

Closed 10 years ago

#3787 closed defect (invalid)

OScam crushed often, even if I change computer hardware

Reported by: fz Owned by:
Priority: critical Component: General
Severity: high Keywords: oscam crush
Cc: Sensitive: no

Description

Revision

All last versions, with last OSCam r9791

Issue Description

In running oscam, radomly oscam crush hinself.

When the issue occurs

I use oscam with debian linux 7 amd64, last update installed.

I have tried to change hardware and I have change my server but I have the same problem.

OScam crush with those errors in /var/log/messages:

(...)
Jul 10 06:33:16 localhost kernel: traps: garbage_collect[20387] general protection ip:42b4a0 sp:7f2ba81f6ec0 error:0 in oscam[400000+f6000]
Jul 10 06:46:15 localhost kernel: garbage_collect[20891]: segfault at 0 ip 00007f12c39dbae9 sp 00007f12c4974eb0 error 4 in libc-2.13.so[7f12c3960000+182000]
Jul 10 08:06:43 localhost kernel: traps: garbage_collect[23399] general protection ip:7f27d8b43ac9 sp:7f27d9ad3eb0 error:0 in libc-2.13.so[7f27d8ac8000+182000]
Jul 10 10:19:47 localhost kernel: traps: wc27-cache-INDI[17723] general protection ip:7f5f0f1a6fed sp:7f5efeeadc30 error:0 in libc-2.13.so[7f5f0f130000+182000]
Jul 10 10:28:06 localhost kernel: traps: wr01-cache-SHAR[17804] general protection ip:7f3c0ce396ba sp:7f3c0765eb70 error:0 in libc-2.13.so[7f3c0cdc0000+182000]
Jul 10 11:53:10 localhost kernel: traps: serve_process[19187] general protection ip:7f3e72b6efed sp:7f3e6296aca0 error:0 in libc-2.13.so[7f3e72af8000+182000]
Jul 10 14:17:18 localhost kernel: traps: garbage_collect[22507] general protection ip:42b4a0 sp:7ffff7fd3ec0 error:0 in oscam[400000+f6000]
(...)

I have created a backtrace log with "gdb" debugging tool.
I attach all my logs in this ticket.
If another logs/confs can help you, developers, I can give all what you whant.

Attachments (2)

Crash.txt (72.7 KB ) - added by fz 10 years ago.
OScam crash
cccam_cachepush_datasize.patch (344 bytes ) - added by theparasol 10 years ago.
Only cachepush if there is valid data, plz try and report!

Download all attachments as: .zip

Change History (18)

by fz, 10 years ago

Attachment: Crash.txt added

OScam crash

comment:1 by fz, 10 years ago

I forgot, my linux kernel version is 3.8.13 amd64.
Linux has installed last updated packages, for debian stable branch.

comment:2 by fz, 10 years ago

Yesterday I have installed last libc (https://www.debian.org/security/2014/dsa-2976) from debian.

Today I have already:

Jul 12 13:31:13 localhost kernel: garbage_collect[4623]: segfault at 0 ip 00007f188db3bbb9 sp 00007f188d262eb0 error 4 in libc-2.13.so[7f188dac0000+182000]
Jul 12 13:54:25 localhost kernel: traps: cw_process[5155] general protection ip:44ec5c sp:7fa39bccedc8 error:0
Jul 12 14:10:35 localhost kernel: garbage_collect[6259]: segfault at 7f1dae9bd4a4 ip 0000000000463629 sp 00007f1d3513aec0 error 4

Is so a noise bug!
Can anybody help me?

comment:4 by fz, 10 years ago

You are right, sorry for my old poor debug log.

Today I have created a good backtrace log.

#0 0x00007ffff6bcf407 in GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff6bd07e8 in
GI_abort () at abort.c:89
#2 0x00007ffff6c1228d in malloc_assert (assertion=assertion@entry=0x7ffff6cfc185 "(bck->bk->size & 0x4) == 0", file=file@entry=0x7ffff6cfc04f "malloc.c",

line=line@entry=3524, function=function@entry=0x7ffff6cfc3d7 <func.11523> "_int_malloc") at malloc.c:293

#3 0x00007ffff6c1505f in _int_malloc (av=0x7fffd4000020, bytes=944) at malloc.c:3524
#4 0x00007ffff6c16220 in GI_libc_malloc (bytes=944) at malloc.c:2891
#5 0x00000000004368fe in cs_malloc (result=0x7fffb6c36950, size=944) at /home/tricky/oscam-svn/oscam-string.c:11
#6 0x0000000000414cda in get_ecmtask () at /home/user/oscam-svn/oscam-ecm.c:684
#7 0x000000000044de3a in cc_cache_push_in (cl=0xc243f0, buf=0x7fffd423fb84 <incomplete sequence \315>) at /home/tricky/oscam-svn/module-cccam.c:2364
#8 0x000000000044f3e2 in cc_parse_msg (cl=0xc243f0, buf=0x7fffd423fb80 "", l=85) at /home/user/oscam-svn/module-cccam.c:2731
#9 0x00000000004523f6 in cc_recv (cl=0xc243f0, buf=0x7fffd423fb80 "", l=2048) at /home/user/oscam-svn/module-cccam.c:3553
#10 0x0000000000422c58 in work_thread (ptr=0xa1d9c0) at /home/tricky/oscam-svn/oscam-work.c:316
#11 0x00007ffff6f4b0ca in start_thread (arg=0x7fffb6c37700) at pthread_create.c:309
#12 0x00007ffff6c8006d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:11

Hope it helps.

Last edited 10 years ago by fz (previous) (diff)

comment:5 by capncook, 10 years ago

There is some progress in the forum, in which superseeds 9791.
Please try latest patch from there to see if this resolves it.

Forumthread: http://www.streamboard.tv/wbb2/thread.php?threadid=41540
Latest patch: http://www.streamboard.tv/wbb2/attachment.php?attachmentid=53456

by theparasol, 10 years ago

Only cachepush if there is valid data, plz try and report!

comment:6 by fz, 10 years ago

1) Backtrace:

#0 pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:41
#1 0x0000000000411839 in check_is_pushed (cw=0x0, cl=0xbcad00) at /home/user/oscam-svn/oscam-cache.c:85
#2 0x000000000045b276 in camd35_cache_push_chk (cl=0xbcad00, er=0x7fffe806ee00) at /home/user/oscam-svn/module-camd35.c:671
#3 0x0000000000422ee9 in work_thread (ptr=0xa80b00) at /home/user/oscam-svn/oscam-work.c:371
#4 0x00007ffff6f4b0ca in start_thread (arg=0x7fffb6ceb700) at pthread_create.c:309
#5 0x00007ffff6c8006d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

2) Backtrace:

#0 pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:41
#1 0x0000000000411839 in check_is_pushed (cw=0x0, cl=0xcfc0e0) at /home/user/oscam-svn/oscam-cache.c:85
#2 0x000000000045b276 in camd35_cache_push_chk (cl=0xcfc0e0, er=0x7fffc841cde0) at /home/user/oscam-svn/module-camd35.c:671
#3 0x0000000000422ee9 in work_thread (ptr=0xb05060) at /home/user/oscam-svn/oscam-work.c:371
#4 0x00007ffff6f4b0ca in start_thread (arg=0x7fffb6c97700) at pthread_create.c:309
#5 0x00007ffff6c8006d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

comment:7 by theparasol, 10 years ago

I can only explain it if there is no client any longer, but this is checked just before doing the camd35_cache_push_chk. So I must assume that your host system is too weak to handle all data from and to the clients.

comment:8 by fz, 10 years ago

Yesterday I have patched with cccam_cachepush_datasize.patch, last svn version.

I have chenged in past days my server and today I use a virtulized AMD Opteron 2800 x 2, 2Gb of RAM. Is the commercial service called "VPS cloud 1" of OVH.

Is not enough?
My friends uses anothers poor vps (like vps classic from ovh) and everything goes ok.

Today I have another backtrace:

1)

#0 pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:41
#1 0x0000000000411839 in check_is_pushed (cw=0x0, cl=0x9aa9b0) at /home/user/oscam-svn/oscam-cache.c:85
#2 0x000000000044d74e in cc_cache_push_chk (cl=0x9aa9b0, er=0x7fffd0136a60) at /home/user/oscam-svn/module-cccam.c:2235
#3 0x0000000000422e5a in work_thread (ptr=0x8dc250) at /home/user/oscam-svn/oscam-work.c:364
#4 0x00007ffff6f4b0ca in start_thread (arg=0x7ffff4103700) at pthread_create.c:309
#5 0x00007ffff6c8006d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

2)

#0 pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:41
#1 0x0000000000411839 in check_is_pushed (cw=0x0, cw@entry=<error reading variable: Cannot access memory at address 0x7fffb6e76d58>, cl=0xa1cc10)

at /home/user/oscam-svn/oscam-cache.c:85

3)

#0 pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:41
#1 0x0000000000411839 in check_is_pushed (cw=0x0, cl=0xabba60) at /home/user/oscam-svn/oscam-cache.c:85
#2 0x000000000045b276 in camd35_cache_push_chk (cl=0xabba60, er=0x7ffff00f5a60) at /home/user/oscam-svn/module-camd35.c:671
#3 0x0000000000422ee9 in work_thread (ptr=0x9d1850) at /home/user/oscam-svn/oscam-work.c:371
#4 0x00007ffff6f4b0ca in start_thread (arg=0x7fffb6c45700) at pthread_create.c:309
#5 0x00007ffff6c8006d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

comment:9 by fz, 10 years ago

Another backtrace.

#0 0x00007ffff6bcf407 in GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff6bd07e8 in
GI_abort () at abort.c:89
#2 0x00007ffff6c0d394 in libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7ffff6d00030 "* Error in `%s': %s: 0x%s *\n")

at ../sysdeps/posix/libc_fatal.c:175

#3 0x00007ffff6c12b7e in malloc_printerr (action=1, str=0x7ffff6cfc100 "free(): invalid size", ptr=<optimized out>) at malloc.c:4996
#4 0x00007ffff6c13886 in _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3840
#5 0x0000000000424452 in ll_destroy_free_data (l=0x7fffd40b0aa0) at /home/user/oscam-svn/oscam-llist.c:78
#6 0x0000000000413144 in cacheex_free_csp_lastnodes (er=0x7fffd40c7630) at /home/user/oscam-svn/module-cacheex.h:36
#7 0x0000000000414c8d in free_push_in_ecm (ecm=0x7fffd40c7630) at /home/user/oscam-svn/oscam-ecm.c:671
#8 0x00000000004143dd in cw_process () at /home/user/oscam-svn/oscam-ecm.c:377
#9 0x00007ffff6f4b0ca in start_thread (arg=0x7ffff7f5c700) at pthread_create.c:309
#10 0x00007ffff6c8006d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Last edited 10 years ago by fz (previous) (diff)

comment:10 by nm1, 10 years ago

May be
ECm is in cl->joblist and ECm is in function free_ecm
so we have possible segmentation fault.

this part of oscam not thread safe

ECm is in free_ecm and Ecm->matching_rdr = NULL and
ECM is used in function / example for(ea = er->matching_rdr; ea; ea = ea->next) /
so we have possible segmentation fault.

Ecm is in processing of ecm and ecm is removing from memory
so we have possible segmentation fault. / ecm was removed in bad time /

comment:11 by theparasol, 10 years ago

Yes, might be but why only this user has issues...
So there must be an unique oscam config that triggers these issues
What clienttimeout is in use?

comment:12 by fz, 10 years ago

I use:
"clienttimeout = 2000" and "fallbacktimeout = 600".

Yesterday I had crash.
Today I have tested a better configuration (of readers of users) and I have less crash (r9792).

Do you think that the last svn version (r9794) resolves?

comment:13 by Deas, 10 years ago

maybe you should go before r9785 just to be sure...

Last edited 10 years ago by Deas (previous) (diff)

comment:14 by theparasol, 10 years ago

clienttimeout 2000 means after 5 seconds garbage is cleaned, increase it to 8000 and observe if it makes any difference on the frequency of the crashes.

comment:15 by fz, 10 years ago

Today I have maked some test, I have used "clienttimeout = 8000" as theparasol says.

Yesterday I have no crash (with r9792). I don'to know why.
Today I have some crashes with r9796, with the same conf.

Instead today, I whant to debug in backtrace and I see those logs in livelogs:

2014/07/16 14:09:45 97B100 p WARNING: job queue reader cache3-USER1 has more than 2000 jobs! count=2001, dropped!
2014/07/16 14:09:45 97B100 p WARNING: job queue reader cache3-USER2 has more than 2000 jobs! count=2001, dropped!
2014/07/16 14:09:45 97B100 p WARNING: job queue reader cache3-USER3 has more than 2000 jobs! count=2001, dropped!
2014/07/16 14:09:45 97B100 p WARNING: job queue reader cache3-USER4 has more than 2000 jobs! count=2001, dropped!

So I think the counter of cache is too small and I want to increase it, but the option "max_count" in [cache] section is deprecated.
I whant to use this option with "max_count = 10000".

Someone in a forums says that I must change in oscam-work.c the job limit from 2000 to 2500, but I think that the best resolution is using a NEW CONFIGURATION VALUE in oscam.conf to increase or decrease for those problems.

I attach some backtrace:

1) Backtrace

#0 pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:41
#1 0x0000000000411848 in check_is_pushed (cw=0x0, cl=0xa583c0) at /home/user/oscam-svn-debug/oscam-cache.c:85
#2 0x000000000045b272 in camd35_cache_push_chk (cl=0xa583c0, er=0x7fffd41a3050) at /home/user/oscam-svn-debug/module-camd35.c:671
#3 0x0000000000422ef8 in work_thread (ptr=0xa58060) at /home/user/oscam-svn-debug/oscam-work.c:371
#4 0x00007ffff6f4b0a4 in start_thread (arg=0x7fffb76e2700) at pthread_create.c:309
#5 0x00007ffff6c8004d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

2) Backtrace

#0 pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:41
#1 0x0000000000411848 in check_is_pushed (cw=0x0, cw@entry=<error reading variable: Cannot access memory at address 0x7ffff41e0d58>, cl=0x9b5bb0)

at /home/user/oscam-svn-debug/oscam-cache.c:85

3) Backtrack

#0 0x00000000004350b4 in garbage_collector () at /home/user/oscam-svn-debug/oscam-garbage.c:108
#1 0x00007ffff6f4b0a4 in start_thread (arg=0x7ffff7fd1700) at pthread_create.c:309
#2 0x00007ffff6c8004d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

4) Backtrack

#0 0x00007ffff6bcf407 in GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff6bd07e8 in
GI_abort () at abort.c:89
#2 0x00007ffff6c0d394 in libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7ffff6d00010 "* Error in `%s': %s: 0x%s *\n")

at ../sysdeps/posix/libc_fatal.c:175

#3 0x00007ffff6c12b6e in malloc_printerr (action=1, str=0x7ffff6d00038 "munmap_chunk(): invalid pointer", ptr=<optimized out>) at malloc.c:4996
#4 0x00000000004351c9 in garbage_collector () at /home/user/oscam-svn-debug/oscam-garbage.c:132
#5 0x00007ffff6f4b0a4 in start_thread (arg=0x7ffff7fd1700) at pthread_create.c:309
#6 0x00007ffff6c8004d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

comment:16 by theparasol, 10 years ago

Resolution: invalid
Status: newclosed

just as I thought, you have far too many clients!
But increase total clients if you want, but realize that no client is interested in old or delayed ecm responses. So instead of fixing it you make the problem worse.
You have to bring the total amount of clients down on the oscam instance.
lesser hops could work too.

For that reason I close this ticket.

Note: See TracTickets for help on using tickets.