First of all, given the kernel printing information, the serial port will have some, dmesg
will see a little bit more.
BUG: Bad page map in process XXX pte:800000036fae6227 pmd:35be8c067
addr:00007f3fa75c0000 vm_flags:00200070 anon_vma:(null) mapping:(null) index:7f3fa75c0
Pid: 1312, comm: XXX Not tainted 2.6.32.27 #1
Call Trace:
[<ffffffff815a3570>] print_bad_pte+0x1e2/0x1fb
[<ffffffff811063ee>] vm_normal_page+0x6e/0x80
[<ffffffff81107117>] unmap_vmas+0x5b7/0x9f0
[<ffffffff8106edba>] ?dequeue_signal+0xda/0x170
[<ffffffff8110cb5c>] unmap_region+0xcc/0x170
[<ffffffff8110e405>] do_munmap+0x305/0x3a0
[<ffffffff8110f183>] sys_munmap+0x53/0x80
[<ffffffff8100c082>] system_call_fastpath+0x16/0x1b
Disabling lock debugging due to kernel taint
Don’t worry about the kernel version number on the stack, because switching to a 3.16.44 kernel will also cause this problem.
The stack seemed to provide a lot of information, but not much help in locating the problem, and at one point led in the wrong direction.
Stack analysis: this stack is obviously a system call, the interface function of the system call is munmap()
, in the program code search, only find a few, the analysis of the code seems to see nothing wrong. So in the process to hang up the point, since the stack each time appears in the business thread, so all the business threads in munmap()
hang the breakpoint: break munmap thread idx
, and then break munmap thread idy
… Not every time a breakpoint is triggered, the stack will be triggered several times before it occurs. This phenomenon began to suspect that the kernel memory problem. Until a display *(0x00007f3fa75c0000)
is set, each time the munmap()
breakpoint is triggered, an attempt is made to read the value of *(0x00007f3fa75c0000)
. After an interrupt is triggered, The stack above is printed directly (without setting display, continue
and munmap()
triggers printing), which is more certain that there is a kernel memory problem.
In the first believe the kernel, doubt their own mentality, the first location of the driver problem, the first thought is netmap
driver. View process XXX memory map: cat /proc/1312/maps
...
7f4020021000 default
7f40275c0000 default file=/dev/Vnetmap
7f40a8000000 default anon=3 dirty=3 N0=3
7f40a8021000 default
...
Any surprise to find that 0x7f40275c0000-0x00007f3fa75c0000 = 0x80000000
, which is netmap
minus the mapping address of the problem (that’s the one on the top display) is equal to 2G. Every time. One face muddleheaded, think of this matter what mechanism caused by the kernel, then walked to read the kernel memory management code, focused on the mmap()
system call flow. I’ll probably think about it first. mmap()
system call flow is simple as follows:
do_mmap_pgoff
|
|--get_unmapped_area
|--...
|--mmap_region
|
|--find_vma_repare
|--...
|--file->f_op->mmap
|--...
The above file - & gt; f_op-> Mmap
is specific to the file function, also note the above device file /dev/Vnetmap
when the user program opens it and puts its file descriptor fd
to mmap()
then it can be tuned to Vnetmap
custom file-> f_op-> Mmap code>. He is in the Vnetmap driver implementation, the problem is here.
Analysis: The running environment is 64-bit X86 platform. Vnetmap
is the driver module, running in kernel state, it maps the memory applied in the kernel bit by bit to the user virtual address space. It USES a variable of type int
as the cumulative count of the virtual address offset, after the offset exceeds 2G
, int
has a sign flip and becomes negative, and the offset becomes -2g
, which seems to be associated with the above.
Understand: (this part didn't source, pure analysis is only for convince yourself that it is not much time, but also does other, have the time to analyze) beyond code> 2 g memory of kernel space, actual associated with a page table (error), but not visible in the area of the page (eara), errors associated with user mode space is redistributed, find a page table is associated with the page frame, print the above error.
Contact: [email protected]