Tag Archives: pysam

[Solved] Python Read bam File Error: &&OSError: no BGZF EOF marker; file may be truncated

1. Read FASTA file:

(1) Method 1: Bio Library

from Bio import SeqIO

fa_seq = SeqIO.read("res/sequence1.fasta", "fasta")

seq = str(fa_seq.seq)

seqs = [fa.seq for fa in SeqIO.parse("res/multi.fasta", "fasta")]

(2) Method 2: pysam Library

fa = pysam.FastaFile(‘filename’)

fa.references

fa.lengths

seq = fa.fetch(reference=’chr21’, start=a, end=b)

seq = fa.fetch(reference=’chr21’)

2. Convert sequence to string: str (FA. SEQ)


print("G Counts: ", fa.count("G"))

print("reverse: ", fa[::-1])

print("Reverse complement: ", fa.complement())

3、pysam read bam file error:
File “pysam/libcalignmentfile.pyx”, line 742, in pysam.libcalignmentfile.AlignmentFile.__cinit__
File “pysam/libcalignmentfile.pyx”, line 952, in pysam.libcalignmentfile.AlignmentFile._open
File “pysam/libchtslib.pyx”, line 365, in pysam.libchtslib.HTSFile.check_truncation
OSError: no BGZF EOF marker; file may be truncated
The file can be read this way, i.e. by adding ignore_truncation=True:

bam = pysam.AlignmentFile(‘**.bam’, "rb", ignore_truncation=True)

4. Cram format file: it has the characteristics of high compression and has a higher compression rate than BAM. Most files in cram format may be used in the future

5. Comparison data (BAM/cram/SAM), variation data: (VCF/BCF)

6. Get each read

for read in bam:
    read.reference_name # The name of the chromosome to which the reference sequence is compared.

    read.pos # the position of the read alignment

    read.mapq # the quality value of the read comparison

    read.query_qualities # read sequence base qualities

    read.query_sequence # read sequence bases

    read.reference_length # 在reference genome上read比对的长度

7. Read cram file

cf = pysam.AlignmentFile(‘*.cram’, ‘rc’)

8. Read SAM file

samfile = pysam.AlignmentFile(‘**.sam’, ‘r’)

9. Get a region in Bam file

for r in pysam.AlignmentFile(‘*.bam’, ‘rb’).fetch(‘chr21’, 300, 310):
    pass  # This is done provided that the *.bam file is indexed