1. Background:
Download the protein sequence in UniProt database to the local machine, use the file to create BLAST search database, and then use blastp command to search in the database.
2. Mistake
Follow the instructions in the blast manual. In order to speed up the search process, use the mask file. The sequence file sequences. Downloaded from UniProt FASTA, such as:
> sid1
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> sid2
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
………………………………………………………..
1) To create a mask file:
segmasker -in sequences.fasta -infmt fasta -outfmt maskingfo_asnl_bin \
-out refsequences.asnb -parse_seqids
2) Create database
makeblastdb -in sequences.fasta -input_type fasta -dbtype prot -out uniprot \
-title “Uniprot” -parse_seqids
3) Sequence alignment
blastp -db uniprot -query input.fasta -out output.txt -outfmt 7
Result error: BLAST Database error: Error pre-fetching sequences data
3. Solution
databases_seqids is not used when creating mask files and database.
segmasker -in sequences.fasta -infmt fasta -outfmt maskingfo_asnl_bin \
-out refsequences.asnb
makeblastdb -in sequences.fasta -input_type fasta -dbtype prot -out uniprot -title “Uniprot”