Plink Error: Multiple instances of ‘_’ in sample ID.?

Preface

When converting vcf to plink format, the command is as follows:

plink --vcf  snp.vcf --recode --allow-extra-chr --out test

An error occurred:

Error: Multiple instances of '_' in sample ID.
If you do not want '_' to be treated as a FID/IID delimiter, use --double-id or
--const-fid to choose a different method of converting VCF sample IDs to PLINK
IDs, or --id-delim to change the FID/IID delimiter.

the reason

There is a hint in the error message.

By default, plink uses underscores to separate the sample names. The two separated fields are used as the family id and sample id in the ped file. If the sample name in the vcf contains multiple underscores, it cannot be divided correctly, the software will report an error.

Solution

Method 1: Modify the sample name

Assuming that the sample name of your vcf file is on line 7:

sed -i '7s/_/-/g' snp.vcf

Method 2: Modify –id-delim

The –id-delim parameter sets the default delimiter to be an underscore, which can be set to other characters to achieve the purpose of correct distinction.

Method 3: Add –double_id or –const-fid parameter

There are two kinds of parameters to specify the setting method of family_id by adding parameters.

The first type-double_id, keep the family id and sample id the same. For plant genome analysis, parents are often ignored, just add this parameter:

plink --vcf  snp.vcf --recode --allow-extra-chr --double_id --out test

The second type – const-fid sets the family id to a constant (the default value is 0).

Read More:

Leave a Reply

Your email address will not be published. Required fields are marked *