Package org.snpeff.vcf
Class VcfEntry
- java.lang.Object
-
- org.snpeff.interval.Interval
-
- org.snpeff.interval.Marker
-
- org.snpeff.vcf.VcfEntry
-
- All Implemented Interfaces:
java.io.Serializable,java.lang.Cloneable,java.lang.Comparable<Interval>,java.lang.Iterable<VcfGenotype>,TxtSerializable
public class VcfEntry extends Marker implements java.lang.Iterable<VcfGenotype>
A VCF entry (a line) in a VCF file- Author:
- pablocingolani
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classVcfEntry.AlleleFrequencyType
-
Field Summary
Fields Modifier and Type Field Description static doubleALLELE_FEQUENCY_COMMONstatic doubleALLELE_FEQUENCY_LOWprotected java.lang.String[]altsprotected java.lang.StringaltStrprotected java.lang.StringchromosomeNamestatic java.lang.String[]EMPTY_STRING_ARRAYprotected java.lang.Stringfilterstatic java.lang.StringFILTER_PASSprotected java.lang.Stringformatprotected java.lang.String[]formatFieldsprotected java.lang.String[]genotypeFieldsprotected java.lang.StringgenotypeFieldsStrprotected byte[]genotypeScoresprotected java.util.HashMap<java.lang.String,java.lang.String>infostatic java.util.regex.PatternINFO_KEY_PATTERNprotected java.lang.StringinfoStrprotected java.lang.Stringlineprotected intlineNumprotected java.lang.Doublequalityprotected java.lang.Stringrefstatic java.lang.StringSUB_FIELD_SEPprotected java.util.LinkedList<Variant>variantsstatic java.lang.StringVCF_ALT_MISSING_REFstatic java.lang.String[]VCF_ALT_MISSING_REF_ARRAYstatic java.lang.StringVCF_ALT_NON_REFstatic java.lang.String[]VCF_ALT_NON_REF_ARRAYstatic java.lang.StringVCF_ALT_NON_REF_gVCFstatic java.lang.String[]VCF_ALT_NON_REF_gVCF_ARRAYstatic java.lang.StringVCF_INFO_ENDstatic java.lang.StringVCF_INFO_HETSstatic java.lang.StringVCF_INFO_HOMSstatic java.lang.StringVCF_INFO_NASstatic java.lang.StringVCF_INFO_PRIVATEprotected java.util.List<VcfEffect>vcfEffectsprotected VcfFileIteratorvcfFileIteratorprotected java.util.ArrayList<VcfGenotype>vcfGenotypesstatic charWITHIN_FIELD_SEP-
Fields inherited from class org.snpeff.interval.Interval
chromosomeNameOri, end, id, parent, start, strandMinus
-
-
Constructor Summary
Constructors Constructor Description VcfEntry(VcfFileIterator vcfFileIterator, java.lang.String line, int lineNum, boolean parseNow)Create a line form a file iteratorVcfEntry(VcfFileIterator vcfFileIterator, Marker parent, java.lang.String chromosomeName, int start, java.lang.String id, java.lang.String ref, java.lang.String altsStr, double quality, java.lang.String filterPass, java.lang.String infoStr, java.lang.String format)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidaddFilter(java.lang.String filterStr)Add string to FILTER fieldvoidaddFormat(java.lang.String formatName)Add a 'FORMAT' fieldvoidaddGenotype(java.lang.String vcfGenotypeStr)Add a genotype as a stringvoidaddInfo(java.lang.String key, java.lang.String value)Add a "key=value" tuple the info fieldVcfEntry.AlleleFrequencyTypealleleFrequencyType()Categorization by allele frequencyjava.lang.BooleancalcHetero()Is this entry heterozygous? Infer Hom/Her if there is only one sample in the file.java.lang.Stringcheck()Perform several simple checks and report problems (if any).static java.lang.StringcleanUnderscores(java.lang.String s)Return a string without leading, trailing and duplicated underscoresCdscloneShallow()Perform a shallow clonebooleancompressGenotypes()Compress genotypes into "HO/HE/NA" INFO fieldsbooleandelFilter(java.lang.String filterStr)Remove a string from FILTER fieldintgetAltIndex(java.lang.String alt)Get index of matching ALT entryjava.lang.String[]getAlts()java.lang.StringgetAltsStr()Create a comma separated ALTS stringjava.lang.StringgetChromosomeNameOri()Original chromosome name (as it appeared in the VCF file)java.lang.StringgetFilter()java.lang.StringgetFormat()java.lang.String[]getFormatFields()byte[]getGenotypesScores()Return genotypes parsed as an array of codesjava.lang.StringgetInfo(java.lang.String key)Get info stringjava.lang.StringgetInfo(java.lang.String key, java.lang.String allele)Get info string for a specific allelejava.lang.StringgetInfo(java.lang.String key, Variant var)Get an INFO field matching a variantbooleangetInfoFlag(java.lang.String key)Does the entry exists?doublegetInfoFloat(java.lang.String key)Get info field as a 'double' number The norm specifies data type as 'FLOAT', that is why the name of this method might be not intuitivelonggetInfoInt(java.lang.String key)Get info field as an long number The norm specifies data type as 'INT', that is why the name of this method might be not intuitivejava.util.Set<java.lang.String>getInfoKeys()Get all keys available in the info fieldjava.lang.StringgetInfoStr()Get the full (unparsed) INFO fieldjava.lang.StringgetLine()Original VCF line (from file)intgetLineNum()intgetNumberOfSamples()number of samples in this VCF filedoublegetQuality()java.lang.StringgetRef()java.lang.StringgetStr()java.util.List<VcfEffect>getVcfEffects()java.util.List<VcfEffect>getVcfEffects(EffFormatVersion formatVersion)Parse 'EFF' info field and get a list of effectsVcfFileIteratorgetVcfFileIterator()VcfGenotypegetVcfGenotype(int index)java.util.List<VcfGenotype>getVcfGenotypes()VcfHeaderInfogetVcfInfo(java.lang.String id)Get VcfInfo type for a given IDVcfInfoTypegetVcfInfoNumber(java.lang.String id)Get Info number for a given IDbooleanhasField(java.lang.String filedName)booleanhasGenotypes()booleanhasInfo(java.lang.String infoFieldName)booleanhasQuality()booleanisBiAllelic()Is this bi-allelic (based ONLY on the number of ALTs) WARINIG: You should use 'calcHetero()' method for a more precise calculation.booleanisCompressedGenotypes()Do we have compressed genotypes in "HO,HE,NA" INFO fields?static booleanisEmpty(java.lang.String value)Does 'value' represent an EMPTY / MISSING value in a VCF field? (or multiple MISSING comma-separated values)booleanisFilterPass()booleanisMultiallelic()Is this multi-allelic (based ONLY on the number of ALTs) WARINIG: You should use 'calcHetero()' method for a more precise calculation.protected booleanisShowWarningIfParentDoesNotInclude()Show an error if parent does not include child?booleanisSingleSnp()Is thins a VCF entry with a single SNP?booleanisSingleton()Is this variant a singleton (appears only in one genotype)static booleanisValidInfoKey(java.lang.String key)Make sure the INFO key matches the regular expression (as specified in VCF spec 4.3)static booleanisValidInfoValue(java.lang.String value)Check that this value can be added to an INFO fieldbooleanisVariant()Is this a change or are the ALTs actually the same as the referencebooleanisVariant(java.lang.String alt)Is this ALT string a variant?java.util.Iterator<VcfGenotype>iterator()intmac()Calculate Minor allele countdoublemaf()Calculate Minor allele frequencyvoidparse()Parse a 'line' from a 'vcfFileIterator'java.util.List<VcfLof>parseLof()Parse LOF from VcfEntryjava.util.List<VcfNmd>parseNmd()Parse NMD from VcfEntryvoidremoveInfo(java.lang.String key)Remove INFO fieldbooleanrmInfo(java.lang.String info)Parse INFO fieldsvoidsetFilter(java.lang.String filter)voidsetFormat(java.lang.String format)voidsetGenotypeStr(java.lang.String genotypeFieldsStr)voidsetLineNum(int lineNum)java.lang.StringtoStr()To string as a simple "CHR:START_REF/ALTs" formatjava.lang.StringtoString()java.lang.StringtoStringNoGt()Show only first eight fields (no genotype entries)VcfEntryuncompressGenotypes()Uncompress VCF entry having genotypes in "HO,HE,NA" fieldsjava.util.List<Variant>variants()Create a list of variants from this VcfEntrystatic java.lang.StringvcfInfoDecode(java.lang.String str)Decode INFO valuestatic java.lang.StringvcfInfoEncode(java.lang.String str)Encode a string to be used in an 'INFO' field value From the VCF 4.3 specification Characters with special meaning (such as field delimiters ';' in INFO or ':' FORMAT fields) must be represented using the capitalized percent encoding: %3A : (colon) %3B ; (semicolon) %3D = (equal sign) %25 % (percent sign) %2C , (comma) %0D CR %0A LF %09 TABstatic java.lang.StringvcfInfoKeySafe(java.lang.String str)Return a string safe to be used in an 'INFO' field keystatic java.lang.StringvcfInfoValueSafe(java.lang.String str)Return a string safe to be used in an 'INFO' field value-
Methods inherited from class org.snpeff.interval.Marker
adjust, apply, applyDel, applyDup, applyIns, applyMixed, clone, codonTable, compareTo, compareToPos, distance, distanceBases, getParent, getType, idChain, idChain, idChain, includes, intersect, isAdjustIfParentDoesNotInclude, isDeferredAnalysis, minus, query, query, readTxt, serializeParse, serializeSave, shouldApply, union, variantEffect, variantEffectNonRef
-
Methods inherited from class org.snpeff.interval.Interval
equals, findParent, getChromosome, getChromosomeName, getChromosomeNum, getEnd, getGenome, getGenomeName, getId, getStart, getStrand, hashCode, intersects, intersects, intersects, intersects, intersectSize, isCircular, isSameChromo, isStrandMinus, isStrandPlus, isValid, setChromosomeNameOri, setEnd, setId, setParent, setStart, setStrandMinus, shiftCoordinates, size, toStringAsciiArt, toStrPos
-
-
-
-
Field Detail
-
FILTER_PASS
public static final java.lang.String FILTER_PASS
- See Also:
- Constant Field Values
-
WITHIN_FIELD_SEP
public static final char WITHIN_FIELD_SEP
- See Also:
- Constant Field Values
-
SUB_FIELD_SEP
public static final java.lang.String SUB_FIELD_SEP
- See Also:
- Constant Field Values
-
EMPTY_STRING_ARRAY
public static final java.lang.String[] EMPTY_STRING_ARRAY
-
ALLELE_FEQUENCY_COMMON
public static final double ALLELE_FEQUENCY_COMMON
- See Also:
- Constant Field Values
-
ALLELE_FEQUENCY_LOW
public static final double ALLELE_FEQUENCY_LOW
- See Also:
- Constant Field Values
-
INFO_KEY_PATTERN
public static final java.util.regex.Pattern INFO_KEY_PATTERN
-
VCF_INFO_END
public static final java.lang.String VCF_INFO_END
- See Also:
- Constant Field Values
-
VCF_ALT_NON_REF
public static final java.lang.String VCF_ALT_NON_REF
- See Also:
- Constant Field Values
-
VCF_ALT_NON_REF_gVCF
public static final java.lang.String VCF_ALT_NON_REF_gVCF
- See Also:
- Constant Field Values
-
VCF_ALT_MISSING_REF
public static final java.lang.String VCF_ALT_MISSING_REF
- See Also:
- Constant Field Values
-
VCF_ALT_NON_REF_gVCF_ARRAY
public static final java.lang.String[] VCF_ALT_NON_REF_gVCF_ARRAY
-
VCF_ALT_NON_REF_ARRAY
public static final java.lang.String[] VCF_ALT_NON_REF_ARRAY
-
VCF_ALT_MISSING_REF_ARRAY
public static final java.lang.String[] VCF_ALT_MISSING_REF_ARRAY
-
VCF_INFO_HOMS
public static final java.lang.String VCF_INFO_HOMS
- See Also:
- Constant Field Values
-
VCF_INFO_HETS
public static final java.lang.String VCF_INFO_HETS
- See Also:
- Constant Field Values
-
VCF_INFO_NAS
public static final java.lang.String VCF_INFO_NAS
- See Also:
- Constant Field Values
-
VCF_INFO_PRIVATE
public static final java.lang.String VCF_INFO_PRIVATE
- See Also:
- Constant Field Values
-
alts
protected java.lang.String[] alts
-
altStr
protected java.lang.String altStr
-
chromosomeName
protected java.lang.String chromosomeName
-
filter
protected java.lang.String filter
-
format
protected java.lang.String format
-
formatFields
protected java.lang.String[] formatFields
-
genotypeFields
protected java.lang.String[] genotypeFields
-
genotypeFieldsStr
protected java.lang.String genotypeFieldsStr
-
genotypeScores
protected byte[] genotypeScores
-
info
protected java.util.HashMap<java.lang.String,java.lang.String> info
-
infoStr
protected java.lang.String infoStr
-
line
protected java.lang.String line
-
lineNum
protected int lineNum
-
quality
protected java.lang.Double quality
-
ref
protected java.lang.String ref
-
variants
protected java.util.LinkedList<Variant> variants
-
vcfEffects
protected java.util.List<VcfEffect> vcfEffects
-
vcfFileIterator
protected VcfFileIterator vcfFileIterator
-
vcfGenotypes
protected java.util.ArrayList<VcfGenotype> vcfGenotypes
-
-
Constructor Detail
-
VcfEntry
public VcfEntry(VcfFileIterator vcfFileIterator, Marker parent, java.lang.String chromosomeName, int start, java.lang.String id, java.lang.String ref, java.lang.String altsStr, double quality, java.lang.String filterPass, java.lang.String infoStr, java.lang.String format)
-
VcfEntry
public VcfEntry(VcfFileIterator vcfFileIterator, java.lang.String line, int lineNum, boolean parseNow)
Create a line form a file iterator
-
-
Method Detail
-
cleanUnderscores
public static java.lang.String cleanUnderscores(java.lang.String s)
Return a string without leading, trailing and duplicated underscores
-
isEmpty
public static boolean isEmpty(java.lang.String value)
Does 'value' represent an EMPTY / MISSING value in a VCF field? (or multiple MISSING comma-separated values)
-
isValidInfoKey
public static boolean isValidInfoKey(java.lang.String key)
Make sure the INFO key matches the regular expression (as specified in VCF spec 4.3)
-
isValidInfoValue
public static boolean isValidInfoValue(java.lang.String value)
Check that this value can be added to an INFO field- Returns:
- true if OK, false if invalid value
-
vcfInfoDecode
public static java.lang.String vcfInfoDecode(java.lang.String str)
Decode INFO value
-
vcfInfoEncode
public static java.lang.String vcfInfoEncode(java.lang.String str)
Encode a string to be used in an 'INFO' field value From the VCF 4.3 specification Characters with special meaning (such as field delimiters ';' in INFO or ':' FORMAT fields) must be represented using the capitalized percent encoding: %3A : (colon) %3B ; (semicolon) %3D = (equal sign) %25 % (percent sign) %2C , (comma) %0D CR %0A LF %09 TAB
-
vcfInfoKeySafe
public static java.lang.String vcfInfoKeySafe(java.lang.String str)
Return a string safe to be used in an 'INFO' field key
-
vcfInfoValueSafe
public static java.lang.String vcfInfoValueSafe(java.lang.String str)
Return a string safe to be used in an 'INFO' field value
-
addFilter
public void addFilter(java.lang.String filterStr)
Add string to FILTER field
-
addFormat
public void addFormat(java.lang.String formatName)
Add a 'FORMAT' field
-
addGenotype
public void addGenotype(java.lang.String vcfGenotypeStr)
Add a genotype as a string
-
addInfo
public void addInfo(java.lang.String key, java.lang.String value)Add a "key=value" tuple the info field- Parameters:
key- : INFO key namevalue- : Can be null if it is a boolean field.
-
alleleFrequencyType
public VcfEntry.AlleleFrequencyType alleleFrequencyType()
Categorization by allele frequency
-
calcHetero
public java.lang.Boolean calcHetero()
Is this entry heterozygous? Infer Hom/Her if there is only one sample in the file. Ohtherwise the field is null.
-
check
public java.lang.String check()
Perform several simple checks and report problems (if any).
-
cloneShallow
public Cds cloneShallow()
Description copied from class:MarkerPerform a shallow clone- Overrides:
cloneShallowin classMarker
-
compressGenotypes
public boolean compressGenotypes()
Compress genotypes into "HO/HE/NA" INFO fields
-
delFilter
public boolean delFilter(java.lang.String filterStr)
Remove a string from FILTER field
-
getAltIndex
public int getAltIndex(java.lang.String alt)
Get index of matching ALT entry- Returns:
- -1 if not found
-
getAlts
public java.lang.String[] getAlts()
-
getAltsStr
public java.lang.String getAltsStr()
Create a comma separated ALTS string
-
getChromosomeNameOri
public java.lang.String getChromosomeNameOri()
Original chromosome name (as it appeared in the VCF file)- Overrides:
getChromosomeNameOriin classInterval
-
getFilter
public java.lang.String getFilter()
-
getFormat
public java.lang.String getFormat()
-
getFormatFields
public java.lang.String[] getFormatFields()
-
getGenotypesScores
public byte[] getGenotypesScores()
Return genotypes parsed as an array of codes
-
getInfo
public java.lang.String getInfo(java.lang.String key)
Get info string
-
getInfo
public java.lang.String getInfo(java.lang.String key, java.lang.String allele)Get info string for a specific allele
-
getInfo
public java.lang.String getInfo(java.lang.String key, Variant var)Get an INFO field matching a variant
-
getInfoFlag
public boolean getInfoFlag(java.lang.String key)
Does the entry exists?
-
getInfoFloat
public double getInfoFloat(java.lang.String key)
Get info field as a 'double' number The norm specifies data type as 'FLOAT', that is why the name of this method might be not intuitive
-
getInfoInt
public long getInfoInt(java.lang.String key)
Get info field as an long number The norm specifies data type as 'INT', that is why the name of this method might be not intuitive
-
getInfoKeys
public java.util.Set<java.lang.String> getInfoKeys()
Get all keys available in the info field
-
getInfoStr
public java.lang.String getInfoStr()
Get the full (unparsed) INFO field
-
getLine
public java.lang.String getLine()
Original VCF line (from file)
-
getLineNum
public int getLineNum()
-
getNumberOfSamples
public int getNumberOfSamples()
number of samples in this VCF file
-
getQuality
public double getQuality()
-
getRef
public java.lang.String getRef()
-
getStr
public java.lang.String getStr()
-
getVcfEffects
public java.util.List<VcfEffect> getVcfEffects()
-
getVcfEffects
public java.util.List<VcfEffect> getVcfEffects(EffFormatVersion formatVersion)
Parse 'EFF' info field and get a list of effects
-
getVcfFileIterator
public VcfFileIterator getVcfFileIterator()
-
getVcfGenotype
public VcfGenotype getVcfGenotype(int index)
-
getVcfGenotypes
public java.util.List<VcfGenotype> getVcfGenotypes()
-
getVcfInfo
public VcfHeaderInfo getVcfInfo(java.lang.String id)
Get VcfInfo type for a given ID
-
getVcfInfoNumber
public VcfInfoType getVcfInfoNumber(java.lang.String id)
Get Info number for a given ID
-
hasField
public boolean hasField(java.lang.String filedName)
-
hasGenotypes
public boolean hasGenotypes()
-
hasInfo
public boolean hasInfo(java.lang.String infoFieldName)
-
hasQuality
public boolean hasQuality()
-
isBiAllelic
public boolean isBiAllelic()
Is this bi-allelic (based ONLY on the number of ALTs) WARINIG: You should use 'calcHetero()' method for a more precise calculation.
-
isCompressedGenotypes
public boolean isCompressedGenotypes()
Do we have compressed genotypes in "HO,HE,NA" INFO fields?
-
isFilterPass
public boolean isFilterPass()
-
isMultiallelic
public boolean isMultiallelic()
Is this multi-allelic (based ONLY on the number of ALTs) WARINIG: You should use 'calcHetero()' method for a more precise calculation.
-
isShowWarningIfParentDoesNotInclude
protected boolean isShowWarningIfParentDoesNotInclude()
Description copied from class:MarkerShow an error if parent does not include child?- Overrides:
isShowWarningIfParentDoesNotIncludein classMarker
-
isSingleSnp
public boolean isSingleSnp()
Is thins a VCF entry with a single SNP?
-
isSingleton
public boolean isSingleton()
Is this variant a singleton (appears only in one genotype)
-
isVariant
public boolean isVariant()
Is this a change or are the ALTs actually the same as the reference
-
isVariant
public boolean isVariant(java.lang.String alt)
Is this ALT string a variant?
-
iterator
public java.util.Iterator<VcfGenotype> iterator()
- Specified by:
iteratorin interfacejava.lang.Iterable<VcfGenotype>
-
mac
public int mac()
Calculate Minor allele count
-
maf
public double maf()
Calculate Minor allele frequency
-
parse
public void parse()
Parse a 'line' from a 'vcfFileIterator'
-
parseLof
public java.util.List<VcfLof> parseLof()
Parse LOF from VcfEntry
-
parseNmd
public java.util.List<VcfNmd> parseNmd()
Parse NMD from VcfEntry
-
removeInfo
public void removeInfo(java.lang.String key)
Remove INFO field
-
rmInfo
public boolean rmInfo(java.lang.String info)
Parse INFO fields
-
setFilter
public void setFilter(java.lang.String filter)
-
setFormat
public void setFormat(java.lang.String format)
-
setGenotypeStr
public void setGenotypeStr(java.lang.String genotypeFieldsStr)
-
setLineNum
public void setLineNum(int lineNum)
-
toStr
public java.lang.String toStr()
To string as a simple "CHR:START_REF/ALTs" format
-
toStringNoGt
public java.lang.String toStringNoGt()
Show only first eight fields (no genotype entries)
-
uncompressGenotypes
public VcfEntry uncompressGenotypes()
Uncompress VCF entry having genotypes in "HO,HE,NA" fields
-
variants
public java.util.List<Variant> variants()
Create a list of variants from this VcfEntry
-
-