Package org.snpeff.interval
Class Transcript
- All Implemented Interfaces:
Serializable,Cloneable,Comparable<Interval>,Iterable<Exon>,TxtSerializable
Interval for a transcript, as well as some other information: exons, utrs, cds, etc.
- Author:
- pcingola
- See Also:
-
Field Summary
Fields inherited from class org.snpeff.interval.Interval
chromosomeNameOri, end, id, parent, start, strandMinus -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionint[]Calculate chromosome position as function of Amino Acid number Note that returns the chromosomal position of the first base for each Amino AcidintaaNumber2Pos(int aaNum) Find a genomic position of the first base in a Amino Acid 'aaNum'voidAdd a CDSvoidAdd an intronvoidadd(SpliceSite spliceSite) Add a SpliceSitevoidAdd a UTRbooleanadjust()Adjust transcript coordinatesCreate a new transcript after applying changes in variantbaseAt(int pos) Find base at genomic coordinate 'pos'intbaseNumber2MRnaPos(int pos) Calculate distance from transcript start to a position mRNA is roughly the same than cDNA.intbaseNumberCds(int pos, boolean usePrevBaseIntron) Calculate base number in a CDS where 'pos' mapsbaseNumberCds2Codon(int cdsBaseNumber) Return a codon that includes 'cdsBaseNumber'int[]Calculate chromosome position as function of CDS numberintbaseNumberCds2Pos(int cdsBaseNum) cds()Retrieve coding sequenceCreate a marker of the coding region in this transcriptPerform a shallow cloneint[]codonNumber2Pos(int codonNum) Return an array of 3 genomic positions where amino acid number 'aaNum' mapsbooleanCollapses exons having gaps of zero (i.e.doubleCalculate CpG bias: number of CpG / expected[CpG]intcpgExons()Count total CpG in this transcript's exonsvoidcreateSpliceSites(int spliceSiteSize, int spliceRegionExonSize, int spliceRegionIntronMin, int spliceRegionIntronMax) Find all splice sites.voidcreateUpDownStream(int upDownLength) Creates a list of UP/DOWN stream regions (for each transcript) Upstream (downstream) stream is defined as upDownLength before (after) transcriptbooleanDeletes redundant exons (i.e.Find a CDS that matches exactly the exonfindExon(int pos) Return the an exon that intersects 'pos'Return an exon intersecting 'marker' (first exon found)findIntron(int pos) Return an intron overlapping position 'pos'findUtr(int pos) Return the UTR that hits position 'pos'Return the UTR that intersects 'marker' (null if not found)booleanCorrect exons based on frame information.Create a list of 3 prime UTRsCreate a list of 5 prime UTRsgetCds()Get all CDSsintintgetExons()A more intuitive name for 'subintervals'Get first coding exongetGene()String[]getTags()getTss()Create a TSS markergetUtrs()Get all UTRsbooleanhasCds()booleanhasError()Does this transcript have any errors?booleanDoes this transcript have any errors?booleanbooleanDoes this transcript have 'tag'?booleanhasTags()booleanbooleanDoes this transcript have any errors?introns()Get all introns (lazy init)booleanprotected booleanisAdjustIfParentDoesNotInclude(Marker parent) Adjust parent if it does not include child?booleanbooleanHas this transcript been checked against CDS/DNA/AA sequences?booleanbooleanbooleanisDownstream(int pos) booleanCheck if coding length is multiple of 3 in protein coding transcriptsbooleanIs the first codon a START codon?booleanCheck if protein sequence has STOP codons in the middle of the coding sequencebooleanisIntron(int pos) booleanbooleanbooleanisUpstream(int pos) booleanisUtr(int pos) booleanbooleanisUtr3(int pos) booleanisUtr5(int pos) booleanIs the last codon a STOP codon?markers()A list of all markers in this transcriptmRna()Retrieve coding sequence AND the UTRs (mRNA = 5'UTR + CDS + 3'UTR) I.e.protein()Protein sequence (amino acid sequence produced by this transcripts)Query all genomic regions that intersect 'marker'Return the first exon that intersects 'interval' (null if not found)booleanAssign ranks to exonsvoidreset()Remove all intervalsvoidvoidsanityCheck(Variant variant) Perfom some baseic chekcs, return error type, if anyvoidserializeParse(MarkerSerializer markerSerializer) Parse a line from a serialized fileserializeSave(MarkerSerializer markerSerializer) Create a string to serialize to a filevoidsetAaCheck(boolean aaCheck) voidsetBioType(BioType bioType) voidsetCanonical(boolean canonical) voidsetDnaCheck(boolean dnaCheck) voidsetProteinCoding(boolean proteinCoding) voidsetProteinId(String proteinId) voidsetRibosomalSlippage(boolean ribosomalSlippage) voidvoidsetTranscriptSupportLevel(TranscriptSupportLevel transcriptSupportLevel) voidsetVersion(String version) voidsortCds()toString()toString(boolean full) toStringAsciiArt(boolean full) Show a transcript as an ASCII ArtbooleanutrFromCds(boolean verbose) Calculate UTR regions from CDSsbooleanvariantEffect(Variant variant, VariantEffects variantEffects) Get some details about the effect on this transcriptMethods inherited from class org.snpeff.interval.IntervalAndSubIntervals
add, addAll, addAll, clone, containsId, get, invalidateSorted, iterator, numChilds, remove, setStrandMinus, shiftCoordinates, sorted, sortedStrand, subIntervalsMethods inherited from class org.snpeff.interval.Marker
adjust, applyDel, applyDup, applyIns, applyMixed, codonTable, compareTo, compareToPos, distance, distanceBases, getParent, getType, idChain, idChain, idChain, includes, intersect, isDeferredAnalysis, isShowWarningIfParentDoesNotInclude, minus, query, readTxt, shouldApply, union, variantEffectNonRefMethods inherited from class org.snpeff.interval.Interval
equals, findParent, getChromosome, getChromosomeName, getChromosomeNameOri, getChromosomeNum, getEnd, getGenome, getGenomeName, getId, getStart, getStrand, hashCode, intersects, intersects, intersects, intersects, intersectSize, isCircular, isSameChromo, isStrandMinus, isStrandPlus, isValid, setChromosomeNameOri, setEnd, setId, setParent, setStart, size, toStr, toStringAsciiArt, toStrPosMethods inherited from class java.lang.Object
equals, finalize, getClass, notify, notifyAll, wait, wait, waitMethods inherited from interface java.lang.Iterable
forEach, spliterator
-
Constructor Details
-
Transcript
public Transcript() -
Transcript
-
-
Method Details
-
aaNumber2Pos
public int[] aaNumber2Pos()Calculate chromosome position as function of Amino Acid number Note that returns the chromosomal position of the first base for each Amino AcidIf you need the chromosomal position of each base
-
aaNumber2Pos
public int aaNumber2Pos(int aaNum) Find a genomic position of the first base in a Amino Acid 'aaNum' -
add
Add a CDS -
add
Add an intron -
add
Add a SpliceSite -
add
Add a UTR -
adjust
public boolean adjust()Adjust transcript coordinates -
apply
Create a new transcript after applying changes in variantNote: If this transcript is unaffected, no new transcript is created (same transcript is returned)
- Overrides:
applyin classIntervalAndSubIntervals<Exon>- Returns:
- The marker result after applying variant
-
baseAt
Find base at genomic coordinate 'pos' -
baseNumber2MRnaPos
public int baseNumber2MRnaPos(int pos) Calculate distance from transcript start to a position mRNA is roughly the same than cDNA. Strictly speaking mRNA has a poly-A tail and 5'cap. -
baseNumberCds
public int baseNumberCds(int pos, boolean usePrevBaseIntron) Calculate base number in a CDS where 'pos' maps- Parameters:
usePrevBaseIntron- : When 'pos' is intronic this method returns: - if( usePrevBaseIntron== false) => The first base in the exon after 'pos' (i.e. first coding base after intron) - if( usePrevBaseIntron== true) => The last base in the exon before 'pos' (i.e. last coding base before intron)
-
baseNumberCds2Codon
Return a codon that includes 'cdsBaseNumber' -
baseNumberCds2Pos
public int[] baseNumberCds2Pos()Calculate chromosome position as function of CDS number -
baseNumberCds2Pos
public int baseNumberCds2Pos(int cdsBaseNum) -
cds
Retrieve coding sequence -
cdsMarker
Create a marker of the coding region in this transcript -
cloneShallow
Description copied from class:MarkerPerform a shallow clone- Overrides:
cloneShallowin classIntervalAndSubIntervals<Exon>
-
codonNumber2Pos
public int[] codonNumber2Pos(int codonNum) Return an array of 3 genomic positions where amino acid number 'aaNum' maps- Returns:
- aa2pos[0], aa2pos[1], aa2pos[2] are the coordinates (within the chromosome)
of the three bases conforming codon 'aaNum'. Any aa2pos[i] = -1 means that
it could a base in the codon could not be mapped.
Bases in the array are sorted by chromosome position, so aa2pos[0] < aa2pos[1] < aa2pos[2]
-
collapseZeroGap
public boolean collapseZeroGap()Collapses exons having gaps of zero (i.e. exons that followed by other exons). Does the same for CDSs and UTRs.- Returns:
- true of any exon in the transcript was 'collapsed'
-
cpgExonBias
public double cpgExonBias()Calculate CpG bias: number of CpG / expected[CpG] -
cpgExons
public int cpgExons()Count total CpG in this transcript's exons -
createSpliceSites
public void createSpliceSites(int spliceSiteSize, int spliceRegionExonSize, int spliceRegionIntronMin, int spliceRegionIntronMax) Find all splice sites. -
createUpDownStream
public void createUpDownStream(int upDownLength) Creates a list of UP/DOWN stream regions (for each transcript) Upstream (downstream) stream is defined as upDownLength before (after) transcript -
deleteRedundant
public boolean deleteRedundant()Deletes redundant exons (i.e. exons that are totally included in other exons). Does the same for CDSs. Does the same for UTRs. -
findCds
Find a CDS that matches exactly the exon -
findExon
Return the an exon that intersects 'pos' -
findExon
Return an exon intersecting 'marker' (first exon found) -
findIntron
Return an intron overlapping position 'pos' -
findUtr
Return the UTR that hits position 'pos'- Returns:
- An UTR intersecting 'pos' (null if not found)
-
findUtrs
Return the UTR that intersects 'marker' (null if not found) -
frameCorrection
public boolean frameCorrection()Correct exons based on frame information.E.g. if the frame information (form a genomic database file, such as a GTF) does not match the calculated frame, we correct exon's boundaries to make them match.
This is performed in two stages: i) First exon is corrected by adding a fake 5'UTR ii) Other exons are corrected by changing the start (or end) coordinates.
-
get3primeUtrs
Create a list of 3 prime UTRs -
get3primeUtrsSorted
-
get5primeUtrs
Create a list of 5 prime UTRs -
get5primeUtrsSorted
-
getBioType
-
getCds
Get all CDSs -
getCdsEnd
public int getCdsEnd() -
getCdsStart
public int getCdsStart() -
getDownstream
-
getExons
A more intuitive name for 'subintervals' -
getFirstCodingExon
Get first coding exon -
getGene
-
hasProteinId
public boolean hasProteinId() -
getProteinId
-
getTags
-
getTranscriptSupportLevel
-
getTss
Create a TSS marker -
getUpstream
-
getUtrs
Get all UTRs -
getVersion
-
setVersion
-
hasCds
public boolean hasCds() -
hasError
public boolean hasError()Does this transcript have any errors? -
hasErrorOrWarning
public boolean hasErrorOrWarning()Does this transcript have any errors? -
hasTag
Does this transcript have 'tag'? -
hasTags
public boolean hasTags() -
hasTranscriptSupportLevelInfo
public boolean hasTranscriptSupportLevelInfo() -
hasWarning
public boolean hasWarning()Does this transcript have any errors? -
introns
Get all introns (lazy init) -
isAaCheck
public boolean isAaCheck() -
setAaCheck
public void setAaCheck(boolean aaCheck) -
isAdjustIfParentDoesNotInclude
Description copied from class:MarkerAdjust parent if it does not include child?- Overrides:
isAdjustIfParentDoesNotIncludein classMarker
-
isCanonical
public boolean isCanonical() -
isChecked
public boolean isChecked()Has this transcript been checked against CDS/DNA/AA sequences? -
isCorrected
public boolean isCorrected() -
isDnaCheck
public boolean isDnaCheck() -
setDnaCheck
public void setDnaCheck(boolean dnaCheck) -
isDownstream
public boolean isDownstream(int pos) -
isErrorProteinLength
public boolean isErrorProteinLength()Check if coding length is multiple of 3 in protein coding transcripts- Returns:
- true on Error
-
isErrorStartCodon
public boolean isErrorStartCodon()Is the first codon a START codon? -
isErrorStopCodonsInCds
public boolean isErrorStopCodonsInCds()Check if protein sequence has STOP codons in the middle of the coding sequence- Returns:
- true on Error
-
isIntron
public boolean isIntron(int pos) -
isProteinCoding
public boolean isProteinCoding() -
isRibosomalSlippage
public boolean isRibosomalSlippage() -
setRibosomalSlippage
public void setRibosomalSlippage(boolean ribosomalSlippage) -
isUpstream
public boolean isUpstream(int pos) -
isUtr
public boolean isUtr(int pos) -
isUtr
-
isUtr3
public boolean isUtr3(int pos) -
isUtr5
public boolean isUtr5(int pos) -
isWarningStopCodon
public boolean isWarningStopCodon()Is the last codon a STOP codon? -
markers
A list of all markers in this transcript- Overrides:
markersin classIntervalAndSubIntervals<Exon>
-
mRna
Retrieve coding sequence AND the UTRs (mRNA = 5'UTR + CDS + 3'UTR) I.e. Concatenate all exon sequences -
protein
Protein sequence (amino acid sequence produced by this transcripts) -
query
Query all genomic regions that intersect 'marker'- Overrides:
queryin classIntervalAndSubIntervals<Exon>
-
queryExon
Return the first exon that intersects 'interval' (null if not found) -
rankExons
public boolean rankExons()Assign ranks to exons -
reset
public void reset()Description copied from class:IntervalAndSubIntervalsRemove all intervals- Overrides:
resetin classIntervalAndSubIntervals<Exon>
-
resetCache
public void resetCache() -
resetExons
public void resetExons() -
sanityCheck
Perfom some baseic chekcs, return error type, if any -
serializeParse
Parse a line from a serialized file- Specified by:
serializeParsein interfaceTxtSerializable- Overrides:
serializeParsein classIntervalAndSubIntervals<Exon>
-
serializeSave
Create a string to serialize to a file- Specified by:
serializeSavein interfaceTxtSerializable- Overrides:
serializeSavein classIntervalAndSubIntervals<Exon>
-
setBioType
-
setCanonical
public void setCanonical(boolean canonical) -
setProteinCoding
public void setProteinCoding(boolean proteinCoding) -
setProteinId
-
setTags
-
setTranscriptSupportLevel
-
sortCds
public void sortCds() -
spliceSites
-
toString
-
toString
-
toStringAsciiArt
Show a transcript as an ASCII Art -
utrFromCds
public boolean utrFromCds(boolean verbose) Calculate UTR regions from CDSs -
variantEffect
Get some details about the effect on this transcript- Overrides:
variantEffectin classMarker
-