Package org.snpeff.geneSets
Class GeneSets
- java.lang.Object
-
- org.snpeff.geneSets.GeneSets
-
- All Implemented Interfaces:
java.io.Serializable,java.lang.Iterable<GeneSet>
- Direct Known Subclasses:
GeneSetsRanked
public class GeneSets extends java.lang.Object implements java.lang.Iterable<GeneSet>, java.io.Serializable
A collection of GeneSets Genes have associated "experimental values"- Author:
- Pablo Cingolani
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static booleandebugstatic doubleLOG2static longPRINT_SOMETHING_TIME
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanadd(java.lang.String gene)Add a gene and aliasesbooleanadd(java.lang.String gene, GeneSet geneSet)Add a gene and it's corresponding gene setvoidadd(GeneSet geneSet)Add a gene setbooleanaddInteresting(java.lang.String gene)Add a symbol as 'interesting' gene (to every corresponding GeneSet in this collection)voidcheckInterestingGenes(java.util.Set<java.lang.String> intGenes)Checks that every symboolID is in the set (as 'interesting' genes)protected voidcopy(GeneSets geneSets)Copy all data from geneSetsGeneSetdisjointSet(java.util.List<GeneSet> geneSetList, int activeSets)Produce a GeneSet based on a list of GeneSets and a 'mask'static GeneSetsfactory(GoTerms goTerms)Create gene sets form GoTermsjava.util.List<GeneSet>geneSetsSorted()Iterate through each GeneSet in this GeneSetsjava.util.List<GeneSet>geneSetsSortedSize(boolean reverse)Gene sets sorted by size (if same size, sort by name).intgetGeneCount()How many genes do we have?java.util.Set<java.lang.String>getGenes()Get all genes in this setGeneSetgetGeneSet(java.lang.String geneSetName)Get a gene set named 'geneSetName'intgetGeneSetCount()Get number of gene setsjava.util.HashSet<GeneSet>getGeneSetsByGene(java.lang.String gene)All gene sets that this gene belongs tojava.util.HashMap<java.lang.String,GeneSet>getGeneSetsByName()java.util.HashSet<java.lang.String>getInterestingGenes()intgetInterestingGenesCount()java.lang.StringgetLabel()doublegetValue(java.lang.String gene)Get experimental valuejava.util.HashMap<java.lang.String,java.lang.Double>getValueByGene()booleanhasGene(java.lang.String geneId)booleanhasValue(java.lang.String gene)booleanisInteresting(java.lang.String geneName)booleanisRanked()protected booleanisUsed(java.lang.String geneName)protected booleanisUsed(GeneSet gs)Is this gene set used? I.e.java.util.Iterator<GeneSet>iterator()Iterate through each GeneSet in this GeneSetsjava.util.Iterator<GeneSet>iteratorSorted()Iterate through each GeneSet in this GeneSetsjava.util.Set<java.lang.String>keySet()java.util.List<GeneSet>listTopTerms(int numberToSelect)Select a number of GeneSetsjava.util.List<java.lang.String>loadExperimentalValues(java.lang.String fileName, boolean maskException)Reads a file with a list of genes and experimental values.booleanloadMSigDb(java.lang.String gmtFile, boolean maskException)Read an MSigDBfile and add every Gene set (do not add relationships between nodes in DAG)voidremove(GeneSet geneSet)voidremoveGeneSet(java.lang.String geneSetName)Remove a GeneSetvoidremoveUnusedSets()Remove unused gene setsvoidreset()Reset every 'interesting' gene or ranked gene (on every single GeneSet in this GeneSets)voidsaveGseaGeneSets(java.lang.String fileName)Save gene sets file for GSEA analysis Format specification: http://www.broad.mit.edu/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29voidsetDoNotAddIfNotInGeneSet(boolean doNotAddIfNotInGeneSet)voidsetGeneSetByName(java.util.HashMap<java.lang.String,GeneSet> geneSets)voidsetInterestingGenes(java.util.HashSet<java.lang.String> interestingGenesIdSet)voidsetValue(java.lang.String geneId, double value)Set experimental value for this genevoidsetVerbose(boolean verbose)java.lang.StringtoString()java.util.Collection<GeneSet>values()
-
-
-
Constructor Detail
-
GeneSets
public GeneSets()
Default constructor
-
GeneSets
public GeneSets(GeneSets geneSets)
-
GeneSets
public GeneSets(java.lang.String msigDb)
-
-
Method Detail
-
factory
public static GeneSets factory(GoTerms goTerms)
Create gene sets form GoTerms- Parameters:
goTerms- : GoTerms to use
-
add
public void add(GeneSet geneSet)
Add a gene set- Parameters:
geneSetName-geneSet-
-
add
public boolean add(java.lang.String gene)
Add a gene and aliases
-
add
public boolean add(java.lang.String gene, GeneSet geneSet)Add a gene and it's corresponding gene set- Parameters:
gene-geneSet-- Returns:
-
addInteresting
public boolean addInteresting(java.lang.String gene)
Add a symbol as 'interesting' gene (to every corresponding GeneSet in this collection)
-
checkInterestingGenes
public void checkInterestingGenes(java.util.Set<java.lang.String> intGenes)
Checks that every symboolID is in the set (as 'interesting' genes)- Parameters:
intGenes- : A set of interesting genes Throws an exception on error
-
copy
protected void copy(GeneSets geneSets)
Copy all data from geneSets- Parameters:
geneSets-
-
disjointSet
public GeneSet disjointSet(java.util.List<GeneSet> geneSetList, int activeSets)
Produce a GeneSet based on a list of GeneSets and a 'mask'- Parameters:
geneSetList- : A list of GeneSetsactiveSets- : An integer (binary mask) that specifies weather a set in the list should be taken into account or not. The operation performed is: Intersection{ GeneSets where mask_bit == 1 } - Union{ GeneSets where mask_bit == 0 } ) where the minus sign '-' is actually a 'set minus' operation. This operation is done for both sets in GeneSet (i.e. genes and interestingGenes)- Returns:
- A GeneSet
-
geneSetsSorted
public java.util.List<GeneSet> geneSetsSorted()
Iterate through each GeneSet in this GeneSets
-
geneSetsSortedSize
public java.util.List<GeneSet> geneSetsSortedSize(boolean reverse)
Gene sets sorted by size (if same size, sort by name).- Parameters:
reverse- : Reverse size sorting (does not affect name sorting)- Returns:
-
getGeneCount
public int getGeneCount()
How many genes do we have?- Returns:
-
getGenes
public java.util.Set<java.lang.String> getGenes()
Get all genes in this set- Returns:
-
getGeneSet
public GeneSet getGeneSet(java.lang.String geneSetName)
Get a gene set named 'geneSetName'- Parameters:
geneSetName-- Returns:
-
getGeneSetCount
public int getGeneSetCount()
Get number of gene sets- Returns:
-
getGeneSetsByGene
public java.util.HashSet<GeneSet> getGeneSetsByGene(java.lang.String gene)
All gene sets that this gene belongs to- Parameters:
gene-- Returns:
-
getGeneSetsByName
public java.util.HashMap<java.lang.String,GeneSet> getGeneSetsByName()
-
getInterestingGenes
public java.util.HashSet<java.lang.String> getInterestingGenes()
-
getInterestingGenesCount
public int getInterestingGenesCount()
-
getLabel
public java.lang.String getLabel()
-
getValue
public double getValue(java.lang.String gene)
Get experimental value- Parameters:
gene-- Returns:
-
getValueByGene
public java.util.HashMap<java.lang.String,java.lang.Double> getValueByGene()
-
hasGene
public boolean hasGene(java.lang.String geneId)
-
hasValue
public boolean hasValue(java.lang.String gene)
-
isInteresting
public boolean isInteresting(java.lang.String geneName)
-
isRanked
public boolean isRanked()
-
isUsed
protected boolean isUsed(GeneSet gs)
Is this gene set used? I.e. is there at least one gene 'used'? (e.g. interesting or ranked)- Parameters:
gs-- Returns:
-
isUsed
protected boolean isUsed(java.lang.String geneName)
-
iterator
public java.util.Iterator<GeneSet> iterator()
Iterate through each GeneSet in this GeneSets- Specified by:
iteratorin interfacejava.lang.Iterable<GeneSet>
-
iteratorSorted
public java.util.Iterator<GeneSet> iteratorSorted()
Iterate through each GeneSet in this GeneSets
-
keySet
public java.util.Set<java.lang.String> keySet()
-
listTopTerms
public java.util.List<GeneSet> listTopTerms(int numberToSelect)
Select a number of GeneSets- Parameters:
numberToSelect-- Returns:
-
loadExperimentalValues
public java.util.List<java.lang.String> loadExperimentalValues(java.lang.String fileName, boolean maskException)Reads a file with a list of genes and experimental values. Format: "gene \t value \n"- Parameters:
fileName-- Returns:
- A list of genes not found
-
loadMSigDb
public boolean loadMSigDb(java.lang.String gmtFile, boolean maskException)Read an MSigDBfile and add every Gene set (do not add relationships between nodes in DAG)- Parameters:
gmtFile-geneSetType-
-
remove
public void remove(GeneSet geneSet)
-
removeGeneSet
public void removeGeneSet(java.lang.String geneSetName)
Remove a GeneSet
-
removeUnusedSets
public void removeUnusedSets()
Remove unused gene sets
-
reset
public void reset()
Reset every 'interesting' gene or ranked gene (on every single GeneSet in this GeneSets)
-
saveGseaGeneSets
public void saveGseaGeneSets(java.lang.String fileName)
Save gene sets file for GSEA analysis Format specification: http://www.broad.mit.edu/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29- Parameters:
fileName-
-
setDoNotAddIfNotInGeneSet
public void setDoNotAddIfNotInGeneSet(boolean doNotAddIfNotInGeneSet)
-
setGeneSetByName
public void setGeneSetByName(java.util.HashMap<java.lang.String,GeneSet> geneSets)
-
setInterestingGenes
public void setInterestingGenes(java.util.HashSet<java.lang.String> interestingGenesIdSet)
-
setValue
public void setValue(java.lang.String geneId, double value)Set experimental value for this gene- Parameters:
geneId-value-
-
setVerbose
public void setVerbose(boolean verbose)
-
toString
public java.lang.String toString()
- Overrides:
toStringin classjava.lang.Object
-
values
public java.util.Collection<GeneSet> values()
-
-