Neighborhood Explorer Documentation
Neighborhood Explorer: initial appearance
(Scalable Precision Medicine Knowledge Engine)
is a very large network containing multiple types of biological data.
Pooling such diverse data into a single knowledge environment
allows identifying new connections, with implications for
biomedical applications like personalized medicine:
suggesting which drugs may be effective for a specific patient.
An earlier version of the network was used to suggest new uses for existing drugs
et al., eLife 2017 Sep 22;6. pii: e26726).
SPOKE is a heterogeneous network,
meaning that different nodes (points) within the network
can represent different types of data.
The edges between pairs of nodes represent known connections.
Paths that follow a series of edges may connect nodes not
previously known to be related.
SPOKE is much too large and dense to comprehend visually all at once.
Although automated analyses may operate on the entire network, a graphical
interface for exploring limited areas or subsets is need for human interaction.
The Neighborhood Explorer allows interactively
finding and viewing specific “neighborhoods” within SPOKE.
As detailed below, it allows searching SPOKE for a specific drug compound,
gene, or protein, and seeing what other nodes are in its immediate neighborhood,
connected by one or a few edges. These other nodes could be any of the
data types in the heterogeneous network, including related diseases,
side effects, pathways, and other compounds, genes, or proteins.
The search can be restricted
to certain node and edge types, and by edge value.
The resulting network is displayed with color-coding by node type,
with detailed information for any node or edge popping up on mouseover.
Sets of terminal nodes (“leaf” nodes) are grouped into
rectangles that can be collapsed to simplify the view.
The network can be extended further from any
node(s) of interest by one or more additional rounds of searching.
Searching and Search Options
Search types, with the required query in parentheses:
- Compound (ChEMBL identifier)
– a drug or other chemical compound specified by its
identifier in ChEMBL,
for example: CHEMBL657 for diphenhydramine
- Disease (DO identifier)
– a disease or condition specified by its identifier in
for example: DOID:3650 for lactic acidosis
- Gene (Entrez name)
– a human gene specified by its name in
for example: DMD for human dystrophin
- Node type – all nodes of the specified type;
generally not feasible except for SARSCov2
(SARS-CoV-2 proteins, a relatively small set)
- Organism (Species name)
– for example, Lactobacillus casei
- Pathway (Pathway name)
– for example, L-tryptophan degradation IX
- Protein (UniProt identifier)
– a human protein specified by its identifier in
for example: P28223 for the human 5-hydroxytryptamine receptor 2A
- Protein (UniProt name)
– a human protein specified by its name in
for example: 5HT2A_HUMAN for the human 5-hydroxytryptamine receptor 2A
- SEA (SMILES)
– a compound specified by
SMILES string for a two-stage search:
Doing a SEA search with path-length set to zero in the
options gives only the edges from the first stage
Ensemble Approach search to predict proteins that might
bind the compound (see Keiser et al.,
Nat Biotechnol 25:197 (2007))
- SPOKE neighborhood search with all of the resulting proteins as queries,
as well as the initial compound
The identifier to enter is rarely known in advance.
Clicking the Source button shows the website corresponding to the
current search type, for example,
for diseases. The identifier(s) can be looked up at this website.
For SMILES string lookup, however, sites other than the
are recommended, such as PubChem Compound or
SMILES string example, for
results from CFTR sample query (Apr 2020)
Entering the exact match to long data type such as a pathway or species name
is often very difficult, and sometimes a broader search is desired.
For these reasons, all search types except SEA and Node type
allow entering a partial match as a
A leading tilde symbol ~ is required to indicate
when a regular expression is being used. Examples:
- ~.*fatty acid.* in a pathway search to find all pathways with
name containing “fatty acid”
- ~Akkermansia.* in an organism search to find all taxa with name
starting with “Akkermansia”
**It is very important to set options
before submitting the search,
mainly to limit the results to a reasonable number of nodes and edges.
Searches that are too broad not only take longer,
the results may be impossible to view.
A good way to get a feel for a reasonable amount of results and the
corresponding option settings is to run some of the
Checkboxes show/hide parts of the interface:
Options – filters and other search settings:
- Node and Edge Types
- Show nodes
[several checkboxes controlling which node types to include in the results]
- Show edges
[several checkboxes controlling which edge types to include in the results]
- Node and Edge Attributes
- Filter nodes
- Compound max phase >=[minPhase]
– minimum value of the maximum clinical trial phase a compound has reached
(for treating any disease); for example, 4 to exclude compounds that have
not been approved as treatments. Max phase is part of the compound data from
- UniProt Protein source: TrEMBL (uncurated) and/or
- Filter edges
- AnatomyCellType-expresses-Gene expression level
– include edges with expression not detected, low,
medium, and/or high according to the
Human Protein Atlas
- Compound-treats-Disease phase >=[minPhase]
– minimum value of the maximum clinical trial phase the compound
has reached as a treatment for the specific disease
- Disease-associates-Gene DISEASES sources
– type of evidence from the
database: textmining, knowledge, and/or experiments
- Disease-associates-Gene text mining z-score >=[minZ]
- Protein-interacts-Protein score >=[minScore]
- Maximum path length <=[L]
– number of edges outward from the query node to include in the results;
1 is usually recommended, 2 if including only a few node and edge types.
The result network can be extended later.
- SEA p-Value Cutoff <=1e-[N]
– significance of protein hits from SEA search
- Graphics limit (nodes + edges) [K]
– limit at which to warn the user of too many results to render
Sample Queries – a series of sample queries, any of which
can be clicked to perform the corresponding search.
Running these examples may change option settings.
- Legend – node-type color key,
with checkboxes to hide and show each type in the current network
Clicking Submit initiates the search, which may take several seconds
depending on the specific query and option settings.
leaf-node group selected
leaf-node group collapsed
Exploring the Network
The resulting network is displayed when the search is complete.
The query node is shown with a double border. Most edges are solid lines,
except that SEA search compound-protein binding predictions
are shown as dashed lines that vary in width by significance value.
General interactions with the network:
- Hovering the cursor over a node or edge shows its detailed information:
name, data source, etc.
- A node can be dragged to a different position in the network,
whereas starting the drag on “empty” space moves the entire network.
- Scrolling zooms the network.
- Clicking a node selects
it for subsequent button actions. Selected nodes are highlighted in yellow.
Shift-clicking a node toggles selection status or adds it to a pre-existing
- Another way to select nodes is by using the
“Find in network” text search box.
All nodes with the text string in their name or description will be selected.
- Clicking in empty space clears the selection.
GATA2 gene node selected
after extending from GATA2 and redoing layout
When numerous “leaf nodes” (those connected by only one edge)
emanate from the same central node, they are grouped into a rectangle
and can be collapsed onto that node to simplify the view.
Clicking the rectangle representing a leaf-node group
selects it for action by the Collapse button,
or if the rectangle is in the collapsed state, for re-expansion with the
- Extend the network by searching with the
selected node as query, using the current
option settings. Only one node should be selected.
- Delete selected nodes
- Clean – delete any nodes that cannot be reached through edges
from the double-border node(s)
- Reset View
– rescale and center the current network without changing the layout
- Redo Layout
– recalculate network layout
– download the network as files for Cytoscape (JSON and style) or an image (SVG or PNG)
– click to list choices:
- Documentation – open this documentation page
- Feedback – open a form to send comments or bug reports
on the Neighborhood Explorer
- SPOKE Home – show
- SPOKE Version – show SPOKE update log
UCSF Resource for Biocomputing, Visualization, and Informatics /