Tutorial: Knowledge Graphs
In this tutorial you'll build a protein interaction network from scratch, query it for biological insights, and combine it with API data from STRING. You'll learn BioLang's graph builtins and how to use graphs for pathway analysis.
Prerequisites: Familiarity with basic BioLang syntax from Hello Genomics and Tables.
What you'll build
A gene interaction graph for the p53 tumor suppressor pathway. You'll create nodes, add weighted edges, find shortest paths, identify connected components, extract subgraphs, and combine the graph with STRING database scores.
bl run examples/tutorials/knowledge-graphs.bl
Step 1: Create a graph
Start with an empty graph and add some nodes:
# Create an empty undirected graph
let g = graph()
# Add nodes with attributes
let g = add_node(g, "TP53", {role: "tumor_suppressor", chr: "17p13.1"})
let g = add_node(g, "MDM2", {role: "oncogene", chr: "12q15"})
let g = add_node(g, "CDKN1A", {role: "cdk_inhibitor", chr: "6p21.2"})
let g = add_node(g, "BAX", {role: "pro_apoptotic", chr: "19q13.33"})
let g = add_node(g, "BCL2", {role: "anti_apoptotic", chr: "18q21.33"})
print(f"Nodes: {node_count(g)}") # 5
Step 2: Add edges
Connect genes with weighted edges representing interaction confidence:
# TP53 interactions
let g = add_edge(g, "TP53", "MDM2", {score: 0.999, type: "inhibition"})
let g = add_edge(g, "TP53", "CDKN1A", {score: 0.998, type: "activation"})
let g = add_edge(g, "TP53", "BAX", {score: 0.992, type: "activation"})
# Apoptosis regulators
let g = add_edge(g, "BAX", "BCL2", {score: 0.976, type: "inhibition"})
# MDM2 feedback
let g = add_edge(g, "MDM2", "CDKN1A", {score: 0.871, type: "inhibition"})
print(f"Edges: {edge_count(g)}") # 5
Nodes are auto-created when you add edges, so you can skip add_node()
if you don't need attributes. But adding attributes makes the graph more useful
for downstream analysis.
Step 3: Query the graph
Use graph builtins to explore relationships:
# Direct neighbors
let tp53_partners = neighbors(g, "TP53")
print(f"TP53 interacts with: {tp53_partners}")
# ["BAX", "CDKN1A", "MDM2"]
# Node degree (number of connections)
print(f"TP53 degree: {degree(g, 'TP53')}") # 3
print(f"BCL2 degree: {degree(g, 'BCL2')}") # 1
# Check if an edge exists
print(has_edge(g, "TP53", "BAX")) # true
print(has_edge(g, "BCL2", "MDM2")) # false
Step 4: Shortest paths
Find the shortest path between any two genes:
# How does BCL2 connect to MDM2?
let path = shortest_path(g, "BCL2", "MDM2")
print(f"BCL2 → MDM2: {path}")
# ["BCL2", "BAX", "TP53", "MDM2"]
# No path between disconnected components returns nil
let g2 = add_node(g, "ISOLATED_GENE", {})
let path2 = shortest_path(g, "ISOLATED_GENE", "TP53")
print(path2) # nil
Step 5: Network analysis
Analyze the global structure of the network:
# Connected components
let components = connected_components(g)
print(f"Components: {len(components)}")
components |> each(|c| print(f" {c}"))
# Get all nodes and edges as data
let all_nodes = nodes(g)
print(f"All nodes: {all_nodes}")
let edge_table = edges(g)
print(edge_table)
# Table with columns: source, target, and any edge attributes
Step 6: Extract subgraphs
Pull out a subset of the network for focused analysis:
# Extract the apoptosis subnetwork
let apoptosis_genes = ["TP53", "BAX", "BCL2"]
let sub = subgraph(g, apoptosis_genes)
print(f"Subgraph: {node_count(sub)} nodes, {edge_count(sub)} edges")
# Subgraph: 3 nodes, 2 edges (TP53-BAX, BAX-BCL2)
# The MDM2 and CDKN1A edges are excluded
Step 7: Build from data
In practice you'll build graphs from files or API results, not by hand. Here's how to construct a graph from a CSV edge list:
# requires: interactions.csv in working directory
# Suppose interactions.csv has columns: gene_a, gene_b, score
let edges_data = csv("interactions.csv")
let g = graph()
let g = edges_data |> reduce(g, |g, row| {
add_edge(g, row.gene_a, row.gene_b, {score: into(row.score, "Float")})
})
print(f"Built graph: {node_count(g)} genes, {edge_count(g)} interactions")
Step 8: Combine with STRING API
Fetch real interaction data from the STRING database and build a graph:
# requires: internet connection
# Fetch TP53 interaction network from STRING
let network = string_network(["TP53"], 9606)
# Build a graph from the API results — each record has {protein_a, protein_b, score}
let g = graph()
let g = network |> reduce(g, |g, row| {
add_edge(g, row.protein_a, row.protein_b, {
score: row.score,
nscore: row.nscore,
escore: row.escore
})
})
print(f"STRING network: {node_count(g)} proteins, {edge_count(g)} interactions")
# Find hub genes (highest degree)
let hub_genes = nodes(g)
|> map(|n| {gene: n, degree: degree(g, n)})
|> sort_by(|r| r.degree, desc: true)
|> take(5)
print("Top 5 hub genes:")
hub_genes |> each(|r| print(f" {r.gene}: {r.degree} connections"))
Step 9: Directed graphs
For regulatory networks where direction matters, create a directed graph:
# Directed graph: edges have a direction
let dg = graph(directed: true)
let dg = add_edge(dg, "TP53", "CDKN1A", {type: "activates"})
let dg = add_edge(dg, "TP53", "BAX", {type: "activates"})
let dg = add_edge(dg, "MDM2", "TP53", {type: "inhibits"})
let dg = add_edge(dg, "TP53", "MDM2", {type: "activates"})
# In directed graphs, neighbors returns outgoing connections only
let tp53_targets = neighbors(dg, "TP53")
print(f"TP53 activates/inhibits: {tp53_targets}")
# ["BAX", "CDKN1A", "MDM2"]
# MDM2 only has one outgoing edge
let mdm2_targets = neighbors(dg, "MDM2")
print(f"MDM2 targets: {mdm2_targets}")
# ["TP53"]
Step 10: Removing nodes and edges
Simulate knockouts by removing genes from the network:
# What happens if we knock out TP53?
let ko = remove_node(g, "TP53")
print(f"After TP53 knockout: {node_count(ko)} nodes, {edge_count(ko)} edges")
# How many components now?
let ko_components = connected_components(ko)
print(f"Components after knockout: {len(ko_components)}")
ko_components |> each(|c| print(f" {c}"))
# Remove a specific interaction
let g_no_feedback = remove_edge(g, "MDM2", "CDKN1A")
print(f"Removed MDM2-CDKN1A: {edge_count(g_no_feedback)} edges remain")
Complete example
Here's the full analysis as a notebook-ready script:
# p53_network.bl — Build and analyze the p53 interaction network
# Build the graph
let g = graph()
let g = add_edge(g, "TP53", "MDM2", {score: 0.999, type: "inhibition"})
let g = add_edge(g, "TP53", "CDKN1A", {score: 0.998, type: "activation"})
let g = add_edge(g, "TP53", "BAX", {score: 0.992, type: "activation"})
let g = add_edge(g, "BAX", "BCL2", {score: 0.976, type: "inhibition"})
let g = add_edge(g, "MDM2", "CDKN1A", {score: 0.871, type: "inhibition"})
let g = add_edge(g, "CDKN1A", "CDK2", {score: 0.945, type: "inhibition"})
let g = add_edge(g, "CDK2", "RB1", {score: 0.932, type: "phosphorylation"})
print(f"Network: {node_count(g)} genes, {edge_count(g)} interactions")
print(f"Components: {len(connected_components(g))}")
# Hub analysis
let hubs = nodes(g)
|> map(|n| {gene: n, deg: degree(g, n)})
|> sort_by(|r| r.deg, desc: true)
print("\nDegree ranking:")
hubs |> each(|r| print(f" {r.gene}: {r.deg}"))
# Path from cell cycle to apoptosis
let path = shortest_path(g, "CDK2", "BCL2")
print(f"\nCell cycle → apoptosis: {path}")
# TP53 knockout simulation
let ko = remove_node(g, "TP53")
let ko_comp = connected_components(ko)
print(f"\nTP53 knockout: {len(ko_comp)} disconnected components")
ko_comp |> each(|c| print(f" {c}"))
Builtin reference
| Function | Description |
|---|---|
| graph(directed?) | Create empty graph (undirected by default) |
| add_node(g, id, attrs?) | Add node with optional attributes |
| add_edge(g, src, tgt, attrs?) | Add edge (auto-creates nodes) |
| remove_node(g, id) | Remove node and its edges |
| remove_edge(g, src, tgt) | Remove specific edge |
| neighbors(g, id) | List of adjacent nodes |
| degree(g, id) | Number of connections |
| has_edge(g, src, tgt) | Check if edge exists |
| nodes(g) | List of all node IDs |
| edges(g) | Table of all edges |
| node_count(g) | Number of nodes |
| edge_count(g) | Number of edges |
| shortest_path(g, src, tgt) | BFS shortest path (list or nil) |
| connected_components(g) | List of component node lists |
| subgraph(g, node_list) | Extract induced subgraph |
What's next
- Enrichment Analysis — combine graphs with gene set enrichment
- Knowledge Graphs reference — full builtin documentation
- Querying Databases — fetch data from NCBI, Ensembl, and more