In [1]:

```
%run "boaz_utils.ipynb"
```

Often in computation we have **data** from the world, and a **question** we want to answer about these data.

To do so, we need to find a **model** for the data, and a way to translate our question into a **mathematical question about the model**

- Have map of Kingston, want to find fastest way to get from Rex Nettleford to Jamaica College.
- Instagram is trying to figure out how many followers of followers the average Jamaican has.
- Geneticist is trying to find which genes relate to a certain colon cancer.

What is perhaps most surprising is that these and any many other questions, all use the same mathematical model of a **graph**

A **graph** is just a way to store **connections** between pairs of entities:

The graph of Kingston's roads could be composed of all street intersections, with a connection between intersection $u$ and intersection $v$ if they are directly connected by a road.

The Instagram graph is composed of all Instagram users, with a connection from user $u$ to user $v$ if $u$ follows $v$.

The gene-symptom interaction graph is composed of all genes and all "symptoms" (also known as phenotypes: some observable differences in people), where gene $u$ is connected to symptom $v$ if there is a correlation between people having the gene $u$ and symptom $v$.

Mathematically, a graph is a set $V$ of **vertices** and a set $E$ of pairs of these vertices which is known as the set of **edges**. We say that a vertex $u\in V$ is connected to $v\in V$ if the pair $(u,v)$ is in $E$.

A graph where $(u,v)$ is an edge if and only if $(v,u)$ is also an edge is known as an **undirected** graph. Undirected graphs form an important special case. Otherwise if this is not promised to be the case, we say the graph is **directed**.

Sometimes the edges (or vertices) of the graph are labeled by a number, which we call a **weight**. For example in the case of the road network, we might label every road segment with the length of that segment in kilometers.

There are two main representations for graphs. We can always assume the vertices are simply identified by the numbers $0$ to $n-1$ for some $n$.

The **adjacency list representation** is an array $L$ where $L[i]$ is the list of all neighbors of the vertex $i$ (i.e., all $j$ such that $(i,j)\in E$)

The **adjacency matrix representation** is an $n\times n$ two-dimensional array $M$ (i.e., matrix) such that $M[i][j]$ equals $1$ if $j$ is a neighbor of $i$ and equals $0$ otherwise.

In [ ]:

```
# adjacency list representation, G is a list of lists
# G[i] is a list of all vertices i is connected to
for j in G[1]:
if j==7:
return True
return False
```

- If a graph has $n$ vertices and $m$ edges - how big is its adjacency list representation? how big is its adjacency matrix representation?

- Given a graph $G$ on $n$ vertices and two vertices $i,j$, how long can it take us (in the worst case) to find out if $j$ is a neighbor of $i$ when $G$ is represented in the adjacency list form? How long will it take in the adjacency matrix form?

In [2]:

```
G = [[1],[2],[3],[0]]
draw_graph(G)
```

In [4]:

```
G = [[3,1],[0,2],[1,3],[2,0]]
draw_graph(G)
```

In [3]:

```
G = [[1,2,3,4,5,6],[0],[0],[0],[0],[0],[0]]
draw_graph(G)
```

In [5]:

```
n = 20
cycle = [ [(i+1) % n, (i-1) % n] for i in range(n) ]
draw_graph(cycle)
```

In [6]:

```
def grid_neighbors(i,j,n):
if i==n-1 and j== n-1: return []
if i==n-1:
return [i*n+j+1]
if j==n-1:
return [(i+1)*n+j]
return [n*i+((j+1) % n), n*((i+1) % n)+j]
```

In [7]:

```
n = 5
grid = [ grid_neighbors(i,j,n) for i in range(n) for j in range(n) ]
```

In [8]:

```
draw_graph(grid,'grid_layout')
```

In [ ]:

```
def neighbors(G,u):
return G[u]
def isedge(G,u,v):
for i in G[u]:
if i == v:
return True
return False
def vertices(G):
return len(G)
def addedge(G,i,j):
if not isedge(G, i, j):
G[i].append(j)
def emptygraph(n):
G = []
for i in range(n):
G.append([])
return G
```

In [ ]:

```
neighbors(cycle,0)
```

In [ ]:

```
isedge(cycle,4,7)
```

In [ ]:

```
vertices(cycle)
```

In [ ]:

```
def zeros(n): return [0 for i in range(n)]
def printmatrix(M):
for L in M:
print(L)
def list2matrix(G):
n = len(vertices(G))
M = [zeros(n) for i in range(n)]
for i in range(n):
for j in range(n):
if isedge(G,i,j):
M[i][j] = 1
else:
M[i][j] = 0
return M
```

**Question:** Write function `list2matrix(G)`

that convertes a graph `G`

in adj list format to adj matrix format where `M[i][j]`

equals `1`

/`0`

based on whether `i`

is neighbor of `j`

In [ ]:

```
G = [[3,1],[0,2],[1,3],[2,0]]
M = list2matrix(G)
printmatrix(M)
```

In [ ]:

```
def list2matrix(G):
n = len(vertices(G))
M = [[0 for i in range(n)] for j in range(n)]
for i in vertices(G):
for j in vertices(G):
if isedge(G,i,j):
(M[i])[j]=1
return M
G = [[1,4],[0,2],[1,3],[2,4],[3,0]]
printmatrix(list2matrix(G))
```

**Exercise:** Write a function `undir`

that takes a graph `G`

and outputs a graph `_G`

that such that for every `i,j`

the edge `i->j`

is in `G`

if and only if both `j->i`

and `i->j`

are in `_G`

.

In [ ]:

```
def undir(G):
_G = [[] for i in vertices(G)]
for i in vertices(G):
for j in neighbors(G,i):
addedge(_G,i,j)
addedge(_G,j,i)
return _G
```

In [ ]:

```
G = [[1],[2],[0]]
undir(G)
```

In [ ]:

```
draw_graph(undir(G))
```