scRNA-seq_analysis

This commit is contained in:
veghp 2019-07-08 12:22:01 +01:00
commit 82cc2d191e
188 changed files with 146184 additions and 0 deletions

View file

@ -0,0 +1,154 @@
## ForceAtlas2 for Python
A port of Gephi's Force Atlas 2 layout algorithm to Python 2 and Python 3 (with a wrapper for NetworkX and igraph). This is the fastest python implementation available with most of the features complete. It also supports Barnes Hut approximation for maximum speedup.
ForceAtlas2 is a very fast layout algorithm for force-directed graphs. It's used to spatialize a **weighted undirected** graph in 2D (Edge weight defines the strength of the connection). The implementation is based on this [paper](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0098679) and the corresponding [gephi-java-code](https://github.com/gephi/gephi/blob/master/modules/LayoutPlugin/src/main/java/org/gephi/layout/plugin/forceAtlas2/ForceAtlas2.java). Its really quick compared to the fruchterman reingold algorithm (spring layout) of networkx and scales well to high number of nodes (>10000).
<p align="center" text-align="center">
<b>Spatialize a random Geometric Graph</b>
</p>
<p align="center">
<img width="460" height="300" src="https://raw.githubusercontent.com/bhargavchippada/forceatlas2/master/examples/geometric_graph.png" alt="Geometric Graph">
</p>
## Installation
Install from pip:
pip install fa2
To build and install run from source:
python setup.py install
**Cython is highly recommended if you are buidling from source as it will speed up by a factor of 10-100x depending on the graph**
### Dependencies
- numpy (adjacency matrix as complete matrix)
- scipy (adjacency matrix as sparse matrix)
- tqdm (progressbar)
- Cython (10-100x speedup)
- networkx (To use the NetworkX wrapper function, you obviously need NetworkX)
- python-igraph (To use the igraph wrapper)
<p align="center" text-align="center">
<b>Spatialize a 2D Grid</b>
</p>
<p align="center">
<img width="460" height="300" src="https://raw.githubusercontent.com/bhargavchippada/forceatlas2/master/examples/grid_graph.png" alt="Grid Graph">
</p>
## Usage
from fa2 import ForceAtlas2
Create a ForceAtlas2 object with the appropriate settings. ForceAtlas2 class contains three important methods:
```python
forceatlas2 (G, pos, iterations)
# G is a graph in 2D numpy ndarray format (or) scipy sparse matrix format. You can set the edge weights (> 0) in the matrix
# pos is a numpy array (Nx2) of initial positions of nodes
# iterations is num of iterations to run the algorithm
# returns a list of (x,y) pairs for each node's final position
```
```python
forceatlas2_networkx_layout(G, pos, iterations)
# G is a networkx graph. Edge weights can be set (if required) in the Networkx graph
# pos is a dictionary, as in networkx
# iterations is num of iterations to run the algorithm
# returns a dictionary of node positions (2D X-Y tuples) indexed by the node name
```
```python
forceatlas2_igraph_layout(G, pos, iterations, weight_attr)
# G is an igraph graph
# pos is a numpy array (Nx2) or list of initial positions of nodes (see that the indexing matches igraph node index)
# iterations is num of iterations to run the algorithm
# weight_attr denotes the weight attribute's name in G.es, None by default
# returns an igraph layout
```
Below is an example usage. You can also see the feature settings of ForceAtlas2 class.
```python
import networkx as nx
from fa2 import ForceAtlas2
import matplotlib.pyplot as plt
G = nx.random_geometric_graph(400, 0.2)
forceatlas2 = ForceAtlas2(
# Behavior alternatives
outboundAttractionDistribution=True, # Dissuade hubs
linLogMode=False, # NOT IMPLEMENTED
adjustSizes=False, # Prevent overlap (NOT IMPLEMENTED)
edgeWeightInfluence=1.0,
# Performance
jitterTolerance=1.0, # Tolerance
barnesHutOptimize=True,
barnesHutTheta=1.2,
multiThreaded=False, # NOT IMPLEMENTED
# Tuning
scalingRatio=2.0,
strongGravityMode=False,
gravity=1.0,
# Log
verbose=True)
positions = forceatlas2.forceatlas2_networkx_layout(G, pos=None, iterations=2000)
nx.draw_networkx_nodes(G, positions, node_size=20, with_labels=False, node_color="blue", alpha=0.4)
nx.draw_networkx_edges(G, positions, edge_color="green", alpha=0.05)
plt.axis('off')
plt.show()
# equivalently
import igraph
G = igraph.Graph.TupleList(G.edges(), directed=False)
layout = forceatlas2.forceatlas2_igraph_layout(G, pos=None, iterations=2000)
igraph.plot(G, layout).show()
```
You can also take a look at forceatlas2.py file for understanding the ForceAtlas2 class and its functions better.
## Features Completed
- **barnesHutOptimize**: Barnes Hut optimization, n<sup>2</sup> complexity to n.ln(n)
- **gravity**: Attracts nodes to the center. Prevents islands from drifting away
- **Dissuade Hubs**: Distributes attraction along outbound edges. Hubs attract less and thus are pushed to the borders
- **scalingRatio**: How much repulsion you want. More makes a more sparse graph
- **strongGravityMode**: A stronger gravity view
- **jitterTolerance**: How much swinging you allow. Above 1 discouraged. Lower gives less speed and more precision
- **verbose**: Shows a progressbar of iterations completed. Also, shows time taken for different force computations
- **edgeWeightInfluence**: How much influence you give to the edges weight. 0 is "no influence" and 1 is "normal"
## Documentation
You will find all the documentation in the source code
## Contributors
Contributions are highly welcome. Please submit your pull requests and become a collaborator.
## Copyright
Copyright (C) 2017 Bhargav Chippada bhargavchippada19@gmail.com.
Licensed under the GNU GPLv3.
The files are heavily based on the java files included in Gephi, git revision 2b9a7c8 and Max Shinn's port to python of the algorithm. Here I include the copyright information from those files:
Copyright 2008-2011 Gephi
Authors : Mathieu Jacomy <mathieu.jacomy@gmail.com>
Website : http://www.gephi.org
Copyright 2011 Gephi Consortium. All rights reserved.
Portions Copyrighted 2011 Gephi Consortium.
The contents of this file are subject to the terms of either the
GNU General Public License Version 3 only ("GPL") or the Common
Development and Distribution License("CDDL") (collectively, the
"License"). You may not use this file except in compliance with
the License.
<https://github.com/mwshinn/forceatlas2-python>
Copyright 2016 Max Shinn <mws41@cam.ac.uk>
Available under the GPLv3
Also, thanks to Eugene Bosiakov <https://github.com/bosiakov/fa2l>

View file

@ -0,0 +1,5 @@
Package downloaded from https://github.com/bhargavchippada/forceatlas2
forceatlas2.py has been modified and it is different from the original script.
The modification allows for returning all FDG coordinates for each iteration. This is needed for the creation of animated force directed graph.
It is the understanding of the person (Dorin-Mirel Popescu) who modified the published package that forceatlas2 is subjected to GPL version 3 terms which allows for modifications of original code and publishing the modified version. The original author of forceatlas2 (Mathieu Jacomy) is acknowledged. Furthermore the modifications within this version do not pertain to the algorithm but only functionalities that allow for keeping all transient states for the purpose of tracking the evolution of force directed graph visualised in a video format.

View file

@ -0,0 +1 @@
from .forceatlas2 import *

View file

@ -0,0 +1,122 @@
# Cython optimizations. Cython allows huge speed boosts by giving
# each variable a type. This file is called a "pxd extension file"
# (see the "Pure Python" section of the Cython manual). In essence,
# it provides types for function definitions and then, if cython is
# available, it uses these types to optimize normal python code. It
# is associated with the fa2util.py file.
#
# IF ANY CHANGES ARE MADE TO fa2util.py, THE CHANGES MUST BE REFLECTED
# HERE!!
#
# Copyright (C) 2017 Bhargav Chippada <bhargavchippada19@gmail.com>
#
# Available under the GPLv3
import cython
# This will substitute for the nLayout object
cdef class Node:
cdef public double mass
cdef public double old_dx, old_dy
cdef public double dx, dy
cdef public double x, y
# This is not in the original java function, but it makes it easier to
# deal with edges.
cdef class Edge:
cdef public int node1, node2
cdef public double weight
# Repulsion function. `n1` and `n2` should be nodes. This will
# adjust the dx and dy values of `n1` (and optionally `n2`). It does
# not return anything.
@cython.locals(xDist = cython.double,
yDist = cython.double,
distance2 = cython.double,
factor = cython.double)
cdef void linRepulsion(Node n1, Node n2, double coefficient=*)
@cython.locals(xDist = cython.double,
yDist = cython.double,
distance2 = cython.double,
factor = cython.double)
cdef void linRepulsion_region(Node n, Region r, double coefficient=*)
@cython.locals(xDist = cython.double,
yDist = cython.double,
distance = cython.double,
factor = cython.double)
cdef void linGravity(Node n, double g)
@cython.locals(xDist = cython.double,
yDist = cython.double,
factor = cython.double)
cdef void strongGravity(Node n, double g, double coefficient=*)
@cython.locals(xDist = cython.double,
yDist = cython.double,
factor = cython.double)
cpdef void linAttraction(Node n1, Node n2, double e, bint distributedAttraction, double coefficient=*)
@cython.locals(i = cython.int,
j = cython.int,
n1 = Node,
n2 = Node)
cpdef void apply_repulsion(list nodes, double coefficient)
@cython.locals(n = Node)
cpdef void apply_gravity(list nodes, double gravity, bint useStrongGravity=*)
@cython.locals(edge = Edge)
cpdef void apply_attraction(list nodes, list edges, bint distributedAttraction, double coefficient, double edgeWeightInfluence)
cdef class Region:
cdef public double mass
cdef public double massCenterX, massCenterY
cdef public double size
cdef public list nodes
cdef public list subregions
@cython.locals(massSumX = cython.double,
massSumY = cython.double,
n = Node,
distance = cython.double)
cdef void updateMassAndGeometry(self)
@cython.locals(n = Node,
leftNodes = list,
rightNodes = list,
topleftNodes = list,
bottomleftNodes = list,
toprightNodes = list,
bottomrightNodes = list,
subregion = Region)
cpdef void buildSubRegions(self)
@cython.locals(distance = cython.double,
subregion = Region)
cdef void applyForce(self, Node n, double theta, double coefficient=*)
@cython.locals(n = Node)
cpdef applyForceOnNodes(self, list nodes, double theta, double coefficient=*)
@cython.locals(totalSwinging = cython.double,
totalEffectiveTraction = cython.double,
n = Node,
swinging = cython.double,
totalSwinging = cython.double,
totalEffectiveTraction = cython.double,
estimatedOptimalJitterTolerance = cython.double,
minJT = cython.double,
maxJT = cython.double,
jt = cython.double,
minSpeedEfficiency = cython.double,
targetSpeed = cython.double,
maxRise = cython.double,
factor = cython.double,
values = dict)
cpdef dict adjustSpeedAndApplyForces(list nodes, double speed, double speedEfficiency, double jitterTolerance)

View file

@ -0,0 +1,326 @@
# This file allows separating the most CPU intensive routines from the
# main code. This allows them to be optimized with Cython. If you
# don't have Cython, this will run normally. However, if you use
# Cython, you'll get speed boosts from 10-100x automatically.
#
# THE ONLY CATCH IS THAT IF YOU MODIFY THIS FILE, YOU MUST ALSO MODIFY
# fa2util.pxd TO REFLECT ANY CHANGES IN FUNCTION DEFINITIONS!
#
# Copyright (C) 2017 Bhargav Chippada <bhargavchippada19@gmail.com>
#
# Available under the GPLv3
from math import sqrt
# This will substitute for the nLayout object
class Node:
def __init__(self):
self.mass = 0.0
self.old_dx = 0.0
self.old_dy = 0.0
self.dx = 0.0
self.dy = 0.0
self.x = 0.0
self.y = 0.0
# This is not in the original java code, but it makes it easier to deal with edges
class Edge:
def __init__(self):
self.node1 = -1
self.node2 = -1
self.weight = 0.0
# Here are some functions from ForceFactory.java
# =============================================
# Repulsion function. `n1` and `n2` should be nodes. This will
# adjust the dx and dy values of `n1` `n2`
def linRepulsion(n1, n2, coefficient=0):
xDist = n1.x - n2.x
yDist = n1.y - n2.y
distance2 = xDist * xDist + yDist * yDist # Distance squared
if distance2 > 0:
factor = coefficient * n1.mass * n2.mass / distance2
n1.dx += xDist * factor
n1.dy += yDist * factor
n2.dx -= xDist * factor
n2.dy -= yDist * factor
# Repulsion function. 'n' is node and 'r' is region
def linRepulsion_region(n, r, coefficient=0):
xDist = n.x - r.massCenterX
yDist = n.y - r.massCenterY
distance2 = xDist * xDist + yDist * yDist
if distance2 > 0:
factor = coefficient * n.mass * r.mass / distance2
n.dx += xDist * factor
n.dy += yDist * factor
# Gravity repulsion function. For some reason, gravity was included
# within the linRepulsion function in the original gephi java code,
# which doesn't make any sense (considering a. gravity is unrelated to
# nodes repelling each other, and b. gravity is actually an
# attraction)
def linGravity(n, g):
xDist = n.x
yDist = n.y
distance = sqrt(xDist * xDist + yDist * yDist)
if distance > 0:
factor = n.mass * g / distance
n.dx -= xDist * factor
n.dy -= yDist * factor
# Strong gravity force function. `n` should be a node, and `g`
# should be a constant by which to apply the force.
def strongGravity(n, g, coefficient=0):
xDist = n.x
yDist = n.y
if xDist != 0 and yDist != 0:
factor = coefficient * n.mass * g
n.dx -= xDist * factor
n.dy -= yDist * factor
# Attraction function. `n1` and `n2` should be nodes. This will
# adjust the dx and dy values of `n1` and `n2`. It does
# not return anything.
def linAttraction(n1, n2, e, distributedAttraction, coefficient=0):
xDist = n1.x - n2.x
yDist = n1.y - n2.y
if not distributedAttraction:
factor = -coefficient * e
else:
factor = -coefficient * e / n1.mass
n1.dx += xDist * factor
n1.dy += yDist * factor
n2.dx -= xDist * factor
n2.dy -= yDist * factor
# The following functions iterate through the nodes or edges and apply
# the forces directly to the node objects. These iterations are here
# instead of the main file because Python is slow with loops.
def apply_repulsion(nodes, coefficient):
i = 0
for n1 in nodes:
j = i
for n2 in nodes:
if j == 0:
break
linRepulsion(n1, n2, coefficient)
j -= 1
i += 1
def apply_gravity(nodes, gravity, useStrongGravity=False):
if not useStrongGravity:
for n in nodes:
linGravity(n, gravity)
else:
for n in nodes:
strongGravity(n, gravity)
def apply_attraction(nodes, edges, distributedAttraction, coefficient, edgeWeightInfluence):
# Optimization, since usually edgeWeightInfluence is 0 or 1, and pow is slow
if edgeWeightInfluence == 0:
for edge in edges:
linAttraction(nodes[edge.node1], nodes[edge.node2], 1, distributedAttraction, coefficient)
elif edgeWeightInfluence == 1:
for edge in edges:
linAttraction(nodes[edge.node1], nodes[edge.node2], edge.weight, distributedAttraction, coefficient)
else:
for edge in edges:
linAttraction(nodes[edge.node1], nodes[edge.node2], pow(edge.weight, edgeWeightInfluence),
distributedAttraction, coefficient)
# For Barnes Hut Optimization
class Region:
def __init__(self, nodes):
self.mass = 0.0
self.massCenterX = 0.0
self.massCenterY = 0.0
self.size = 0.0
self.nodes = nodes
self.subregions = []
self.updateMassAndGeometry()
def updateMassAndGeometry(self):
if len(self.nodes) > 1:
self.mass = 0
massSumX = 0
massSumY = 0
for n in self.nodes:
self.mass += n.mass
massSumX += n.x * n.mass
massSumY += n.y * n.mass
self.massCenterX = massSumX / self.mass
self.massCenterY = massSumY / self.mass
self.size = 0.0
for n in self.nodes:
distance = sqrt((n.x - self.massCenterX) ** 2 + (n.y - self.massCenterY) ** 2)
self.size = max(self.size, 2 * distance)
def buildSubRegions(self):
if len(self.nodes) > 1:
leftNodes = []
rightNodes = []
for n in self.nodes:
if n.x < self.massCenterX:
leftNodes.append(n)
else:
rightNodes.append(n)
topleftNodes = []
bottomleftNodes = []
for n in leftNodes:
if n.y < self.massCenterY:
topleftNodes.append(n)
else:
bottomleftNodes.append(n)
toprightNodes = []
bottomrightNodes = []
for n in rightNodes:
if n.y < self.massCenterY:
toprightNodes.append(n)
else:
bottomrightNodes.append(n)
if len(topleftNodes) > 0:
if len(topleftNodes) < len(self.nodes):
subregion = Region(topleftNodes)
self.subregions.append(subregion)
else:
for n in topleftNodes:
subregion = Region([n])
self.subregions.append(subregion)
if len(bottomleftNodes) > 0:
if len(bottomleftNodes) < len(self.nodes):
subregion = Region(bottomleftNodes)
self.subregions.append(subregion)
else:
for n in bottomleftNodes:
subregion = Region([n])
self.subregions.append(subregion)
if len(toprightNodes) > 0:
if len(toprightNodes) < len(self.nodes):
subregion = Region(toprightNodes)
self.subregions.append(subregion)
else:
for n in toprightNodes:
subregion = Region([n])
self.subregions.append(subregion)
if len(bottomrightNodes) > 0:
if len(bottomrightNodes) < len(self.nodes):
subregion = Region(bottomrightNodes)
self.subregions.append(subregion)
else:
for n in bottomrightNodes:
subregion = Region([n])
self.subregions.append(subregion)
for subregion in self.subregions:
subregion.buildSubRegions()
def applyForce(self, n, theta, coefficient=0):
if len(self.nodes) < 2:
linRepulsion(n, self.nodes[0], coefficient)
else:
distance = sqrt((n.x - self.massCenterX) ** 2 + (n.y - self.massCenterY) ** 2)
if distance * theta > self.size:
linRepulsion_region(n, self, coefficient)
else:
for subregion in self.subregions:
subregion.applyForce(n, theta, coefficient)
def applyForceOnNodes(self, nodes, theta, coefficient=0):
for n in nodes:
self.applyForce(n, theta, coefficient)
# Adjust speed and apply forces step
def adjustSpeedAndApplyForces(nodes, speed, speedEfficiency, jitterTolerance):
# Auto adjust speed.
totalSwinging = 0.0 # How much irregular movement
totalEffectiveTraction = 0.0 # How much useful movement
for n in nodes:
swinging = sqrt((n.old_dx - n.dx) * (n.old_dx - n.dx) + (n.old_dy - n.dy) * (n.old_dy - n.dy))
totalSwinging += n.mass * swinging
totalEffectiveTraction += .5 * n.mass * sqrt(
(n.old_dx + n.dx) * (n.old_dx + n.dx) + (n.old_dy + n.dy) * (n.old_dy + n.dy))
# Optimize jitter tolerance. The 'right' jitter tolerance for
# this network. Bigger networks need more tolerance. Denser
# networks need less tolerance. Totally empiric.
estimatedOptimalJitterTolerance = .05 * sqrt(len(nodes))
minJT = sqrt(estimatedOptimalJitterTolerance)
maxJT = 10
jt = jitterTolerance * max(minJT,
min(maxJT, estimatedOptimalJitterTolerance * totalEffectiveTraction / (
len(nodes) * len(nodes))))
minSpeedEfficiency = 0.05
# Protective against erratic behavior
if totalSwinging / totalEffectiveTraction > 2.0:
if speedEfficiency > minSpeedEfficiency:
speedEfficiency *= .5
jt = max(jt, jitterTolerance)
if totalSwinging == 0:
targetSpeed = float('inf')
else:
targetSpeed = jt * speedEfficiency * totalEffectiveTraction / totalSwinging
if totalSwinging > jt * totalEffectiveTraction:
if speedEfficiency > minSpeedEfficiency:
speedEfficiency *= .7
elif speed < 1000:
speedEfficiency *= 1.3
# But the speed shoudn't rise too much too quickly, since it would
# make the convergence drop dramatically.
maxRise = .5
speed = speed + min(targetSpeed - speed, maxRise * speed)
# Apply forces.
#
# Need to add a case if adjustSizes ("prevent overlap") is
# implemented.
for n in nodes:
swinging = n.mass * sqrt((n.old_dx - n.dx) * (n.old_dx - n.dx) + (n.old_dy - n.dy) * (n.old_dy - n.dy))
factor = speed / (1.0 + sqrt(speed * swinging))
n.x = n.x + (n.dx * factor)
n.y = n.y + (n.dy * factor)
values = {}
values['speed'] = speed
values['speedEfficiency'] = speedEfficiency
return values
try:
import cython
if not cython.compiled:
print("Warning: uncompiled fa2util module. Compile with cython for a 10-100x speed boost.")
except:
print("No cython detected. Install cython and compile the fa2util module for a 10-100x speed boost.")

View file

@ -0,0 +1,250 @@
# This is the fastest python implementation of the ForceAtlas2 plugin from Gephi
# intended to be used with networkx, but is in theory independent of
# it since it only relies on the adjacency matrix. This
# implementation is based directly on the Gephi plugin:
#
# https://github.com/gephi/gephi/blob/master/modules/LayoutPlugin/src/main/java/org/gephi/layout/plugin/forceAtlas2/ForceAtlas2.java
#
# For simplicity and for keeping code in sync with upstream, I have
# reused as many of the variable/function names as possible, even when
# they are in a more java-like style (e.g. camelcase)
#
# I wrote this because I wanted an almost feature complete and fast implementation
# of ForceAtlas2 algorithm in python
#
# NOTES: Currently, this only works for weighted undirected graphs.
#
# Copyright (C) 2017 Bhargav Chippada <bhargavchippada19@gmail.com>
#
# Available under the GPLv3
import random
import time
import numpy as np
import numpy
import scipy
from tqdm import tqdm
from . import fa2util
class Timer:
def __init__(self, name="Timer"):
self.name = name
self.start_time = 0.0
self.total_time = 0.0
def start(self):
self.start_time = time.time()
def stop(self):
self.total_time += (time.time() - self.start_time)
def display(self):
print(self.name, " took ", "%.2f" % self.total_time, " seconds")
class ForceAtlas2:
def __init__(self,
# Behavior alternatives
outboundAttractionDistribution=False, # Dissuade hubs
linLogMode=False, # NOT IMPLEMENTED
adjustSizes=False, # Prevent overlap (NOT IMPLEMENTED)
edgeWeightInfluence=1.0,
# Performance
jitterTolerance=1.0, # Tolerance
barnesHutOptimize=True,
barnesHutTheta=1.2,
multiThreaded=False, # NOT IMPLEMENTED
# Tuning
scalingRatio=2.0,
strongGravityMode=False,
gravity=1.0,
# Log
verbose=True):
assert linLogMode == adjustSizes == multiThreaded == False, "You selected a feature that has not been implemented yet..."
self.outboundAttractionDistribution = outboundAttractionDistribution
self.linLogMode = linLogMode
self.adjustSizes = adjustSizes
self.edgeWeightInfluence = edgeWeightInfluence
self.jitterTolerance = jitterTolerance
self.barnesHutOptimize = barnesHutOptimize
self.barnesHutTheta = barnesHutTheta
self.scalingRatio = scalingRatio
self.strongGravityMode = strongGravityMode
self.gravity = gravity
self.verbose = verbose
self.dataContainer = []
def init(self,
G, # a graph in 2D numpy ndarray format (or) scipy sparse matrix format
pos=None # Array of initial positions
):
isSparse = False
if isinstance(G, numpy.ndarray):
# Check our assumptions
assert G.shape == (G.shape[0], G.shape[0]), "G is not 2D square"
assert numpy.all(G.T == G), "G is not symmetric. Currently only undirected graphs are supported"
assert isinstance(pos, numpy.ndarray) or (pos is None), "Invalid node positions"
elif scipy.sparse.issparse(G):
# Check our assumptions
assert G.shape == (G.shape[0], G.shape[0]), "G is not 2D square"
assert isinstance(pos, numpy.ndarray) or (pos is None), "Invalid node positions"
G = G.tolil()
isSparse = True
else:
assert False, "G is not numpy ndarray or scipy sparse matrix"
# Put nodes into a data structure we can understand
nodes = []
for i in range(0, G.shape[0]):
n = fa2util.Node()
if isSparse:
n.mass = 1 + len(G.rows[i])
else:
n.mass = 1 + numpy.count_nonzero(G[i])
n.old_dx = 0
n.old_dy = 0
n.dx = 0
n.dy = 0
if pos is None:
n.x = random.random()
n.y = random.random()
else:
n.x = pos[i][0]
n.y = pos[i][1]
nodes.append(n)
# Put edges into a data structure we can understand
edges = []
es = numpy.asarray(G.nonzero()).T
for e in es: # Iterate through edges
if e[1] <= e[0]: continue # Avoid duplicate edges
edge = fa2util.Edge()
edge.node1 = e[0] # The index of the first node in `nodes`
edge.node2 = e[1] # The index of the second node in `nodes`
edge.weight = G[tuple(e)]
edges.append(edge)
return nodes, edges
# Given an adjacency matrix, this function computes the node positions
# according to the ForceAtlas2 layout algorithm. It takes the same
# arguments that one would give to the ForceAtlas2 algorithm in Gephi.
# Not all of them are implemented. See below for a description of
# each parameter and whether or not it has been implemented.
#
# This function will return a list of X-Y coordinate tuples, ordered
# in the same way as the rows/columns in the input matrix.
#
# The only reason you would want to run this directly is if you don't
# use networkx. In this case, you'll likely need to convert the
# output to a more usable format. If you do use networkx, use the
# "forceatlas2_networkx_layout" function below.
#
# Currently, only undirected graphs are supported so the adjacency matrix
# should be symmetric.
def forceatlas2(self,
G, # a graph in 2D numpy ndarray format (or) scipy sparse matrix format
pos=None, # Array of initial positions
iterations=100 # Number of times to iterate the main loop
):
# Initializing, initAlgo()
# ================================================================
# speed and speedEfficiency describe a scaling factor of dx and dy
# before x and y are adjusted. These are modified as the
# algorithm runs to help ensure convergence.
speed = 1.0
speedEfficiency = 1.0
nodes, edges = self.init(G, pos)
outboundAttCompensation = 1.0
if self.outboundAttractionDistribution:
outboundAttCompensation = numpy.mean([n.mass for n in nodes])
# ================================================================
# Main loop, i.e. goAlgo()
# ================================================================
barneshut_timer = Timer(name="BarnesHut Approximation")
repulsion_timer = Timer(name="Repulsion forces")
gravity_timer = Timer(name="Gravitational forces")
attraction_timer = Timer(name="Attraction forces")
applyforces_timer = Timer(name="AdjustSpeedAndApplyForces step")
# Each iteration of this loop represents a call to goAlgo().
niters = range(iterations)
if self.verbose:
niters = tqdm(niters)
for _i in niters:
for n in nodes:
n.old_dx = n.dx
n.old_dy = n.dy
n.dx = 0
n.dy = 0
# Barnes Hut optimization
if self.barnesHutOptimize:
barneshut_timer.start()
rootRegion = fa2util.Region(nodes)
rootRegion.buildSubRegions()
barneshut_timer.stop()
# Charge repulsion forces
repulsion_timer.start()
# parallelization should be implemented here
if self.barnesHutOptimize:
rootRegion.applyForceOnNodes(nodes, self.barnesHutTheta, self.scalingRatio)
else:
fa2util.apply_repulsion(nodes, self.scalingRatio)
repulsion_timer.stop()
# Gravitational forces
gravity_timer.start()
fa2util.apply_gravity(nodes, self.gravity, useStrongGravity=self.strongGravityMode)
gravity_timer.stop()
# If other forms of attraction were implemented they would be selected here.
attraction_timer.start()
fa2util.apply_attraction(nodes, edges, self.outboundAttractionDistribution, outboundAttCompensation,
self.edgeWeightInfluence)
attraction_timer.stop()
# Adjust speeds and apply forces
applyforces_timer.start()
values = fa2util.adjustSpeedAndApplyForces(nodes, speed, speedEfficiency, self.jitterTolerance)
speed = values['speed']
speedEfficiency = values['speedEfficiency']
applyforces_timer.stop()
self.dataContainer.append(np.array([(n.x, n.y) for n in nodes]))
if self.verbose:
if self.barnesHutOptimize:
barneshut_timer.display()
repulsion_timer.display()
gravity_timer.display()
attraction_timer.display()
applyforces_timer.display()
# ================================================================
return [(n.x, n.y) for n in nodes]
# A layout for NetworkX.
#
# This function returns a NetworkX layout, which is really just a
# dictionary of node positions (2D X-Y tuples) indexed by the node name.
def forceatlas2_networkx_layout(self, G, pos=None, iterations=100):
import networkx
assert isinstance(G, networkx.classes.graph.Graph), "Not a networkx graph"
assert isinstance(pos, dict) or (pos is None), "pos must be specified as a dictionary, as in networkx"
M = networkx.to_scipy_sparse_matrix(G, dtype='f', format='lil')
if pos is None:
l = self.forceatlas2(M, pos=None, iterations=iterations)
else:
poslist = numpy.asarray([pos[i] for i in G.nodes()])
l = self.forceatlas2(M, pos=poslist, iterations=iterations)
return dict(zip(G.nodes(), l))

View file

@ -0,0 +1,75 @@
from codecs import open
from os import path
from setuptools import setup
print("Installing fa2 package (fastest forceatlas2 python implementation)\n")
here = path.abspath(path.dirname(__file__))
# Get the long description from the README file
with open(path.join(here, 'README.md'), 'r') as f:
long_description = f.read()
print(">>>> Cython is installed?")
try:
from Cython.Distutils import Extension
from Cython.Build import build_ext
USE_CYTHON = True
print('Yes\n')
except ImportError:
from setuptools.extension import Extension
USE_CYTHON = False
print('Cython is not installed; using pre-generated C files if available')
print('Please install Cython first and try again if you face any installation problems\n')
print(">>>> Are pre-generated C files available?")
if USE_CYTHON:
ext_modules = [Extension('fa2.fa2util', ['fa2/fa2util.py', 'fa2/fa2util.pxd'], cython_directives={'language_level' : 3})]
cmdclass = {'build_ext': build_ext}
opts = {"ext_modules": ext_modules, "cmdclass": cmdclass}
elif path.isfile(path.join(here, 'fa2/fa2util.c')):
print("Yes\n")
ext_modules = [Extension('fa2.fa2util', ['fa2/fa2util.c'])]
cmdclass = {}
opts = {"ext_modules": ext_modules, "cmdclass": cmdclass}
else:
print("Pre-generated C files are not available. This library will be slow without Cython optimizations.\n")
opts = {"py_modules": ["fa2.fa2util"]}
# Uncomment the following line if you want to install without optimizations
# opts = {"py_modules": ["fa2.fa2util"]}
print(">>>> Starting to install!\n")
setup(
name='fa2',
version='0.3.5',
description='The fastest ForceAtlas2 algorithm for Python (and NetworkX)',
long_description_content_type='text/markdown',
long_description=long_description,
author='Bhargav Chippada',
author_email='bhargavchippada19@gmail.com',
url='https://github.com/bhargavchippada/forceatlas2',
download_url='https://github.com/bhargavchippada/forceatlas2/archive/v0.3.5.tar.gz',
keywords=['forceatlas2', 'networkx', 'force-directed-graph', 'force-layout', 'graph'],
packages=['fa2'],
classifiers=[
'Development Status :: 5 - Production/Stable',
'Intended Audience :: Science/Research',
'Topic :: Scientific/Engineering :: Mathematics',
'License :: OSI Approved :: GNU General Public License v3 (GPLv3)',
'Programming Language :: Python :: 2',
'Programming Language :: Python :: 3'
],
install_requires=['numpy', 'scipy', 'tqdm'],
extras_require={
'networkx': ['networkx'],
'igraph': ['python-igraph']
},
include_package_data=True,
**opts
)

View file

@ -0,0 +1,108 @@
from os import mkdir
from os.path import exists
from shutil import rmtree
from fa2 import ForceAtlas2
import pandas as pd
from scipy.io import mmread
import numpy as np
import subprocess
# smaller steps by:
# - decrease barnesHutOptimize
# - decrease gravity
# number of frames
frames = 2000
# load pca, SNN and label colours data
# the first 2 PC form PCA are used as initial conditions
# SNN is used for building the force directed graph
pca_data = pd.read_csv("./input/pca.csv", index_col = 0)
labels_col = pd.read_csv("./input/label_colours.csv", squeeze = True, index_col = 0)
snn = mmread("./input/SNN.smm")
# set initialposition as the first 2 PCs
positions = pca_data.values[:, 0:2]
# initialize force directed graph class instance
forceatlas2 = ForceAtlas2(outboundAttractionDistribution=False, linLogMode=False,
adjustSizes=False, edgeWeightInfluence=1.0,
jitterTolerance=1.0, barnesHutTheta = .8,
barnesHutOptimize=True, multiThreaded=False,
scalingRatio=2.0, strongGravityMode=True, gravity=1, verbose=True)
# run force directed graph; for each iterations generates the coordinates use din each frame
discard = forceatlas2.forceatlas2(G = snn, pos = positions, iterations = frames)
if exists("./input/buffers"):
rmtree("./input/buffers")
if exists("./input/frames"):
rmtree("./input/frames")
mkdir("./input/buffers")
mkdir("./input/frames")
for index in range(len(forceatlas2.dataContainer)):
positions = forceatlas2.dataContainer[index]
fname = "./input/buffers/{index}.csv".format(index = index)
np.savetxt(fname, positions, delimiter = ",")
print("Saving buffer: {index}".format(index = index))
# run R
subprocess.call(["Rscript", "make_plots.R"], shell = True)
# assemble the frames into a video
import cv2
import os
def sortImages(imgPath):
return int(os.path.splitext(imgPath)[0])
# Arguments
dir_path = './input/frames'
ext = "png"
output = "fdg.mp4"
images = []
for f in os.listdir(dir_path):
if f.endswith(ext):
images.append(f)
images = sorted(images, key = sortImages)
legend = cv2.imread("./input/legend.png")
lH, lW, chs = legend.shape
legend = legend[0:(lH-10), 10:lW]
legend = cv2.resize(legend, (0, 0), fx = .8, fy = .8)
lH, lW, chs = legend.shape
# Determine the width and height from the first image
image_path = os.path.join(dir_path, images[0])
frame = cv2.imread(image_path)
cv2.imshow('video',frame)
height, width, channels = frame.shape
# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'mp4v') # Be sure to use lower case
out = cv2.VideoWriter(output, fourcc, 30.0, (width+792, height))
import numpy as np
for image in images:
image_path = os.path.join(dir_path, image)
frame = cv2.imread(image_path)
frame = cv2.resize(frame, (width, height))
lh1 = width + lW
template = np.zeros((height, lW, 3), dtype = frame.dtype)
frame = np.hstack((frame, template))
frame[0:lH, width:lh1, :] = legend
#cv2.putText(frame, "by Dorin-Mirel Popescu", (width - 400, height - 30), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 255, 255), thickness = 2)
out.write(frame) # Write out frame to video
print(image)
# Release everything if job is finished
out.release()
cv2.destroyAllWindows()
print("The output video is {}".format(output))

View file

@ -0,0 +1,42 @@
setwd("~/Documents/MyTools/force_abstract_graph_2Danimation/")
buffers.addrs <- list.files("./input/buffers/", full.names=T)
data.colours <- as.vector(read.csv("./input/label_colours.csv")$LabelCols)
################################################################################################################
################################################################################################################
################################################################################################################
library(RColorBrewer)
library(dplyr)
library(plyr)
library(Seurat)
#c.unique <-as.vector( unique(data.colours))
#c.colours <- sample(colorRampPalette(brewer.pal(12, "Paired"))(length(c.unique)))
#data.colours <- factor(plyr::mapvalues(x=data.colours, from=c.unique, to = c.colours), levels = c.colours)
################################################################################################################
################################################################################################################
################################################################################################################
for(k in 1:length(buffers.addrs)){
buffer.addr <- buffers.addrs[k]
print(sprintf("Plotting frame %d", k))
buffer.data <- read.csv(buffer.addr, header = F)
buffer.data <- cbind(buffer.data, data.colours)
colnames(buffer.data) <- c("FDGX", "FDGY", "Colours")
limitX <- quantile(buffer.data$FDGX, c(.01, .99)) + c(-15000, 15000)
limitY <- 1.1 * quantile(buffer.data$FDGY, c(.01, .99)) + c(-15000, 15000)
plot.obj <- ggplot(data=buffer.data, aes(x = FDGX, y = FDGY))
plot.obj <- plot.obj + geom_point(show.legend=F, size = 1.5, color = as.vector(buffer.data$Colours))
plot.obj <- plot.obj + scale_color_manual(values=as.vector(buffer.data$Colours))
plot.obj <- plot.obj + theme(plot.background = element_rect(fill = "black"))
plot.obj <- plot.obj + scale_x_continuous(limits = limitX, expand = c(0, 0))
plot.obj <- plot.obj + scale_y_continuous(limits = limitY, expand = c(0, 0))
plot.obj <- plot.obj + theme(axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank())
fname <- file.path("./input/frames", sub(pattern=".csv", replacement=".png", x=basename(buffer.addr)))
png(fname, width = 2000, height = 2000)
print(plot.obj)
dev.off()
}

View file

@ -0,0 +1,71 @@
# import libraries
library(Seurat)
library(plyr)
seurat.obj.addr <- "../../seurat_data/liver_immune.RDS"
# a plotting function for indexed legend; special modifications for current script
plot.indexed.legend <- function(label.vector, color.vector, ncols = 2, left.limit = 3.4, symbol.size = 8, text.size = 10){
if (length(label.vector) != length(color.vector)){
stop("number of labels is different from number colors\nAdvice: learn to count!")
}
if (length(ncol) > length(label.vector)){
stop("You cannot have more columns than labels\nSolution: Learn to count")
}
indices.vector <- 1:length(label.vector)
label.no <- length(label.vector)
nrows <- ceiling(label.no / ncols)
legend.frame <- data.frame(X = rep(0, label.no), Y = rep(0, label.no), CS = color.vector, Txt = label.vector)
for (i in 1:label.no){
col.index <- floor(i / (nrows + 1)) + 1
row.index <- 15 - ((i - 1) %% nrows + 1)
legend.frame[i, 1] <- (col.index - 1) * 2
legend.frame[i, 2] <- row.index
}
plot.obj <- ggplot(data = legend.frame, aes(x = X, y = Y))
plot.obj <- plot.obj + geom_point(size = symbol.size, colour = color.vector)
plot.obj <- plot.obj + scale_x_continuous(limits = c(0, left.limit)) + theme_void()
plot.obj <- plot.obj + annotate("text", x=legend.frame$X+.1, y = legend.frame$Y, label=legend.frame$Txt, hjust = 0, size = text.size, colour = "white")
plot.obj <- plot.obj + theme(panel.background = element_rect(fill='black'))
return(plot.obj)
}
# load the seurat object
print("Loading the data ... ")
seurat.obj <- readRDS(seurat.obj.addr)
cell.type.to.colour <- read.csv("./liver_cell_type_colours.csv")
seurat.obj <- SetAllIdent(object=seurat.obj, id="cell.labels")
################################################
print("saving pca data ...")
pca.data <- seurat.obj@dr$pca@cell.embeddings
write.csv(pca.data, "./input/pca.csv")
################################################
print("Computing and saving KNN graph ...")
seurat.obj <- BuildSNN(object=seurat.obj, reduction.type="pca", dims.use=1:20, plot.SNN=F, force.recalc=T)
writeMM(obj=seurat.obj@snn, file="./input/SNN.smm")
labels <- as.vector(seurat.obj@ident)
labels.unique <- unique(labels)
filter.key <- cell.type.to.colour$CellTypes %in% labels.unique
cell.labels <- cell.type.to.colour$CellTypes[filter.key]
cell.colours <- cell.type.to.colour$Colours[filter.key]
labels.cols <- mapvalues(x=labels, from=as.vector(cell.labels), to=as.vector(cell.colours))
write.csv(data.frame(LabelCols = labels.cols), "./input/label_colours.csv")
png("./input/legend.png", width = 1000, height = 800)
legend.plt <- plot.indexed.legend(label.vector=cell.labels, color.vector=cell.colours, left.limit=3.6, text.size=10, ncols=2, symbol.size = 15)
print(legend.plt)
dev.off()
print("End")

View file

@ -0,0 +1,11 @@
#!/bin/bash
#$ -cwd
#$ -N prepare_input
#$ -V
#$ -l h_rt=23:59:59
#$ -l h_vmem=400G
Rscript prepare_input.R
echo "End on `date`"

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

Binary file not shown.

After

Width:  |  Height:  |  Size: 39 KiB

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,112 @@
# labels have been updated, should remove the part that overwrites cell labels
# must create functions that handle the formation of FDG animation:
# - data writter
# - plotting that takes dimenssion parameters
library(plyr)
library(RColorBrewer)
library(Seurat)
seurat.addr <- "../../data/test_yolk_sac_subset.RDS"
seurat.obj <- readRDS(seurat.addr)
cell.type.to.colour <- read.csv("../../resources/test_yolk_sac_fdg_colour_key.csv")
print("Checking for doublets:")
print(table(seurat.obj@meta.data$doublets))
# a plotting function for indexed legend; special modifications for current script
plot.indexed.legend <- function(label.vector, color.vector, ncols = 2, left.limit = 3.4, symbol.size = 8, text.size = 10){
if (length(label.vector) != length(color.vector)){
stop("number of labels is different from number colors\nAdvice: learn to count!")
}
if (length(ncol) > length(label.vector)){
stop("You cannot have more columns than labels\nSolution: Learn to count")
}
indices.vector <- 1:length(label.vector)
label.no <- length(label.vector)
nrows <- ceiling(label.no / ncols)
legend.frame <- data.frame(X = rep(0, label.no), Y = rep(0, label.no), CS = color.vector, Txt = label.vector)
for (i in 1:label.no){
col.index <- floor(i / (nrows + 1)) + 1
row.index <- 15 - ((i - 1) %% nrows + 1)
legend.frame[i, 1] <- (col.index - 1) * 2
legend.frame[i, 2] <- row.index
}
plot.obj <- ggplot(data = legend.frame, aes(x = X, y = Y))
plot.obj <- plot.obj + geom_point(size = symbol.size, colour = color.vector)
plot.obj <- plot.obj + scale_x_continuous(limits = c(0, left.limit)) + theme_void()
plot.obj <- plot.obj + annotate("text", x=legend.frame$X+.1, y = legend.frame$Y, label=legend.frame$Txt, hjust = 0, size = text.size, colour = "white")
plot.obj <- plot.obj + theme(panel.background = element_rect(fill='black'))
return(plot.obj)
}
# a plotting function for indexed legend
plot.indexed.legend <- function(label.vector, color.vector, ncols = 2, left.limit = 3.4, symbol.size = 8, text.size = 10, padH = 1, padV = 1, padRight = 0){
if (length(label.vector) != length(color.vector)){
stop("number of labels is different from number colors\nAdvice: learn to count!")
}
if (length(ncol) > length(label.vector)){
stop("You cannot have more columns than labels\nSolution: Learn to count")
}
indices.vector <- 1:length(label.vector)
label.no <- length(label.vector)
nrows <- ceiling(label.no / ncols)
legend.frame <- data.frame(X = rep(0, label.no), Y = rep(0, label.no), CS = color.vector, Txt = label.vector)
legend.frame$X <- rep(1:ncols, each=nrows)[1:nrow(legend.frame)]
legend.frame$Y <- rep(nrows:1, times = ncols)[1:nrow(legend.frame)]
Xrange <- range(legend.frame$X)
Yrange <- range(legend.frame$Y)
plot.obj <- ggplot(data = legend.frame, aes(x = X, y = Y))
plot.obj <- plot.obj + geom_point(size = symbol.size, colour = color.vector)
plot.obj <- plot.obj + scale_x_continuous(limits = c(Xrange[1] - padRight, Xrange[2] + padH))
plot.obj <- plot.obj + scale_y_continuous(limits = c(Yrange[1] - padV, Yrange[2] + padV))
plot.obj <- plot.obj + theme_void()
plot.obj <- plot.obj + annotate("text", x=legend.frame$X, y = legend.frame$Y, label = indices.vector, size = text.size)
plot.obj <- plot.obj + annotate("text", x=legend.frame$X+.1, y = legend.frame$Y, label=legend.frame$Txt, hjust = 0, size = text.size, colour = "white")
plot.obj <- plot.obj + theme(panel.background = element_rect(fill='black'))
return(plot.obj)
}
pca.data <- seurat.obj@dr$pca@cell.embeddings
write.csv(pca.data, "./input/pca.csv")
seurat.obj <- BuildSNN(object=seurat.obj, reduction.type="pca", dims.use=1:20, plot.SNN=F,force.recalc=T)
writeMM(obj=seurat.obj@snn, file="./input/SNN.smm")
labels <- as.vector(seurat.obj@meta.data$cell.labels)
labels.unique <- unique(labels)
print("printing cell.type.to.colour")
print(cell.type.to.colour)
print("!is.na(cell.type.to.colour)")
print(!is.na(cell.type.to.colour))
if(!is.na(cell.type.to.colour)){
cell.labels <- as.vector(cell.type.to.colour$CellTypes)
cell.colours <- as.vector(cell.type.to.colour$Colours)
filter.key <- cell.labels %in% labels.unique
cell.labels <- cell.labels[filter.key]
cell.colours <- cell.colours[filter.key]
}else{
cell.labels <- labels.unique
set.seed(100)
cell.colours <- sample(colorRampPalette(brewer.pal(12, "Paired"))(length(labels.unique)))
}
print("printing cell.labels")
print(cell.labels)
print("printing cell.colours")
print(cell.colours)
labels.cols <- mapvalues(x=labels, from=cell.labels, to=cell.colours)
write.csv(data.frame(LabelCols = labels.cols), "./input/label_colours.csv")
png("./input/legend.png", width = 1000, height = 700)
legend.plt <- plot.indexed.legend(label.vector=cell.labels, color.vector=cell.colours, ncols=2, left.limit=0, symbol.size=17, text.size=10, padH=.9, padV=.6)
print(legend.plt)
dev.off()
print("ended beautifully")

View file

@ -0,0 +1,11 @@
#!/bin/bash
#$ -cwd
#$ -N write_data
#$ -V
#$ -l h_rt=23:59:59
#$ -l h_vmem=400G
Rscript write_data.R
echo "End on `date`"