How to use graph-tool on a mounted drive using WSL
Two of the biggest pains of being a data scientists is having problems
installing a specific package or tool, and having problems with
accessing data. This post will walk you through the process of
installing the Python network library graph-tool
on a
Windows machine using Windows Subsystem for Linux (WSL), and
how to run a jupyter notebook in a network drive.
The blog post will as such walk you through the following steps:
- Install Ubuntu with WSL
- Mount a network drive from your local machine to the Ubuntu
- Install and configure miniconda as the python environment and package manager on Ubuntu
- Install graph-tool (and the hSBM-TM package)
Motivation
Many a battle have been fought trying to make the
graph-tool
library accessable on Windows OS. Two things
make the library attractive for data scientists working with network
graphs in Python: 1) The core data structures and algorithms are
implemented in C++, boosting the performance significantly when working
with large networks compared to other libraries. 2) The library provides
many advanced methods for manipulating and analysing networks, including
sofisticated statistical methods such as Stochastic Block
Modelling (SBM).
The library is however also notoriously difficult to install on a machine with Windows OS. Having done it once on docker, we recently faced the challenge of making it work on a another Windows machine, and opted for the Linux installation this time, utalizing the possibility of installing Ubuntu with WSL. To top it off, we had to conduct our analysis on a network drive, ensuring compliance with GDPR.
Install WSL
Install Ubuntu as a WSL through the Windows Store: https://www.microsoft.com/en-us/p/ubuntu-2004-lts/9n6svws3rx71
After installing Ubuntu, and running it you will see a new terminal. This is how you interact with the operating system, and if the first time you run it you will be prompted to set a user name and password.
Mount the network drive
Per default, your systems local drive is mounted to WSL in the location /mnt/c. Through this you can acess your local files.
To be able to connect to a network drive which is already connected to the Windows system, we have to do two simple steps. First of all we have to create a folder in which the mounted content will be.
In the Ubuntu terminal, create a new folder in the mnt (mount) folder.
sudo mkdir /mnt/S
Mount the drive from your local machine to WSL. You have to have opened the S-drive in windows file explorer first for it to work on Ubuntu.
sudo mount -t drvfs S: /mnt/S
You have to mount the S-drive every time you start a new WSL instance.
Install graph tool
I found the easiest way to install the graph-tool (https://graph-tool.skewed.de/) library on linux to be through conda.
To install conda in WLS, download the miniconda installation file with the following one line
curl -sL "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" > "Miniconda3.sh"
Install the file and remove it
# Install
bash Miniconda3.sh
# Remove
rm Miniconda3.sh
Close the Ubuntu terminal, and open cmd.exe (Windowns Command Prompt)
Shut down WSL in cmd
wsl --shutdown
Open the Ubuntu terminal and Update Conda
conda update conda
Create a new conda environment and install graph-tool and jupyter
conda create --name gt -c conda-forge graph-tool jupyter pandas
Not all the dependencies were installed correctly, so I had to install some of them manually.
sudo run apt-get update
sudo apt install libgtk-3-0
You can activate the newly created conda environment with
conda activate gt
Install HSBM-TM functions
In this section we will walk through the process of downloading the extended Topic Modelling utility for graph-tool’s hierarchical stocastich block models.
To make the HSBM-TM module available in the environement, we have to install it in the folder in which the conda environment looks for packages. The path (should only) vary depending on the environment name and python version.
cd ~/miniconda3/envs/ENVIRONMENT/lib/pythonVERSION/site-packages
In our case, it was.
cd ~/miniconda3/envs/gt/lib/python3.10/site-packages
If you cannot find the path, try starting python, import a module eg.
graph_tool and call graph_tool.__file__
When you are inside this folder, you can download the hSBM package through the following command.
git clone https://github.com/martingerlach/hSBM_Topicmodel.git
Set up jupyter notebook
We have to make a few changes to the way jupyter notebook works in order for it to properly run on the windows system running the WSL.
First, generate the notebook configuration file
jupyter notebook --generate-config
Then open the file
nano ~/.jupyter/jupyter_notebook_config.py
Uncomment (remove the “#”-bit at the start of the row) the following line and set to False
c.NotebookApp.use_redirect_file = False
For convenience, also consider uncomment and change the default port through the following line
c.NotebookApp.port = 8889
To exit the file press Ctrl+X, press Y (yes) to save, and then Enter to exit
Then, add your windows browser to the following file
nano ~/.bashrc
Scroll to the bottom of the file and add the following
# Specify the path of the windows browser
export BROWSER='/mnt/c/Program Files (x86)/Google/Chrome/Application/chrome.exe'
Use the package
Activate the conda environment
conda activate gt
Open a notebook (If you did not change the default in
jupyter_notebook_config.py
, manually specifying a different
port than usual may be helpful, so you can also have a notebook server
running on windows at the same time. If you did change the default you
can leave out the –port 8889 argument)
jupyter notebook --port 8889
Import the package from the folder like so:
from hSBM_Topicmodel.sbmtm import sbmtm
(Examples of usages can be found https://github.com/martingerlach/hSBM_Topicmodel/blob/master/TopSBM-tutorial.ipynb)
Deactivate a conda enviroment with
conda deactivate
Extra
If you want to make your life easier, you could make a bash file in your desired folder, which automates activating your environment and mounting the specified webdrive.
nano setup.sh
And then paste the following
#!/bin/bash
# Activate conda environment
if [ $CONDA_DEFAULT_ENV == "gt" ]
then
echo "Environment is already activated"
else
echo "Activating environment"
source activate gt
fi
# Mount drive
if grep -qs "/mnt/S " /proc/mounts
then
echo "Drive is already mounted"
else
echo "Mounting drive"
sudo mount -t drvfs S: /mnt/S
fi
# Change to project folder
cd /mnt/S/Name_of_project_folder/
Run the file with
source setup.sh