Managing Your Python Environment

Overview

Teaching: 45 min
Exercises: 30 min

Questions

What Python packages are available to me?

How do I control the packages that are available to me?

Objectives

Learn how to create, select and manage conda environments.

When you run Python, either from the command line (whether from a terminal window from your laptop or within the Hopper Desktop on your browser), there will be certain functions and libraries installed, but there are many more functions and libraries that are available which are not part of the default setup on Hopper.

If you run Python from a JupyerLab session, you may see a clue that there are options. On the upper right corner of your workspace, there will be a small circle, with a string to its left. The string is the name of the kernel that your notebook is running. Jupyter dynamically creates a kernel that meets the specifications of a particular environment. Click on the string, and you will see some alternatives listed. Different environments can include different versions of the same software, or completely different functions and libraries. What do you do when there is a certain Python function or library you want to use, but the import command returns a message that the function or library is not available?

Personalizing your environment

There are two main ways to maintain environments:

Python virtual environments

On Hopper, the default system is to use Python Virtual Environments. In this approach, commands and libraries are packaged into separate modules that can be individually loaded. This system of modules is much broader than just for Python - it is a general system for making software available on Linux/Unix systems. Recall when we used the ncks command to peer inside NetCDF files, we had to load a module called nco. That did not involve Python at all.

The module system requires system administrators to install software before you can use it. If the software you want is not already avaiable on the system, you will have to submit a request in the same way you would request any other assistance from ORC: by sending an email to orchelp@gmu.edu.

Conda environments

A more customizable and independent way to maintain environments specific to Python and a number of other programming languages is to use Conda. Conda is an open-source package management system and environment management system that ensures the versions of various packages (a set of related software functions and/or libraries) within an environment are all consistent with each other… 99% of the time. Occasionally it fails to ensure consistency - an occurrence that is usually the fault of the software engineers maintaining one or more of the packages. But such problems are rare, and outweighed by the benefits of customizability.

It is common to start a new environment for each project (e.g., each manuscript you submit to a peer-reviewed journal). This is done to maintain back-compatability. You do not want to find that the scripts you used to make the figures for a paper are broken when you come back to revise the manuscript, because a software update for one of your packages changed the function names, arguments or parameters (this does happen, especially when software versions change their first-digit numbers, e.g., from v0.5 to v1.0) By keeping an environment frozen with all the Python scripts and notebooks that are working correctly, you can ensure that they will still work correctly weeks, months, even years later.

Setting up Conda on your Hopper account

From a terminal session on Hopper, edit your .bashrc file and make sure you are not loading the Python module - we will be invoking versions of Python from our Conda environments from now on, not from modules.

#module load python

The hash # at the start of the line will comment out that line so it is not executed.
Save your edited .bashrc file, and then reinitialize your shell:

$ source .bashrc

Then we will need to unload the Python module if is loaded. To see your loaded modules, type:

$ module list

If you see a module there starting with python, unload it:

$ module unload python

Now we are ready to run Conda. From the terminal command line on Hopper, type two commands:

$ module load anaconda3
$ conda init bash

Now, you should see something different in your command line prompt. Before it looked something like this:

[yourname@hopper1 ~]$

But now it looks like this:

(base) [yourname@hopper1 ~]$

That extra word in parentheses is your Conda environment. By default it is base.

Let’s have a look at the environments available to you:

$ conda env list

You will see a list of Conda environment names, along with their paths - all (probably) residing under the /opt first-level directory on the cluster.

If you look closely, there is one called clim680. We are not going to use that one. Why? Because it is owned by opsadmin and not us, which means we cannot modify it. The whole point of Conda environments is that they are customizable. Instead, we will each create our own copy of that environment, and add some missing packages that we will need for this class.

Making a new environment

There are several ways to make a new environment in Conda. One way is to create and share .yaml files (YAML stands for “yet another markup language”) that document all the pakages and versions in an environment, so that they can be duplicated by downloading and installing all the right files.

Since we have the clim680 environment on our system already, we will clone it to a new name and location under our home directory, and then update it to suit our needs. Let’s name the new environment for this class clim_data:

$ conda create --clone clim680 --name clim_data

It will take some time to complete this command. Conda will verify that all of the packages are compatable with one another before installing the copy in our home directory.

A little longer than a few minutes later...

Once complete, you can see that your new environment is listed among the others:

$ conda env list

Furthermore, you will see that your new environment’s path is under your home directory, not a system directory. It belongs to you!

Choosing and updating environments

From the command like in your bash shell, we can change the active environment with the activate command in Conda:

$ conda activate clim_data

Your new environment is the active environment. If we install new packages now, they will go into this environment as updates.

Let’s install a few packages that we will need later in this class: * cftime and cfgrib will expand the capabilities of xarray to read and interpret self-describing data files. * metpy is a handy Python package of functions for Atmospheric Science, which includes the tracking and conversion of units. * esmpy is a Python package for functions used in the Earth System Modeling Framework (ESMF) - also very useful for Climate Science.

$ conda install -c conda-forge cftime cfgrib metpy esmpy

Here, conda-forge is the name of the main channel for software in the Conda universe - it is the most complete and most likely to have all the things we need. You can set up a prioritized list of channels to search for packages. The Conda documentation shows you how. Some specialty packages are only available on specific sites, and may not be on the most popular sites.

You will notice that Conda will do more than simply locate and install these specific packages. It also determines other necessary packages, called dependencies, and flags them for downloading as well. It may also determine that versions of packages already installed need to change - to either newer or older versions. This is all part of Conda’s effort to maintain consistency so that everything will work smoothly.

Conda will also ask you if you want to proceed with the changes - you may notice that it is proposing to make a change you do not want, e.g., it might propose to downgrade an important package to an older version that is lacking important functionality.

Proceed ([y]/n)?

In this case, enter y and proceed. Again, you will have to wait several minutes for the verification and execution of the transaction.

Because you have installed additional packages, your clim_data environment is now different than the clim680 environment you cloned.

Making your new environment accessible to JupyterLab

The last setup step is making your new environment available as a choice when you are running JupyterLab. To do that, we need to make ipython, the interactive version of Python upon which Jupyter and other interactive interfaces such as Spyder are based, aware of your new environment.

From your command line, be sure the clim_data environment is still the active one, and type:

$ ipython kernel install --user --name=clim_data

You could give it a different name: this is the name of the kernel you will see listed among the choices in JupyterLab. However, it is usually less confusing to keep the name of the kernel the same as the name of its associated environment.

A few other useful Conda commands

We can list the versions of all Python packages we have access to in the active environment with the command:

$ conda list

This list is very long. We can list specific packages, or use wildcards to find packages whose name contains a particular string.

$ conda list xarray

# Name                    Version                   Build  Channel
xarray                    0.20.1                   pypi_0    pypi

You can also search the contents of a channel for packages, also using wildcards if you want:

$ conda search -c conda-forge metpy

# Name                       Version           Build  Channel
metpy                          0.3.0          py27_0  conda-forge
metpy                          0.3.0          py27_1  conda-forge
.
.
.
metpy                          1.2.0    pyhd8ed1ab_0  conda-forge
metpy                          1.3.0    pyhd8ed1ab_0  conda-forge
metpy                          1.3.1    pyhd8ed1ab_0  conda-forge

Specific versions or ranges of versions of packages can be queried:

$ conda search -c conda-forge "metpy>=1.0"

# Name                       Version           Build  Channel
metpy                            1.0    pyhd8ed1ab_0  conda-forge
metpy                          1.0.1    pyhd8ed1ab_0  conda-forge
metpy                          1.1.0    pyhd8ed1ab_0  conda-forge
metpy                          1.2.0    pyhd8ed1ab_0  conda-forge
metpy                          1.3.0    pyhd8ed1ab_0  conda-forge
metpy                          1.3.1    pyhd8ed1ab_0  conda-forge

You can create a catalog of the current state of this environment by writing it out to a .yaml file:

$ conda env export > clim_data.yaml

Later, if you want to update it with changes you have made (e.g., after installing more packages), you can update it:

$ conda env update -f clim_data.yaml

You can share a .yaml file with colleagues so that they will have the same environment, and thus the same functionality, as you. This can be crucial if you are sharing a Python script or notebook. Your colleague may find your notebook does not work properly in any of their environments. Sharing the environment that matches the script or notebook is a great way to ensure functionality.

If someone shares an environment file with you, you can make a Conda environment from it with the env create command:

$ conda env create --file <colleague's_file>.yaml

The name of the new environment is not necessarily the same as the filename - the first line of the .yaml file will contain a line that names the environment, e.g.:

name: hopper_test

Be sure that the name is different from any of your existing environments! If it is not, you can edit the .yaml file and change the name before creating the new environment.

Do I have to do this every time?

Whether you are running Python from the command line, a Jupyter session or submitting a batch job using slurm, you will want to ensure the correct environment is active before you start running. But the initialization and installation steps described above do not need to be repeated.

We’ve given you just enough information to get a better environment setup for this class, but there’s lots more if you are interested. The conda User’s Guide can help.

Additionally, there is the Conda cheat sheet - a handy two-page guide to common commands and syntax.

Key Points

Conda environments allow you to install and manage Python packages to suit your needs.

Conda environments let you keep a stable set of software versions for your projects, so that code remains functional and backwards-compatible.

lesson home

Managing Conda Environments

next episode