Package Name Normalization for Pip Installing

Name normalization

Name normalization happens when using pip to install python package. Dash '-', underscore '_' and period '.'are conflated based on certain rules when pip searchs for packages.

Find packages

For example, we use the following command to install a package aaa.bbb_ccc.

pip install aaa.bbb_ccc

pip will search for the package aaa.bbb_ccc on PyPI. The “best” match for the requirements is selected (see pip guide and source code for details). Loosely speaking, the “best” match is the newest version of the package.

Matching wheel names

  • The package name aaa.bbb_ccc is transformed to aaa-bbb-ccc by calling canonicalize_name(source code)
# extracted from pip source code
_canonicalize_regex = re.compile(r"[-_.]+")
 
def canonicalize_name(name):
    return _canonicalize_regex.sub("-", name).lower()
  • The wheel name is transformed in the same way (source code)

Matching tarball names

  • aaa.bbb_ccc is converted to aaa.bbb-ccc by calling safe_name (source code)
# extracted from pip source code
def safe_name(name):
    """Convert an arbitrary string to a standard distribution name
    Any runs of non-alphanumeric/. characters are replaced with a single '-'.
    """
    return re.sub('[^A-Za-z0-9.]+', '-', name)
  • '_' in the tarball name is replaced by '-' (source code)

Package name convention in PEP 8

PEP 8 doesn’t encourage a long and complicated package name.

# extracted from PEP 8
Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.
Avatar
Kun Liu
Data Scientist
comments powered by Disqus