Caching module locations in Python

The misery of importing modules to python

Whenever python executes an import statement, it (more or less) traverses the directories in sys.path to locate the module. This takes time, particularly when the directories are not in the filesystem cache (e.g. at system startup) and when you have a lot of modules installed. I noticed because I had calendarserver installed on my laptop (and using zope.interface below stems from looking why it needs 10 seconds to start at boot).

What to do?

In Debian (or other distributions) the contents of most of those directories (e.g. those in /usr/lib/*) are installed to only by ways defined by the distribution. Thus it would seem to be advantagous to cache the lookups.

Here is a small hackish proof of concept: dirscan.py (run as root if you dare...) creates files .tvmodulecache for all the possible import targets it finds for directories in sys.path that are under /usr/lib or /usr/share. Then importlookupcache.py installs import hooks to have import use the cache when possible. In order to benefit from the cache, supply -mimportlookupcache as a command line parameter to python (before the name of the script you are runnung).

Does this speed things up?

A bit when the caches are cold. As a totally unscientific experiment I ran (only once, no averaging) time python -c "import zope.interface" against time python -mimportlookupcache -c "import zope.interface". (FS cache dropping with sync ; echo 3 | sudo tee /proc/sys/vm/drop_caches.)

After dropping FS cache: raw: 5.509s, with import cache: 3.130s.
There is some degeneration for FS cached directories: raw: 0.085s, with import cache: 0.123s.

Quite likely, the cache organization and things could be improved. Also, the implementation probably has bugs for funny cases of the import statement.


Copyright 2009 Thomas Viehmann