Library Database API#
This page describes the internal API of beets’ core database features. It doesn’t exhaustively document the API, but is aimed at giving an overview of the architecture to orient anyone who wants to dive into the code.
The Library object is the central repository for data in beets. It represents
a database containing songs, which are Item instances, and groups of items,
which are Album instances.
The Library Class#
The Library is typically instantiated as a singleton. A single invocation of
beets usually has only one Library. It’s powered by dbcore.Database
under the hood, which handles the SQLite abstraction, something like a very
minimal ORM. The library is also responsible for handling queries to retrieve
stored objects.
Overview#
You can add new items or albums to the library via the Library.add()
and Library.add_album() methods.
You may also query the library for items and albums using the
Library.items(), Library.albums(), Library.get_item()
and Library.get_album() methods.
Any modifications to the library must go through a Transaction object,
which you can get using the Library.transaction() context manager.
Model Classes#
The two model entities in beets libraries, Item and Album, share a base
class, LibModel, that provides common functionality. That class itself
specialises beets.dbcore.Model which provides an ORM-like abstraction.
To get or change the metadata of a model (an item or album), either access its
attributes (e.g., print(album.year) or album.year = 2012) or use the
dict-like interface (e.g. item['artist']).
Model base#
Models use dirty-flags to track when the object’s metadata goes out of sync with
the database. The dirty dictionary maps field names to booleans indicating
whether the field has been written since the object was last synchronized (via
load or store) with the database. This logic is implemented in the model base
class LibModel and is inherited by both Item and Album.
We provide CRUD-like methods for interacting with the database:
The base class beets.dbcore.Model has a dict-like interface, so
normal the normal mapping API is supported:
Item#
Each Item object represents a song or track. (We use the more generic term
item because, one day, beets might support non-music media.) An item can either
be purely abstract, in which case it’s just a bag of metadata fields, or it can
have an associated file (indicated by item.path).
In terms of the underlying SQLite database, items are backed by a single table
called items with one column per metadata fields. The metadata fields currently
in use are listed in library.py in Item._fields.
To read and write a file’s tags, we use the MediaFile library. To make changes
to either the database or the tags on a file, you update an item’s fields (e.g.,
item.title = "Let It Be") and then call item.write().
Items also track their modification times (mtimes) to help detect when they become out of sync with on-disk metadata, mainly to speed up the update (which needs to check whether the database is in sync with the filesystem). This feature turns out to be sort of complicated.
For any Item, there are two mtimes: the on-disk mtime (maintained by the OS)
and the database mtime (maintained by beets). Correspondingly, there is on-disk
metadata (ID3 tags, for example) and DB metadata. The goal with the mtime is to
ensure that the on-disk and DB mtimes match when the on-disk and DB metadata are
in sync; this lets beets do a quick mtime check and avoid rereading files in
some circumstances.
Specifically, beets attempts to maintain the following invariant:
If the on-disk metadata differs from the DB metadata, then the on-disk mtime must be greater than the DB mtime.
As a result, it is always valid for the DB mtime to be zero (assuming that real
disk mtimes are always positive). However, whenever possible, beets tries to set
db_mtime = disk_mtime at points where it knows the metadata is synchronized.
When it is possible that the metadata is out of sync, beets can then just set
db_mtime = 0 to return to a consistent state.
This leads to the following implementation policy:
On every write of disk metadata (
Item.write()), the DB mtime is updated to match the post-write disk mtime.Same for metadata reads (
Item.read()).On every modification to DB metadata (
item.field = ...), the DB mtime is reset to zero.
Album#
An Album is a collection of Items in the database. Every item in the database
has either zero or one associated albums (accessible via item.album_id). An
item that has no associated album is called a singleton. Changing fields on an
album (e.g. album.year = 2012) updates the album itself and also changes the
same field in all associated items.
An Album object keeps track of album-level metadata, which is (mostly) a
subset of the track-level metadata. The album-level metadata fields are listed
in Album._fields. For those fields that are both item-level and album-level
(e.g., year or albumartist), every item in an album should share the
same value. Albums use an SQLite table called albums, in which each column
is an album metadata field.
Note
The Album.items() method is not inherited from
LibModel.items() for historical reasons.
Transactions#
The Library class provides the basic methods necessary to access and
manipulate its contents. To perform more complicated operations atomically, or
to interact directly with the underlying SQLite database, you must use a
transaction (see this blog post for motivation). For example
lib = Library()
with lib.transaction() as tx:
items = lib.items(query)
lib.add_album(list(items))
The Transaction class is a context manager that provides a
transactional interface to the underlying SQLite database. It is responsible for
managing the transaction’s lifecycle, including beginning, committing, and
rolling back the transaction if an error occurs.
Migrations#
The database layer includes a first-class migration system for data changes that must happen alongside schema evolution. This keeps compatibility work explicit, testable, and isolated from normal query and model code.
Each database subclass declares its migrations in _migrations as pairs of a
migration class and the model classes it applies to. During startup, the
database creates required tables and columns first, then executes configured
migrations.
Migration completion is tracked in a dedicated migrations table keyed by
migration name and table name. This means each migration runs at most once per
table, so large one-time data rewrites can be safely coordinated across
restarts.
The migration name is derived from the migration class name. Because that name
is the persisted identity in the migrations table, renaming a released
migration class changes its identity and can cause the migration to run again.
Treat migration class names as stable once shipped.
For example, MultiGenreFieldMigration becomes multi_genre_field. After
it runs for the items table, beets records a row equivalent to:
name = "multi_genre_field", table_name = "items"
Common use cases include:
Backfilling a newly introduced canonical field from older data.
Normalizing legacy free-form values into a structured representation.
Splitting mixed-content legacy fields into cleaned primary content plus auxiliary metadata stored as flexible attributes.
To add a migration:
Create a
beets.dbcore.db.Migrationsubclass.Implement the table-specific data rewrite logic in
_migrate_data.Register the migration in the database subclass
_migrationslist for the target models.
In practice, migrations should be idempotent and conservative: gate behavior on the current schema when needed, keep writes transactional, and batch large updates so startup remains predictable for real libraries.
Queries#
To access albums and items in a library, we use Queries. In
beets, the Query abstract base class represents a criterion that
matches items or albums in the database. Every subclass of Query must
implement two methods, which implement two different ways of identifying
matching items/albums.
The clause() method should return an SQLite WHERE clause that matches
appropriate albums/items. This allows for efficient batch queries.
Correspondingly, the match(item) method should take an Item object and
return a boolean, indicating whether or not a specific item matches the
criterion. This alternate implementation allows clients to determine whether
items that have already been fetched from the database match the query.
There are many different types of queries. Just as an example,
FieldQuery determines whether a certain field matches a certain value
(an equality query). AndQuery (like its abstract superclass,
CollectionQuery) takes a set of other query objects and bundles them
together, matching only albums/items that match all constituent queries.
Beets has a human-writable plain-text query syntax that can be parsed into
Query objects. Calling AndQuery.from_strings parses a list of query
parts into a query object that can then be used with Library objects.