Simplify store: replace pair mappings with explicit groups #11

New issue

Closed

opened 2026-05-17 06:20:08 +00:00 by victor · 0 comments

victor commented

2026-05-17 06:20:08 +00:00

Owner

Problem

The current store models duplicate relationships as a flat list of track-to-track pairs (a graph of edges). Groups are only ever reconstructed at query time by computing the transitive closure of those edges. This produces several complexities:

group_undecided does graph traversal in application code to reassemble groups from pairs.
The "reopen" invariant is maintained by a consistency SQL trigger that fires when a new pair connects to an already-decided group — subtle and hard to reason about.
Queries like decided_groups_with_rating_mismatch join across multiple tables and re-derive group membership.
The excluded flag on pairs (from split_track) is a workaround for pairs that are edges in the graph but should not be treated as same-group.

Proposed change

Replace the pairs/decisions tables with an explicit groups model:

CREATE TABLE groups (
    id          INTEGER PRIMARY KEY,
    canonical_id TEXT    -- NULL = undecided
);

CREATE TABLE group_members (
    group_id    INTEGER NOT NULL REFERENCES groups(id) ON DELETE CASCADE,
    track_id    TEXT    NOT NULL,
    match_method TEXT,          -- how this track was linked (fuzzy/mbid/manual)
    PRIMARY KEY (group_id, track_id)
);

Key behaviours that change:

find: instead of upserting pairs, upsert group members directly. Two tracks that match go into the same group (merge if they were in separate groups).
evaluate/review: set canonical_id on the group.
sync: upserting a track that already belongs to a group leaves the group intact; removing a track that is the canonical reopens the group (sets canonical_id = NULL).
split_track: remove a member from a group (and optionally form a new singleton group).
export: read group_members WHERE group_id IN (SELECT id FROM groups WHERE canonical_id IS NOT NULL) AND track_id != canonical_id.

Queries become direct SELECTs rather than in-memory graph traversal.

Migration

Since the user's collection is already marked, a SQLite migration must be provided:

Read all existing pairs (with excluded and decided flags).
Reconstruct groups via transitive closure (same logic as current group_undecided).
For each group, determine canonical from the decisions table.
Insert into groups + group_members.
Drop old tables.

Tests

Integration test that seeds a realistic mix of decided/undecided/excluded pairs into the old schema, runs the migration, and then asserts:
- decided_duplicate_ids() returns the same IDs as before.
- undecided_pairs() (or equivalent group query) returns the same groups as before.
- decided_groups_with_rating_mismatch() returns the same result.
Extend the existing Navidrome integration tests: run a full sync → find → evaluate cycle, snapshot the store state, run the migration, assert the snapshot is identical.

## Problem The current store models duplicate relationships as a flat list of track-to-track **pairs** (a graph of edges). Groups are only ever reconstructed at query time by computing the transitive closure of those edges. This produces several complexities: - `group_undecided` does graph traversal in application code to reassemble groups from pairs. - The \"reopen\" invariant is maintained by a consistency SQL trigger that fires when a new pair connects to an already-decided group — subtle and hard to reason about. - Queries like `decided_groups_with_rating_mismatch` join across multiple tables and re-derive group membership. - The `excluded` flag on pairs (from `split_track`) is a workaround for pairs that are edges in the graph but should not be treated as same-group. ## Proposed change Replace the pairs/decisions tables with an explicit **groups** model: ```sql CREATE TABLE groups ( id INTEGER PRIMARY KEY, canonical_id TEXT -- NULL = undecided ); CREATE TABLE group_members ( group_id INTEGER NOT NULL REFERENCES groups(id) ON DELETE CASCADE, track_id TEXT NOT NULL, match_method TEXT, -- how this track was linked (fuzzy/mbid/manual) PRIMARY KEY (group_id, track_id) ); ``` Key behaviours that change: - **`find`**: instead of upserting pairs, upsert group members directly. Two tracks that match go into the same group (merge if they were in separate groups). - **`evaluate`/`review`**: set `canonical_id` on the group. - **`sync`**: upserting a track that already belongs to a group leaves the group intact; removing a track that is the canonical reopens the group (sets `canonical_id = NULL`). - **`split_track`**: remove a member from a group (and optionally form a new singleton group). - **`export`**: read `group_members WHERE group_id IN (SELECT id FROM groups WHERE canonical_id IS NOT NULL) AND track_id != canonical_id`. Queries become direct SELECTs rather than in-memory graph traversal. ## Migration Since the user's collection is already marked, a **SQLite migration** must be provided: 1. Read all existing pairs (with `excluded` and `decided` flags). 2. Reconstruct groups via transitive closure (same logic as current `group_undecided`). 3. For each group, determine canonical from the decisions table. 4. Insert into `groups` + `group_members`. 5. Drop old tables. ## Tests - Integration test that seeds a realistic mix of decided/undecided/excluded pairs into the **old schema**, runs the migration, and then asserts: - `decided_duplicate_ids()` returns the same IDs as before. - `undecided_pairs()` (or equivalent group query) returns the same groups as before. - `decided_groups_with_rating_mismatch()` returns the same result. - Extend the existing Navidrome integration tests: run a full `sync → find → evaluate` cycle, snapshot the store state, run the migration, assert the snapshot is identical.

victor closed this issue

2026-05-20 18:37:48 +00:00

victor referenced this issue from a commit

2026-05-21 06:02:06 +00:00

Replace pair mappings with explicit groups (issue #11)

victor referenced this issue from a commit

2026-05-21 06:02:06 +00:00

Merge issue #11: Replace pair mappings with explicit groups

victor referenced this issue from a commit

2026-05-21 06:02:06 +00:00

Update integration tests to use undecided_groups API