Collection Audit SOP: Agent SOP for collection audit # Collection Audit > Agent SOP for detecting and fixing naming, tagging, and convention violations. Scan audio files for convention violations and fix them with user approval. ### Agent prompt [Section titled “Agent prompt”](#agent-prompt) Paste into your agent to start: ```plaintext Audit my collection for naming and tagging issues. ``` ## Constraints * **Never rename imported files.** Before any rename, check `search_tracks(path="...")` — if the file is in Rekordbox, defer the rename. Rekordbox cannot update paths via XML import (creates duplicates). * **Never manually resolve with `resolution="fixed"`** — that value is reserved for the scan engine’s auto-resolution. * Resolution options: `accepted_as_is`, `deferred`, `wont_fix`. * **Read-only by default.** No files modified until user approves a fix plan. * **Never delete audio files.** Renaming and tag editing only. * **WAV dual-tag rule.** `write_file_tags` handles this automatically. ## Prerequisites * reklawdbox MCP server connected * No external tools required — all tag operations use `audit_state` and `write_file_tags` ## Steps ### 0. Determine scope **Default: full collection.** Use the `content_roots` from `read_library` as the scan scope. If the user specified a narrower scope in their prompt (an artist, directory, or subset), use that instead. **Always skip `GENRE_SET`** — genre issues are handled by the Genre Classification and Genre Audit workflows. ### 0b. (Optional) Structural health scans Before the tag audit, consider running these scans for structural library issues: * `scan_broken_links()` — tracks in Rekordbox whose audio files are missing from disk * `scan_orphan_files()` — audio files on disk not imported into Rekordbox * `scan_duplicates()` — duplicate tracks by metadata or file hash * `scan_playlist_coverage()` — tracks not assigned to any playlist These are independent of the tag audit and can be run at any time. Address any findings before or after the tag audit. ### 1. Scan ```plaintext audit_state(scan, scope="/path/", skip_issue_types=["GENRE_SET"]) ``` Review summary: `files_in_scope`, `new_issues` by type, `total_open`. If `total_open` = 0 → skip to Final Report. ### 2. Review and triage Query issues by type: ```plaintext audit_state(query_issues, scope="/path/", status="open", issue_type="WAV_TAG3_MISSING") ``` Categorize by safety tier: | Category | Safety tier | Issue types | Action | | -------------- | ----------- | ------------------------------------------------------ | -------------------------- | | Auto-fixable | Safe | `WAV_TAG3_MISSING`, `WAV_TAG_DRIFT`, `ARTIST_IN_TITLE` | → Step 3 | | Rename-fixable | Rename-safe | `ORIGINAL_MIX_SUFFIX`, `TECH_SPECS_IN_DIR` | → Step 3 (if not imported) | | Needs review | Review | All others | → Step 4 | Present triage summary to user. Ask user: “Start with safe auto-fixes?“ ### 3. Fix safe issues Present planned writes for approval, then apply: ```plaintext write_file_tags(writes=[{path: "...", tags: {artist: "...", title: "..."}}]) ``` Re-scan to verify fixes (scan auto-marks resolved issues as `fixed`): ```plaintext audit_state(scan, scope="/path/") ``` Never manually resolve with `resolution="fixed"` — that’s reserved for the scan engine. **WAV\_TAG\_DRIFT and Original suffixes:** When the drift is caused by one tag layer having an `(Original)`, `(Original Mix)`, or `(Original Version)` suffix that the other lacks, strip the suffix from both layers rather than syncing it. These are store-generated filler — the shorter title is correct. More specific Original variants like `(Original Club Mix)` or `(Original Demo Version)` should be kept as-is. ### 4. Review-tier issues Present each issue type in batches. **Empty/missing tags** (`EMPTY_ARTIST`, `EMPTY_TITLE`, `MISSING_TRACK_NUM`, `MISSING_ALBUM`, `MISSING_YEAR`): * Check the `detail` field — it may be `null` for empty/missing-tag issues. Infer fix from filename and parent directory structure. * Use `lookup_discogs()` / `lookup_beatport()` / `lookup_bandcamp()` / `lookup_musicbrainz()` for missing album/year * Ask user to confirm each fix **Filename issues** (`BAD_FILENAME`, `MISSING_YEAR_IN_DIR`, `FILENAME_TAG_DRIFT`): * For `FILENAME_TAG_DRIFT`: compare `detail.filename` vs `detail.tag` values (see handling below) * For imported files needing rename: defer with note about manual Rekordbox relocate **FILENAME\_TAG\_DRIFT — handling:** On macOS, `/` is the only forbidden filename character. The scanner normalises `/` to `-` before comparison, so `AC/DC` in a tag won’t flag drift against `AC-DC` in a filename. All other characters (`: ? " * | < >`) are valid on macOS — if a download tool substituted them, the file should be renamed to match the tag. Workflow: 1. Read the `detail` JSON — contains `filename` and `tag` values for each drifted field 2. If the file is **not imported** into Rekordbox, rename to match the tag 3. If the file **is imported**, defer with a note about manual Rekordbox relocate, or accept as-is 4. If the drift is a genuinely different name (not just character substitution), look up via `lookup_discogs` / `lookup_beatport` / `lookup_bandcamp` / `lookup_musicbrainz` 5. If ambiguous after lookup, flag for user review — do not guess **Genre tags** (`GENRE_SET`, if included): * Per file: keep / clear / migrate to comments * When migrating: read existing comment first via `read_file_tags`, prepend genre, never overwrite * `write_file_tags(writes=[{path: "...", tags: {comment: "Genre: Deep House | ", genre: null}}])` **No-tag files** (`NO_TAGS`): * Infer metadata from parent directory name, companion files, filename * Present inferred values to user for confirmation before writing For items user wants to skip: ```plaintext audit_state(resolve_issues, issue_ids=[...], resolution="accepted_as_is", note="...") audit_state(resolve_issues, issue_ids=[...], resolution="deferred", note="...") audit_state(resolve_issues, issue_ids=[...], resolution="wont_fix", note="...") ``` ### 5. Verification scan ```plaintext audit_state(scan, scope="/path/") ``` If `total_open` > 0, return to Step 2 or confirm remaining items are intentionally deferred. ### 6. Final report ```plaintext audit_state(get_summary, scope="/path/") ``` Present: scope, files scanned, pass rate, fixes by type, deferred items, next steps (Rekordbox import, genre classification SOP). *** ## Issue Type Reference | Issue type | Safety tier | Detection | Fix method | | --------------------- | ----------- | ------------------------------------------------------------------------- | -------------------------------------------------------------- | | `EMPTY_ARTIST` | Review | `artist` field empty/null | Parse from filename | | `EMPTY_TITLE` | Review | `title` field empty/null | Parse from filename | | `MISSING_TRACK_NUM` | Review | `track` field empty/null | Parse from filename | | `MISSING_ALBUM` | Review | `album` field empty/null | Directory name or enrichment lookup | | `MISSING_YEAR` | Review | `year` field empty/null | Enrichment lookup | | `ARTIST_IN_TITLE` | Safe | `title` starts with `{artist} -` | Strip prefix | | `WAV_TAG3_MISSING` | Safe | WAV file with `tag3_missing` non-empty | Copy from tag 2 | | `WAV_TAG_DRIFT` | Safe | WAV `id3v2` and `riff_info` values differ | Sync to tag 2 (but see Original-suffix note in Step 3) | | `GENRE_SET` | Review | `genre` field non-empty | User decides keep/clear/migrate | | `NO_TAGS` | Review | All 14 tag fields empty/null | Infer from filename/dir | | `BAD_FILENAME` | Review | Filename doesn’t match canonical | User review | | `ORIGINAL_MIX_SUFFIX` | Rename-safe | Filename contains `(Original)`, `(Original Mix)`, or `(Original Version)` | Strip suffix from filename and tags (defer rename if imported) | | `TECH_SPECS_IN_DIR` | Rename-safe | Directory contains `[FLAC]`, `[WAV]`, etc. | Strip from dir name | | `MISSING_YEAR_IN_DIR` | Review | Album directory missing `(YYYY)` suffix | Enrichment lookup | | `FILENAME_TAG_DRIFT` | Review | Filename artist/title disagrees with tags | Rename to match tag (defer if imported) |