Files

10 KiB
Raw Permalink Blame History

PROJECT.md

This file is project-specific. Only include information directly related to the concrete project — goals, current state, architecture decisions, known issues, and tasks.

Origin

Curator is a fork of the former Tagger project. The tagging, filtering and hardlink-tree parts are inherited and keep working as before. On top of that, Curator becomes a full movie library manager (Filmotéka).

Core idea

Curator manages a personal movie library based on two folders:

  • Pool — the managed repository of video files. This is the single source of truth. Curator manages the pool itself (insert/remove file), so files are never moved by hand. The pool has exactly two top-level folders: Filmy (movies — tag-based tree) and Seriály (series — a "copy-as-is" folder mirrored 1:1 into the output; see Design decisions). Every file lives here exactly once.
  • Filmotéka (output) — a generated, browsable directory tree made only of hardlinks into the pool (the same mechanism as today's hardlink manager). It is fully disposable: deleting the Filmotéka folder loses nothing, because it can always be regenerated from the pool.

Workflow

  1. The user configures two folders: the pool and the Filmotéka output.
  2. The user picks a video file via "Open file".
  3. Curator opens a dialog to fill in basic info — at minimum the title/name and a ČSFD link.
  4. Curator renames the file and moves it into the managed pool, and writes a metadata file describing it.
  5. From the pool, Curator generates the Filmotéka — a complex tree of hardlinks built from each file's tags/metadata (like the current hardlink manager, but driven by the pool).
  6. Deleting the Filmotéka has no effect on the pool; the tree is regenerated on demand.

Current state

  • Inherited from Tagger: Tag, TagManager, File (sidecar metadata), FileManager (folder scan, filtering, ignore patterns), 3-level config, HardlinkManager (create/sync/cleanup), pytest suite.
  • Rename Tagger → Curator done across code, spec, config filenames (.Curator.!gtag / .Curator.!ftag) and tests.
  • PySide6 GUI (src/ui/qt_app.py) reframed around the Filmotéka workflow is the entry point; the old tkinter src/ui/gui.py is retained for reference.
  • Pool + Filmotéka wired up: global config holds pool_dir / filmoteka_dir; FileManager creates Filmy/Seriály, imports movies (copy → Title.ext), loads the pool, and the GUI generates the Filmotéka tree via HardlinkManager.
  • File carries title + csfd_link. Pool metadata lives in a unified index (<pool>/.Curator.!index, see pool_index.py); File writes there when an index is injected, and still falls back to per-file .!tag sidecars for arbitrary (non-pool) folders.

GUI decision

The GUI was reframed around the Filmotéka (not kept as a generic tagger) and rewritten in PySide6: Pool/Filmotéka setup, Import movie, tag-filter sidebar, movie table, and one-click Filmotéka generation.

Design decisions

  • Metadata storage: one unified metadata file for the whole pool (a central index), not per-file sidecars. Justified because Curator owns the pool and files are never moved manually, so it is not exposed to path drift.
  • Import dialog: multi-file — pick several videos at once and give each its own Title + ČSFD link (one row per file, more can be added from the dialog), or auto-filled with "Najít ČSFD odkazy" (cleans each filename into a query and fills the first ČSFD search hit; existing links are kept). A single copy/move toggle decides whether the sources are copied (default) or moved into the pool. Each file is renamed to Title.ext. When a ČSFD link is given, Curator fetches the movie and assigns Žánr / Rok / Země původu / Hodnocení (ten-point band) tags automatically; further tags can be added via the UI. Directors and the first 10 actors are fetched and cached too, but deliberately not turned into tags/folders (there would be too many).
  • Genres / countries: a movie can have multiple genres and, for a co-production, multiple countries of origin (ČSFD writes them slash-separated, e.g. "USA / Velká Británie"). Each becomes its own tag, so the film appears under every matching genre and country branch in the Filmotéka (multiple hardlinks).
  • Pool layout: two top-level folders — Filmy and Seriály. Movies are the first target; the Seriály branch follows the "copy-as-is" rule below.
  • Copy-as-is folders (Seriály): a subfolder inside the pool can be marked as copy / as-is. For such a folder Curator does not build a tag-based tree; instead it mirrors the exact directory hierarchy from the pool into the Filmotéka output, with the files materialized as hardlinks into the pool. So pool/Seriály/... is cloned 1:1 into output/Seriály/... (same structure, hardlinked files). This is how Seriály work.
  • File naming: imported movies are renamed to Title.ext (no year in the filename; year lives in metadata/tags).
  • Import copy vs move: by default the original file is copied into the pool (non-destructive); the import dialog also offers a move option that relocates the source into the pool instead.
  • Filmotéka tree layout: driven by a category → root-folder map (FILMOTEKA_CATEGORY_ROOTS). At the output root sit the genre folders directly (output/Akční/film, …), next to the copy-as-is mirrors (Seriály), plus two grouping folders: Dle roku (output/Dle roku/<rok>/film) and Dle země původu (output/Dle země původu/<země>/film), plus Dle hodnocení. Each is a hardlink. HardlinkManager supports an empty root (tag folders placed directly at the output root) and restricts obsolete cleanup to the tag-tree's own top-level folders so mirrors are never touched.
  • Tag schema (config-driven, not hard-coded): the categories, their ČSFD source field + transform, and their Filmotéka folder mapping all live in tag_schema in the global config (default config.DEFAULT_TAG_SCHEMA, edited via Nastavení → Tag schéma…). Both apply_csfd_tags (which fields → tags) and the Filmotéka layout (FileManager.filmoteka_category_roots) read from it, so adding a category or changing a folder rule needs no code change. A category can be made filter-only (no folders) by setting its filmoteka_root to null. The transform (e.g. decade_band) shapes only the folder name — tags keep the exact value (rating → tag Hodnocení/90, folder Dle hodnocení/90100 %); it is applied at Filmotéka generation via filmoteka_category_transforms.
  • Per-category filename template (filename_template in a schema entry): the hardlink name inside that category's folders only is rendered from the movie's metadata (File.name_context: title/year/rating/ext/stem/filename plus any free-form attributes), e.g. a Kolekce with "{collection_sort} - {title}{ext}". Other folders and the pool file keep the plain name; applied via filmoteka_category_filename_templates.
  • Free-form per-movie attributes (File.attributes, set in the GUI): arbitrary key → value metadata stored in the index and merged into name_context, so custom fields like collection_sort can drive filename templates.
  • Tag provenance (ČSFD vs user): each file records which tags came from ČSFD (csfd_tags). Re-fetching regenerates only those; user-added tags are kept, so changing a movie's ČSFD link refreshes ČSFD tags without losing manual ones.

Tasks

(no open tasks — see Done)

Done

  • Pool-root and Filmotéka-output folder settings in the global config
  • Filmy / Seriály top-level folder handling in the pool
  • "Import movie" dialog (Title + ČSFD link), copy into pool/Filmy as Title.ext
  • Rename a pooled movie from the app (FileManager.rename_movie): renames the file in pool/Filmy and moves its metadata to the new index key
  • Remove-from-pool (delete file + its metadata)
  • Generate the Filmotéka hardlink tree from the pool (Rok / Žánr / Země původu / Hodnocení)
  • Filmotéka fully regenerable from the pool alone (delete output = no loss)
  • GUI reframed around the Filmotéka and rewritten in PySide6
  • Seriály "copy-as-is" mirror: pool/Seriály cloned 1:1 into the output as hardlinks (HardlinkManager.mirror_as_is), wired into Filmotéka generation
  • Fixed media_utils missing subprocess import
  • Unified pool metadata index (pool_index.py): one .Curator.!index per pool; File reads/writes it when injected, FileManager uses it for the pool
  • Configurable copy-as-is folders (copyasis_folders in global config, editable from the GUI); each is mirrored 1:1 during Filmotéka generation (Seriály default)
  • README.md written (overview, concepts, workflow, run/build instructions)
  • ČSFD scraping (csfd.py, ported from Tagger devel): File.apply_csfd_tags fetches a movie and assigns Žánr / Rok / Země původu tags (cached in metadata); wired into the GUI (auto-fetch on import with a ČSFD link, plus "Načíst tagy z ČSFD"). Parsing updated for current ČSFD HTML and verified live against Matrix (film/9499); HTTPS uses the OS cert store via truststore (corporate SSL)
  • ČSFD Anubis anti-bot wall handled: csfd.py detects the proof-of-work challenge page, solves it (SHA-256 PoW matching the bundled worker JS) and replays via a shared requests.Session, so Žánr / Rok / Země původu tags load again (the "nalezeno 1 film, načteno 0 tagů" symptom). Verified live (Matrix 1999)
  • Removed the inherited Tagger predefined tags: DEFAULT_TAGS is now empty (no Hodnocení / Barva categories) and new files no longer get an automatic Stav/Nové tag. Tags now come from ČSFD (Žánr / Rok / Země původu) and manual edits. Note: Hodnocení is still listed in FILMOTEKA_CATEGORIES, so that branch is simply empty until something assigns a Hodnocení tag again
  • Fixed template cruft: src/constants.py made consistent (Curator values, get_version/get_debug_mode API) and test_constants.py aligned; removed the imported tagger/ devel dump