Files

169 lines
10 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# PROJECT.md
This file is project-specific. Only include information directly related to the concrete project — goals, current state, architecture decisions, known issues, and tasks.
## Origin
Curator is a fork of the former **Tagger** project. The tagging, filtering and
hardlink-tree parts are inherited and keep working as before. On top of that,
Curator becomes a full **movie library manager (Filmotéka)**.
## Core idea
Curator manages a personal movie library based on two folders:
- **Pool** — the managed repository of video files. This is the **single source
of truth**. Curator manages the pool itself (insert/remove file), so files are
never moved by hand. The pool has exactly two top-level folders: **Filmy**
(movies — tag-based tree) and **Seriály** (series — a "copy-as-is" folder
mirrored 1:1 into the output; see Design decisions). Every file lives here
exactly once.
- **Filmotéka (output)** — a generated, browsable directory tree made only of
**hardlinks** into the pool (the same mechanism as today's hardlink manager).
It is fully disposable: deleting the Filmotéka folder loses nothing, because
it can always be regenerated from the pool.
### Workflow
1. The user configures two folders: the **pool** and the **Filmotéka output**.
2. The user picks a video file via "Open file".
3. Curator opens a dialog to fill in basic info — at minimum the **title/name**
and a **ČSFD link**.
4. Curator **renames** the file and **moves** it into the managed pool, and
writes a **metadata file** describing it.
5. From the pool, Curator **generates the Filmotéka** — a complex tree of
hardlinks built from each file's tags/metadata (like the current hardlink
manager, but driven by the pool).
6. Deleting the Filmotéka has no effect on the pool; the tree is regenerated on
demand.
## Current state
- Inherited from Tagger: `Tag`, `TagManager`, `File` (sidecar metadata),
`FileManager` (folder scan, filtering, ignore patterns), 3-level config,
`HardlinkManager` (create/sync/cleanup), pytest suite.
- Rename Tagger → Curator done across code, spec, config filenames
(`.Curator.!gtag` / `.Curator.!ftag`) and tests.
- **PySide6 GUI** (`src/ui/qt_app.py`) reframed around the Filmotéka workflow is
the entry point; the old tkinter `src/ui/gui.py` is retained for reference.
- **Pool + Filmotéka wired up:** global config holds `pool_dir` / `filmoteka_dir`;
`FileManager` creates `Filmy`/`Seriály`, imports movies (copy → `Title.ext`),
loads the pool, and the GUI generates the Filmotéka tree via `HardlinkManager`.
- `File` carries `title` + `csfd_link`. **Pool metadata lives in a unified index**
(`<pool>/.Curator.!index`, see `pool_index.py`); `File` writes there when an
index is injected, and still falls back to per-file `.!tag` sidecars for
arbitrary (non-pool) folders.
### GUI decision
The GUI was **reframed around the Filmotéka** (not kept as a generic tagger) and
**rewritten in PySide6**: Pool/Filmotéka setup, Import movie, tag-filter sidebar,
movie table, and one-click Filmotéka generation.
## Design decisions
- **Metadata storage:** one **unified metadata file** for the whole pool (a
central index), not per-file sidecars. Justified because Curator owns the pool
and files are never moved manually, so it is not exposed to path drift.
- **Import dialog:** **multi-file** — pick several videos at once and give each
its own **Title** + **ČSFD link** (one row per file, more can be added from the
dialog), or auto-filled with **"Najít ČSFD odkazy"** (cleans each filename into
a query and fills the first ČSFD search hit; existing links are kept). A single
**copy/move** toggle decides whether the sources are copied (default) or moved
into the pool. Each file is renamed to `Title.ext`. When a
ČSFD link is given, Curator fetches the movie and assigns Žánr / Rok / Země
původu / Hodnocení (ten-point band) tags automatically; further tags can be
added via the UI. Directors and the first 10 actors are fetched and cached too,
but **deliberately not turned into tags/folders** (there would be too many).
- **Genres / countries:** a movie can have **multiple genres** and, for a
co-production, **multiple countries of origin** (ČSFD writes them
slash-separated, e.g. "USA / Velká Británie"). Each becomes its own tag, so the
film appears under every matching genre and country branch in the Filmotéka
(multiple hardlinks).
- **Pool layout:** two top-level folders — **Filmy** and **Seriály**. Movies are
the first target; the Seriály branch follows the "copy-as-is" rule below.
- **Copy-as-is folders (Seriály):** a subfolder inside the pool can be marked as
**copy / as-is**. For such a folder Curator does **not** build a tag-based tree;
instead it **mirrors the exact directory hierarchy** from the pool into the
Filmotéka output, with the files materialized as **hardlinks** into the pool.
So `pool/Seriály/...` is cloned 1:1 into `output/Seriály/...` (same structure,
hardlinked files). This is how Seriály work.
- **File naming:** imported movies are renamed to **`Title.ext`** (no year in the
filename; year lives in metadata/tags).
- **Import copy vs move:** by default the original file is **copied** into the
pool (non-destructive); the import dialog also offers a **move** option that
relocates the source into the pool instead.
- **Filmotéka tree layout:** driven by a category → root-folder map
(`FILMOTEKA_CATEGORY_ROOTS`). At the output root sit the **genre folders
directly** (`output/Akční/film`, …), next to the copy-as-is mirrors
(**Seriály**), plus two grouping folders: **`Dle roku`** (`output/Dle
roku/<rok>/film`) and **`Dle země původu`** (`output/Dle země
původu/<země>/film`), plus `Dle hodnocení`. Each is a hardlink.
`HardlinkManager` supports an empty root (tag folders placed directly at the
output root) and restricts obsolete cleanup to the tag-tree's own top-level
folders so mirrors are never touched.
- **Tag schema (config-driven, not hard-coded):** the categories, their ČSFD
source field + transform, and their Filmotéka folder mapping all live in
`tag_schema` in the global config (default `config.DEFAULT_TAG_SCHEMA`, edited
via *Nastavení → Tag schéma…*). Both `apply_csfd_tags` (which fields → tags)
and the Filmotéka layout (`FileManager.filmoteka_category_roots`) read from it,
so adding a category or changing a folder rule needs no code change. A category
can be made filter-only (no folders) by setting its `filmoteka_root` to null.
The `transform` (e.g. `decade_band`) shapes only the **folder name** — tags keep
the **exact value** (rating → tag `Hodnocení/90`, folder `Dle hodnocení/90100 %`);
it is applied at Filmotéka generation via `filmoteka_category_transforms`.
- **Per-category filename template** (`filename_template` in a schema entry): the
hardlink name **inside that category's folders only** is rendered from the
movie's metadata (`File.name_context`: title/year/rating/ext/stem/filename plus
any free-form attributes), e.g. a Kolekce with `"{collection_sort} - {title}{ext}"`.
Other folders and the pool file keep the plain name; applied via
`filmoteka_category_filename_templates`.
- **Free-form per-movie attributes** (`File.attributes`, set in the GUI): arbitrary
`key → value` metadata stored in the index and merged into `name_context`, so
custom fields like `collection_sort` can drive filename templates.
- **Tag provenance (ČSFD vs user):** each file records which tags came from ČSFD
(`csfd_tags`). Re-fetching regenerates only those; user-added tags are kept, so
changing a movie's ČSFD link refreshes ČSFD tags without losing manual ones.
## Tasks
# (no open tasks — see Done)
## Done
- Pool-root and Filmotéka-output folder settings in the global config
- Filmy / Seriály top-level folder handling in the pool
- "Import movie" dialog (Title + ČSFD link), copy into pool/Filmy as Title.ext
- Rename a pooled movie from the app (`FileManager.rename_movie`): renames the
file in pool/Filmy and moves its metadata to the new index key
- Remove-from-pool (delete file + its metadata)
- Generate the Filmotéka hardlink tree from the pool (Rok / Žánr / Země původu /
Hodnocení)
- Filmotéka fully regenerable from the pool alone (delete output = no loss)
- GUI reframed around the Filmotéka and rewritten in PySide6
- Seriály "copy-as-is" mirror: pool/Seriály cloned 1:1 into the output as
hardlinks (`HardlinkManager.mirror_as_is`), wired into Filmotéka generation
- Fixed `media_utils` missing `subprocess` import
- Unified pool metadata index (`pool_index.py`): one `.Curator.!index` per pool;
`File` reads/writes it when injected, `FileManager` uses it for the pool
- Configurable copy-as-is folders (`copyasis_folders` in global config, editable
from the GUI); each is mirrored 1:1 during Filmotéka generation (Seriály default)
- README.md written (overview, concepts, workflow, run/build instructions)
- ČSFD scraping (`csfd.py`, ported from Tagger devel): `File.apply_csfd_tags`
fetches a movie and assigns Žánr / Rok / Země původu tags (cached in metadata); wired
into the GUI (auto-fetch on import with a ČSFD link, plus "Načíst tagy z ČSFD").
Parsing updated for current ČSFD HTML and verified live against Matrix
(film/9499); HTTPS uses the OS cert store via `truststore` (corporate SSL)
- ČSFD Anubis anti-bot wall handled: `csfd.py` detects the proof-of-work
challenge page, solves it (SHA-256 PoW matching the bundled worker JS) and
replays via a shared `requests.Session`, so Žánr / Rok / Země původu tags load again
(the "nalezeno 1 film, načteno 0 tagů" symptom). Verified live (Matrix 1999)
- Removed the inherited Tagger predefined tags: `DEFAULT_TAGS` is now empty
(no Hodnocení ⭐ / Barva categories) and new files no longer get an automatic
`Stav/Nové` tag. Tags now come from ČSFD (Žánr / Rok / Země původu) and manual edits.
Note: `Hodnocení` is still listed in `FILMOTEKA_CATEGORIES`, so that branch is
simply empty until something assigns a Hodnocení tag again
- Fixed template cruft: `src/constants.py` made consistent (Curator values,
`get_version`/`get_debug_mode` API) and `test_constants.py` aligned; removed
the imported `tagger/` devel dump