Files
Curator/PROJECT.md
T

147 lines
8.5 KiB
Markdown

# PROJECT.md
This file is project-specific. Only include information directly related to the concrete project — goals, current state, architecture decisions, known issues, and tasks.
## Origin
Curator is a fork of the former **Tagger** project. The tagging, filtering and
hardlink-tree parts are inherited and keep working as before. On top of that,
Curator becomes a full **movie library manager (Filmotéka)**.
## Core idea
Curator manages a personal movie library based on two folders:
- **Pool** — the managed repository of video files. This is the **single source
of truth**. Curator manages the pool itself (insert/remove file), so files are
never moved by hand. The pool has exactly two top-level folders: **Filmy**
(movies — tag-based tree) and **Seriály** (series — a "copy-as-is" folder
mirrored 1:1 into the output; see Design decisions). Every file lives here
exactly once.
- **Filmotéka (output)** — a generated, browsable directory tree made only of
**hardlinks** into the pool (the same mechanism as today's hardlink manager).
It is fully disposable: deleting the Filmotéka folder loses nothing, because
it can always be regenerated from the pool.
### Workflow
1. The user configures two folders: the **pool** and the **Filmotéka output**.
2. The user picks a video file via "Open file".
3. Curator opens a dialog to fill in basic info — at minimum the **title/name**
and a **ČSFD link**.
4. Curator **renames** the file and **moves** it into the managed pool, and
writes a **metadata file** describing it.
5. From the pool, Curator **generates the Filmotéka** — a complex tree of
hardlinks built from each file's tags/metadata (like the current hardlink
manager, but driven by the pool).
6. Deleting the Filmotéka has no effect on the pool; the tree is regenerated on
demand.
## Current state
- Inherited from Tagger: `Tag`, `TagManager`, `File` (sidecar metadata),
`FileManager` (folder scan, filtering, ignore patterns), 3-level config,
`HardlinkManager` (create/sync/cleanup), pytest suite.
- Rename Tagger → Curator done across code, spec, config filenames
(`.Curator.!gtag` / `.Curator.!ftag`) and tests.
- **PySide6 GUI** (`src/ui/qt_app.py`) reframed around the Filmotéka workflow is
the entry point; the old tkinter `src/ui/gui.py` is retained for reference.
- **Pool + Filmotéka wired up:** global config holds `pool_dir` / `filmoteka_dir`;
`FileManager` creates `Filmy`/`Seriály`, imports movies (copy → `Title.ext`),
loads the pool, and the GUI generates the Filmotéka tree via `HardlinkManager`.
- `File` carries `title` + `csfd_link`. **Pool metadata lives in a unified index**
(`<pool>/.Curator.!index`, see `pool_index.py`); `File` writes there when an
index is injected, and still falls back to per-file `.!tag` sidecars for
arbitrary (non-pool) folders.
### GUI decision
The GUI was **reframed around the Filmotéka** (not kept as a generic tagger) and
**rewritten in PySide6**: Pool/Filmotéka setup, Import movie, tag-filter sidebar,
movie table, and one-click Filmotéka generation.
## Design decisions
- **Metadata storage:** one **unified metadata file** for the whole pool (a
central index), not per-file sidecars. Justified because Curator owns the pool
and files are never moved manually, so it is not exposed to path drift.
- **Import dialog:** **multi-file** — pick several videos at once and give each
its own **Title** + **ČSFD link** (one row per file, more can be added from the
dialog), or auto-filled with **"Najít ČSFD odkazy"** (cleans each filename into
a query and fills the first ČSFD search hit; existing links are kept). A single
**copy/move** toggle decides whether the sources are copied (default) or moved
into the pool. Each file is renamed to `Title.ext`. When a
ČSFD link is given, Curator fetches the movie and assigns Žánr / Rok / Země
původu / Hodnocení (ten-point band) tags automatically; further tags can be
added via the UI. Directors and the first 10 actors are fetched and cached too,
but **deliberately not turned into tags/folders** (there would be too many).
- **Genres / countries:** a movie can have **multiple genres** and, for a
co-production, **multiple countries of origin** (ČSFD writes them
slash-separated, e.g. "USA / Velká Británie"). Each becomes its own tag, so the
film appears under every matching genre and country branch in the Filmotéka
(multiple hardlinks).
- **Pool layout:** two top-level folders — **Filmy** and **Seriály**. Movies are
the first target; the Seriály branch follows the "copy-as-is" rule below.
- **Copy-as-is folders (Seriály):** a subfolder inside the pool can be marked as
**copy / as-is**. For such a folder Curator does **not** build a tag-based tree;
instead it **mirrors the exact directory hierarchy** from the pool into the
Filmotéka output, with the files materialized as **hardlinks** into the pool.
So `pool/Seriály/...` is cloned 1:1 into `output/Seriály/...` (same structure,
hardlinked files). This is how Seriály work.
- **File naming:** imported movies are renamed to **`Title.ext`** (no year in the
filename; year lives in metadata/tags).
- **Import copy vs move:** by default the original file is **copied** into the
pool (non-destructive); the import dialog also offers a **move** option that
relocates the source into the pool instead.
- **Filmotéka tree layout:** driven by a category → root-folder map
(`FILMOTEKA_CATEGORY_ROOTS`). At the output root sit the **genre folders
directly** (`output/Akční/film`, …), next to the copy-as-is mirrors
(**Seriály**), plus two grouping folders: **`Dle roku`** (`output/Dle
roku/<rok>/film`) and **`Dle země původu`** (`output/Dle země
původu/<země>/film`), plus `Dle hodnocení`. Each is a hardlink.
`HardlinkManager` supports an empty root (tag folders placed directly at the
output root) and restricts obsolete cleanup to the tag-tree's own top-level
folders so mirrors are never touched.
## Tasks
# (no open tasks — see Done)
## Done
- Pool-root and Filmotéka-output folder settings in the global config
- Filmy / Seriály top-level folder handling in the pool
- "Import movie" dialog (Title + ČSFD link), copy into pool/Filmy as Title.ext
- Rename a pooled movie from the app (`FileManager.rename_movie`): renames the
file in pool/Filmy and moves its metadata to the new index key
- Remove-from-pool (delete file + its metadata)
- Generate the Filmotéka hardlink tree from the pool (Rok / Žánr / Země původu /
Hodnocení)
- Filmotéka fully regenerable from the pool alone (delete output = no loss)
- GUI reframed around the Filmotéka and rewritten in PySide6
- Seriály "copy-as-is" mirror: pool/Seriály cloned 1:1 into the output as
hardlinks (`HardlinkManager.mirror_as_is`), wired into Filmotéka generation
- Fixed `media_utils` missing `subprocess` import
- Unified pool metadata index (`pool_index.py`): one `.Curator.!index` per pool;
`File` reads/writes it when injected, `FileManager` uses it for the pool
- Configurable copy-as-is folders (`copyasis_folders` in global config, editable
from the GUI); each is mirrored 1:1 during Filmotéka generation (Seriály default)
- README.md written (overview, concepts, workflow, run/build instructions)
- ČSFD scraping (`csfd.py`, ported from Tagger devel): `File.apply_csfd_tags`
fetches a movie and assigns Žánr / Rok / Země původu tags (cached in metadata); wired
into the GUI (auto-fetch on import with a ČSFD link, plus "Načíst tagy z ČSFD").
Parsing updated for current ČSFD HTML and verified live against Matrix
(film/9499); HTTPS uses the OS cert store via `truststore` (corporate SSL)
- ČSFD Anubis anti-bot wall handled: `csfd.py` detects the proof-of-work
challenge page, solves it (SHA-256 PoW matching the bundled worker JS) and
replays via a shared `requests.Session`, so Žánr / Rok / Země původu tags load again
(the "nalezeno 1 film, načteno 0 tagů" symptom). Verified live (Matrix 1999)
- Removed the inherited Tagger predefined tags: `DEFAULT_TAGS` is now empty
(no Hodnocení ⭐ / Barva categories) and new files no longer get an automatic
`Stav/Nové` tag. Tags now come from ČSFD (Žánr / Rok / Země původu) and manual edits.
Note: `Hodnocení` is still listed in `FILMOTEKA_CATEGORIES`, so that branch is
simply empty until something assigns a Hodnocení tag again
- Fixed template cruft: `src/constants.py` made consistent (Curator values,
`get_version`/`get_debug_mode` API) and `test_constants.py` aligned; removed
the imported `tagger/` devel dump