doesitarm/docs/research/pagefind-feature-parity-2026-03-15.md
ThatGuySam e5f28b16ee docs(research): add pagefind feature parity memo
Capture user-visible parity requirements for a future Pagefind migration.

This keeps the earlier viability memo focused on engine fit and documents the recommended adapter approach, carry-over patterns, and remaining prototype risks around ranking and title highlighting.
2026-03-15 13:03:57 -05:00

306 lines
14 KiB
Markdown

# Pagefind Feature Parity For doesitarm
Date: 2026-03-15
## Scope
Read alongside `docs/research/pagefind-viability-2026-03-15.md`.
Investigate how a Pagefind migration could preserve the current Stork-backed
search UX in `doesitarm`, focusing on user-visible behavior rather than on
whether Pagefind is viable in the abstract.
## Short Answer
Yes, most of the current search experience can be carried over without the user
feeling a major regression, but only if Pagefind is treated as the search core
under a custom Vue adapter.
Recommended parity path:
1. Keep the current server-rendered initial lists, pagination links, summary
block, and page chrome exactly as they are.
2. Replace only the "user has started searching/filtering" path with Pagefind's
JavaScript API.
3. Build the Pagefind index from the existing sitemap/listing data, not from an
HTML crawl.
4. Use Pagefind `filters` for status/category/type scoping.
5. Use Pagefind `meta` only for simple scalar fields needed in result rendering.
6. Reattach richer card UI state such as `searchLinks` from a local
URL-or-slug keyed map instead of trying to force arrays into Pagefind
metadata.
The one place where a prototype may still change the implementation choice is
search quality. If `addCustomRecord()` does not rank app-name and alias matches
well enough, the next-best parity option is to generate virtual HTML records via
`addHTMLFile()` so Pagefind can use `h1` weighting and `data-pagefind-*`
attributes.
## Current UX Contract In The Repo
From `components/search-stork.vue`, `helpers/stork/toml.js`, and the scoped
Astro pages:
- The page initially shows the existing paginated list from the API when the
user has not typed anything yet.
- Search is search-as-you-type, with loading placeholders while results are
pending.
- The UI exposes quick status chips.
- Scoped pages such as `/kind/...` and `/games` inject base filters so the same
component behaves like "search within this slice".
- Empty results on a scoped page show a "Search Everything" escape hatch.
- Query results show highlighted snippets and a detail link.
- Non-query cards can also show timestamps and auxiliary action buttons such as
benchmark/performance links.
- The current Stork index injects synthetic searchable tokens for `status_*`,
category, and route type, in addition to title/content/description/aliases and
tags.
- Stork also post-filters query results so every typed term must be present
somewhere in the returned title/URL/excerpts.
That means parity is not just "can users search", but:
- can they search globally and within a scoped page
- can they click status chips
- can they still get good snippets and stable detail URLs
- can the initial browse mode remain unchanged
## What The Evidence Says
Confirmed from Pagefind docs and repo activity:
- The Node API supports `addCustomRecord()` with `url`, `content`, `language`,
optional flat `meta`, optional flat `filters`, and optional flat `sort`.
- The Node API also supports `addHTMLFile()` for virtual HTML pages and
`writeFiles()` / `getFiles()` for writing the bundle to `/pagefind/`.
- The browser API is intended for custom search interfaces, not just the stock
widget.
- `pagefind.init()` can be called on focus, and `pagefind.preload()` /
`pagefind.debouncedSearch()` exist specifically to reduce first-search
latency.
- `result.data()` returns `url`, `excerpt`, `meta`, and related result data.
The docs explicitly say `excerpt` is safe to use as `innerHTML`, while
`content` and `meta` are raw.
- The JS API supports filter-only browsing by calling
`pagefind.search(null, { filters: ... })`.
- The JS API can also return filter counts via `pagefind.filters()`, plus
remaining-result counts on subsequent searches.
- Filtering defaults to AND semantics, and compound `any` / `all` / `none` /
`not` logic is available.
- Sorting can be applied at search time, but records missing a sort value are
omitted when that sort is active.
- Highlighting on destination pages is supported via `highlightParam` and
`pagefind-highlight.js`.
- Historical GitHub issues `#198` and `#277` asked for direct non-HTML input;
both are now closed, and the current docs document that capability.
- The latest stable release is `v1.4.0`, published on 2025-09-01.
- Issue `#574` about the `npx` wrapper on `ubuntu-latest` is still open as of
2026-03-15, so a pinned dependency or direct binary path is safer than a
casual CLI swap.
Community signal:
- In the main HN discussion for Pagefind's launch, the maintainer explicitly
said multi-word query merging is built in.
- Another HN commenter reported that deploying Pagefind was "pleasingly easy"
and the result was "reasonably nippy".
- Zach Leatherman's `pagefind-search` component is a concrete GitHub example of
treating Pagefind as a customizable UI layer with explicit fallback content
and controlled asset loading.
## Feature Mapping
| Current user-visible feature | Carry-over path with Pagefind | Confidence | Notes |
| --- | --- | --- | --- |
| Search-as-you-type | `pagefind.debouncedSearch()` or manual debounce + `pagefind.preload()` | High | This is native to the JS API. |
| Lazy first-load behavior | `pagefind.init()` on focus, or rely on first search | High | This matches the current deferred Stork load pattern. |
| Scoped search pages | Keep current initial page data, then call `pagefind.search(term-or-null, { filters })` | High | Better fit than the current synthetic token approach. |
| Quick status chips | Map chips to `filters.status` values | High | Pagefind filters are cleaner than indexing `status_native` into content. |
| Empty-state "Search Everything" | Clear base filters and rerun, or keep current link to `/` | High | User-visible behavior is easy to preserve. |
| Highlighted excerpts | Render `result.data().excerpt` | High | Officially documented and safe as `innerHTML`. |
| Highlighted title text | No first-class JS API equivalent was found in the docs | Medium | Likely plain title unless we add client-side emphasis ourselves. |
| Detail links | Use `result.data().url` | High | Direct match. |
| Relative timestamp text | Put timestamp in `meta`, or join from local listing data | High | `meta` is string-only, so store ISO strings if using metadata. |
| Benchmark / Performance buttons | Join from local listing data keyed by URL or slug | High | Inference: better than encoding arrays as metadata strings. |
| Status / category / type scoping | Use Pagefind `filters`, not fake searchable tokens | High | Cleaner and more explicit than the current Stork trick. |
| "Every typed term must match somewhere" behavior | Likely client-side post-filter using returned raw `content` if needed | Medium | Current Stork behavior is explicit; Pagefind query semantics need a parity check in a prototype. |
| Result ordering that favors app names and aliases | Start with `addCustomRecord()` content shaping; fall back to `addHTMLFile()` if needed | Medium | Custom-record metadata appears display-oriented, not ranking-oriented. |
## Options
### 1. Custom Vue adapter over `addCustomRecord()` output
This is the lowest-risk parity path.
Why it works:
- It matches the repo's existing data-first indexing model.
- It preserves the current page shell and only swaps the query engine.
- It uses Pagefind features the way they are documented today:
`meta` for display fields, `filters` for scoping, `sort` for explicit sort
options.
Tradeoffs:
- `meta` is for returned metadata, not clearly for ranking.
- Complex card state such as `searchLinks` does not fit naturally into flat
string metadata.
- The docs do not show title-highlight ranges in the JS API, so exact title
highlighting may need custom client logic.
### 2. Custom Vue adapter over generated virtual HTML via `addHTMLFile()`
This is the "higher parity if ranking is off" option.
Why it might be worth it:
- Pagefind documents default weighting for HTML headings.
- Pagefind documents `data-pagefind-weight`, `data-pagefind-meta`,
`data-pagefind-filter`, and `data-pagefind-sort`.
- If app-name, alias, and status text need finer relevance tuning than a plain
custom record gives, virtual HTML gives more levers.
Tradeoffs:
- More adapter code.
- Harder to justify unless a real query corpus shows ranking problems.
### 3. Replace the current component with the stock Pagefind UI
Not recommended.
Why:
- It discards the current browse-first behavior and scoped-page behavior.
- It loses the current empty-state copy and action-button treatment.
- It would make parity depend on overriding Pagefind's UI instead of preserving
the repo's existing search component contract.
## What Works
- Keep "no query" mode exactly as it is today and switch to Pagefind only after
the user types or toggles a filter.
- Build one Pagefind record per listing/detail route, using the same sitemap
payloads already feeding the Stork pipeline.
- Put searchable text in `content`, starting with the fields users most expect
to match: app name, aliases, support text, description, tags, and any status
phrasing users already see.
- Put render-only scalar fields in `meta`, such as title, slug, status label,
last-updated ISO timestamp, and short display text.
- Use `filters` for `status`, `category`, `kind`, and other scoped-page
constraints.
- Build a parallel local result-decoration map keyed by URL or slug so Pagefind
results can be decorated with the same `searchLinks`, timestamps, or other
card chrome without turning the index into a transport format for the whole
listing object.
- Call `pagefind.filters()` once if you want the chip row to expose counts or
disabled states.
Inference:
For `doesitarm`, this "Pagefind for retrieval + local map for decoration"
split is probably the cleanest way to preserve the current UI without bloating
Pagefind metadata or weakening search semantics.
## What To Avoid
- Do not replace the entire search component with the stock Pagefind UI if the
goal is parity.
- Do not assume `meta` alone is enough for search quality. Metadata is clearly
documented as returned result data; searchable content still needs to live in
`content`.
- Do not try to stuff arrays or nested structures like `searchLinks` into flat
Pagefind metadata if the same information already exists in local page data.
- Do not apply Pagefind sort options to sparse fields unless every record has a
value, because missing sort keys are omitted from sorted results.
- Do not assume the `npx pagefind` wrapper is production-safe on Ubuntu CI
without pinning and testing.
## Recommendation
Recommended implementation order:
1. Keep the current Astro pages and initial list rendering exactly as-is.
2. Build a Pagefind prototype with `addCustomRecord()` from the existing sitemap
payloads.
3. Map the current scoped-page `baseFilters` to real Pagefind filters.
4. Add a thin Pagefind adapter inside the current Vue component rather than
replacing the component.
5. Use a local `listingByUrl` or `listingBySlug` map to reattach rich card UI
fields.
6. Compare a real query set against Stork, especially app-name, alias, and
multi-term searches.
7. Only if ranking quality is weaker than expected, move the prototype from
`addCustomRecord()` to generated `addHTMLFile()` records with weighted
markup.
Why this is the best default:
- It preserves the current user-visible experience in the cheapest way.
- It uses Pagefind features where Pagefind is strongest: retrieval, snippets,
filtering, counts, and static bundle delivery.
- It avoids forcing Pagefind to become the canonical source of every UI field.
## Missing Information
- I did not find a documented JS API for title highlight ranges equivalent to
the Stork title-range behavior.
- I did not find clear documentation on exact multi-term query semantics beyond
Pagefind supporting multi-word queries in practice.
- I did not find a high-signal Stack Overflow thread that added more than the
official docs for this migration.
- The Lobsters URL surfaced during search no longer resolved, so I did not use
it as evidence.
## Recommended Next Inspection Steps
1. Build a small Pagefind prototype against 100-200 representative listings.
2. Test 25-50 real queries from the current site vocabulary:
app names, aliases, status words, category words, and mixed multi-term
queries.
3. Decide whether status chips should stay effectively single-select, matching
current behavior, or become explicit OR filters within the same filter key.
4. Verify whether plain title rendering is acceptable, or whether custom
client-side title emphasis is needed.
5. Measure first-search latency on mobile before removing Stork.
## Source Links
- Pagefind Node API docs:
https://pagefind.app/docs/node-api/
- Pagefind browser API docs:
https://pagefind.app/docs/api/
- Pagefind filtering docs:
https://pagefind.app/docs/filtering/
- Pagefind JS API filtering docs:
https://pagefind.app/docs/js-api-filtering/
- Pagefind sorting docs:
https://pagefind.app/docs/sorts/
- Pagefind JS API sorting docs:
https://pagefind.app/docs/js-api-sorting/
- Pagefind metadata docs:
https://pagefind.app/docs/metadata/
- Pagefind JS API metadata docs:
https://pagefind.app/docs/js-api-metadata/
- Pagefind weighting docs:
https://pagefind.app/docs/weighting/
- Pagefind ranking docs:
https://pagefind.app/docs/ranking/
- Pagefind highlighting docs:
https://pagefind.app/docs/highlighting/
- Pagefind sub-results docs:
https://pagefind.app/docs/sub-results/
- Pagefind latest release `v1.4.0` (published 2025-09-01):
https://github.com/Pagefind/pagefind/releases/tag/v1.4.0
- Pagefind issue `#198` ("Manually defining content, without passing HTML"):
https://github.com/Pagefind/pagefind/issues/198
- Pagefind issue `#277` ("Can pagefind pull its data from a json index file?"):
https://github.com/Pagefind/pagefind/issues/277
- Pagefind issue `#574` (`ubuntu-latest` / `npx` wrapper failure, still open on
2026-03-15):
https://github.com/Pagefind/pagefind/issues/574
- HN discussion for Pagefind launch:
https://news.ycombinator.com/item?id=32290634
- HN item API for the same discussion:
https://hn.algolia.com/api/v1/items/32290634
- Zach Leatherman's `pagefind-search` web component:
https://github.com/zachleat/pagefind-search