doesitarm/docs/research/pagefind-feature-parity-2026-03-15.md
ThatGuySam e5f28b16ee docs(research): add pagefind feature parity memo
Capture user-visible parity requirements for a future Pagefind migration.

This keeps the earlier viability memo focused on engine fit and documents the recommended adapter approach, carry-over patterns, and remaining prototype risks around ranking and title highlighting.
2026-03-15 13:03:57 -05:00

14 KiB

Pagefind Feature Parity For doesitarm

Date: 2026-03-15

Scope

Read alongside docs/research/pagefind-viability-2026-03-15.md.

Investigate how a Pagefind migration could preserve the current Stork-backed search UX in doesitarm, focusing on user-visible behavior rather than on whether Pagefind is viable in the abstract.

Short Answer

Yes, most of the current search experience can be carried over without the user feeling a major regression, but only if Pagefind is treated as the search core under a custom Vue adapter.

Recommended parity path:

  1. Keep the current server-rendered initial lists, pagination links, summary block, and page chrome exactly as they are.
  2. Replace only the "user has started searching/filtering" path with Pagefind's JavaScript API.
  3. Build the Pagefind index from the existing sitemap/listing data, not from an HTML crawl.
  4. Use Pagefind filters for status/category/type scoping.
  5. Use Pagefind meta only for simple scalar fields needed in result rendering.
  6. Reattach richer card UI state such as searchLinks from a local URL-or-slug keyed map instead of trying to force arrays into Pagefind metadata.

The one place where a prototype may still change the implementation choice is search quality. If addCustomRecord() does not rank app-name and alias matches well enough, the next-best parity option is to generate virtual HTML records via addHTMLFile() so Pagefind can use h1 weighting and data-pagefind-* attributes.

Current UX Contract In The Repo

From components/search-stork.vue, helpers/stork/toml.js, and the scoped Astro pages:

  • The page initially shows the existing paginated list from the API when the user has not typed anything yet.
  • Search is search-as-you-type, with loading placeholders while results are pending.
  • The UI exposes quick status chips.
  • Scoped pages such as /kind/... and /games inject base filters so the same component behaves like "search within this slice".
  • Empty results on a scoped page show a "Search Everything" escape hatch.
  • Query results show highlighted snippets and a detail link.
  • Non-query cards can also show timestamps and auxiliary action buttons such as benchmark/performance links.
  • The current Stork index injects synthetic searchable tokens for status_*, category, and route type, in addition to title/content/description/aliases and tags.
  • Stork also post-filters query results so every typed term must be present somewhere in the returned title/URL/excerpts.

That means parity is not just "can users search", but:

  • can they search globally and within a scoped page
  • can they click status chips
  • can they still get good snippets and stable detail URLs
  • can the initial browse mode remain unchanged

What The Evidence Says

Confirmed from Pagefind docs and repo activity:

  • The Node API supports addCustomRecord() with url, content, language, optional flat meta, optional flat filters, and optional flat sort.
  • The Node API also supports addHTMLFile() for virtual HTML pages and writeFiles() / getFiles() for writing the bundle to /pagefind/.
  • The browser API is intended for custom search interfaces, not just the stock widget.
  • pagefind.init() can be called on focus, and pagefind.preload() / pagefind.debouncedSearch() exist specifically to reduce first-search latency.
  • result.data() returns url, excerpt, meta, and related result data. The docs explicitly say excerpt is safe to use as innerHTML, while content and meta are raw.
  • The JS API supports filter-only browsing by calling pagefind.search(null, { filters: ... }).
  • The JS API can also return filter counts via pagefind.filters(), plus remaining-result counts on subsequent searches.
  • Filtering defaults to AND semantics, and compound any / all / none / not logic is available.
  • Sorting can be applied at search time, but records missing a sort value are omitted when that sort is active.
  • Highlighting on destination pages is supported via highlightParam and pagefind-highlight.js.
  • Historical GitHub issues #198 and #277 asked for direct non-HTML input; both are now closed, and the current docs document that capability.
  • The latest stable release is v1.4.0, published on 2025-09-01.
  • Issue #574 about the npx wrapper on ubuntu-latest is still open as of 2026-03-15, so a pinned dependency or direct binary path is safer than a casual CLI swap.

Community signal:

  • In the main HN discussion for Pagefind's launch, the maintainer explicitly said multi-word query merging is built in.
  • Another HN commenter reported that deploying Pagefind was "pleasingly easy" and the result was "reasonably nippy".
  • Zach Leatherman's pagefind-search component is a concrete GitHub example of treating Pagefind as a customizable UI layer with explicit fallback content and controlled asset loading.

Feature Mapping

Current user-visible feature Carry-over path with Pagefind Confidence Notes
Search-as-you-type pagefind.debouncedSearch() or manual debounce + pagefind.preload() High This is native to the JS API.
Lazy first-load behavior pagefind.init() on focus, or rely on first search High This matches the current deferred Stork load pattern.
Scoped search pages Keep current initial page data, then call pagefind.search(term-or-null, { filters }) High Better fit than the current synthetic token approach.
Quick status chips Map chips to filters.status values High Pagefind filters are cleaner than indexing status_native into content.
Empty-state "Search Everything" Clear base filters and rerun, or keep current link to / High User-visible behavior is easy to preserve.
Highlighted excerpts Render result.data().excerpt High Officially documented and safe as innerHTML.
Highlighted title text No first-class JS API equivalent was found in the docs Medium Likely plain title unless we add client-side emphasis ourselves.
Detail links Use result.data().url High Direct match.
Relative timestamp text Put timestamp in meta, or join from local listing data High meta is string-only, so store ISO strings if using metadata.
Benchmark / Performance buttons Join from local listing data keyed by URL or slug High Inference: better than encoding arrays as metadata strings.
Status / category / type scoping Use Pagefind filters, not fake searchable tokens High Cleaner and more explicit than the current Stork trick.
"Every typed term must match somewhere" behavior Likely client-side post-filter using returned raw content if needed Medium Current Stork behavior is explicit; Pagefind query semantics need a parity check in a prototype.
Result ordering that favors app names and aliases Start with addCustomRecord() content shaping; fall back to addHTMLFile() if needed Medium Custom-record metadata appears display-oriented, not ranking-oriented.

Options

1. Custom Vue adapter over addCustomRecord() output

This is the lowest-risk parity path.

Why it works:

  • It matches the repo's existing data-first indexing model.
  • It preserves the current page shell and only swaps the query engine.
  • It uses Pagefind features the way they are documented today: meta for display fields, filters for scoping, sort for explicit sort options.

Tradeoffs:

  • meta is for returned metadata, not clearly for ranking.
  • Complex card state such as searchLinks does not fit naturally into flat string metadata.
  • The docs do not show title-highlight ranges in the JS API, so exact title highlighting may need custom client logic.

2. Custom Vue adapter over generated virtual HTML via addHTMLFile()

This is the "higher parity if ranking is off" option.

Why it might be worth it:

  • Pagefind documents default weighting for HTML headings.
  • Pagefind documents data-pagefind-weight, data-pagefind-meta, data-pagefind-filter, and data-pagefind-sort.
  • If app-name, alias, and status text need finer relevance tuning than a plain custom record gives, virtual HTML gives more levers.

Tradeoffs:

  • More adapter code.
  • Harder to justify unless a real query corpus shows ranking problems.

3. Replace the current component with the stock Pagefind UI

Not recommended.

Why:

  • It discards the current browse-first behavior and scoped-page behavior.
  • It loses the current empty-state copy and action-button treatment.
  • It would make parity depend on overriding Pagefind's UI instead of preserving the repo's existing search component contract.

What Works

  • Keep "no query" mode exactly as it is today and switch to Pagefind only after the user types or toggles a filter.
  • Build one Pagefind record per listing/detail route, using the same sitemap payloads already feeding the Stork pipeline.
  • Put searchable text in content, starting with the fields users most expect to match: app name, aliases, support text, description, tags, and any status phrasing users already see.
  • Put render-only scalar fields in meta, such as title, slug, status label, last-updated ISO timestamp, and short display text.
  • Use filters for status, category, kind, and other scoped-page constraints.
  • Build a parallel local result-decoration map keyed by URL or slug so Pagefind results can be decorated with the same searchLinks, timestamps, or other card chrome without turning the index into a transport format for the whole listing object.
  • Call pagefind.filters() once if you want the chip row to expose counts or disabled states.

Inference: For doesitarm, this "Pagefind for retrieval + local map for decoration" split is probably the cleanest way to preserve the current UI without bloating Pagefind metadata or weakening search semantics.

What To Avoid

  • Do not replace the entire search component with the stock Pagefind UI if the goal is parity.
  • Do not assume meta alone is enough for search quality. Metadata is clearly documented as returned result data; searchable content still needs to live in content.
  • Do not try to stuff arrays or nested structures like searchLinks into flat Pagefind metadata if the same information already exists in local page data.
  • Do not apply Pagefind sort options to sparse fields unless every record has a value, because missing sort keys are omitted from sorted results.
  • Do not assume the npx pagefind wrapper is production-safe on Ubuntu CI without pinning and testing.

Recommendation

Recommended implementation order:

  1. Keep the current Astro pages and initial list rendering exactly as-is.
  2. Build a Pagefind prototype with addCustomRecord() from the existing sitemap payloads.
  3. Map the current scoped-page baseFilters to real Pagefind filters.
  4. Add a thin Pagefind adapter inside the current Vue component rather than replacing the component.
  5. Use a local listingByUrl or listingBySlug map to reattach rich card UI fields.
  6. Compare a real query set against Stork, especially app-name, alias, and multi-term searches.
  7. Only if ranking quality is weaker than expected, move the prototype from addCustomRecord() to generated addHTMLFile() records with weighted markup.

Why this is the best default:

  • It preserves the current user-visible experience in the cheapest way.
  • It uses Pagefind features where Pagefind is strongest: retrieval, snippets, filtering, counts, and static bundle delivery.
  • It avoids forcing Pagefind to become the canonical source of every UI field.

Missing Information

  • I did not find a documented JS API for title highlight ranges equivalent to the Stork title-range behavior.
  • I did not find clear documentation on exact multi-term query semantics beyond Pagefind supporting multi-word queries in practice.
  • I did not find a high-signal Stack Overflow thread that added more than the official docs for this migration.
  • The Lobsters URL surfaced during search no longer resolved, so I did not use it as evidence.
  1. Build a small Pagefind prototype against 100-200 representative listings.
  2. Test 25-50 real queries from the current site vocabulary: app names, aliases, status words, category words, and mixed multi-term queries.
  3. Decide whether status chips should stay effectively single-select, matching current behavior, or become explicit OR filters within the same filter key.
  4. Verify whether plain title rendering is acceptable, or whether custom client-side title emphasis is needed.
  5. Measure first-search latency on mobile before removing Stork.