mirror of https://github.com/ThatGuySam/doesitarm.git synced 2026-05-18 06:44:46 -07:00

ThatGuySam e5f28b16ee docs(research): add pagefind feature parity memo

Capture user-visible parity requirements for a future Pagefind migration.

This keeps the earlier viability memo focused on engine fit and documents the recommended adapter approach, carry-over patterns, and remaining prototype risks around ranking and title highlighting.

2026-03-15 13:03:57 -05:00

14 KiB

Raw Blame History

Pagefind Feature Parity For doesitarm

Date: 2026-03-15

Scope

Read alongside docs/research/pagefind-viability-2026-03-15.md.

Investigate how a Pagefind migration could preserve the current Stork-backed search UX in doesitarm, focusing on user-visible behavior rather than on whether Pagefind is viable in the abstract.

Short Answer

Yes, most of the current search experience can be carried over without the user feeling a major regression, but only if Pagefind is treated as the search core under a custom Vue adapter.

Recommended parity path:

Keep the current server-rendered initial lists, pagination links, summary block, and page chrome exactly as they are.
Replace only the "user has started searching/filtering" path with Pagefind's JavaScript API.
Build the Pagefind index from the existing sitemap/listing data, not from an HTML crawl.
Use Pagefind filters for status/category/type scoping.
Use Pagefind meta only for simple scalar fields needed in result rendering.
Reattach richer card UI state such as searchLinks from a local URL-or-slug keyed map instead of trying to force arrays into Pagefind metadata.

The one place where a prototype may still change the implementation choice is search quality. If addCustomRecord() does not rank app-name and alias matches well enough, the next-best parity option is to generate virtual HTML records via addHTMLFile() so Pagefind can use h1 weighting and data-pagefind-* attributes.

Current UX Contract In The Repo

From components/search-stork.vue, helpers/stork/toml.js, and the scoped Astro pages:

The page initially shows the existing paginated list from the API when the user has not typed anything yet.
Search is search-as-you-type, with loading placeholders while results are pending.
The UI exposes quick status chips.
Scoped pages such as /kind/... and /games inject base filters so the same component behaves like "search within this slice".
Empty results on a scoped page show a "Search Everything" escape hatch.
Query results show highlighted snippets and a detail link.
Non-query cards can also show timestamps and auxiliary action buttons such as benchmark/performance links.
The current Stork index injects synthetic searchable tokens for status_*, category, and route type, in addition to title/content/description/aliases and tags.
Stork also post-filters query results so every typed term must be present somewhere in the returned title/URL/excerpts.

That means parity is not just "can users search", but:

can they search globally and within a scoped page
can they click status chips
can they still get good snippets and stable detail URLs
can the initial browse mode remain unchanged

What The Evidence Says

Confirmed from Pagefind docs and repo activity:

The Node API supports addCustomRecord() with url, content, language, optional flat meta, optional flat filters, and optional flat sort.
The Node API also supports addHTMLFile() for virtual HTML pages and writeFiles() / getFiles() for writing the bundle to /pagefind/.
The browser API is intended for custom search interfaces, not just the stock widget.
pagefind.init() can be called on focus, and pagefind.preload() / pagefind.debouncedSearch() exist specifically to reduce first-search latency.
result.data() returns url, excerpt, meta, and related result data. The docs explicitly say excerpt is safe to use as innerHTML, while content and meta are raw.
The JS API supports filter-only browsing by calling pagefind.search(null, { filters: ... }).
The JS API can also return filter counts via pagefind.filters(), plus remaining-result counts on subsequent searches.
Filtering defaults to AND semantics, and compound any / all / none / not logic is available.
Sorting can be applied at search time, but records missing a sort value are omitted when that sort is active.
Highlighting on destination pages is supported via highlightParam and pagefind-highlight.js.
Historical GitHub issues #198 and #277 asked for direct non-HTML input; both are now closed, and the current docs document that capability.
The latest stable release is v1.4.0, published on 2025-09-01.
Issue #574 about the npx wrapper on ubuntu-latest is still open as of 2026-03-15, so a pinned dependency or direct binary path is safer than a casual CLI swap.

Community signal:

In the main HN discussion for Pagefind's launch, the maintainer explicitly said multi-word query merging is built in.
Another HN commenter reported that deploying Pagefind was "pleasingly easy" and the result was "reasonably nippy".
Zach Leatherman's pagefind-search component is a concrete GitHub example of treating Pagefind as a customizable UI layer with explicit fallback content and controlled asset loading.

Feature Mapping

Current user-visible feature	Carry-over path with Pagefind	Confidence	Notes
Search-as-you-type	`pagefind.debouncedSearch()` or manual debounce + `pagefind.preload()`	High	This is native to the JS API.
Lazy first-load behavior	`pagefind.init()` on focus, or rely on first search	High	This matches the current deferred Stork load pattern.
Scoped search pages	Keep current initial page data, then call `pagefind.search(term-or-null, { filters })`	High	Better fit than the current synthetic token approach.
Quick status chips	Map chips to `filters.status` values	High	Pagefind filters are cleaner than indexing `status_native` into content.
Empty-state "Search Everything"	Clear base filters and rerun, or keep current link to `/`	High	User-visible behavior is easy to preserve.
Highlighted excerpts	Render `result.data().excerpt`	High	Officially documented and safe as `innerHTML`.
Highlighted title text	No first-class JS API equivalent was found in the docs	Medium	Likely plain title unless we add client-side emphasis ourselves.
Detail links	Use `result.data().url`	High	Direct match.
Relative timestamp text	Put timestamp in `meta`, or join from local listing data	High	`meta` is string-only, so store ISO strings if using metadata.
Benchmark / Performance buttons	Join from local listing data keyed by URL or slug	High	Inference: better than encoding arrays as metadata strings.
Status / category / type scoping	Use Pagefind `filters`, not fake searchable tokens	High	Cleaner and more explicit than the current Stork trick.
"Every typed term must match somewhere" behavior	Likely client-side post-filter using returned raw `content` if needed	Medium	Current Stork behavior is explicit; Pagefind query semantics need a parity check in a prototype.
Result ordering that favors app names and aliases	Start with `addCustomRecord()` content shaping; fall back to `addHTMLFile()` if needed	Medium	Custom-record metadata appears display-oriented, not ranking-oriented.

Options

1. Custom Vue adapter over `addCustomRecord()` output

This is the lowest-risk parity path.

Why it works:

It matches the repo's existing data-first indexing model.
It preserves the current page shell and only swaps the query engine.
It uses Pagefind features the way they are documented today: meta for display fields, filters for scoping, sort for explicit sort options.

Tradeoffs:

meta is for returned metadata, not clearly for ranking.
Complex card state such as searchLinks does not fit naturally into flat string metadata.
The docs do not show title-highlight ranges in the JS API, so exact title highlighting may need custom client logic.

2. Custom Vue adapter over generated virtual HTML via `addHTMLFile()`

This is the "higher parity if ranking is off" option.

Why it might be worth it:

Pagefind documents default weighting for HTML headings.
Pagefind documents data-pagefind-weight, data-pagefind-meta, data-pagefind-filter, and data-pagefind-sort.
If app-name, alias, and status text need finer relevance tuning than a plain custom record gives, virtual HTML gives more levers.

Tradeoffs:

More adapter code.
Harder to justify unless a real query corpus shows ranking problems.

3. Replace the current component with the stock Pagefind UI

Not recommended.

Why:

It discards the current browse-first behavior and scoped-page behavior.
It loses the current empty-state copy and action-button treatment.
It would make parity depend on overriding Pagefind's UI instead of preserving the repo's existing search component contract.

What Works

Keep "no query" mode exactly as it is today and switch to Pagefind only after the user types or toggles a filter.
Build one Pagefind record per listing/detail route, using the same sitemap payloads already feeding the Stork pipeline.
Put searchable text in content, starting with the fields users most expect to match: app name, aliases, support text, description, tags, and any status phrasing users already see.
Put render-only scalar fields in meta, such as title, slug, status label, last-updated ISO timestamp, and short display text.
Use filters for status, category, kind, and other scoped-page constraints.
Build a parallel local result-decoration map keyed by URL or slug so Pagefind results can be decorated with the same searchLinks, timestamps, or other card chrome without turning the index into a transport format for the whole listing object.
Call pagefind.filters() once if you want the chip row to expose counts or disabled states.

Inference: For doesitarm, this "Pagefind for retrieval + local map for decoration" split is probably the cleanest way to preserve the current UI without bloating Pagefind metadata or weakening search semantics.

What To Avoid

Do not replace the entire search component with the stock Pagefind UI if the goal is parity.
Do not assume meta alone is enough for search quality. Metadata is clearly documented as returned result data; searchable content still needs to live in content.
Do not try to stuff arrays or nested structures like searchLinks into flat Pagefind metadata if the same information already exists in local page data.
Do not apply Pagefind sort options to sparse fields unless every record has a value, because missing sort keys are omitted from sorted results.
Do not assume the npx pagefind wrapper is production-safe on Ubuntu CI without pinning and testing.

Recommendation

Recommended implementation order:

Keep the current Astro pages and initial list rendering exactly as-is.
Build a Pagefind prototype with addCustomRecord() from the existing sitemap payloads.
Map the current scoped-page baseFilters to real Pagefind filters.
Add a thin Pagefind adapter inside the current Vue component rather than replacing the component.
Use a local listingByUrl or listingBySlug map to reattach rich card UI fields.
Compare a real query set against Stork, especially app-name, alias, and multi-term searches.
Only if ranking quality is weaker than expected, move the prototype from addCustomRecord() to generated addHTMLFile() records with weighted markup.

Why this is the best default:

It preserves the current user-visible experience in the cheapest way.
It uses Pagefind features where Pagefind is strongest: retrieval, snippets, filtering, counts, and static bundle delivery.
It avoids forcing Pagefind to become the canonical source of every UI field.

Missing Information

I did not find a documented JS API for title highlight ranges equivalent to the Stork title-range behavior.
I did not find clear documentation on exact multi-term query semantics beyond Pagefind supporting multi-word queries in practice.
I did not find a high-signal Stack Overflow thread that added more than the official docs for this migration.
The Lobsters URL surfaced during search no longer resolved, so I did not use it as evidence.

Recommended Next Inspection Steps

Build a small Pagefind prototype against 100-200 representative listings.
Test 25-50 real queries from the current site vocabulary: app names, aliases, status words, category words, and mixed multi-term queries.
Decide whether status chips should stay effectively single-select, matching current behavior, or become explicit OR filters within the same filter key.
Verify whether plain title rendering is acceptable, or whether custom client-side title emphasis is needed.
Measure first-search latency on mobile before removing Stork.

Source Links

Pagefind Node API docs: https://pagefind.app/docs/node-api/
Pagefind browser API docs: https://pagefind.app/docs/api/
Pagefind filtering docs: https://pagefind.app/docs/filtering/
Pagefind JS API filtering docs: https://pagefind.app/docs/js-api-filtering/
Pagefind sorting docs: https://pagefind.app/docs/sorts/
Pagefind JS API sorting docs: https://pagefind.app/docs/js-api-sorting/
Pagefind metadata docs: https://pagefind.app/docs/metadata/
Pagefind JS API metadata docs: https://pagefind.app/docs/js-api-metadata/
Pagefind weighting docs: https://pagefind.app/docs/weighting/
Pagefind ranking docs: https://pagefind.app/docs/ranking/
Pagefind highlighting docs: https://pagefind.app/docs/highlighting/
Pagefind sub-results docs: https://pagefind.app/docs/sub-results/
Pagefind latest release v1.4.0 (published 2025-09-01): https://github.com/Pagefind/pagefind/releases/tag/v1.4.0
Pagefind issue #198 ("Manually defining content, without passing HTML"): https://github.com/Pagefind/pagefind/issues/198
Pagefind issue #277 ("Can pagefind pull its data from a json index file?"): https://github.com/Pagefind/pagefind/issues/277
Pagefind issue #574 (ubuntu-latest / npx wrapper failure, still open on 2026-03-15): https://github.com/Pagefind/pagefind/issues/574
HN discussion for Pagefind launch: https://news.ycombinator.com/item?id=32290634
HN item API for the same discussion: https://hn.algolia.com/api/v1/items/32290634
Zach Leatherman's pagefind-search web component: https://github.com/zachleat/pagefind-search

14 KiB Raw Blame History