mirror of https://github.com/ThatGuySam/doesitarm.git synced 2026-05-18 06:44:46 -07:00

ThatGuySam 9cb77a8ab4 docs(plan): de-risk discoverability work with an ease-first rollout

This adds a repo-local plan for the public discoverability and dataset work so
future implementation can proceed in small, reviewable stages. The rollout is
ordered by ease of implementation and each stage begins with a research pass to
re-check current crawler, schema, and search guidance before code changes land.

Constraint: User requested an ease-first rollout with research checkpoints at every stage
Rejected: Leverage-first ordering | larger evidence-page rewrite would delay smaller foundational wins
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Re-check external docs at the start of each stage before implementing because crawler and structured-data guidance can shift
Tested: git diff --cached --check; manual review of staged plan content
Not-tested: pnpm build; pnpm test (docs-only plan change)

2026-04-04 16:57:25 -05:00

16 KiB

Raw Blame History

Original Prompt

Audit this app for recommend changes from this research ranked by highest leverage. Score each for ease of implementation.

Okay, let's put all of these into a plan. Let's put the easiest to implement towards the top, but also let's include in the plan at every step to research what we're about to do and look for anything helpful, anything that would be helpful for the implementation.

Goal

Turn doesitarm into a stronger public compatibility knowledge product by improving its discoverability, machine-readable surfaces, and evidence quality, while keeping the rollout ordered by easiest implementation first and requiring a fresh research pass before each implementation stage.

Non-Goals

Rebuild the entire site or search experience in the first pass.
Move the full discovery/ingestion pipeline into D1 as part of this plan.
Publish raw crawls, downloaded binaries, or private operational data.
Treat llms.txt or any single AI-specific surface as the primary strategy.
Optimize for every possible search feature before the canonical app pages are stronger for humans.

Repo Findings

App, formula, and game detail pages all flow through the same basic listing template: src/pages/app/[...appPath].astro, src/pages/formula/[...formulaPath].astro, src/pages/game/[...gamePath].astro, src/components/default-listing.astro.
Those pages are still closer to status/listing pages than evidence pages. The main body currently emphasizes the status line, related links, device support, videos, bundles, and a generic last-updated footer, but not a clear "how we know" or methodology layer.
The head system is centralized in helpers/config-node.js, but it does not currently emit canonical links and uses generic descriptions for listing pages.
The listing head model is centralized in helpers/listing-page.js, but the current title/description logic is generic and partially hardcoded.
Structured data support exists, but it is effectively video-only: helpers/listing-page.js builds JSON-LD for videos and benchmark pages, and helpers/config-node.js only emits JSON-LD for tv and benchmarks routes.
The app pages already have a useful public JSON surface under static/api/, but there is no explicit dataset landing page or formal public snapshot surface documented for humans and machines.
The build already generates sitemaps and a search index: build-lists.js.
robots.txt is currently a blanket allow with only a sitemap declaration: static/robots.txt.
I did not find a checked-in llms.txt, llms-full.txt, or public plain-text / Markdown export surface.

External Research

Use docs/research/desktop-app-compatibility-data-strategy-2026-04-04.md as the strategy anchor for this plan.
Google’s current guidance says AI search eligibility still largely depends on normal Search fundamentals, which means the public app pages and metadata still matter more than any AI-only affordance.
Google’s helpful-content, spam-policy, and reviews guidance implies that thin template pages are risky, while evidence-backed, comparison-friendly, and user-helpful pages are more durable.
OpenAI and Anthropic now expose separate search, training, and user-triggered crawlers, which makes bot-specific crawl policy a practical implementation surface.
llms.txt and AI-friendly Markdown exports appear useful as secondary surfaces, but not as the primary lever.
Dataset markup is useful only when the site actually publishes a real dataset landing page and download surface.

Decision

Execute this work in ease-first order, not leverage-first order.

Rationale:

The highest-leverage item is the evidence-page upgrade, but it is not the easiest thing to implement cleanly.
A sequence of smaller early wins can improve crawl quality, metadata, and machine-readable surfaces before the larger content-model work lands.
Each stage should begin with a small research/inspection pass so the implementation reflects current docs, current schema requirements, and any repo changes that happened after this plan was written.

Rollout Plan

1. Strengthen head metadata and canonical URL support

Research and confirm: review current Astro/head implementation, Google guidance for canonical URLs, and whether any existing route types need exceptions before editing the head system.
Implement: add canonical links, ensure og:url / social URL values use the canonical page URL, and upgrade page title/description generation so app pages include more useful fact signals than the current generic copy.
Scope: helpers/config-node.js, helpers/listing-page.js, and the page entrypoints that feed headOptions.
Stop condition: the site emits canonical links and noticeably better per-page metadata without changing page structure yet.

2. Add a public methodology layer and improve source labeling on app pages

Research and confirm: review Google’s helpful-content and review-style guidance again, then inspect the current app page layout to identify the smallest meaningful “how we know” addition that improves user trust without requiring a full template rewrite.
Implement: add a methodology page or methodology section, link to it from app pages, and stop hiding source context behind generic labels such as View.
Implement: expose clearer source/evidence language on app pages, especially around: official source, verification source, last verified, and update/report flows.
Scope: src/components/default-listing.astro, src/components/listing-parts/related-links.astro, src/components/listing-parts/last-updated.astro, plus a new methodology page in src/pages/.
Stop condition: every app page has a clearer “how we know” path even before the full evidence-page upgrade lands.

3. Add a bot-policy matrix and update crawl controls

Research and confirm: re-check current OpenAI and Anthropic crawler docs plus any search-engine bot requirements immediately before changing crawler policy.
Implement: define the intended policy for: Googlebot, OAI-SearchBot, GPTBot, ChatGPT-User, Claude-SearchBot, ClaudeBot, and Claude-User.
Implement: update robots.txt accordingly and add a short repo-local note documenting the reasoning so the bot policy is not rediscovered from scratch later.
Scope: static/robots.txt and a small supporting doc under docs/ or docs/plans/.
Stop condition: the site has explicit AI/search crawler policy rather than a wildcard default.

4. Add `llms.txt` and a lightweight plain-text export surface

Research and confirm: review the latest llms.txt / llms-full.txt conventions and current examples before implementing anything. Re-check whether there are now better conventions for docs maps, plain-text exports, or per-page Markdown.
Implement: add a minimal llms.txt that points to the most useful public surfaces: homepage, categories, methodology, dataset/download page, and canonical app surfaces.
Implement: decide the smallest useful plain-text export surface: either a global llms-full.txt, a compact machine-readable summary, or a small number of high-value Markdown/text views.
Scope: static/ and whichever build helpers are needed to produce a stable low-cost text surface.
Stop condition: the project exposes a minimal AI-friendly discovery surface without yet attempting a full text-export architecture.

5. Expand structured data beyond videos

Research and confirm: re-read current Google structured-data docs and Schema.org guidance before implementation to avoid stale or invalid markup assumptions.
Implement: remove the current “video-only emit” restriction from the head system where appropriate and add truthful app-level structured data for app, formula, and game pages.
Implement: keep markup aligned with visible page content. Do not invent commerce fields or unsupported review fields just to chase rich results.
Scope: helpers/config-node.js, helpers/listing-page.js, helpers/structured-data.js, plus tests around structured-data output.
Stop condition: app-like pages emit valid, truthful JSON-LD, while video pages keep their existing structured-data support.

6. Publish a dataset/download surface

Research and confirm: re-check Google Dataset guidance, Dataset / DataDownload expectations, and practical licensing/provenance requirements before publishing anything as a dataset.
Implement: add a dataset landing page that explains what can be downloaded, how often it updates, what fields are included, and how the data should be cited or used.
Implement: promote one or more existing JSON outputs into formal public snapshot assets, and decide whether CSV should also be emitted in the first pass.
Scope: build-lists.js, new pages in src/pages/, and any helper code needed to expose a stable snapshot surface.
Stop condition: the site offers a real public dataset entrypoint instead of only implicit API files.

7. Upgrade canonical app pages from status pages to evidence pages

Research and confirm: inspect the live app-page composition, compare it with the research memo’s recommendations, and review any new examples of strong evidence-heavy product or compatibility pages before rewriting the template.
Implement: extend app pages so they include a clearer evidence summary, environment distinctions, verification method, source attribution, and a more explicit “what changed” or “status history” path where data exists.
Implement: define the smallest new content blocks that materially increase usefulness: status summary, evidence summary, environment/workaround summary, and change notes.
Scope: src/components/default-listing.astro plus new listing-part components and any supporting listing helpers.
Stop condition: the canonical app pages answer more of the real user question than the current template does.

8. Reduce index bloat and make indexing intentional

Research and confirm: re-read Google crawling/indexing guidance before touching indexability rules, especially around internal search, faceted navigation, and low-value pages.
Implement: decide which surfaces should remain indexable by default: canonical app pages, category pages, dataset pages, and high-intent comparison pages.
Implement: identify low-value or confusing surfaces that may need noindex, canonical consolidation, or sitemap exclusion.
Scope: build-lists.js, helpers/config-node.js, search/category pages, and any sitemap/index-generation helpers.
Stop condition: the site’s crawl/index policy reflects page value instead of “everything is indexable because it exists.”

Validation Gates

Metadata/canonical stage: inspect page source for /, one /app/..., one /formula/..., one /game/..., and one /tv/... page and confirm canonical plus improved title and description output.
Methodology/source-label stage: manually review one app page and confirm that evidence/source links are more legible and that a methodology path is present.
Bot-policy stage: validate generated robots.txt content against the intended bot matrix and confirm the file still advertises the sitemap.
llms.txt stage: fetch and inspect the generated files and confirm they point to live public surfaces that are actually useful.
Structured-data stage: extend or unskip structured-data tests and validate one app page plus one video page.
Dataset stage: confirm the landing page exists, download links work, and the documented schema matches the shipped snapshot fields.
Evidence-page stage: manually review several representative pages: native app, non-native app, formula, and game.
Index-bloat stage: inspect page source, sitemap output, and any noindex directives to confirm only intended pages are affected.
End-to-end gate: pnpm build pnpm test plus manual review of rendered HTML on representative routes.

Deliverables

A repo-local implementation plan in public-discoverability-and-dataset-plan.md
Improved page metadata and canonical handling
A methodology surface linked from public pages
An explicit crawler-policy matrix and updated robots.txt
llms.txt and a lightweight AI-friendly text surface
App-level structured data beyond videos
A dataset/download landing page and public snapshot surface
Stronger evidence-oriented app pages
A more intentional indexing strategy

Risks And Open Questions

Ease-first order is not identical to leverage-first order. The plan delays the largest content-model improvement so smaller foundation changes can land first.
The current public JSON shape may not be sufficient for a strong evidence-page upgrade without adding new derived fields or exposing more of the existing data model.
Bot-policy decisions are partly product decisions, not only technical ones. The safest implementation is not necessarily the best business decision.
llms.txt conventions are still moving, so the implementation should remain lightweight and easy to change.
Dataset publication raises questions about license, attribution, rate limits, and whether CSV/Parquet should be included now or later.
Structured data can easily drift away from visible page truth if the page templates and data model are not upgraded together.
Search/category pages may currently be valuable enough to keep indexed; the “index bloat” stage needs traffic and usefulness review before it removes anything from search.

16 KiB Raw Blame History Unescape Escape