doesitarm/docs/research/public-repo-security-and-monorepo-patterns-2026-04-04.md
ThatGuySam 1248c705b0 docs(plan): add discovery and deploy follow-up research
Capture the next discovery, security, compatibility-data, and dual-deploy planning work, and ignore local Vercel/env state that should not be committed. This keeps the operational research with the repo while avoiding accidental local-config churn.

Constraint: Must not alter production runtime behavior
Rejected: Fold research notes into the runtime fix commit | obscures the user-facing app-test correction with planning-only material
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep .omx local state untracked even when committing broad workspace updates
Tested: Document review only
Not-tested: No runtime verification required for docs and ignore rules
2026-04-04 15:38:39 -05:00

17 KiB

Public Repo Security And Monorepo Patterns For doesitarm

Tease: The safest version of this plan keeps doesitarm public, but treats credentials, imports, downloaded app artifacts, and privileged automation as private operational surfaces.

Lede: For doesitarm on 2026-04-04, the best-fit pattern is a Kriasoft-style public monorepo with clear apps/, packages/, db/, and infra/ boundaries, plus hardened GitHub Actions, GitHub-hosted runners for public workflows, D1 local development via Wrangler, and private storage for secrets, backups, and quarantined artifacts.

Why it matters:

  • The current repo is about to add higher-risk surfaces: D1, automated app discovery, archive downloading, scheduled jobs, and more Cloudflare automation.
  • In a public repo, CI/CD mistakes matter as much as application code mistakes. Workflow files, tokens, logs, and runner choices become part of the threat model.
  • The current repo already has one immediate security problem: a workflow prints secret-derived files to CI logs.

Go deeper:

  • Keep the code public; keep secrets, raw data, and operational state private.
  • Refactor toward a monorepo shape early so new ingestion, scanner, D1, and infra code do not spread across a flat root.
  • Adopt OSS-friendly GitHub hardening: read-only default GITHUB_TOKEN, pinned actions, CODEOWNERS on workflow/infra/db paths, secret scanning, private vulnerability reporting, and no self-hosted runners for public PRs.

Date: 2026-04-04

Scope

Research security considerations and common open-source repository patterns for a setup like doesitarm:

  • public GitHub repository
  • Cloudflare Workers and D1
  • scheduled automation
  • automated downloading and scanning of third-party app archives
  • prospective monorepo refactor in the style of kriasoft/react-starter-kit

This memo is intended to drive updates to app-discovery-d1-automation.md.

Short Answer

Do not move the whole repo private.

Instead:

  1. Keep the application and infrastructure code public.
  2. Move secrets, imported raw data, D1 operational state, downloaded artifacts, quarantined samples, and any sensitive fixtures to private systems.
  3. Refactor into a monorepo early, using a Kriasoft-style structure adapted to this repo's existing pnpm/Netlify/Astro/Workers setup.
  4. Harden GitHub Actions before expanding automation.

Best-fit recommendation:

  • Public monorepo with apps/, packages/, db/, infra/, scripts/, and docs/
  • GitHub-hosted runners for public workflows
  • GitHub environment secrets with required reviewers for production deploys
  • Cloudflare D1 local development and tests via Wrangler --local, preview_database_id, and test harnesses like unstable_dev()/Miniflare
  • Private object storage or equivalent for raw app archives, import dumps, and quarantine material

Inference: This is the right fit because the repo is open source and community-facing, but the risky parts are operational, not architectural. Public code is compatible with good security here; public credentials and public operational data are not.

What The Repo Already Knows

  • The repo is currently flat-rooted, not organized as a workspace monorepo.
  • There is no checked-in D1 configuration or local D1 bootstrap yet.
  • There is Cloudflare deployment automation in deploy-cloudflare-workers.yml.
  • That workflow currently decodes secret-backed .env / wrangler.toml files and prints them with cat, which is a real security issue in CI logs.
  • The site build still depends on remote/env-backed feeds such as SCANS_SOURCE, COMMITS_SOURCE, HOMEBREW_SOURCE, GAMES_SOURCE, and VFUNCTIONS_URL.
  • The scanner and planned discovery pipeline will process untrusted third-party files, including archive formats like ZIP, DMG, and PKG.
  • .env is ignored at the root, and per-worker wrangler.toml files are already ignored in worker subdirectories.

What The Evidence Says

1. Public repos can stay public if the operational boundary is private

GitHub's own docs assume public repositories will use:

  • repository or environment secrets
  • restricted organization secret access
  • private vulnerability reporting
  • automatic secret scanning on public repos

That is strong evidence that the normal pattern is not "make the repo private"; it is "keep sensitive operational material out of the repo and out of logs."

2. Default GitHub Actions posture should be least privilege

GitHub recommends:

  • minimum required GITHUB_TOKEN permissions
  • default repository token permission set to read-only
  • escalating permissions only per job
  • using a GitHub App token if a job needs more than GITHUB_TOKEN can provide

This matches what open-source repos increasingly do for deploy, release, and cross-repo automation.

3. Secrets are still easy to leak through logs and workflow behavior

GitHub's secure-use docs explicitly warn that:

  • redaction is not guaranteed for transformed values
  • structured blobs like JSON/YAML are poor secret formats
  • non-secret values should be masked explicitly with ::add-mask::
  • exposed secrets in logs should trigger deletion/rotation

For doesitarm, this directly applies to the current workflow that prints secret-derived config files into CI output.

4. Public repos should avoid self-hosted runners for untrusted PRs

GitHub explicitly recommends self-hosted runners only with private repositories, because forks of public repositories can run dangerous code on them through pull requests.

For this repo, that means:

  • do not put public PR workflows on a local machine or other long-lived self-hosted runner
  • do not run untrusted archive-processing jobs on a self-hosted runner that also holds production credentials

5. pull_request_target remains a common footgun

GitHub Security Lab's Preventing pwn requests guidance is still the clearest implementation reference:

  • pull_request_target plus checking out/building PR code is dangerous
  • untrusted PR code should run in an unprivileged pull_request workflow
  • privileged follow-up actions should happen through workflow_run with carefully handled artifacts

HN discussion around real workflow exploits reinforces the same point: the problem is not theoretical.

6. Common OSS hardening patterns for GitHub workflows are now well-defined

GitHub secure-use guidance and OpenSSF best-practice guidance converge on:

  • pin actions to full commit SHAs
  • restrict allowed actions where possible
  • guard .github/workflows/ with CODEOWNERS
  • keep default branch protected
  • require reviews and passing checks
  • use code scanning / dependency review / secret scanning / Dependabot
  • use private vulnerability reporting for public repos

These are standard public-repo practices, not enterprise-only overkill.

7. Cloudflare D1 already supports local-first development and tests

Cloudflare's D1 docs explicitly support:

  • wrangler dev local mode
  • preview_database_id
  • wrangler d1 migrations apply --local
  • test setups using Miniflare and unstable_dev()

That means D1 does not require a private repo or remote-only workflow. It fits the "run locally on this machine, then automate" plan well.

8. Cloudflare Workflows and observability make Cloudflare a credible later home for ingestion

Cloudflare Workflows now position themselves as durable multi-step execution with retries, persisted state, and debugging. Workers Logs and Traces provide native observability. That is enough evidence to treat Cloudflare as a viable later landing zone for scheduled ingestion and scan orchestration.

Inference: GitHub Actions is still the easier first scheduler because it is already in the repo, but Cloudflare Workflows has matured enough to stay in the plan as a serious later option.

9. Kriasoft's monorepo shape is a good architectural fit, but not every exact convention should be copied blindly

kriasoft/react-starter-kit is a public monorepo with:

  • apps/
  • packages/
  • db/
  • docs/
  • infra/
  • scripts/

It also documents a public template env pattern where committed .env contains placeholders/defaults and .env.local contains real credentials.

That shape is a strong fit for doesitarm, but I would adapt the env pattern slightly for safety and clarity:

  • keep a committed public template file such as .env.example
  • keep real credentials in .env.local, .dev.vars, GitHub environment secrets, and Cloudflare secrets

Inference: Kriasoft's folder layout is the part worth copying directly. The exact env-file naming should follow the least-confusing safe convention for this repo.

Common Open-Source Patterns That Fit doesitarm

Public code, private state

Keep public:

  • app code
  • scanner code
  • D1 schema and migrations
  • workflow definitions
  • docs and plans

Keep private:

  • deploy credentials and tokens
  • raw Google Sheets exports or database backups
  • downloaded app archives
  • quarantine samples
  • private test fixtures that would create redistribution or abuse risk
  • operational dashboards and alert destinations

Workspace monorepo with clear trust boundaries

Best-fit structure for doesitarm:

  • apps/web/ — Astro site and app-test UI
  • apps/default-worker/ — current doesitarm-default
  • apps/analytics-worker/ — current workers/analytics
  • apps/ingest/ or apps/discovery/ — CLI/admin surface for discovery jobs
  • packages/scanner-core/ — shared scan engine and file-format logic
  • packages/source-runners/ — Homebrew/GitHub/download-page source runners
  • packages/data-model/ — shared D1 schema types, DTOs, validation
  • packages/site-build/ — list/build/export helpers
  • db/ — D1 migrations, seeds, import scripts, local test DB helpers
  • infra/ — Wrangler config, deploy config, policy docs
  • scripts/ — repo automation
  • docs/ — plans, research, operational docs

Repo template files, not repo secrets

Common OSS pattern:

  • commit .env.example or placeholder-only .env
  • ignore .env.local, .dev.vars, and .wrangler/
  • keep Cloudflare secrets in Workers secrets / GitHub environment secrets

Hardened GitHub Actions for public forks

Common OSS pattern:

  • default permissions: { contents: read }
  • explicit per-job escalation only
  • require approval for fork PR workflows where appropriate
  • no self-hosted runners for public PRs
  • no pull_request_target workflows that checkout/build PR code

Supply-chain hygiene for workflows

Common OSS pattern:

  • pin actions to full SHAs
  • restrict allowed actions
  • Dependabot for action updates
  • CodeQL / code scanning for workflow vulnerabilities
  • OpenSSF Scorecards for ongoing hygiene checks

Disclosure and scanning defaults

Common OSS pattern:

  • enable private vulnerability reporting
  • enable secret scanning and push protection
  • keep a SECURITY.md policy

What Works

  • Keeping the repo public while moving secrets and sensitive data out of git
  • Refactoring to a monorepo before adding more D1/discovery complexity
  • Treating workflow files, infra/, and db/ as protected surfaces with CODEOWNERS
  • Using GitHub-hosted runners for public CI and scheduled jobs
  • Using environment-specific secrets with required reviewers for production deployment jobs
  • Using D1 local mode and local migrations as part of normal development
  • Using Cloudflare Logs/Traces or equivalent observability for scheduled jobs
  • Storing raw archives and quarantine material in private object storage rather than in the repo

What To Avoid

  • Do not move the whole repo private as a substitute for secrets hygiene
  • Do not keep the current workflow behavior that prints secret-derived files to CI logs
  • Do not use self-hosted runners for public PR workflows
  • Do not run archive downloads/extraction in privileged workflows that also have deploy credentials
  • Do not combine pull_request_target with explicit PR checkout/build steps
  • Do not keep adding discovery/D1/worker code into the current flat root
  • Do not commit raw import dumps, app archives, or structured secret blobs

Recommendation

For doesitarm, the strongest next-step package is:

  1. Refactor toward a Kriasoft-style monorepo shape adapted to pnpm.
  2. Add a security-hardening stage before expanding automation.
  3. Keep the repo public.
  4. Keep secrets, raw operational data, and archive/quarantine material private.
  5. Start scheduled discovery on GitHub-hosted runners with hardened workflows.
  6. Keep Cloudflare Workflows as a second-phase target for durable ingestion.

Immediate high-priority actions to capture in the plan:

  1. Remove secret printing from deploy-cloudflare-workers.yml and rotate affected secrets.
  2. Add repo policy and tooling for:
    • read-only default GITHUB_TOKEN
    • pinned actions
    • CODEOWNERS for .github/workflows/, infra/, and db/
    • secret scanning / push protection
    • private vulnerability reporting
  3. Add ignored local-secret files for the new D1/Workers workflow:
    • .env.local
    • .dev.vars
    • .wrangler/
  4. Keep public PR CI on GitHub-hosted runners only.
  5. Store raw archives/import snapshots outside the repo.

Missing Information

  • Whether the future ingestion runtime is expected to stay GitHub-first or eventually move fully to Cloudflare Workers/Workflows.
  • Whether there are legal or vendor-policy constraints around storing downloaded app archives long term.
  • Whether the monorepo refactor should keep Netlify as-is or consolidate more runtime surfaces onto Cloudflare.

Source Quality Notes

  • Highest-confidence sources in this memo are GitHub Docs, GitHub Security Lab, OpenSSF, Cloudflare Docs, and the Kriasoft repository itself.
  • HN/Lobsters did not surface a materially better competing pattern in this pass; the most useful HN signal reinforced GitHub Security Lab's warning on pull_request_target.
  • The recommendation to keep the repo public but move operational data private is a synthesis from official guidance plus this repo's current shape and risk surface.