Capture the next discovery, security, compatibility-data, and dual-deploy planning work, and ignore local Vercel/env state that should not be committed. This keeps the operational research with the repo while avoiding accidental local-config churn. Constraint: Must not alter production runtime behavior Rejected: Fold research notes into the runtime fix commit | obscures the user-facing app-test correction with planning-only material Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep .omx local state untracked even when committing broad workspace updates Tested: Document review only Not-tested: No runtime verification required for docs and ignore rules
17 KiB
Public Repo Security And Monorepo Patterns For doesitarm
Tease: The safest version of this plan keeps doesitarm public, but treats credentials, imports, downloaded app artifacts, and privileged automation as private operational surfaces.
Lede: For doesitarm on 2026-04-04, the best-fit pattern is a Kriasoft-style public monorepo with clear apps/, packages/, db/, and infra/ boundaries, plus hardened GitHub Actions, GitHub-hosted runners for public workflows, D1 local development via Wrangler, and private storage for secrets, backups, and quarantined artifacts.
Why it matters:
- The current repo is about to add higher-risk surfaces: D1, automated app discovery, archive downloading, scheduled jobs, and more Cloudflare automation.
- In a public repo, CI/CD mistakes matter as much as application code mistakes. Workflow files, tokens, logs, and runner choices become part of the threat model.
- The current repo already has one immediate security problem: a workflow prints secret-derived files to CI logs.
Go deeper:
- Keep the code public; keep secrets, raw data, and operational state private.
- Refactor toward a monorepo shape early so new ingestion, scanner, D1, and infra code do not spread across a flat root.
- Adopt OSS-friendly GitHub hardening: read-only default
GITHUB_TOKEN, pinned actions, CODEOWNERS on workflow/infra/db paths, secret scanning, private vulnerability reporting, and no self-hosted runners for public PRs.
Date: 2026-04-04
Scope
Research security considerations and common open-source repository patterns for a
setup like doesitarm:
- public GitHub repository
- Cloudflare Workers and D1
- scheduled automation
- automated downloading and scanning of third-party app archives
- prospective monorepo refactor in the style of
kriasoft/react-starter-kit
This memo is intended to drive updates to app-discovery-d1-automation.md.
Short Answer
Do not move the whole repo private.
Instead:
- Keep the application and infrastructure code public.
- Move secrets, imported raw data, D1 operational state, downloaded artifacts, quarantined samples, and any sensitive fixtures to private systems.
- Refactor into a monorepo early, using a Kriasoft-style structure adapted to this repo's existing pnpm/Netlify/Astro/Workers setup.
- Harden GitHub Actions before expanding automation.
Best-fit recommendation:
- Public monorepo with
apps/,packages/,db/,infra/,scripts/, anddocs/ - GitHub-hosted runners for public workflows
- GitHub environment secrets with required reviewers for production deploys
- Cloudflare D1 local development and tests via Wrangler
--local,preview_database_id, and test harnesses likeunstable_dev()/Miniflare - Private object storage or equivalent for raw app archives, import dumps, and quarantine material
Inference: This is the right fit because the repo is open source and community-facing, but the risky parts are operational, not architectural. Public code is compatible with good security here; public credentials and public operational data are not.
What The Repo Already Knows
- The repo is currently flat-rooted, not organized as a workspace monorepo.
- There is no checked-in D1 configuration or local D1 bootstrap yet.
- There is Cloudflare deployment automation in deploy-cloudflare-workers.yml.
- That workflow currently decodes secret-backed
.env/wrangler.tomlfiles and prints them withcat, which is a real security issue in CI logs. - The site build still depends on remote/env-backed feeds such as
SCANS_SOURCE,COMMITS_SOURCE,HOMEBREW_SOURCE,GAMES_SOURCE, andVFUNCTIONS_URL. - The scanner and planned discovery pipeline will process untrusted third-party files, including archive formats like ZIP, DMG, and PKG.
.envis ignored at the root, and per-workerwrangler.tomlfiles are already ignored in worker subdirectories.
What The Evidence Says
1. Public repos can stay public if the operational boundary is private
GitHub's own docs assume public repositories will use:
- repository or environment secrets
- restricted organization secret access
- private vulnerability reporting
- automatic secret scanning on public repos
That is strong evidence that the normal pattern is not "make the repo private"; it is "keep sensitive operational material out of the repo and out of logs."
2. Default GitHub Actions posture should be least privilege
GitHub recommends:
- minimum required
GITHUB_TOKENpermissions - default repository token permission set to read-only
- escalating permissions only per job
- using a GitHub App token if a job needs more than
GITHUB_TOKENcan provide
This matches what open-source repos increasingly do for deploy, release, and cross-repo automation.
3. Secrets are still easy to leak through logs and workflow behavior
GitHub's secure-use docs explicitly warn that:
- redaction is not guaranteed for transformed values
- structured blobs like JSON/YAML are poor secret formats
- non-secret values should be masked explicitly with
::add-mask:: - exposed secrets in logs should trigger deletion/rotation
For doesitarm, this directly applies to the current workflow that prints
secret-derived config files into CI output.
4. Public repos should avoid self-hosted runners for untrusted PRs
GitHub explicitly recommends self-hosted runners only with private repositories, because forks of public repositories can run dangerous code on them through pull requests.
For this repo, that means:
- do not put public PR workflows on a local machine or other long-lived self-hosted runner
- do not run untrusted archive-processing jobs on a self-hosted runner that also holds production credentials
5. pull_request_target remains a common footgun
GitHub Security Lab's Preventing pwn requests guidance is still the clearest
implementation reference:
pull_request_targetplus checking out/building PR code is dangerous- untrusted PR code should run in an unprivileged
pull_requestworkflow - privileged follow-up actions should happen through
workflow_runwith carefully handled artifacts
HN discussion around real workflow exploits reinforces the same point: the problem is not theoretical.
6. Common OSS hardening patterns for GitHub workflows are now well-defined
GitHub secure-use guidance and OpenSSF best-practice guidance converge on:
- pin actions to full commit SHAs
- restrict allowed actions where possible
- guard
.github/workflows/withCODEOWNERS - keep default branch protected
- require reviews and passing checks
- use code scanning / dependency review / secret scanning / Dependabot
- use private vulnerability reporting for public repos
These are standard public-repo practices, not enterprise-only overkill.
7. Cloudflare D1 already supports local-first development and tests
Cloudflare's D1 docs explicitly support:
wrangler devlocal modepreview_database_idwrangler d1 migrations apply --local- test setups using Miniflare and
unstable_dev()
That means D1 does not require a private repo or remote-only workflow. It fits the "run locally on this machine, then automate" plan well.
8. Cloudflare Workflows and observability make Cloudflare a credible later home for ingestion
Cloudflare Workflows now position themselves as durable multi-step execution with retries, persisted state, and debugging. Workers Logs and Traces provide native observability. That is enough evidence to treat Cloudflare as a viable later landing zone for scheduled ingestion and scan orchestration.
Inference: GitHub Actions is still the easier first scheduler because it is already in the repo, but Cloudflare Workflows has matured enough to stay in the plan as a serious later option.
9. Kriasoft's monorepo shape is a good architectural fit, but not every exact convention should be copied blindly
kriasoft/react-starter-kit is a public monorepo with:
apps/packages/db/docs/infra/scripts/
It also documents a public template env pattern where committed .env
contains placeholders/defaults and .env.local contains real credentials.
That shape is a strong fit for doesitarm, but I would adapt the env pattern
slightly for safety and clarity:
- keep a committed public template file such as
.env.example - keep real credentials in
.env.local,.dev.vars, GitHub environment secrets, and Cloudflare secrets
Inference: Kriasoft's folder layout is the part worth copying directly. The exact env-file naming should follow the least-confusing safe convention for this repo.
Common Open-Source Patterns That Fit doesitarm
Public code, private state
Keep public:
- app code
- scanner code
- D1 schema and migrations
- workflow definitions
- docs and plans
Keep private:
- deploy credentials and tokens
- raw Google Sheets exports or database backups
- downloaded app archives
- quarantine samples
- private test fixtures that would create redistribution or abuse risk
- operational dashboards and alert destinations
Workspace monorepo with clear trust boundaries
Best-fit structure for doesitarm:
apps/web/— Astro site and app-test UIapps/default-worker/— currentdoesitarm-defaultapps/analytics-worker/— currentworkers/analyticsapps/ingest/orapps/discovery/— CLI/admin surface for discovery jobspackages/scanner-core/— shared scan engine and file-format logicpackages/source-runners/— Homebrew/GitHub/download-page source runnerspackages/data-model/— shared D1 schema types, DTOs, validationpackages/site-build/— list/build/export helpersdb/— D1 migrations, seeds, import scripts, local test DB helpersinfra/— Wrangler config, deploy config, policy docsscripts/— repo automationdocs/— plans, research, operational docs
Repo template files, not repo secrets
Common OSS pattern:
- commit
.env.exampleor placeholder-only.env - ignore
.env.local,.dev.vars, and.wrangler/ - keep Cloudflare secrets in Workers secrets / GitHub environment secrets
Hardened GitHub Actions for public forks
Common OSS pattern:
- default
permissions: { contents: read } - explicit per-job escalation only
- require approval for fork PR workflows where appropriate
- no self-hosted runners for public PRs
- no
pull_request_targetworkflows that checkout/build PR code
Supply-chain hygiene for workflows
Common OSS pattern:
- pin actions to full SHAs
- restrict allowed actions
- Dependabot for action updates
- CodeQL / code scanning for workflow vulnerabilities
- OpenSSF Scorecards for ongoing hygiene checks
Disclosure and scanning defaults
Common OSS pattern:
- enable private vulnerability reporting
- enable secret scanning and push protection
- keep a
SECURITY.mdpolicy
What Works
- Keeping the repo public while moving secrets and sensitive data out of git
- Refactoring to a monorepo before adding more D1/discovery complexity
- Treating workflow files,
infra/, anddb/as protected surfaces withCODEOWNERS - Using GitHub-hosted runners for public CI and scheduled jobs
- Using environment-specific secrets with required reviewers for production deployment jobs
- Using D1 local mode and local migrations as part of normal development
- Using Cloudflare Logs/Traces or equivalent observability for scheduled jobs
- Storing raw archives and quarantine material in private object storage rather than in the repo
What To Avoid
- Do not move the whole repo private as a substitute for secrets hygiene
- Do not keep the current workflow behavior that prints secret-derived files to CI logs
- Do not use self-hosted runners for public PR workflows
- Do not run archive downloads/extraction in privileged workflows that also have deploy credentials
- Do not combine
pull_request_targetwith explicit PR checkout/build steps - Do not keep adding discovery/D1/worker code into the current flat root
- Do not commit raw import dumps, app archives, or structured secret blobs
Recommendation
For doesitarm, the strongest next-step package is:
- Refactor toward a Kriasoft-style monorepo shape adapted to pnpm.
- Add a security-hardening stage before expanding automation.
- Keep the repo public.
- Keep secrets, raw operational data, and archive/quarantine material private.
- Start scheduled discovery on GitHub-hosted runners with hardened workflows.
- Keep Cloudflare Workflows as a second-phase target for durable ingestion.
Immediate high-priority actions to capture in the plan:
- Remove secret printing from deploy-cloudflare-workers.yml and rotate affected secrets.
- Add repo policy and tooling for:
- read-only default
GITHUB_TOKEN - pinned actions
CODEOWNERSfor.github/workflows/,infra/, anddb/- secret scanning / push protection
- private vulnerability reporting
- read-only default
- Add ignored local-secret files for the new D1/Workers workflow:
.env.local.dev.vars.wrangler/
- Keep public PR CI on GitHub-hosted runners only.
- Store raw archives/import snapshots outside the repo.
Missing Information
- Whether the future ingestion runtime is expected to stay GitHub-first or eventually move fully to Cloudflare Workers/Workflows.
- Whether there are legal or vendor-policy constraints around storing downloaded app archives long term.
- Whether the monorepo refactor should keep Netlify as-is or consolidate more runtime surfaces onto Cloudflare.
Source Links
- GitHub Docs,
GITHUB_TOKENleast-privilege and GitHub App escalation: https://docs.github.com/en/actions/tutorials/authenticate-with-github_token - GitHub Docs, secrets in Actions, fork-secret behavior, environment reviewers, OIDC, and masking: https://docs.github.com/en/actions/security-for-github-actions/security-guides/using-secrets-in-github-actions
- GitHub Docs, secure use reference, pinning actions, CODEOWNERS, code scanning, Dependabot, and Scorecards: https://docs.github.com/en/actions/reference/security/secure-use
- GitHub Docs, self-hosted runner warning for public repositories: https://docs.github.com/en/actions/how-tos/manage-runners/self-hosted-runners/add-runners
- GitHub Docs, limiting self-hosted runners in organizations: https://docs.github.com/en/organizations/managing-organization-settings/disabling-or-limiting-github-actions-for-your-organization
- GitHub Docs, approval requirements for fork PR workflows: https://docs.github.com/en/actions/managing-workflow-runs-and-deployments/managing-workflow-runs/approving-workflow-runs-from-public-forks
- GitHub Docs, repository Actions settings and fork workflow controls: https://docs.github.com/github/administering-a-repository/managing-repository-settings/disabling-or-limiting-github-actions-for-a-repository
- GitHub Docs, secret scanning for public repositories: https://docs.github.com/github/administering-a-repository/about-token-scanning
- GitHub Docs, enabling secret scanning / push protection: https://docs.github.com/en/code-security/how-tos/secure-your-secrets/detect-secret-leaks/enabling-secret-scanning-for-your-repository
- GitHub Docs, enabling push protection: https://docs.github.com/en/code-security/secret-scanning/enabling-secret-scanning-features/enabling-push-protection-for-your-repository
- GitHub Docs, private vulnerability reporting: https://docs.github.com/en/code-security/security-advisories/working-with-repository-security-advisories/configuring-private-vulnerability-reporting-for-a-repository
- GitHub Security Lab,
pull_request_target/workflow_runguidance: https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/ - OpenSSF GitHub configuration best practices: https://best.openssf.org/SCM-BestPractices/github/
- Kriasoft React Starter Kit: https://github.com/kriasoft/react-starter-kit
- Cloudflare D1 local development: https://developers.cloudflare.com/d1/best-practices/local-development/
- Cloudflare Workers observability: https://developers.cloudflare.com/workers/observability/
- Cloudflare Workers logs: https://developers.cloudflare.com/workers/observability/logs/
- Cloudflare Workers traces: https://developers.cloudflare.com/workers/observability/traces/
- Cloudflare Workflows overview: https://developers.cloudflare.com/workflows/
Source Quality Notes
- Highest-confidence sources in this memo are GitHub Docs, GitHub Security Lab, OpenSSF, Cloudflare Docs, and the Kriasoft repository itself.
- HN/Lobsters did not surface a materially better competing pattern in this
pass; the most useful HN signal reinforced GitHub Security Lab's warning on
pull_request_target. - The recommendation to keep the repo public but move operational data private is a synthesis from official guidance plus this repo's current shape and risk surface.