Package Registry Security

# Package Registry Security Package registries (npm, PyPI, crates.io, Maven Central, RubyGems, NuGet) are the distribution layer for software dependencies. They are critical infrastructure for the software supply chain and a primary target for attackers because a single malicious package can propagate to millions of downstream projects. ## Trust model Registries operate on an open-publish model: anyone can create an account and upload packages with any unclaimed name. This is great for the open-source ecosystem's velocity but creates fundamental security challenges: - **No identity verification** for publishers (though npm now requires 2FA for popular packages) - **Names are first-come-first-served**, enabling [[Namesquatting]] - **No link between package name and source code**: a package named `react-utils` has no verified connection to React - **Metadata is self-declared**: repository links, descriptions, and keywords are unverified, enabling [[Starjacking]] ## Registry-level defenses ### Name policies - **Similarity checks**: npm blocks new names too close to existing popular packages (hamming distance, edit distance) - **Name normalization**: PyPI normalizes hyphens, underscores, and case (`my-package` = `my_package` = `My_Package`) - **Scoped namespaces**: npm's `@scope/package` prevents collisions; PyPI lacks true namespaces - **Limitation**: these defenses don't catch [[Slopsquatting]] because hallucinated names are often dissimilar to any existing package ### Provenance and signing - **npm provenance attestations**: link a package version to a specific CI build and source commit (Sigstore-based) - **PyPI Trusted Publishers**: packages can only be published from verified CI/CD workflows - **Sigstore**: keyless signing infrastructure used by npm, PyPI, and others - **Limitation**: adoption is still low; most packages have no provenance attestation ### Malware scanning - npm, PyPI, and others run automated malware scans on uploads - Socket.dev analyzes package behavior (network calls, filesystem access, eval usage) rather than just matching CVEs - Scans catch known patterns but miss novel techniques ## Consumer-side defenses 1. **Private registry proxies** (Artifactory, Verdaccio, Nexus): cache approved packages, block direct access to public registries 2. **Lockfiles**: pin exact versions and integrity hashes; review diffs 3. **[[Software Composition Analysis (SCA)]]**: scan dependencies continuously 4. **Allowlists**: restrict which packages and publishers your CI/CD can install 5. **Namespace reservation**: publish placeholder packages for internal names on public registries (prevents [[Dependency Confusion]]) ## Open problems - No universal namespace ownership (anyone can publish `company-auth` on npm) - Cross-registry confusion: a PyPI package and an npm package can share a name but be completely different things - AI agents installing packages autonomously bypass all human review gates - No standardized way to revoke or deprecate malicious packages across registries ## References - ## Related - [[Software Supply Chain Security]] - [[Namesquatting]] - [[Typosquatting]] - [[Slopsquatting]] - [[Dependency Confusion]] - [[Starjacking]] - [[Software Composition Analysis (SCA)]] - [[Least Privilege Principle]] - [[Zero Trust Security]] - [[AI Skill Supply Chain Security]] - [[Attack surface]]