# DuckDB
DuckDB is an in-process analytical (OLAP) database; the analytical counterpart to [[SQLite]]. Created at CWI Amsterdam by Mark Raasveldt and Hannes Mühleisen and first released in 2019, it embeds directly inside the host application (Python, R, Node.js, Java, Rust, C++, the CLI), runs columnar queries on local files (CSV, Parquet, JSON, Arrow), and stores its own data in a single columnar file. No server, no daemon, no configuration.
## Positioning
- **Like [[SQLite]]**: zero-configuration, single-file, embedded, MIT-licensed, public-API stable
- **Unlike [[SQLite]]**: column-oriented and vectorized, designed for analytical (read-heavy, aggregation-heavy) workloads instead of transactional (point-lookup, single-row update) workloads
- **Like Pandas**: lives in the same process as your analysis code; no network round-trips
- **Unlike Pandas**: backed by a real query optimizer, parallel execution, and out-of-core processing for datasets larger than RAM
The shorthand: SQLite is for OLTP, DuckDB is for OLAP. They are siblings, not competitors.
## Why It Took Off
Analysts and data scientists were caught between two worlds: lightweight tools (Pandas, R data frames) that hit a wall above ~10GB, and heavyweight warehouses (Snowflake, BigQuery, Redshift) that required infrastructure, latency, and cost they didn't want for exploratory work. DuckDB filled the gap. Run SQL over a 50GB Parquet file on a laptop in seconds, with the same query you'd run in production.
## Core Capabilities
- **Vectorized columnar execution** with parallel multi-core query plans
- **Native readers** for Parquet, CSV, JSON, Arrow, Iceberg, Delta Lake, and remote HTTP/S3 URLs
- **Direct query** of in-memory Pandas/Polars/Arrow data frames, zero-copy where possible
- **Streaming and out-of-core**: works on datasets larger than RAM via spilling to disk
- **Extensions**: full-text search, spatial, HTTP, JSON, vector similarity, and many community extensions
- **Stable storage format** with backwards compatibility (since v1.0 in 2024)
## Common Use Cases
- Local data exploration without spinning up a warehouse
- ETL and data transformation pipelines (often as a Pandas/Spark replacement for medium data)
- Embedded analytics inside applications (web apps, notebooks, CLIs)
- Querying remote Parquet/Iceberg lakes directly without a separate query engine
- The "small data" backend for dashboards and BI prototypes
## Trade-offs
- Single-writer, like SQLite; not designed for high-concurrency write workloads
- No built-in network protocol; embedded-only by design (the project explicitly does not want to become a server)
- Younger than SQLite; ecosystem is growing but smaller than the SQL warehouse incumbents
## References
- Official site: https://duckdb.org/
- Documentation: https://duckdb.org/docs/
- GitHub: https://github.com/duckdb/duckdb
## Related
- [[SQLite]]
- [[Database]]
- [[Database Management Systems (DBMS)]]
- [[Relational Databases (RDBMS)]]
- [[SQL]]
- [[PostgreSQL]]