Gemini - DeveloPassion

# Gemini Google's main AI model series, "for the agentic era". Gemini 2+ are multi-modal [[Large Language Models (LLMs)]]. They can handle a full range of multi-modal inputs: text, documents, images, video, and audio. In addition, they also supports streaming, which enables it to receive, analyze, understand, analyze, and react to various kinds of inputs (including live video!). They're also able to return bounding boxes for objects within an image, which can enable various scenarios. They can also read, write, and execute code. Gemini 2.5 Pro is especially good at it. Importantly, Gemini 2.5 Pro also has a HUGE context window (1M tokens!), which makes it quite unique. There are different variants of Gemini available. Gemini is the LLM behind [[NotebookLM]]. ## References - Model variants: https://ai.google.dev/gemini-api/docs/models - AI Studio: https://aistudio.google.com - Realtime streaming: https://aistudio.google.com/live - Simon Willison's first insights: https://simonwillison.net/2024/Dec/11/gemini-2/ - Introductions videos - https://www.youtube.com/watch?v=7RqFLp0TqV0 - https://www.youtube.com/watch?v=qE673AY-WEI - Multi-modal Live API demos - https://www.youtube.com/watch?v=J_q7JY1XxFE - https://www.youtube.com/watch?v=J62TUCRapR8 - https://www.youtube.com/watch?v=n8Dz2GA2hDc - https://www.youtube.com/watch?v=9hE5-98ZeCg - Native tool use: https://www.youtube.com/watch?v=EVzeutiojWs - Native audio output: https://www.youtube.com/watch?v=qE673AY-WEI - Spatial understanding: https://www.youtube.com/watch?v=-XmoDzDMqj4 - Behind the scenes: https://www.youtube.com/watch?v=L7dw799vu5o ## Related - [[Google AI Studio]] - [[NotebookLM]]