# Gemini 2.0 Flash
Google's latest (end 2024) AI model, "for the agentic era".
What's interesting about Gemini 2.0 is that it's a multi-modal [[Large Language Models (LLMs)]]. it can handle a full range of multi-modal inputs: text, documents, images, video, and audio. In addition, it also supports streaming, which enables it to receive, analyze, understand, analyze, and react to various kinds of inputs (including live video!).
Gemini 2.0 is also able to return bounding boxes for objects within an image, which can enable various scenarios.
It can also read, write, and execute code.
## References
- Announcement: https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/
- Realtime streaming: https://aistudio.google.com/live
- Simon Willison's first insights: https://simonwillison.net/2024/Dec/11/gemini-2/
- Introductions videos
- https://www.youtube.com/watch?v=7RqFLp0TqV0
- https://www.youtube.com/watch?v=qE673AY-WEI
- Multi-modal Live API demos
- https://www.youtube.com/watch?v=J_q7JY1XxFE
- https://www.youtube.com/watch?v=J62TUCRapR8
- https://www.youtube.com/watch?v=n8Dz2GA2hDc
- https://www.youtube.com/watch?v=9hE5-98ZeCg
- Native tool use: https://www.youtube.com/watch?v=EVzeutiojWs
- Native audio output: https://www.youtube.com/watch?v=qE673AY-WEI
- Spatial understanding: https://www.youtube.com/watch?v=-XmoDzDMqj4
- Behind the scenes: https://www.youtube.com/watch?v=L7dw799vu5o