"Open-weight" (or "open-source") just means the company gives away the actual AI for free, so anyone can download it & run it on their own computers. The opposite is a "closed" model, which you can only rent through an app or website; you never get the thing itself; you can just use it on their website. Most of the launches below are the free, downloadable kind.
(1) DiffusionGemma: Google bets text shouldn't be typed left to right
Most AI writes like you texting, one word, then the next, then the next, forever. Google's DiffusionGemma (released June 10) drafts a whole chunk of text at once, starting with a blurry mess and sharpening it into clear sentences, the way an old Polaroid photo slowly develops into a clear picture.
Why? Because it is up to 4x faster. To make that concrete: it pumps out over 1,000 words' worth of text per second on a high-end data-center graphics card, NVIDIA H100, and 700+ per second on a top gaming card (an RTX 5090, the kind serious gamers own). And it's small enough to fit on that gaming card. The model is built so only a fraction of its "brain" switches on for any given task, which keeps it light. DiffusionGemma is free to download right now on Hugging Face
The catch nobody's putting on the marketing slide
The trade-off: DiffusionGemma's writing quality is actually lower than Google's normal model. This isn't a "best model ever"; it is a fast tool for specific jobs such as editing code on the fly, rapid drafts, that sort of thing. Someone even tuned it to solve Sudoku puzzles, which normal AI is bad at (because each square depends on squares it hasn't filled in yet, and this model considers the whole grid at once). Those are the real use cases; it is not going to replace any chatbots
(2) Kimi K2.7-Code: a huge coding model that overthinks less
A Chinese company called Moonshot AI dropped Kimi K2.7-Code on June 12, and the size is intimidating: 1 trillion "parameters." A trillion is enormous. But what's clever is: it only switches on about 32 billion of those knobs at a time (think of a massive company where only the relevant specialists clock in for each task), so it's far cheaper to run than its giant size suggests. It can also read images and video, and it can hold roughly a couple of long novels' worth of text in its working memory at once.
K2.7-Code does its job using about 30% less "thinking" than the previous version, which means it reaches good answers with less rambling, which saves you money. On a standard coding test it scores 62 out of 100, versus around 67–69 for the top closed models. Still behind the leaders, but closing in, and it costs roughly 5x less to use. (For the technically curious: that's about $0.95 to feed it a million words of input and $4.00 for a million words of output.)
The "you can't really run it at home" reality check
It is "free to download," but it doesn't mean "free to run." Even a shrunk-down version of this model is a 340GB file and needs a small server's worth of memory, and computer memory got pricey in 2026. So for nearly everyone, the real win is renting it cheaply through the $19/month for the basic plan, not hosting it in your closet. What you are really buying is a cheaper option and freedom to switch, not a server in your garage.
Still, if you want to download Kimi K2.7 model then here is the huggingface url
(3) MiniMax M3: Million Tokens, without a giant bill
MiniMax M3 (June 1) is another big free model, and its party trick is memory. It can take in roughly a million "tokens" at once that is about 700,000+ words, or several full-length books, in a single go. Normally, holding that much text would cost a fortune in computing power, because the AI re-reads everything against everything else.
MiniMax's fix (they call it "sparse attention") is basically letting the AI use a table of contents instead of re-reading the whole library every time it answers. The result: it handles that million-word memory at roughly one-twentieth the computing cost of its previous version, and runs about 10–15x faster on long documents. It can also see images and video and even operate a computer. The company itself did a coding test, and it scored 59%, beating a couple of big closed names on that particular test.
And like other Chinese-hosted models, anything you send through its app is subject to Chinese law. If you handle sensitive data, think about it before giving it to the chat.
If you want to use MiniMax then her is the hugging face url for you, and if you want to see the GitHub of MiniMax, then here is the link to their official github of MiniMax M3
(4) Nex-N2: the AI that knows when to keep the mouth shut
Nex-N2 comes in two sizes (a bigger "Pro" and a smaller "Mini"); the main thing the model does is know when to think hard. Lots of AIs "think out loud" on every single question, which is slow and expensive. Nex-N2 decides for itself when a question actually needs deep thought and when a quick answer will do, & that saves about 20% on running costs without making it dumber. Handy, and it's free to use right now on a few platforms.
(5) GLM-5.2: the launch that landed one day after a ban
A Chinese company called Z.ai announced GLM-5.2 on June 13, just one day after the US government blocked global access to one of Anthropic's most powerful Claude models. Z.ai's founder called the block "deeply regrettable," then silently pointed everyone toward his free alternative.
GLM-5.2 can also hold about a million words in memory, & it is built for coding. It went live instantly for paying subscribers, with the free download promised a week later. The awkward part for buyers: the company published zero test scores at launch. None. So for now, treat it as "available and interesting," not "proven"; independent results are still pending.
Building with these right now? Avidclan Technologies can help you a lot; we have proven AI experts who have done many projects and can also help you with your AI projects
Video, motion, and 3D got genuinely wild
This is where June got fun. A pile of research projects; mostly from universities, a few from big companies, made several video models which can do several good things such as: take a flat photo or an everyday video, and get back something you can move through, relight, or animate.
3D Worlds from a single photo
(1) World Tracing: it imagines what the camera couldn't see
World Tracing, from a startup called World Labs and the University of Illinois (with one of the inventors of a famous 3D-imaging technique on the team), turns a single photo into a 3D scene; here is the cool part: it guesses the parts you can't see. The back of a chair. The wall behind a lamp. Stuff the camera never captured. It works on single objects, full rooms, and short video clips. There is a free demo online you can poke at.
arxiv: World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible
(2) MoVerse: a walkable world running on a gaming PC
MoVerse takes one photo and builds a 3D space you can actually roam around in, smoothly, in real time, running on a single high-end gaming graphics card, not a data center. How do they do this? By building the slow part (building the world) once, then making the fun part (walking around) cheap and instant.
(3) AnchorWorld: puts you inside the scene, first-person
AnchorWorld flips the camera around to you. It creates a first-person world (think as GoPro view) that responds to how a real person moves, and you can even change the scene by typing instructions..
(4) Surflo: turns your messy snapshots into one clean 3D model
Surflo solves the annoying part of 3D scanning: you took a bunch of photos from random angles, no special setup, and you want one clean 3D model. Surflo stitches those random shots into a single coherent object, and the more photos you add, the more the gaps fill in neatly instead of clashing.
Motion and characters you can actually control
(1) StreamForce - a "physics joystick" for video
StreamForce lets you steer a video as it is being made; it can give an object a push, add a gust of wind, instead of planning every frame in advance. And it figured out real-world physics on its own: a glass of milk slides more slowly than an empty glass under the same push, things bounce believably, and friction matters. Nobody programmed those rules in. It just learned them. It runs smoothly enough for real-time tinkering.
(2) SCAIL-2 - animates characters without the usual rigging
Most tools that make a character copy someone's movement rely on a stick-figure skeleton as a middle step, which gets glitchy in busy scenes. SCAIL-2 skips that middleman and copies motion directly from one video onto a character, handling crowds, character swaps, and body-shape changes more cleanly. Fewer weird, twisted-limb mistakes.
(3) Flex4DHuman and VideoMDM - cheaper ways to capture human motion
Two projects cut the cost of capturing how people move. Flex4DHuman rebuilds a moving person as a full 3D-plus-time model from ordinary video - no expensive motion-capture suits or depth sensors. And VideoMDM learns realistic 3D human motion purely by watching normal 2D videos, then lets you generate new movement from a simple text description. The pricey mocap studio just got a little less essential.
3D objects, digital twins, and longer videos
(1) MeshFlow - usable 3D models in about a second
MeshFlow, from Meta and a Hong Kong university, builds clean, artist-quality 3D objects (the kind games and animators actually use) in about one second, roughly 18x faster than older methods. It builds the whole object at once instead of piece by piece. Feed it a text prompt, a photo, or a rough point-cloud scan, and out comes proper geometry. There is a free demo.
(2) WorldString - digital twins that actually bend and move
WorldString, from a group of top universities and a chip maker, makes "digital twins" of real objects - not frozen scans, but copies that move realistically. It captures how a robot hand's joints work, how a body kicks a ball, how a squishy earphone deforms. A controllable virtual copy, not a statue.
(3) MilliVid: long AI videos that stop falling apart
MilliVid, from MIT and Toyota's research arm, fixes a real headache: AI videos tend to drift into nonsense after a few seconds, faces change, scenes morph. MilliVid keeps things consistent for much longer by sketching the big picture first (where everything is) and then adding the fine detail (textures, surfaces). The unglamorous fix the whole field needed.
Voice and speech translation
(1) Gemini 3.5 Live Translate: translation without the awkward pauses
Google's Gemini 3.5 Live Translate (June 9) does almost instant speech translation inside Google Translate and Google Meet. It works continuously as you talk, so conversations don't stall into that stilted walkie-talkie back-and-forth, handles 70+ languages, and even keeps your own voice and tone so the translation still sounds like you, not a robot reading subtitles.
(2) dots.tts: a small voice AI that can whisper and stutter on cue
Dots.tts is from a Chinese social-media company is lab, is a small, free text-to-speech model. It turns written words into spoken audio. It can copy a new voice from just a short sample, speak 24 languages, and add genuinely human touches - whispering, stuttering, emotion. It scores at or near the top on standard voice-quality tests, and there is a free demo.
Image generation: Princeton is a "show your work" recipe
(1) i1: not the prettiest, but the most honest
i1 is from Princeton; it is an AI that makes images from text prompts. It is competitive with other free image models, but that is not why it matters. Princeton released the full training data, the code, and the entire recipe - which is rare, because most "free" image AIs hide exactly how they were built. If you actually want to learn how a modern image generator is made from the ground up, this is the open textbook.
Quick reference: every tool and where to find it
| Tool | What it does | Made by | Official link |
| DiffusionGemma | Fast text writing | link | |
| Kimi K2.7-Code | Coding AI | Moonshot AI | link |
| MiniMax M3 | Big-memory AI | MiniMax | link |
| Nex-N2 | Efficient reasoning AI | Nex AGI | link |
| GLM-5.2 | Coding AI | Z.ai | link |
| Gemini 3.5 Live Translate | Live speech translation | link | |
| dots.tts | Text-to-speech | RedNote lab | link |
| SCAIL-2 | Character animation | Tsinghua / Z.ai | link |
| WorldString | Moving digital twins | Tsinghua / NVIDIA | link |
| StreamForce | Steerable video | Northeastern et al. | link |
| World Tracing | 3D from one photo | World Labs / UIUC | link |
| Flex4DHuman | 3D person from video | - | link |
| VideoMDM | 3D motion from 2D video | Technion / NVIDIA | link |
| Surflo | 3D from random photos | Polytechnique / Berkeley | link |
| MoVerse | Walkable world from a photo | Orange Team / Youku | link |
| AnchorWorld | First-person worlds | Kuaishou / Tsinghua | link |
| MeshFlow | Fast 3D objects | Meta / HKUST | link |
| MilliVid | Longer consistent video | MIT / Toyota | link |
| OSCAR | Robot practice simulator | Peking U / NVIDIA | link |
| Agents' Last Exam | AI job-readiness test | Berkeley | link |
| Arbor | Automated researcher | Renmin University | link |
| Luma Agents | All-in-one creative assistant | Luma AI | link |
| i1 | Image generator (fully open) | Princeton | link |
So that sums up the month. Free, downloadable AIs most of them from China are shipping faster than anyone can test them; "turn one photo into a 3D world" went from sci-fi to crowded almost overnight, & the price gap with the big-name closed models keeps shrinking even when the quality gap does not.
FREQUENTLY ASKED QUESTIONS (FAQs)
