FLUX.2: Frontier Visual Intelligence

294 points by meetpateltech 17 hours ago

vunderba 13 hours ago

Updating the GenAI comparison website is starting to feel a bit Sisyphean with all the new models coming out lately, but the results are in for the Flux 2 Pro Editing model!

https://genai-showdown.specr.net/image-editing

It scored slightly higher than BFL's Kontext model, coming in around the middle of the pack at 6 / 12 points.

I’ll also be introducing an additional numerical metric soon, so we can add more nuance to how we evaluate model quality as they continue to improve.

If you're solely interested in seeing how Flux 2 Pro stacks up against the Nano Banana Pro, and another Black Forest model (Kontext), see here:

https://genai-showdown.specr.net/image-editing?models=km,nbp...

Note: It should be called out that BFL seems to support a more formalized JSON structure for more granular edits so I'm wondering if accuracy would improve using it.

echelon 6 hours ago

How much energy does BFL have to keep playing this game against Google and ByteDance (SeeDream)?
If their new fancy model is only middle of the pack, and they're not as open source as the Chinese Qwen image models (or ByteDance / Alibaba / Lightricks video models), what's the point?
It's not just prompt adherence, the image quality of Flux models has been pretty bad. Plastic skin, inhumanely chiseled chins, that general faux "AI" aura.
Indeed, the Flux samples in your test suite that "pass" look God-awful. It might "pass" from a technical standpoint, but there's no way I'd choose Flux to solve my workflows. It looks bad.
(I wonder if they lack people on their data team with good aesthetic taste. It may be as simple as that.)
I think this company is struggling. They're pinned between Google and the Chinese. It's a tough, unenviable spot to be in.
I think a lot of the foundation model companies in media are having a really hard time: RunwayML, PikaLabs, LumaLabs. Some of them have pivoted hard away from solving media for everyone. I don't think they can beat the deep-pocketed hyperscalers or the Chinese ecosystem.
BFL just raised a massive round, so what do I know? I just can't help but feel that even though Runway raised similar money, they're struggling really hard now. And I would really not want to be fighting against Google who is already ahead in the game.
- vunderba 5 hours ago
  
  Sadly, I tend to agree. I'm rooting for BFL, but the results from this latest model (the Pro version, of all things) have just been a bit disappointing. Google’s release of NB Pro last week certainly didn’t help either, since it set the bar so incredibly high.
  Flux 2 Pro only scored a single point higher than the Kontext models they released over half a year ago.
  The text-to-image side was even more frustrating. It often felt like it was actively fighting me, as evidenced by the high number of re-rolls required before it passed some of the tests (Cubed⁵, for example).
- latentspacer 5 hours ago
  
  i may be wrong, but it doesn't seem like BFL is struggling to me. they were apparently founded in august 2024, and have already signed $100M+ revenue deals with customers like meta (https://www.bloomberg.com/news/articles/2025-09-09/meta-to-p...)
  in fact, it seems like BFL has benefited a lot by becoming the go-to alternative for big enterprise customers who don't want to be dependent on google
  - echelon 5 hours ago
    
    Wow, I didn't hear about this. That's impressive, and kudos to the team.
    That's why they raised the massive round, then.
    But this just leads to more questions - I have to wonder if and for how long this is just going to be to plug in a gap for Meta's own AI product offering. At some point they'll want to build their own in-house models or perhaps just acquire BFL. Zuckerberg would not be printing AI data centers if that wasn't the case.
    From a PLG standpoint, Flux isn't really what graphics designers are choosing for their work. The generations look worse than OpenAI's "piss filter". But aesthetics might not be the play the team is going after.
    Hopefully they don't just raise all of this dry powder energy and burn it trying to race Google. They should start listening to designers and get in their good graces if their intent is to build tools for art and graphics design work.
    A good press release would consist of lots of good looking images and a video of workflows that save artists time. This press release doesn't connect with graphics designers at all and it reads as if they aren't even the audience.
    If it's something else, more "enterprise", that BFL is after, then maybe I don't know the strategy or game plan.
    
    latentspacer 2 hours ago
    
    idk it seems pretty clear BFL’s target market is developers not graphic designers. and for developers at scale like Meta and Adobe, it’s pretty incredible a tiny startup like BFL has become the primary alternative to Google with 1/100th of the resources within 12 months of their founding, doing hundreds of millions of revenue
    the Chinese models are great, but no serious enterprise developer is going to bet their image workloads at scale in production on Chinese models if the market evolves anything like past developer infrastructure

tmikaeld an hour ago

Just an FYI, the open source version FLUX.2-DEV cannot be used commercially.

https://huggingface.co/black-forest-labs/FLUX.2-dev/blob/mai...

spyder 15 hours ago

Great, especially that they still have an open-weight variant of this new model too. But what happened to their work on their unreleased SOTA video model? did it stop being SOTA, others got ahead, and they folded the project, or what? YT video about it: https://youtu.be/svIHNnM1Pa0?t=208 They even removed the page of that: https://bfl.ai/up-next/

liuliu 14 hours ago

As a startup, they pivoted and focused on image models (they are model providers, and image models often have more use cases than video models, not to mention they continue to have bigger image dataset moat, not video).
- echelon 6 hours ago
  
  > bigger image dataset moat
  If they have so much data, then why do Flux model outputs look so God-awful bad?
  They have plastic skin, weird chins, and have that "AI" aura. Not the good AI aura, mind you. The cheap automated YouTube video kind that you immediately skip.
  Flux 2 seems to suffer from the exact same problems.
  Midjourney is ancient. Their CEO is off trying to build a 3D volume and dating companion or some nonsense and leaving the product without guidance and much change. It almost feels abandoned. But even so, Midjourney has 10,000x better aesthetics despite having terrible prompt adherence and control. Midjourney images are dripping with magazine spread or Pulitzer aesthetics. It's why Zuckerberg went to them to license their model instead of quasi "open source" BFL.
  Even SDXL looks better, and that's a literal dinosaur.
  Most of the amazing things you see on social media either come from Midjourney or SDXL. To this day.
  - SV_BubbleTime 4 hours ago
    
    >Even SDXL looks better, and that's a literal dinosaur.
    I’m not saying you are wrong in effect, but for reference just slightly over 2 years ago was SDZL released, and it took about a year to have great fine tunes.
andersa 13 hours ago

I heard a possibly unsubstantiated rumor that they had a major failed training run with the video model and canceled the project.
- latentspacer 2 hours ago
  
  lol, unless I’m wrong, that is not how model development works
  a ‘major training run’ only becomes major after you sample from it iteratively every few thousand steps, check its good, fix your pipeline, then continue
  almost by design, major training runs don’t fail
  if I had to guess, like most labs. they’ve probably had to reallocate more time and energy to their image models than expected since the AI image editing market has exploded in size this year, and will do video later
- qoez 13 hours ago
  
  Makes no sense since they should have checkpoints earlier in the run that they could restart from and they should have regular checks that keep track if a model has exploded etc.
  - embedding-shape 12 hours ago
    
    I didn't read "major failed training run" as in "the process crashed and we lost all data" but more like "After spending N weeks on training, we still didn't achieve our target(s)", which could be considered "failing" as well.
    
    echelon 6 hours ago
    
    They could have done what Lightricks did with LTX-1 - build almost embarrassingly small models in the open and iteratively improve from learning.
    LTX's first model felt two years behind SOTA when it launched, but they viewed it as a success and kept going.
    The investment initially is low and can scale with confidence.
    BFL goes radio silent and then drops stuff. Now they're dropping stuff that is clearly middle of the pack.
  - observationist 9 hours ago
    
    There's always a possibility that something implicit to the early model structure causes it to explode later, even if it's a well known, otherwise stable architecture, and you do everything right. A cosmic bit flip at the start of a training run can cascade into subtle instability and eventual total failure, and part of the hard decision making they have to do includes knowing when to start over.
    I'd take it with a grain of salt; these people are chainsaw jugglers and know what they're doing, so any sort of major hiccup was probably planned for. They'd have plan b and c, at a minimum, and be ready to switch - the work isn't deterministic, so you have to be ready for failures. (If you sense an imminent failure, don't grab the spinny part of the chainsaw, let it fall and move on.)
echelon 14 hours ago

Image models are more fundamentally important at this stage than video models.
Almost all of the control in image-to-video comes through an image. And image models still needs a lot of work and innovation.
On a real physical movie set, think about all of the work that goes into setting the stage. The set dec, the makeup, the lighting, the framing, the blocking. All the work before calling "action". That's what image models do and must do in the starting frame.
We can get way more influence out of manipulating images than video. There are lots of great video models and it's highly competitive. We still have so much need on the image side.
When you do image-to-video, yes you control evolution over time. But the direction is actually lower in terms of degrees of freedom. You expect your actors or explosions to do certain reasonable things. But those 1024x1024xRGB pixels (or higher) have way more degrees of freedom.
Image models have more control surface area. You exercise control over more parameters. In video, staying on rails or certain evolutionary paths is fine. Mistakes can not just be okay, they can be welcome.
It also makes sense that most of the work and iteration goes into generating images. It's a faster workflow with more immediate feedback and productivity. Video is expensive and takes much longer. Images are where the designer or director can influence more of the outcomes with rapidity.
Image models still need way more stylistic control, pose control (not just ControlNets for limbs, but facial expressions, eyebrows, hair - everything), sets, props, consistent characters and locations and outfits. Text layout, fonts, kerning, logos, design elements, ...
We still don't have models that look as good as Midjourney. Midjourney is 100x more beautiful than anything else - it's like a magazine photoshoot or dreamy Instagram feed. But it has the most lackluster and awful control of any model. It's a 2021-era model with 2030-level aesthetics. You can't place anything where you want it, you can't reuse elements, you can't have consistent sets... But it looks amazing. Flux looks like plastic, Imagen looks cartoony, and OpenAI GPT Image looks sepia and stuck in the 90's. These models need to compete on aesthetics and control and reproducibility.
That's a lot of work. Video is a distraction from this work.
- cubefox 13 hours ago
  
  Hot take: text-to-image models should be biased toward photorealism. This is because if I type in "a cat playing piano", I want to see something that looks like a 100% real cat playing a 100% real piano. Because, unless specified otherwise, a "cat" is trivially something that looks like an actual cat. And a real cat looks photorealistic. Not like a painting, or cartoon, or 3D render, or some fake almost-realistic-but-cleary-wrong "AI style".
  - 85392_school 13 hours ago
    
    FYI: photorealism is art that imitates photos, and I see the term misused a lot both in comments and prompts (where you'll actually get subideal results if you say "photorealism" instead of describing the camera that "shot" it!)
    
    cubefox 11 hours ago
    
    I meant it here in the sense of "as indistinguishable from a photo as the model can make it".
    
    echelon 6 hours ago
    
    "style" is apt for many reasons.
    I've heard chairs of animation departments say they feel like this puts film departments under them as a subset rather than the other way around. It's a funny twist of fate, given that the tables turned on them ages ago.
    Photorealistic models are just learning the rules of camera optics and physics. In other "styles", the models learn how to draw Pixar shaded volumes, thick lines, or whatever rules and patterns and aesthetics you teach.
    Different styles can reinforce one another across stylistic boundaries and mixed data sets can make the generalization better (at the cost of excelling in one domain).
    "Real life", it seems, might just be a filter amongst many equally valid interpretations.
  - minimaxir 13 hours ago
    
    As Midjourney has demonstrated, the median user of AI image generation wants those aesthetic dreamy images.
    
    cubefox 11 hours ago
    
    I think it's more likely this is just a niche that Midjourney has occupied.
    
    loudmax 11 hours ago
    
    If Midjourney is a niche, then what is the broader market for AI image generation?
    Porn, obviously, though if you look at what's popular on civitai.com, a lot of it isn't photo-realistic. That might change as photo-realistic models are fully out of the uncanny valley.
    Presumably personalized advertising, but this isn't something we've seen much of yet. Maybe this is about to explode into the mainstream.
    Perhaps stock-photo type images for generic non-personalized advertising? This seems like a market with a lot of reach, but not much depth.
    There might be demand for photos of family vacations that didn't actually happen, or removing erstwhile in-laws from family photos after a divorce. That all seems a bit creepy.
    I could see some useful applications in education, like "Draw a picture to help me understand the role of RNA." But those don't need to be photo-realistic.
    I'm sure people will come up with more and better uses for AI-generated images, but it's not obvious to me there will be more demand for images that are photo-realistic, rather than images that look like illustrations.
    
    echelon 10 hours ago
    
    > If Midjourney is a niche, then what is the broader market for AI image generation?
    Midjourney is one aesthetically pleasing data point in a wide spectrum of possibilities and market solutions.
    Creator economy is huge and is outgrowing Hollywood and the Music Industry combined.
    There's all sorts of use cases in marketing, corporate, internal comms.
    There are weird new markets. A lot of people simply subscribe to Midjourney for "art therapy" (a legit term) and use it as a social media replacement.
    The giants are testing whether an infinite scroll of 100% AI content can beat human social media. Jury's out, but it might start to chip away at Instagram and TikTok.
    Corporate wants certain things. Disney wants to fine tune. They're hiring companies like MoonValley to deliver tailored solutions.
    Adobe is building tools for agencies and designers. They are only starting to deliver competent models (see their conference videos), and they're going about this a very different way.
    ChatGPT gets the social trend. Ghibli. Sora memes.
    > Porn, obviously, though if you look at what's popular on civitai.com, a lot of it isn't photo-realistic.
    Civitai is circling the drain. Even before the unethical and religious Visa blacklisting, the company was unable to steer itself to a Series A. Stable Diffusion and local models are still way too hard for 99.99% of people and will never see the same growth as a Midjourney or OpenAI that have zero sharp edges and that anyone in the world can use. I'm fairly certain an "OnlyFans but AI" will arise and make billions of dollars. But it has to be so easy a tucker who doesn't learn to code can use it from their 11 year old Toshiba.
    > Presumably personalized advertising, but this isn't something we've seen much of yet.
    Carvana pioneered this almost five years ago. I'll try to find the link. This isn't going to really take off though. It's creepy and people hate ads. Carvana's use case was clever and endearing though.
    
    echelon 6 hours ago
    
    Carvana AI ads:
    https://www.businesswire.com/news/home/20230509005451/en/Car...
    
    cubefox 10 hours ago
    
    Well, as I said, if I type "cat", the most reasonable interpretation of that text string is a perfectly realistic cat.
    If I want an "illustration" I can type in "illustration of a cat". Though of course that's still quite unspecific. There are countless possible unrealistic styles for pictures (e.g. line art, manga, oil painting, vector art etc), and the reasonable thing is that the users should specify which of these countless unrealistic styles they want, if they want one. If I just type in "cat" and the model gives me, say, a water color picture of a cat, it is highly improbable that this style happens to be actually what I wanted.
    
    observationist 8 hours ago
    
    If I want a badly drawn, salad fingers inspired scrawl of a mangy cat, it should be possible. If I want a crisp, xkcd depiction of a cat, it should capture the vibe, which might be different from a stick fighters depiction of a cat, or "what would it look like if George Washington, using microsoft paint for the first time, right after stepping out of the time machine, tried to draw a cat"
    I think we'll probably need a few more hardware generations before it becomes feasible to use chatgpt 5 level models with integrated image generation. The underlying language model and its capabilities, the RL regime, and compute haven't caught up to the chat models yet, although nano-banana is certainly doing something right.
    
    wiredpancake 9 hours ago
    
    [dead]

minimaxir 13 hours ago

I just finished my Flux 2 testing (focusing on the Pro variant here: https://replicate.com/black-forest-labs/flux-2-pro). Overall, it's a tough sell to use Flux 2 over Nano Banana for the same use cases, but even if Nano Banana didn't exist it's only an iterative improvement over Flux 1.1 Pro.

Some notes:

- Running my nuanced Nano Banana prompts though Flux 2, Flux 2 definitely has better prompt adherence than Flux 1.1, but in all cases the image quality was worse/more obviously AI generated.

- The prompting guide for Flux 2 (https://docs.bfl.ai/guides/prompting_guide_flux2) encourages JSON prompting by default, which is new for an image generation model that has the text encoder to support it. It also encourages hex color prompting, which I've verified works.

- Prompt upsampling is an option, but it's one that's pushed in the documentation (https://github.com/black-forest-labs/flux2/blob/main/docs/fl...). This does allow the model to deductively reason, e.g. if asked to generate an image of a Fibonacci implementation in Python it will fail hilariously if prompt sampling is disabled, but get somewhere if it's enabled: https://x.com/minimaxir/status/1993361220595044793

- The Flux 2 API will flag anything tangently related to IP as sensentive even at its lowest sensitivity level, which is different from Flux 1.1 API. If you enable prompt upsampling, it won't get flagged, but the results are...unexpected. https://x.com/minimaxir/status/1993365968605864010

- Costwise and generation-speed-wise, Flux 2 Pro is on par with Nano Banana, and adding an image as an input pushes the cost of Flux 2 Pro higher than Nano Banana. The cost discrepancy increases if you try to utilize the advertised multi-image reference feature.

- Testing Flux 1.1 vs. Flux 2 generations does not result in objective winners, particularly around more abstract generations.

loudmax 11 hours ago

The fact that you have the possibility of running Flux locally might be enough of an argument to sway the balance for some cases. For example, if you've already set up a workflow and Google jacks up the price, or changes the API, you have no choice but to go along. If BFL does the same, you at least have the option of running locally.
- minimaxir 10 hours ago
  
  Those cases imply commercial workflows that are prohibited with the open-weights model without purchasing a license.
  I am curious to see how the Apache 2.0 distilled variant performs but it's still unlikely that the economics will favor it unless you have a specific niche use case: the engineering effort needed to scale up image inference for these large models isn't zero cost.
- BoorishBears 6 hours ago
  
  Their testing was for the Pro model, which you cannot host locally, and is already not price competitive with Google's offering for the capabilities.
- echelon 6 hours ago
  
  You can run Alibaba's Qwen(Edit) locally too, and the company isn't as weird with its license, weights, or training set.
  I personally prefer Qwen's performance here. I'm waiting to see other folks' takes.
  The Qwen folks are also a lot more transparent, spend time community building, and iterate on releases much more rapidly. In the open rather than behind closed doors.
  I don't like how secretive BFL is.
vunderba 13 hours ago

I've re-run my benchmark with the Flux 2 Pro model and found that in some cases the higher resolution models (I believe Flux 2 Pro handles 4k) can actually backfire on some of the tests because it'll introduce the equivalent of an almost ESRGAN style upscale which may add in unwanted additional details. (See the Constanza test in particular).
https://genai-showdown.specr.net/image-editing
- minimaxir 13 hours ago
  
  That Constanza test result is baffling.
  - vunderba 11 hours ago
    
    Agreed - I was quite surprised. Even though its a bog-standard 1024x1024 image, the somewhat low quality nature of a TV still provides for an interesting challenge. All the BFL models (Kontext Max and Flux 2 Pro) seemed to struggle hard with it.
babaganoosh89 11 hours ago

Flux 2 Dev is not IP censored
- minimaxir 11 hours ago
  
  Do you have generations contradicting that? The HF repo for the open-weights Flux 2 Dev says that IP filters are in place (and imply it's a violation of the license to do as such)
  EDIT: Seeing a few generations on /r/StableDiffusion generating IP from the open weights model.

jakozaur 14 hours ago

FLUX.1 Pro Kontext was one of the best artistic model, still great at instruction following comparing to MidJourney V7.

See my third comparison in Nano Banana blog post: https://quesma.com/blog/nano-banana-pro-intelligence-with-to...

542458 16 hours ago

> Run FLUX.2 [dev] on GeForce RTX GPUs for local experimentation with an optimized fp8 reference implementation of FLUX.2 [dev], created in collaboration with NVIDIA and ComfyUI.

Glad to see that they're sticking with open weights.

That said, Flux 1.x was 12B params, right? So this is about 3x as large plus a 24B text encoder (unless I'm misunderstanding), so it might be a significant challenge for local use. I'll be looking forward to the distill version.

minimaxir 15 hours ago

Looking at the file sizes on the open weights version (https://huggingface.co/black-forest-labs/FLUX.2-dev/tree/mai...), the 24B text encoder is 48GB, the generation model itself is 64GB, which roughly tracks with it being the 32B parameters mentioned.
Downloading over 100GB of model weights is a tough sell for the local-only hobbyists.
- zamadatix 14 hours ago
  
  100 GB is less than a game download, it's actually running it that's a tough sell. That said, the linked blog post seems to say the optimized model is both smaller and greatly improved the streaming approach from system RAM, so maybe it is actually reasonably usable on a single 4090/5090 type setup (I'm not at home to test).
- BadBadJellyBean 15 hours ago
  
  Never mind the download size. Who has the VRAM to run it?
  - pixelpoet 14 hours ago
    
    I do, 2x Strix Halo machines ready to go.
- _ache_ 15 hours ago
  
  Even a 5090 can handle that. You have to use multiple GPUs.
  So the only option will be [klein] on a single GPU... maybe? Since we don't have much information.
  - Sharlin 13 hours ago
    
    As far as I know, no open-weights image gen tech supports multi-GPU workflows except in the trivial sense that you can generate two images in parallel. The model either fits into the VRAM of a single card or it doesn’t. A 5ish-bit quantization of a 32Gw model would be usable by owners of 24GB cards, and very likely someone will create one.
- crest 11 hours ago
  
  The download is a trivial onetime cost and so is storing it on a direct attached NVMe SSD. The expensive part is getting a GPU with 64GB of memory.

xnx 16 hours ago

Good to see there's some competition to Nano Banana Pro. Other players are important for keeping the price of the leaders in check.

mlnj 14 hours ago

Also happy to see European players doing it.

minimaxir 16 hours ago

Text encoder is Mistral-Small-3.2-24B-Instruct-2506 (which is multimodal) as opposed to the weird choice to use CLIP and T5 in the original FLUX, so that's a good start albeit kinda big for a model intended to be open weight. BFL likely should have held off the release until their Apache 2.0 distilled model was released in order to better differentiate from Nano Banana/Nano Banana Pro.

The pricing structure on the Pro variant is...weird:

> Input: We charge $0.015 for each megapixel on the input (i.e. reference images for editing)

> Output: The first megapixel is charged $0.03 and then each subsequent MP will be charged $0.015

woadwarrior01 15 hours ago

> BFL likely should have held off the release until their Apache 2.0 distilled model was released in order to better differentiate from Nano Banana/Nano Banana Pro.
Qwen-Image-Edit-2511 is going to be released next week. And it will be Apache 2.0 licensed. I suspect that was one of the factors in the decision to release FLUX.2 this week.
- minimaxir 15 hours ago
  
  Fair point.
kouteiheika 15 hours ago

> as opposed to the weird choice to use CLIP and T5 in the original FLUX
Yeah, CLIP here was essentially useless. You can even completely zero the weights through which the CLIP input is ingested by the model and it barely changes anything.
beernet 16 hours ago

Nice catch. Looks like engineers tried to take care of the GTM part as well and (surprise!) messed it up. In any case, the biggest loser here is Europe once again.
throwaway314155 16 hours ago

> as opposed to the weird choice to use CLIP and T5 in the original FLUX
This method was used in tons of image generation models. Not saying it's superior or even a good idea, but it definitely wasn't "weird".

Yokohiii 15 hours ago

18gb 4 bit quant via diffusers. "low vram setup" :)

visioninmyblood 14 hours ago

The model looks good for an open source model. I want to see how these models are trained. may be they have a base model from academic datasets and quickly fine-tune with models like nano banana pro or something? That could be the game for such models. But great to see an open source model competing with the big players.

anjneymidha 14 hours ago

they released a research post on how the new model's VAE was trained here: https://bfl.ai/research/representation-comparison
- E-Reverance 14 hours ago
  
  Surprised there wasn't any mention of Equilibrium Matching [1] in the future work section
  [1] https://raywang4.github.io/equilibrium_matching/
- visioninmyblood 14 hours ago
  
  great this is more on the techincal details. it is great but would be great to see the data. I know they will not expose such information but would be great to have a visibility onto the datasets and how the data was sourced.

AmazingTurtle 15 hours ago

I ran "family guy themed cyberpunk 2077 ingame screenshot, peter griffin as main character, third person view, view of character from the back" on both nano banana pro and bfl flux 2 pro. The results were staggering. The google model aligned better with the cyberpunk ingame scene, flux was too "realistic"

Yokohiii 11 hours ago

i think they focus their dataset on photography. flux 1 dev one was never really great at artistic style, mostly locking you into a somewhat generic style. my little flux 2 pro testing does seem to verify that. but with lora ecosystem and enough time to fiddle flux 1 dev is probably still the best if you want creative stylistic results.

notrealyme123 15 hours ago

> The FLUX.2 - VAE is available on HF under an Apache 2.0 license.

anyone found this? To me the link doesn't lead to the model

minimaxir 14 hours ago

There is no repo for the VAE on Hugging Face yet which implies it's not up yet: https://huggingface.co/black-forest-labs/models?sort=created
- abidlinker 14 hours ago
  
  It's here: https://huggingface.co/black-forest-labs/FLUX.2-dev/tree/mai...
  - minimaxir 14 hours ago
    
    That's a subfolder of the non-Apache 2.0 repo so it can't be used as if it was, for now.

cedar-v an hour ago

比较细分

geooff_ 15 hours ago

Their published benchmarks leave a lot to be desired. I would be interested in seeing their multi-image performance vs. Nano Banana. I just finished up benchmarking Image Editing models and while Nano Banana is the clear winner for one-shot editing its not great at few-shot.

minimaxir 15 hours ago

The issue with testing multi-image with Flux is that it's expensive due to its pricing scheme ($0.015 per input image for Flux 2 Pro, $0.06 per input image for Flux 2 Flex: https://bfl.ai/pricing?category=flux.2) while the cost of adding additional images is neligible in Nano Banana ($0.000387 per image).
In the case of Flux 2 Pro, adding just one image increases the total cost to be greater than a Nano Banana generation.

bossyTeacher 12 hours ago

Genuine question, does anyone use any of these text to image models regularly for non trivial tasks? I am curious to know how they get used. It literally seems like there is a new model reaching the top 3 every week

ThrowawayTestr 9 hours ago

I use them to generate very niche porn

DeathArrow 14 hours ago

We probably won't be able to run it on regular PCs, even with a 5090. So I am curious how good the results will be using a quntized version.

shikon7 an hour ago

You can run it with a 5090 and the standard ComfyUI template, it just offloads some parts to RAM. Image generation takes about a minute for sizes like 1024x1024.

echelon 15 hours ago

> Launch Partners

Wow, the Krea relationship soured? These are both a16z companies and they've worked on private model development before. Krea.1 was supposed to be something to compete with Midjourney aesthetics and get away from the plastic-y Flux models with artificial skin tones, weird chins, etc.

This list of partners includes all of Krea's competitors: HiggsField (current aggregator leader), Freepik, "Open"Art, ElevenLabs (which now has an aggregator product), Leonardo.ai, Lightricks, etc. but Krea is absent. Really strange omission.

I wonder what happened.

dvrp 9 hours ago

They messed up. We (Krea) were also surprised.
They put our logo after we pointed it out.
Nice eye!

DeathArrow 15 hours ago

If this is still a diffusion model, I wonder how well does it compare with NanoBanana.

liuliu 9 hours ago

There is no reason to believe Gemini Image is not diffusion model. In fact, generated result suggests it at least have VAE and very likely is a diffusion model variant. (Most likely a transfusion model).

eric-p7 16 hours ago

Yes yes very impressive.

But can it still turn my screen orange?

Shelby-Thomas 15 hours ago

[dead]

beernet 16 hours ago

Oh, looks like someone had to release something very quickly after Google came for their lunch. Their little 15 mins is over already for BFL as it seems.

whywhywhywhy 16 hours ago

comparing a closed image model to an open one is like comparing a compiled closed source app to raw source code.
it's pointless to compare in pure output when one is set in stone and the other can be built upon.
- beernet 15 hours ago
  
  Did you guys even check the licence? Not sure what is "open source" about that. Open weights at the very best, yet highly restrictive
  - gunalx 10 hours ago
    
    Yep, definetly this, They should have creds for open weigths, and bein transparent of it not being open source though. Pepole should stop being this confused when the messaging is pretty clear.
timmmmmmay 16 hours ago

yeah except I can download this and run it on my computer, whereas Nano Banana is a service that Google will suddenly discontinue the instant they get bored with it