DeepSeek: Inference-Time Scaling for Generalist Reward Modeling

156 points by tim_sw 2 days ago

Not jus being impressed that every paper coming out is SOTA, but also leads the way in being Open-Source in the pure definition of OSS, even with permissible licensing.

Let's not confuse the company with the country by over-fitting a narrative. Popular media is reenforcing hatred or anything that sponsors them, especially to weaker groups. Less repercussions and more clicks/money to be made I guess.

While Politicians may hate each other, Scientists love to work with other aspiring Scientists who have similar ambitions and the only competition is in achieving measurable success and the reward it means to the greater public.

Without any bias, but it's genuinely admirable when companies release their sources to enable faster scientific progress cycles. It's ironic that this company is dedicated to finance, yet shares their progress, while non-profits and companies dedicated purely to AI are locking all knowledge about their findings from access.

Are there other companies like DeepSeek that you know of that commonly release great papers? I am following Mistral already, but I'd love to enrich my sources of publications that I consume. Highly appreciated!

wood_spirit 2 days ago

When OpenAI surged ahead Meta ended up giving away its incredibly expensive to make llama model to reduce the OpenAI valuations.
Is DeepSeeks openness in part to reduce the big American tech companies?
- ALLTaken 2 days ago
  
  Correlation isn't causation, I hate to say this, but here's really applicable. Facebook aka Meta has always been very opensource. Let's not talk about the license though. :)
  Why do you imply malice in OSS companies? Or for profit companies opensourcing their models and sourcecode?
  - mwigdahl 2 days ago
    
    Personally I don't impute any malice whatsoever -- these are soulless corporate entities -- but a for-profit company with fiduciary duty to shareholders releasing expensive, in-house-developed intellectual property for free certainly deserves some scrutiny.
    I tend to believe this is a "commoditize your complement" strategy on Meta's part, myself. No idea what Deepseek's motivation is, but it wouldn't surprise me if it was a similar strategy.
    
    eidifikwn24 2 days ago
    
    In its ideal form, the sum of every participant commoditising their complements is how competition should benefit everyone — albeit at the expense of excess returns
    
    astrange a day ago
    
    Companies basically don't have fiduciary duties to shareholders. Also, Zuck has all the votes and can do whatever he wants.
    
    ALLTaken a day ago
    
    This I think is closer to the truth, there can be despite all fiducuiary duty an executive who just wants his way. I admire being bold. OSS is in my opinion a "Co-Operation request" and co-operation is in game theory a winning move.
  - throwaway314155 2 days ago
    
    Meta is decidedly not an "OSS company" no matter how much they put out.
    
    SXX 2 days ago
    
    In this case there are very few truly "OSS companies" except for Red Hat and few other Linux distribution maintainers. Even companies centered around open source like Gitlab are usually generate most of their revenue of proprietary products or use liceses like BSL.
    
    throwaway314155 2 days ago
    
    > In this case there are very few truly "OSS companies" except for Red Hat and few other Linux distribution maintainers.
    Okay then. Fine by me.
    > Gitlab
    Perfect example. They have OSS offerings. They are not an OSS _company_.
    This also serves to exclude the hundreds of VC-backed "totally open source 100% not going to enshittify this when our investors come asking for returns". Which, again, I'm fine with.
    The business model of the purist OSS company is not one that's been found to be terribly successful. Nevertheless, it _is_ one which has a sort of moral high ground at least. I would prefer to leave definitions as is so as to keep that distinction (of having the moral high ground) crystal clear.
    Does that make sense?
- phoronixrly 2 days ago
  
  If only totalitarian nation states used their subjects' money to undermine the dominance of US-based software vendors by releasing open-source alternatives created with slave labour... Oh wait, it can't work because software patents are here to the rescue again ... Wait, open source is communism? Always has been. /s
Febra33 2 days ago

> Let's not confuse the company with the country
What's wrong with China? They're wonderful in the OSS ecosystem.
- ALLTaken a day ago
  
  I didn't want to be politically correct, but also not insensitive. Many countries produce great things, but if we measure these countries rigerously, just a few stand out. Unfortunately from here on it get's messy, political, unsubstantiated or backed by data that is inherently biased due to selection criteria and weight.
  It's very difficult to be truly unbiased and neutral and it's not my goal, I just think it's a common thought, that needs to be challenged. To associate products/results of scientists, quants, engineers and companies they are employed with an entire Nation is inherently simplistic.
  In that case, why did the CIA/NSA develop TOR and made it OSS? If the governments in the UK/France/Turkey are so brutally against encryption, why does the USA release safe encryption products?
  If the world were absolute, we would absolutely be doomed and I hope to be part of a world, where freedom of thought, responsibility of each, constructive cooperation and a mesh of companies can work and produce value from and with each other permissionlessly. A world where Copyright/Patents are not needed anymore, because a stronger framework supports the individual contributor and also companies. Leftist, Right and Centrists views how an economy should look like are flawed, because they introduce idealogies to a mathematical non-linear partially closed but mostly open system.
  Every idealistic concept shouldn't be believed, but explored. To hate one system over another one is also flawed, because it doesn't produce data and forces hypothesis testing without consequentially following conclusions. Economy is too complex for a man to design. It shouldn't be put into a canvas of restricted operations, but circuits would need to be developed locally. If we empower small communities and allow changes to be made quicker with less bureaucracy, this seemingly grand introduction of chaos leads to emergence of a larger stability of the whole. We are soo far away from that man..
- echelon 2 days ago
  
  It varies on a company to company basis. BOOX, for instance, are notorious GPL violators.
  There's also significant alpha in releasing open weights models. You get to slow down the market leaders to make sure they don't have runaway success. It reduces moats, slows funding, creates a wealth of competition, reduces margin. It's a really smart move if you want to make sure there's a future where you can compete with Google, OpenAI, etc. There's even a chance it makes those companies bleed a little. The value chain moves to differently shaped companies (tools, infra) leaving space for consumer and product to not necessarily be won by the "labs" companies.
  - ALLTaken a day ago
    
    If you look at releasing "everything" from the perspective of a quant and purely so, then the objective to dominate a metric relevant to the quant is obviously the motive. But it's impossible to prove and a very strong assumption with little to no data. If DeepSeek's parent company traded on the data and release of DeepSeek with quant models that target affected firms with shorts before release, then that's a whole new level of WOW and honestly great funds do that. But this is a too big and bold of a move to underpin motive.
    But believing a man could achieve such a feat alone is inspiring to be frank.
refulgentis 2 days ago

I love open source and the general vibe of good vibes you're bringing, but...this isn't SOTA, or close, even on the papers own terms. (i.e. excluding models released the last 6 months, including their own, which is a strange, yet understandable, choice given the results they report)
Quickest way to show this:
- Table 2, top of page 7
- Gemma 2 27B, 0 interventions, has 94.1/56.6/60.2
- Gemma 2 27B, with all their interventions, has 86/64/69.
- Gemma 2 27B, with all their interventions, sampled 32 times, is at 90.4/67.2/70.3.
- Gemma 2 27B came out in...June 2024. :/
Quick heuristics employed here:
- What models did they compare against? (this isn't strictly an issue, the big screaming tell is "What models did they compare against compared to their last N papers?"
- How quickly does the paper have to move towards N samples, and how big does N get before they're happy enough to conclude? (32). How much does that improve performance on their chosen metric? (1.8%)
- ALLTaken 15 hours ago
  
  Good vibes, I mean yeah, we need more breakthroughs and AI isn't here to take our jobs if WE can own the AI too and not just a super-corperation.
  I think what we are all really excited about having finally AI at home and being unchained and freed from a central SaaS controlling all the AI is ever going to tell you.
  So, 6-7y ago google had these AI Chats internally and never intended to release it, a friendly googler told me.
  Then ChatGPT came along and locked you into their SaaS. That was fantastic in the beginning, but the more you used the AI, the more you felt helpless, swound by anyone who may have access to an AI at OpenAI that is unfiltered and uses the full power of the model. Then came the jailbreak and accounts being banned for using it.
  Then came the freedom by LLAMA and DeepSeek and waves of otheres. It rolled into your laptop real quick and this freedom is priceless! Something we should be really thankful for that it happened and support more OSS.
  Google and Facebook would never share their trove of data with us ever and very few people have enough storage and compute to even attempt to replicate them. But their Data Dominance doesn't protect them anymore. Once the models became intelligent enough to slurp up large chunks of the web, they became a better search, a better teacher and a better experience than sponsored ads, with ads with internal google/bing products listed up, then SEO websites and somewhere hidden what we really were looking for. Or often.. just being deleted for copyright and other reasons.

resters 2 days ago

DeepSeek R1 is by far the best at writing prose of any model, including Grok-3, GPT-4o, o1-pro, o3, claude, etc.

Paste in a snippet from a book and ask the model to continue the story in the style of the snippet. It's surprising how bad most of the models are.

Grok-3 comes in a close second, likely because it is actually DeepSeek R1 with a few mods behind the scenes.

vessenes 2 days ago

why do you think that grok 3 is deepseek, out of curiosity?
- azinman2 2 days ago
  
  Yes that’s a pretty giant accusation, especially given they’re buying boatloads of GPUs and have previous versions as well (it’s not like they’re starting with 3).
  - resters 2 days ago
    
    1) Grok-2 was akin to GPT-3.5
    2) Grok-3 comes out a month after DeepSeek R1 was open sourced. I think Grok-3 is DeepSeek R1 with some added params and about a month of training on the giant cluster, possibly a bit of in-house secret sauce added to the model or training methodology.
    What are the chances that XAI just happened to have a thinking model close to as good as revolutionary DeepSeek but happened to launch it 30 days later?
    It was both smart and pragmatic for XAI to simply use the best available open source stuff and layer their own stuff on top of it. Imagine they doubled the parameter count and trained it for 30 days, that would not even use half of the GPU power!
    
    vessenes 2 days ago
    
    > What are the chances that XAI just happened to have a thinking model close to as good as revolutionary DeepSeek but happened to launch it 30 days later?
    Extremely, extremely good. That was in fact the real point of the deepseek paper - it was extremely cheap to turn a frontier(ish?) model into a reasoning model. There is nothing suspicious about this timeline from an ML Ops point of view.
    In fact DeepSeek themselves in a sort of victory lap released six OTHER models from other providers finetuned with reasoning as part of the initial drop.
- resters 2 days ago
  
  I replied to the child of your comment
gmerc a day ago

If it was Elon is even more stupid than he lets on because
DS3: 5M training run Grok3: 400M training run
for 2% difference in the benchmarks.

mentalgear 2 days ago

Happy to see deekseek using the correct (and much more idiomatic) term "inference-time scaling", instead of the grotesque construction of "test-time compute" that openAI came up with.

ftbsqcfjm 2 days ago

Interesting work on open-ending language models to foster imagination and narrative generation. The idea of role-playing as different characters is novel. I wonder how well it would generalize to non-fantasy domains and if the lack of grounding could lead to hallucinations. Excited to see where this research goes!

NitpickLawyer 2 days ago

> The idea of role-playing as different characters is novel.
It is not. I remember Karpathy being really excited about the "1 million gpt personas" dataset and highlighted it as a way to avoid reward hacking in RLAIF. That was 3-6 months ago I believe.
Of course paper / code / weights beats idea, and it's exciting to see how far this can go.

bilsbie 2 days ago

Any idea why I lost interest in deep seek? I used it and grok3 a whole bunch when they first came out but now I’ve fallen back to Claude for everything.

manmal 2 days ago

For coding, I‘m finding Claude‘s responses most to the point and on-task. While many other models try to extrapolate or lecture or patronize. DeepSeek is pretty good though. Maybe it’s the high latency (probably due to prompt processing)?
- xrortrad 29 minutes ago
  
  I personally have no good reason why I don't always ask Claude and DeepSeek the same prompt.
  Thinking about it more, I think a big part of it is that correct answers are not a limiting factor for me it feels like. Claude is good enough and it is more what to do with all these correct answers is my problem. I am also naturally biased to a model if paying for it.
UltraSane 2 days ago

Claude is love. Claude is life.