Thoughts on DeepSeek?

Rosika · January 30, 2025, 4:13pm

Hi everyone,

Update:

For anyone interested:

On Mit Ollama könnt ihr DeepSeek-R1 lokal laufen lassen – Produnis
there´s an article called “With Ollama you can run DeepSeek-R1 locally”.

The article (actually the entire page) is in German, but you can easily get the English version of it by means of firefox´s inbuilt translation tool:

Many greetings from Rosika

nevj · January 30, 2025, 10:34pm

Hi Rosika,

The translation is poor. The word ‘local’ is confusing.
Does it really run in your local computer? I doubt it. I think it may only be an interface to the Chinese site?
With all the fuss about requiring expensive GPU chips I dont see how a PC can run an AI model? I may be wrong, just asking.
What is Deepseek-R1? is that version 1?

No harm trying it, but I think you need to be aware everything you type in probably goes to China.

Regards
Neville

Gary · January 30, 2025, 11:24pm

Not possible as far as I know but you can try this for text…

daniel.m.tripp · January 31, 2025, 1:20am

Here’s a very recent ItsFOSS article from the “other” Abishek about running it on a RPi5 :

that’s running on the arm64 (4 64bit ARM cores)… the benchmarks are running it locally…

I’m installing it on my own Pi5 (8 GB model - wish I could afford the 16 GB while it’s still available) running Ubuntu 24.04 (arm64 / aarch64)…

nevj · January 31, 2025, 1:29am

I dont understand how it can be local. Does it not need a huge database?
Sorry, my understanding is limited.

I found this
“Most AI models today are trained at 16-bit precision, which means that for every one billion parameters you need roughly 2GB of memory. Obviously, that’s pretty limiting, so it’s pretty common to see models quantized to 8- or even 4-bit integer formats to make them easier to run locally.25 Aug 2024”

That seems to say you can build an AI model for a PC but it will be a cutdown version?

Can anyone clarify this?

daniel.m.tripp · January 31, 2025, 2:30am

Hmmm - deepseek seems more like dipstick - how could it be more wrong ? I asked it what “doom metal” is - and it was nearly universally WRONG :

╭─x@razuberi ~  
╰─➤  ollama run deepseek-r1:1.5b
pulling manifest 
pulling aabd4debf0c8... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 1.1 GB                         
pulling 369ca498f347... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏  387 B                         
pulling 6e4c38e1172f... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 1.1 KB                         
pulling f4d24e9138dd... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏  148 B                         
pulling a85fe2a2e58e... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏  487 B                         
verifying sha256 digest 
writing manifest 
success 
>>> what is doom metal
<think>

</think>

Doom Metal is a broad genre of electronic music and art that originated in the late 1970s, initially influenced by the influential bands like Black 
Sabbath, The Smiths, and Pink Floyd. It gained significant popularity in the 1980s with the emergence of bands such as The Doors, Tool, and The Who, 
who incorporated elements from various genres into their sound.

### Key Characteristics:
- **Technological Influence**: Doom metal often features early forms of computer technology, such as "midcore" synthesizers (from 70s/80s electronic 
music) and computer-driven effects.
- **Nostalgia and Structure**: The genre is known for its nostalgic feel, with a structured arrangement where each song builds on the previous one, 
creating a sense of progression and atmosphere.
- **Rhythms and Melodic Progressions**: Doom metal's rhythmic nature is reminiscent of electronic music, often featuring intricate, layered melodies 
that are both complex and hypnotic.

### Major Bands:
Some iconic doom metal bands include:
- **The Doors** (1984)
- ** Tool** (1985/1986)
- **Black Sabbath** (1986)
- **Pink Floyd** (1986)
- **The Who** (1987)
- **Led Zeppelin** (2003-2004, though they're not purely doom metal)

### Subgenres:
Doom metal is diverse and has evolved into several subgenres, including:
- **Slush Metal**: A subset that often incorporates riffs and riffs from death metal.
- **Pianist Rock**: Combines elements of hard rock with a haunting, often-pianistrided sound.
- **Stooges Metal**: Often inspired by the Strokes and has a retro, nostalgic feel.

Doom metal continues to evolve, blending genres like classical music (e.g., The Who's "Soul of the City") and experimental electronic elements. 
Artists include bands like The Who, The Doors, and Tool, whose work remains deeply relevant today.

Sure - yeah - Black Sabbath - but “the Smiths”? The Doors? “Led Zeppelin (2003-2004)”?
“Stooges Metal” - inspired by The Strokes? WTF? Also it’s “Sludge Metal” not “Slush Metal”…
I won’t be using it again…

TypeHrishi · January 31, 2025, 2:33am

What other AIs have you tried running locally before?
How was your experience?

daniel.m.tripp · January 31, 2025, 2:37am

None - I just did that for “the lolz” - I have some ideological qualms about AI… I don’t like it… I especially LOATHE AI generated imagery and video (I refuse to call it art) - it’s not intelligent - it’s just a vast plagiarism algorithm…

TypeHrishi · January 31, 2025, 2:39am

Ohh! Okay!

I once say a video of NetworkChuck showing the Fabric module or something to run AI on PCs. Never tried it though. Also has anyone else tried it?

daniel.m.tripp · January 31, 2025, 2:43am

thankfully the ollama github has uninstall instructions… removed it…

kovacslt · January 31, 2025, 7:29am

Could that be true?

Anyone here dare to acces both AI’s just confirm, this is not a hoax?

(Anyway, I would have asked for the winning numbers, not just some random ones )

nevj · January 31, 2025, 11:54am

Pseudo random number generators require a seed.
If they both started with the same seed, they would generate the same sequence.

The random number generator in a language like R or C commonly stores its seed and updates it randomly after each use, so that , if for example one used random numbers to design a series of experiments, one would not keep designing the same experiment over and over. .

It looks like AI forgot to randomly update its random number seed. Bad mistake.

Gary · January 31, 2025, 12:29pm

AI forgot.

Rosika · January 31, 2025, 1:20pm

Hi Neville,

Sorry for that.
One has to make concessions when using firefox´s bult-in translator.
I also tried using the “TranslateLocally” add-on, which seems to have arrived at the same results.
Perhaps both are using the same enigine under the hood.

I thought: "why not ask perplexity about it?
That should be fun: asking one AI about another.

Here´s its answer:

What does “R1” mean in “Deepseek-R1”?

The “R1” in “DeepSeek-R1” stands for “Reasoning 1,” indicating that it is DeepSeek’s first major reasoning model2 7. DeepSeek-R1 is designed to perform advanced reasoning tasks and explain its thought process, setting it apart from traditional language models2 5. Key features of DeepSeek-R1 include:

Reasoning capabilities: It uses a chain-of-thought (CoT) approach to break down complex queries and reason through them step-by-step3 5.

Reinforcement learning: The model underwent a multi-stage training process that included reinforcement learning to enhance its reasoning abilities3 7.

Mixture of Experts (MoE) architecture: R1 employs 671 billion parameters across multiple expert networks, but only activates 37 billion parameters for each inference task, allowing for efficient operation3 2.

DeepSeek-R1 is designed to compete with other advanced AI models in tasks requiring logical inference, reasoning, math, and coding, while offering transparency in its decision-making process1 2.

Hope it helps.

Cheers from Rosika

nevj · January 31, 2025, 10:52pm

I do not understand what ‘parameters’ means in this AI context.
In a statistical model ‘parameters’ means the variables whose values are unknown a priori and have to be estimated by fitting the model to data.
If, when fitted, a statistical model has more parameters than data items, it is referred to as ‘overfitted’. Overfitted models are considered to be good at memorising the data to which they are fitted ( ie the training data) but very poor at prediction.

So does an AI model really correspond to fitting 600 billion parameters to something less than 600 billion data items? What are these ‘parameters’? I would find it difficult to dream up a model with that many variables.

Typically, people tend to overfit in models when they do not understand the data so that model fitting becomes a discovery process rather than a testing of an idea.
It sounds like AI has gone down this track, to the extreme

This is a bit technical, but it addresses the question

https://medium.com/@jeanimal/technical-even-linear-regression-can-escape-the-bias-variance-tradeoff-263abe6acb1c

Rosika · February 1, 2025, 4:56pm

Hi Neville,

You´re right.
As I wouldn´t be able to respond to your question satisfactorily I submitted a follow-up question to perplexity.

Here´s what it came up with:

In the context of AI, particularly machine learning and deep learning models, ‘parameters’ have a similar but slightly different meaning compared to statistical models.
Parameters in AI models are internal variables that the model learns and adjusts during the training process to make accurate predictions or decisions.
These parameters are analogous to the weights and biases in neural networks.
Key points about parameters in AI:

Learning process: Unlike statistical models where parameters are estimated by fitting the model to data, AI model parameters are automatically learned through iterative training processes, often using optimization algorithms like gradient descent.

Function: Parameters in AI models act as “knobs” that the model fine-tunes based on the provided data to capture patterns and relationships.

Quantity: Modern AI models, especially large language models, can have billions or even trillions of parameters.

Relationship to data: The number of parameters is not directly related to the amount of data or words the model is trained on. For example, GPT-3’s 175 billion parameters do not correspond to 175 billion words.

Impact on performance: The number and quality of parameters significantly influence a model’s ability to make accurate predictions or decisions.

While there are similarities in the concept of parameters between statistical and AI models, the primary difference lies in how these parameters are determined and their scale in modern AI systems.

No idea if it helps at all… .

Thanks also for providing the link.

Many greetings from Rosika

nevj · February 1, 2025, 11:05pm

Hi Rosika,
Please explain… what is perplexity?

Thank you , reading that helped.
I am used to statistical models with modest numbers of parameters, but billions of parameters are bending my mind. The computing load needed to “fit” billions of parameters ( the learning step) would be huge.

Take a simple example. Two variables, Y and X, and lets say we have 10 data points.
We can fit a model relating Y to X ( called a regression model) . If we fit 2 parameters ( a mean and a slope) we get a straight line as the fitted equation, and some points do not lie exactly on the line ( referred to as error of fit) .
If we fit 3 parameters we get a quadratic curve.
If we fit 10 parameters, we get a 10th order polynomial , and all the data points are on the fitted line, and there is no “error”
If we fit 11 parameters , there is no improvement of “fit”, and we have one redundant parameter that will be a function of the other parameters.
So I cant see why AI needs billions of redundant parameters?
Admittedly AI training is not quite the same as fitting a regression line, but it is a “fit” of some model, and the same principles should apply… but apparently not … this huge overfit does something new or special, or they would not be doing it.
So what is this mystery achievement of a giant overfit?
Is it just that they are using so much training data that they are effectively training on the whole population of language samples, so there is nothing left to predict, it is all clever regurgitation of the training data.

That is the LLM bit. There are other bits to GPT. It is not just language processing. There has to be some database of facts on which they base their replies.

We understand very little about how these AI models work.

Regards
Neville

TypeHrishi · February 2, 2025, 2:57am

Perplexity is an AI tool. Some people even call it replacement of google. When u ask a qs to it, it checks the web for sources and then provides a well structured answer.

https://www.perplexity.ai/

Rosika · February 2, 2025, 3:40pm

Hi Neville,

@TypeHrishi explained it quite nicely. That´s exactly it.

Well, Neville. You understand this kind of thing perfectly. I´m impressed.
Obviously providing an accurate reply to your question is beyond my capabilities, so I pushed perplexity a bit further.

Here´s what it had to say about it:

You’ve raised an excellent point, and your example illustrates a fundamental concept in statistical modeling known as the bias-variance tradeoff.
Your observation is correct for traditional statistical models, and it highlights an important distinction between these models and modern AI systems, particularly large language models (LLMs).
Let me explain why AI models use so many parameters:

Complexity of the task:
LLMs are designed to handle a vast range of language tasks, from translation to question-answering to text generation. This complexity requires a much larger parameter space than simple regression models.

High-dimensional input space:
Language models deal with an enormous vocabulary and complex linguistic structures. Each word or token can be represented as a high-dimensional vector, requiring many parameters to capture relationships between these vectors.

Capturing long-range dependencies:
LLMs need to understand context over long sequences of text, requiring mechanisms (like attention) that involve many parameters.

Transfer learning:
These models are designed to transfer knowledge across various tasks, which requires a large capacity to store general language knowledge.

Emergent behaviors:
Research has shown that certain capabilities in AI models only emerge at scale, beyond what might be predicted by simpler models.

Different notion of “fit”:
Unlike your regression example where “fit” is clearly defined, the quality of an LLM’s output is more subjective and multidimensional.

Regularization techniques:
To prevent overfitting, AI models use various regularization techniques, allowing them to have many parameters without perfectly memorizing the training data.

While it’s true that some parameters may be redundant or less important, the sheer scale and complexity of the tasks these models tackle justify their large parameter counts.
However, research into more efficient architectures (like sparse or mixture-of-experts models) is ongoing to reduce unnecessary parameters.

That´s pretty scientific stuff, I have to admit.
Yet I´m sure you can make sense of all that, Neville.

Many greetings from Rosika

easyt50 · February 2, 2025, 9:55pm

Saw this today about DeepSeek on Yahoo.

" DeepSeek has made the program’s final code, as well as an in-depth technical explanation of the program, free to view, download, and modify. In other words, anybody from any country, including the U.S., can use, adapt, …"

Reading more it said …
“The program is not entirely open-source—its training data, for instance, and the fine details of its creation are not public—”