Qwen Max new AI model

Rosika · February 26, 2025, 4:47pm

Hi all,

seems it keeps getting more diffucult for me to keep track of the series of recently emerging AI models.

I just stumbled across this article:

https://www.makeuseof.com/alibaba-launched-qwen-ai-chatbot-better-than-deepseek/

Although being a Chinese model (again) the author states that Qwen Max is better than DeepSeek.

Some key points:

free to use once you’ve created an account via email, Google, or GitHub
Qwen’s AI models are also open-source
you can find them on GitHub or HuggingFace
Qwen2.5-Max is a 72-billion parameter Mixture-of-Experts (MoE) model, supports 29 languages, and is trained on over 20 trillion tokens
you can combine two models together

Qwen’s website is the only place to officially access the AI model, which means running it in a highly isolated environment (firejail --private [...]) would be the best option, I think.

More details can be found in the article.

Many greetings from Rosika

nevj · February 26, 2025, 10:52pm

It is time we understood this.
What exactly is an AI model?
What does it model?.. my understanding of ‘model’ is a mathematical calculation that emulates some real world phenomenon. What is an AI model emulating?
Why are some models different from others?
How do they differ?.. does the maths differ or are they pointed at different phenomena?

Can anyone shed light on this or point to some useful links?

Rosika · February 28, 2025, 2:05pm

@nevj :

Hi Neville,

sorry for the belated reply.

Here´s my attempt to come up with something useful.
I submitted your questions to peplexity in order to get some quick answers.
I was thinking: Why not asking one AI about something fundamental to AI´s in general?

O.K., here´s some informatioan:

An AI model is a program that analyzes datasets to find patterns and make predictions, essentially emulating aspects of human intelligence and decision-making4 7. In the context of artificial intelligence, these models are designed to replicate cognitive functions such as learning, problem-solving, and pattern recognition.

AI models differ in several key ways:

Architecture:
Models can have different underlying mathematical structures. For example, some use deep neural networks (DNNs) that mimic the human brain’s interconnected neurons, while others may use decision trees or regression algorithms4.

Purpose:
Different models are optimized for specific tasks. In the Qwen family:

Qwen-VL focuses on vision-language tasks

Qwen-Audio specializes in audio processing

Qwen-Coder is tailored for programming assistance

Qwen-Math targets mathematical problem-solving

Scale:
Models can vary in size, measured by parameters. For instance, Mistral Large 2 has 123 billion parameters, while others may have more or fewer.

Training Data:
Models are trained on different datasets, which influences their capabilities and biases. Qwen models, for example, are pre-trained on extensive multilingual and multimodal datasets3.

Training Techniques:
Some models undergo additional training like Supervised Fine-Tuning (SFT) or Reinforcement Learning from Human Feedback (RLHF) to enhance specific abilities.

Specialization:
While some models are general-purpose, others are highly specialized. For example, QwQ-32B-Preview focuses on advancing AI reasoning capabilities, while QvQ-72B-Preview emphasizes visual reasoning.

The mathematical foundations may be similar across models, but they differ in how these foundations are applied, scaled, and optimized for specific phenomena or tasks. For instance, Qwen2.5-Max explores large-scale Mixture of Experts (MoE) architecture, which allows for more efficient processing of vast amounts of data.

In essence, AI models are emulating various aspects of human cognition, from language understanding to visual processing to logical reasoning. The differences between models reflect the diverse and complex nature of intelligence itself, with each model designed to capture and replicate specific facets of cognitive ability.

As for links, here´s what perplexity used for creating its answers:

Would this be of any help at all?

Many greetings from Rosika

P.S.:

Here are some follow-up question perplexity itself suggested:

How do AI models learn from experiences

What are the main differences between machine learning and deep learning models

How do reinforcement learning models differ from other AI models

What are some real-world applications of AI models like Qwen2.5-Plus

How do AI models handle multilingual tasks

So we could dig deeper…

nevj · February 28, 2025, 10:26pm

Hi Rosika,
That is a really helpful outline… thank you.
It is modelling the way our brain works
It is a data based model… what you train it with determines its capabilities.

I once worked on a simple version of such a thing… trying to train a program to class wool from images of wool staples. I used a statistical technique called discriminant functions.
It would have been interesting to try these modern AI modelling techniques on the wool classing problem.

I would like to try and build a simplified AI model that we could all take part in training and using. it would have to be small… ie able to run on a PC. That rules out language models. Any ideas?

Regards
Neville

Rosika · March 1, 2025, 1:53pm

Hi Neville,

You´re very welcome. I´m glad you found the findings helpful.

Well, you know much more about things like that than I do.
You can make sense of it all. That´s to be admired.

Yes, I see what you mean. Those AI models should have been around much earlier. You would have benefitted a lot by them.

That´s interesting. I´ll look around a bit. But it´s a rather demanding topic.

Cheers from Rosika