Qwen Max new AI model

Hi all, :wave:

seems it keeps getting more diffucult for me to keep track of the series of recently emerging AI models. :wink:

I just stumbled across this article:

https://www.makeuseof.com/alibaba-launched-qwen-ai-chatbot-better-than-deepseek/

Although being a Chinese model (again) the author states that Qwen Max is better than DeepSeek.

Some key points:

  • free to use once you’ve created an account via email, Google, or GitHub
  • Qwen’s AI models are also open-source
  • you can find them on GitHub or HuggingFace
  • Qwen2.5-Max is a 72-billion parameter Mixture-of-Experts (MoE) model, supports 29 languages, and is trained on over 20 trillion tokens
  • you can combine two models together

Qwen’s website is the only place to officially access the AI model, which means running it in a highly isolated environment (firejail --private [...]) would be the best option, I think.

More details can be found in the article.

Many greetings from Rosika :slightly_smiling_face:

3 Likes

It is time we understood this.
What exactly is an AI model?
What does it model?.. my understanding of ‘model’ is a mathematical calculation that emulates some real world phenomenon. What is an AI model emulating?
Why are some models different from others?
How do they differ?.. does the maths differ or are they pointed at different phenomena?

Can anyone shed light on this or point to some useful links?

4 Likes

@nevj :

Hi Neville, :wave:

sorry for the belated reply. :neutral_face:

Here´s my attempt to come up with something useful.
I submitted your questions to peplexity in order to get some quick answers.
I was thinking: Why not asking one AI about something fundamental to AI´s in general? :blush:

O.K., here´s some informatioan:

An AI model is a program that analyzes datasets to find patterns and make predictions, essentially emulating aspects of human intelligence and decision-making47. In the context of artificial intelligence, these models are designed to replicate cognitive functions such as learning, problem-solving, and pattern recognition.

AI models differ in several key ways:

  1. Architecture:
    Models can have different underlying mathematical structures. For example, some use deep neural networks (DNNs) that mimic the human brain’s interconnected neurons, while others may use decision trees or regression algorithms4.

  2. Purpose:
    Different models are optimized for specific tasks. In the Qwen family:

    Qwen-VL focuses on vision-language tasks

    Qwen-Audio specializes in audio processing

    Qwen-Coder is tailored for programming assistance

    Qwen-Math targets mathematical problem-solving

  3. Scale:
    Models can vary in size, measured by parameters. For instance, Mistral Large 2 has 123 billion parameters, while others may have more or fewer.

  4. Training Data:
    Models are trained on different datasets, which influences their capabilities and biases. Qwen models, for example, are pre-trained on extensive multilingual and multimodal datasets3.

  5. Training Techniques:
    Some models undergo additional training like Supervised Fine-Tuning (SFT) or Reinforcement Learning from Human Feedback (RLHF) to enhance specific abilities.

  6. Specialization:
    While some models are general-purpose, others are highly specialized. For example, QwQ-32B-Preview focuses on advancing AI reasoning capabilities, while QvQ-72B-Preview emphasizes visual reasoning.

The mathematical foundations may be similar across models, but they differ in how these foundations are applied, scaled, and optimized for specific phenomena or tasks. For instance, Qwen2.5-Max explores large-scale Mixture of Experts (MoE) architecture, which allows for more efficient processing of vast amounts of data.

In essence, AI models are emulating various aspects of human cognition, from language understanding to visual processing to logical reasoning. The differences between models reflect the diverse and complex nature of intelligence itself, with each model designed to capture and replicate specific facets of cognitive ability.

As for links, here´s what perplexity used for creating its answers:

Would this be of any help at all?

Many greetings from Rosika :slightly_smiling_face:

P.S.:

Here are some follow-up question perplexity itself suggested:

  • How do AI models learn from experiences

  • What are the main differences between machine learning and deep learning models

  • How do reinforcement learning models differ from other AI models

  • What are some real-world applications of AI models like Qwen2.5-Plus

  • How do AI models handle multilingual tasks

So we could dig deeper…

2 Likes

Hi Rosika,
That is a really helpful outline… thank you.
It is modelling the way our brain works
It is a data based model… what you train it with determines its capabilities.

I once worked on a simple version of such a thing… trying to train a program to class wool from images of wool staples. I used a statistical technique called discriminant functions.
It would have been interesting to try these modern AI modelling techniques on the wool classing problem.

I would like to try and build a simplified AI model that we could all take part in training and using. it would have to be small… ie able to run on a PC. That rules out language models. Any ideas?

Regards
Neville

2 Likes

Hi Neville, :wave:

You´re very welcome. I´m glad you found the findings helpful.

Well, you know much more about things like that than I do.
You can make sense of it all. That´s to be admired. :+1:

Yes, I see what you mean. Those AI models should have been around much earlier. You would have benefitted a lot by them.

That´s interesting. I´ll look around a bit. But it´s a rather demanding topic. :thinking:

Cheers from Rosika :slightly_smiling_face:

2 Likes