从中国走向全球:DeepSeek潜入寻常百姓家 AI人人可亲
https://www.zaobao.com.sg/lifestyle/feature/story20250221-5898165
(Translated from Chinese by Cici AI app)
February 21, 2025
黄少伟
联合早报
副刊高级记者
Huang Shaowei
Senior Reporter, Lianhe Zaobao Supplement
The emergence of DeepSeek, a Chinese artificial intelligence company, has sent shockwaves across the globe. Tech experts are dissecting the reasons for DeepSeek's success, applauding its efforts in lowering the barriers to entry for high-tech solutions, making AI readily available for small and medium-sized enterprises, and even individuals. This, they believe, can help countries like Singapore achieve scientific and economic progress with fewer resources.
On January 20, 2025, a previously unknown Chinese AI startup, DeepSeek, chose to launch its open-source reasoning model R1 on the same day US President Trump was inaugurated. The performance of R1 rivals that of the o1 model developed by global AI giant OpenAI. DeepSeek's sudden arrival has shaken up the global AI race, prompting countries to reassess China's rising influence and potential in artificial intelligence.
Reasoning models, as the name suggests, are large language models capable of reasoning. When faced with complex tasks, they can generate answers through multi-step reasoning and can enhance model performance by increasing resource allocation in post-training or online reasoning phases. Reasoning models are therefore seen as a new direction for the development of large language models.
Vladislav Tushkanov, team manager at Kaspersky’s AI Technology Research Center, says, "Reasoning models actually originated with the o1 model released by OpenAI last December. However, the o1 model is closed-source, and only paying users have access to it. DeepSeek R1, on the other hand, is free for users and even allows them to see its reasoning process. This has garnered considerable attention."
What are the benefits of an open-source reasoning model? Tushkanov responds: "You can examine the reasoning process, helping you better correct problems. If the model provides an incorrect answer, you can identify where the error occurred. Also, if the model's reasoning performs well, you can transfer the knowledge to smaller models, a process we call distillation, making deployment more convenient."
Li Boyang, Associate Professor at the School of Computer Science and Engineering at Nanyang Technological University, says, "Reasoning with large language models is a difficult technical problem. Not only did this unknown startup, DeepSeek, successfully implement it, but its reasoning accuracy is also comparable to that of OpenAI, the world's leading AI company. Secondly, DeepSeek claims to have completed model training with just $6 million and 2,000 NVIDIA H800 GPUs, showcasing its model efficiency. In contrast, OpenAI's GPT-4 is estimated to have cost between $80 million and $100 million to train."
DeepSeek has been forced to come up with innovative engineering solutions to significantly reduce the cost of model reasoning and training due to US restrictions on AI chips. One of the main innovations is to bypass CUDA (NVIDIA GPU's general-purpose parallel computing interface used to handle complex AI computations) and use another programming language, allowing DeepSeek engineers to better control GPU instruction execution and improve GPU utilization.
Li Boyang uses an air conditioner as an analogy: "Everyone uses a remote control to adjust the air conditioner. Pressing a button on the remote control can adjust the temperature by one degree, but it does not provide the ability to adjust it by half a degree. To achieve precise adjustments in half-degree increments, you need to directly control the internal components of the air conditioner. DeepSeek has bypassed the 'remote control' and directly connected to the air conditioner's internal system, using a lower-level programming language to send instructions to the GPU, leading to higher efficiency. This method is technically challenging."
DeepSeek has also adopted a Mixture of Experts (MoE) model. Multiple "experts" (smaller models) are combined, with each expert responsible for handling different types of data or tasks. The advantage of MoE is that it allows each expert to focus on their area of expertise, thus improving overall efficiency.
Anthony K.H. Tung (邓锦浩), Professor at the Department of Computer Science, School of Computing, National University of Singapore, explains: "There's a Chinese proverb: Three cobblers are smarter than one Zhuge Liang. DeepSeek has many cobblers, 256 to be exact. When answering questions, it doesn't have all 256 experts work together. Instead, the question is passed to eight experts, and they jointly give an answer. It's a divide-and-conquer model, and it doesn't need a very fast graphics card for training."
Anthony K.H. Tung (邓锦浩) is also the Head of Urban Sustainable Development AI at the National University of Singapore's AI Institute. He believes that DeepSeek's emergence has driven AI democratization: "I've always been worried that small and medium-sized enterprises don't have the resources to use AI. Previously, large language models required expensive equipment and talent. DeepSeek is open-source, free for anyone to use, and users with some technical knowledge can customize it. The distilled model can be used on everyday devices like phones or computers."
He also believes that DeepSeek can promote local scientific development: "We don't have as many resources as American tech companies. DeepSeek allows us to pursue scientific and economic progress with fewer resources."
Anthony K.H. Tung (邓锦浩) laughs and says: "You don't need a butcher knife to kill a chicken. ChatGPT is like using a butcher knife to kill a chicken, requiring a large processor for everything. DeepSeek makes your device smaller, consumes less electricity, and is more portable. Previously, we had no choice, but now we do."
Daniel Kahneman, a renowned American psychologist, categorizes human thinking patterns into System 1 and System 2. System 1 is an intuitive, unconscious thinking system; System 2 is a controlled, conscious thinking system.
Li Boyang says: "So far, the AI technologies we've built, like large language models, are very similar to System 1. However, logical reasoning and mathematical abilities require System 2. While DeepSeek surpasses previous systems in this area, it's far from perfect. For example, when multiplication involves too many digits, exceeding two two-digit numbers, DeepSeek gives incorrect answers. This simple mathematics, easily done by humans, cannot be executed correctly by DeepSeek. The lack of System 2 capabilities is a common problem for large language models, not just for DeepSeek but also for OpenAI's so-called reasoning models."
Many people are concerned about the safety of AI technology, especially in terms of personal data protection and privacy. DeepSeek was recently forced to be removed from South Korea due to privacy issues.
Tushkanov says, "People should distinguish between the DeepSeek model and the DeepSeek chatbot service. The cool thing about the DeepSeek model is that it's open-source. Basically, anyone can download it to their computer and run it entirely locally. By running it only on your own hardware, you can avoid the leakage of personal data and privacy."
On the other hand, DeepSeek also offers a chatbot service. This cloud service, similar to ChatGPT, Google Gemini, etc., has the same advantages and risks. Tushkanov says, "Data leakage is possible. For example, researchers found a security vulnerability in a database used by DeepSeek, which was quickly patched by DeepSeek."
Many users have attempted to ask DeepSeek about sensitive political issues, such as the "June Fourth" incident, Taiwan's sovereignty, Tibet, and Xinjiang. DeepSeek either refuses to answer or provides answers consistent with the Chinese government's stance, sparking widespread discussion.
Regarding this, Tushkanov says: "Every company must comply with the laws of its country. Different AI service companies have different legal limitations. This is not a technical issue or related to safety."
DeepSeek's emergence has had a significant impact on OpenAI. Perhaps due to pressure, OpenAI quickly launched its o3-mini reasoning model on January 23rd, its first time making a reasoning model available to free users.
Sam Altman, CEO of OpenAI, followed up quickly, announcing on February 13th that OpenAI would be releasing the GPT-5 model in the coming months and making ChatGPT available for free and unlimited use to its free users. GPT-5 will integrate the o1 and o3 reasoning models with the GPT series models, creating a new system that "can automatically choose thinking and non-thinking functions, suitable for various tasks."
Several US technology companies have also begun using DeepSeek models. Microsoft announced it will deploy DeepSeek-R1 on its Azure cloud service. Additionally, a simplified version of DeepSeek-R1 has been incorporated into the model directory of Microsoft's Azure AI Foundry and GitHub, allowing developers to run it on their personal computers.
NVIDIA's developer website has also included DeepSeek-R1 in its "Most Popular Models" category and is available for use on NVIDIA NIM microservices. The developer website calls DeepSeek-R1 "a state-of-the-art and efficient large language model" that excels in reasoning, mathematics, and coding.
In addition, Amazon Web Services (AWS) has allowed users to deploy the "powerful and cost-effective" DeepSeek-R1 model on its two AI service platforms.
Across the Atlantic, DeepSeek's ecosystem in China is expanding rapidly. Tencent, a Chinese tech giant, has certified a gray-scale test of DeepSeek integration with its communication application WeChat on February 16th. A gray-scale test involves releasing a product or application to a specific group of users before its official launch, gradually expanding the user base to identify and rectify any issues.
Reportedly, WeChat users can access the "AI Search" feature in the top search bar of their chat window and use the DeepSeek-R1 model for free. The AI search function not only integrates information sources within the Tencent ecosystem, such as WeChat Public Accounts and Video Accounts, but also supports web searches, providing users with more comprehensive answers.
Following WeChat, Baidu Search announced on the same day that it would fully integrate DeepSeek and its own Wenxin large language model deep search function. Subsequently, the Wenxin Intelligent Entity Platform announced it would also fully integrate DeepSeek. This platform is designed for developers to create various AI products.
Currently, over 200 Chinese companies have announced their integration with DeepSeek, including Huawei, Alibaba, JD.com, and more, covering industries such as telecommunications, cloud computing, chips, finance, automobiles, and mobile phones.
======.
The end of the article
No comments:
Post a Comment