to whom They rule out Apple in the AI raceeye. The company may have arrived late and it certainly may have little to show today, but its evolution over the last three years reveals three interesting things. The first, that Apple does have its own AI models. The second, that they are very far in performance from the best of OpenAI and Claude. Third, that may not matter at all.
Three years of evolution. The trajectory of the technical documents shared by Apple in recent years reveals a series of more than relevant changes. In 2024 its initial proposal was limited to small models of about 3,000 million parameters (3B) specialized in solving basic tasks like generating Genmojis or text summaries. In 2025 the company launched its MLX framework to the developer community to facilitate the integration and use of local models. Now, in 2026, They propose a hybrid infrastructure based on a basic principle:
- Simple requests: they run in small local models on the device, you don’t even need an internet connection
- Complex requests: the system delegates the task to be processed in the cloud privately through Private Cloud Compute
A (maybe) great idea: NAND can help. The most relevant milestone of Apple’s new approach lies in the design of its AFM 3 Core Advanced model. In today’s mobile phones we have a big bottleneck with the execution of capable (large) AI models because these devices have a very limited amount of memory (12 GB on some iPhones). To be able to fit a model with 20,000 million parameters (20B), Apple has decided to store that model in the internal SSD unit, not in memory.

In the AFM 3 Core Advanced model the “experts” are in the mobile’s SSD. They are preselected and loaded into RAM to be used dynamically, optimizing model execution.
Experts by prompt, not by token. It then activates a series of pruning techniques (Instruction-Following Pruning, or IFP) to activate only between 1,000 and 4,000 million parameters in a sparse manner (sparse), somewhat similar to what is done in models with Mixture-of-Experts architecture. But Apple selects these experts at the beginning of each prompt, not token by token, which allows it to avoid the slow bandwidth of the mobile’s NAND storage compared to its RAM memory.
Privacy by flag. If for something Apple’s approach stood out from the beginning It was for his privacy.which is implicit when using local models. But if the request is complex, the system redirects it to the AI models in Apple’s cloud, the Private Cloud Compute (PCC). Unlike other platforms and infrastructures such as those of OpenAI or Anthropic, conversations with Apple’s AI are encrypted and are totally private according to the company: this data is not shared with third parties (because not even Apple can see it) and it is not used to train its models.
Five models with the help of Gemini. Although Apple is obsessed with total control of its products, this time had to give in and ally with Google so that their Gemini models could “show” Apple the way. The result is a third generation of models that are developed in collaboration with the Mountain View firm. We have five models in total:
- AFM 3 Core: 3B parameter dense model
- AFM 3 Core Advanced: sparse model of 20B parameters with activation of 1B to 4B parameters depending on the task
- AFM 3 Cloud: a powerful but also efficient and fast model that runs on the Apple cloud.
- ADM 3 Cloud (Image): for generating and editing images, the heart of both these options and the new Image Playground
- AFM 3 Cloud Pro– Apple’s most powerful cloud model is for autonomous agents. It has been trained with Google TPUs and runs on Nvidia GPUs within Google Cloud infrastructure
Performance, an unknown. Unlike what other companies usually do when they present their models, Apple has not published metrics on known benchmarks. Instead, it shows “human preference” metrics in which it compares user satisfaction when using its models versus competing models. The comparisons are also with previous versions of these models, which does not clarify much what can be expected from them.
But they are not in the race for the best model. In 2025 yes there was comparison with open weight models of that time (Qwen-3-4B locally, GPT-4o or Llama 4 Scout in the cloud) and then they seemed to be at a good level in reference to those options. Expect them to be behind the most recent models from OpenAI, Anthropic, or Google itself, and it’s unclear how they compare to the new Chinese open weights models. One thing seems clear: Apple is not very interested in having its own Mythos, at least for now. Your objective is different.

Apple models from 2026 are “preferred” more than those from 2025. Logical, but also useless when it comes to understanding how good these models are compared to the competition.
But integration is important. Apple’s big ace to compensate for this difference in capacity is that its models have full access to the user’s OS, apps and hardware. AFM models are integrated with iPhone camera sensors, notification history or local app permissions. This allows useful tasks to be carried out that an LLM that is “disconnected” from the hardware will hardly be able to replicate. Here the integration of the models with the hardware and software of the device is (or wants to be) fundamental.
Beware of mediocrity. This approach focused on integration and privacy is especially striking and differentiating from its competitors, but there are risks. Among others, the product is limited by its functional capabilities compared to the competition. If local models do not solve and cloud models also do not behave reliably, Apple runs the risk of having an AI that is secure and private but technically mediocre in its responses. Siri has already been criticized for being especially stupid: Siri AI must precisely eradicate that perception.


GIPHY App Key not set. Please check settings