Salvatore Sanfilippo, who built the Redis non-relational database, has created an inference engine for the DeepSeek V4 Flash open source LLM. Called DwarfStar 4, it runs on Apple Mac gear and Linux.

He posted the code, ds4.c – written mostly in the C programming language and Apple’s own Objective-C – on GitHub, under an M.I.T. open source license. It’s since been forked 533 times (at press time). 

In terms of astronomy, a dwarf star is a star that has collapsed into itself, creating a denser object in a smaller space. Likewise, DwarfStar condenses the 284 billion parameter DeepSeek LLM into a form factor that can be used on a home Mac (via the Metal GPU API), and, more recently, Linux (through NVIDIA’s CUDA).  

“A nice thing about DeepSeek V4 Flash locally is that it’s a big enough model that you can have it explain [expletive] to you and it won’t completely lie to you,” wrote Python creator Armin Ronacher, in an X message praising the work. 

“Big models on small computers is the way!” further enthused CVS AI VP Dan Woods, in an X message.

An Inference Engine for a Specific Model

A lone master craftsman of software development, Sanfilippo has long undertaken projects with practical utility, bucking against the committee-driven development checklists of enterprise software. 

Sanfilippo originally created Redis as a way to get around MySQL’s limitation of writing each new entry to disk, which was deadly for tracking real-time Web activity. Redis in effect acted as an in-memory database, and it eventually found a huge user base with latency-sensitive concerns such as Instagram. 

So, not surprisingly, Sanfilippo took the path rarely taken for building an open source inference engine.  LLMs are a moving target, in that they are constantly being updated. This project takes a “deliberately narrow bet,” he wrote, by focusing on full support for a single version of a model, and for a single platform, one with at least 128GB of memory

DwarfStar borrows some logic from the open source llama.cpp inferencing software for small form factors. But whereas Llama.cpp was designed for a wide range of software, DwarfStar is targeted specifically for Apple silicon. “It is intentionally narrow,” its GitHub page stated. (Even the Linux version is not optimized to the level of the Mac). 

The open source GGML (Georgi Gerganov Machine Learning) library, now managed by Hugging Face, provided a blueprint on how to run LLMs on small form-factor hardware.

The State of Open Source AI

Along with Meta’s Llama, DeepSeek is perhaps the premier open source large language model, offering similar smarts as ChatGPT, Google Gemini, Anthropic and other commercial offerings. The Chinese company that developed its namesake LLM focused on architectural efficiency as a way to bypass the U.S. export ban of NVIDIA GPUs, and/or just the paucity of GPUs in general.

In the best style of open source, DeepSeek does not offer a stock inference engine for its models (which consist mainly of weights and reference code), but rather lets the open source community roll their own depending on specific needs. vLLM appears to be the favorite for general LLM serving. Hugging Face offers a Python-based option for DeepSeek specifically. SGLang provides an engine for larger, industrial-scale deployments.   

Shrinkage Will Occur

DwarfStar crams DeepSeek’s V4 Flash onto home hardware. The inference engine does this by selectively managing 13 billion parameters at a time, and has an aggressive KV cache compression. Also playing a part, at least with the Apple hardware, are the SSD drives, which offer fast response-times. 

This compact form factor is a victory all on its own. But there are also a number of beneficial second order effects, according to Sanfilippo.

For one, this inference engine offers shorter “thinking” sessions, sometimes a fifth of the length of regular models. This opens up the possibility of deeper inquiries into DeepSeek that may not even be possible with other models. 

DeepSeek’s generous million token context window provides an edge when it comes to obscure topics, such as Italian television shows, the GitHub page proffers as an example. 

AI for the Home Hobbyist

In a way, Sanfilippo operates in a similar vein to HashiCorp co-founder Mitchell Hashimoto, who helped create Terraform, Vault and other cloud native tools. Hashimoto left HashiCorp and independently built what is now a popular open source terminal, Ghostty.  

Now, Sanfilippo, like Hashimoto, has created in his post-enterprise software activities a minimalistic utility that solves a very specific problem, in this case a front-end for an open source model that can run in the developer’s own environment. 

His work points to the path that open source adherents must take if they want to rescue AI from the grips of corporate control.