Tue, 02 Dec 2025

duckdb-mlpack 0.0.5: Added kmeans, version helpers, documentation

A new release of the still-recent duckdb extension for mlpack, the C++ header-only library for machine learning, was merged into the duckdb community extensions repo today, and has been updated at its duckdb ‘mlpack’ extension page.

This release 0.0.5 adds one new method: kmeans clustering. We also added two version accessors for both mlpack and armadillo. We found during the work on random forests (added in 0.0.4) that the multithreaded random number generation was not quite right in the respective upstream codes. This has by now been corrected in armadillo 15.2.2 as well as the trunk version of mlpack so if you build with those, and set a seed, then your forests and classification will be stable across reruns. We added a second state variable mlpack_silent that can be used to suppress even the minimal prediction quality summary some methods show, and expanded the documentation.

For more details, see the repo for code, issues and more, and the extension page for more about this duckdb community extension.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

/code/duckdb-mlpack | permanent link