LM-Kit.NET 2026.5.4

Released: May 24, 2026

Aggiornamenti in 2026.5.4

Funzionalità

  • Stream-based model loading (LMKit.Model): new LM(Stream) loads GGUF and LMK archives from a stream; LM.LoadEncryptedFromStream(...) does the same for encrypted GGUF (.lmke). No on-disk extraction.
  • Markdown attachment support (LMKit.Data.Attachment, LM-Kit.Server): .md / .markdown files are now recognized as plain-text attachments (text/markdown), handled identically to .txt end to end.
  • Multi-Token Prediction (MTP) self-speculative decoding: a new generation accelerator for models trained with MTP heads. MTP runs a lightweight in-model draft head to propose several tokens per main-model forward pass and verifies them in a single batched decode, delivering ~2× generation throughput with no accuracy loss. Lossless under greedy decoding and a zero-cost no-op on checkpoints without MTP heads. New public surface:
    • LM.LoadingOptions.EnableMultiTokenPrediction (bool, default true): controls whether MTP head tensors are loaded into VRAM at model load time. Set to false to skip the heads and save a few hundred MiB to ~1 GiB of VRAM when you know you will not use MTP on this LM instance.
    • LM.HasMultiTokenPrediction (bool): runtime capability check - true when the loaded model declares MTP heads and they were loaded.
  • Improved translation, text rewriting, and text correction engines (LMKit.Translation, LMKit.TextGeneration): higher output quality, better fidelity to the source text, and improved handling of long-form inputs across the TextTranslation, TextRewriting, and TextCorrection pipelines.
  • Broader GPU coverage in the Vulkan backend: the Vulkan runtime now detects and offloads onto a wider range of GPUs, including additional integrated and discrete devices that previously fell back to CPU. Improves performance on mixed-vendor fleets and on machines without CUDA-capable hardware.