LM-Kit.NET
LLM 애플리케이션을 빌드하기 위한 엔터프라이즈급 .NET SDK.
LM-Kit사에서 공개
2025년 부터 ComponentSource에서 판매중
Released: May 6, 2026
LMKit.Cryptography, LMKit.Model, LMKit.Hardware): LM.LoadEncrypted now honors LoadingOptions.LoadTensors = false, mirroring the plaintext metadata-only path - only the metadata block is decrypted, no tensor bytes are read, and the resulting LM exposes architecture, vocabulary, context length, layer count, and other GGUF metadata. Use this for fast catalog inspection or pre-flight checks on protected .lmke containers.MemoryEstimation.FitParameters overload for encrypted containers (LMKit.Hardware): new FitParameters(string encryptedPath, GgufEncryptionScheme scheme, string password, ...) runs the native fit estimator against an encrypted GGUF without ever materializing tensor bytes; the existing FitParameters(LM model, ...) overload also now works on models loaded via LM.LoadEncrypted and reuses the metadata cached at load time, so callers do not need to re-supply the password. Tensor data is never decrypted during estimation.LM.IsEncrypted property (LMKit.Model): true when the instance was loaded via LM.LoadEncrypted. Lets downstream code branch on encryption state without inspecting the file path.LM.DeviceConfiguration.AutoFitToVram property (LMKit.Model): controls whether the model loader automatically retries with progressively fewer GPU layers when the first load attempt fails because the model does not fit in available VRAM. Default is true. When enabled, the runtime walks GpuLayerCount down - placing the remaining layers in system memory - until the model loads or the entire model is on CPU. Set to false to restore the previous behavior of failing loud on insufficient VRAM.LMKit.Model): when a model load fails because the model does not fit in the GPU's available VRAM, the loader now automatically retries with progressively fewer GPU layers - placing the remaining layers in system memory - until the model loads or the entire model is on CPU. Replaces the previous behavior where insufficient VRAM produced an immediate exception. The fallback is gated by the new DeviceConfiguration.AutoFitToVram flag (default true); set it to false to restore the previous fail-loud behavior.LMKit.Model, LMKit.TextGeneration): before allocating a new inference context, the runtime now estimates the KV-cache and compute-buffer cost for the requested context size and compares it to the device's currently free VRAM. If the projection exceeds free memory, the context size is reduced up-front, avoiding a doomed allocation attempt. On an actual allocation failure during creation, the runtime additionally retries with progressively smaller context sizes before throwing.LMKit.Exceptions): when the runtime cannot allocate an inference context even after its built-in retries, the thrown RuntimeException now includes the device's free VRAM at failure time and a hint to either set DeviceConfiguration.GpuLayerCount = 0 for CPU-only loading or shrink the requested context size.LM.DeviceConfiguration.ForceCpuMode property (LMKit.Model): functionally duplicated by setting GpuLayerCount = 0, which routes the entire model and KV cache to system memory. Callers who set ForceCpuMode = true should set GpuLayerCount = 0 instead.