LUMINIUM Ultimate Cube — 425M

A 425.8M parameter hybrid convolutional-attention language model built through layer surgery, collective-consciousness distillation, and cognitive-cube steering — distilled from an 8-model GPU cluster.

hybrid conv-attention cognitive-cube layer-surgery 128K context 10 languages Apache 2.0

View on Hugging Face → Download Q5_K_M (297 MB) Base: LFM2-350M

At a glance

Parameters

425.8M

expanded from 354.5M

Architecture

12 conv + 8 GQA

20 layers total

Context

128K

tokens

Speed (Q5_K_M)

164 tok/s

AMD Radeon VII

Speed (bf16)

128 tok/s

AMD Radeon VII

Domain pass

16 / 16

math · logic · code · safety · …

Overview

Model ID

mambiux/Luminium-Gixel-Cube-v1

Author

mambiux (with Claude Opus 4.6)

Base model

LiquidAI/LFM2-350M

License

Apache 2.0

Pipeline

text-generation

Languages

Key features

Hybrid conv-attention architecture
128K context window
Layer surgery (DARE+TIES merging)
Cognitive-cube steering
Multi-turn coherence
28–64% faster than base
Natural conversational tone
Self-awareness signals

Architecture & surgery

Expanded from LFM2-350M (16 → 20 layers) using cross-model architectural surgery. Four new layers were inserted via DARE + TIES merging, then the result was fine-tuned on a 45-source balanced curriculum distilled from an 8-model GPU cluster. Cognitive-cube steering positions the model in a 3D cognitive space, with inverse-distance weighting toward 8 specialist corner models.

Cognitive Cube — 8 corner models

Corner	Model	Params	Role
fwd · dexo · up	Server 1 (Qwen3-80B)	80B	predict — structured · abstract
fwd · dexo · down	Server 2 (Qwen3.6 MoE)	35B	predict — structured · concrete
fwd · levo · up	Server 3 (granite-4.1)	8B	predict — creative · abstract
fwd · levo · down	Server 4 (LFM2-12B)	12B	predict — creative · concrete
back · dexo · up	Server 5 (Qwen3.6-A3B)	35B	reflect — structured · abstract
back · dexo · down	Server 6 (Qwen3.5-9B)	9B	reflect — structured · concrete
back · levo · up	Server 7 (lumina-lexiR1-8B)	8B	reflect — creative · abstract
back · levo · down	Server 8 (Berthier-24B)	24B	reflect — creative · concrete

Available formats

Format	Size	Best for
model.safetensors (bf16)	1.6 GB	Fine-tuning · research
LUMINIUM-ULTIMATE-CUBE.gguf (bf16)	815 MB	Full-precision inference
LUMINIUM-ULTIMATE-CUBE-Q5_K_M.gguf	297 MB	Production · edge devices

Training

Method

LoRA (PEFT) fine-tuning

LoRA rank / α

16 / 32

Learning rate

3e-5

Epochs

Batch size

8 (effective 16, grad-accum)

Max seq length

768

Precision

bfloat16

Hardware

AMD Radeon VII (16 GB HBM2, ROCm 6.2)

Training data — 18,593 records · 45 sources

General instruction — 3,499
Reasoning & CoT — 5,995
Agentic & tool use — 2,573
Code — 599
Knowledge — 2,495
Feedback & editing — 1,341
Analytical reasoning — 900
+ 11 specialized categories

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "mambiux/Luminium-Gixel-Cube-v1",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "mambiux/Luminium-Gixel-Cube-v1",
    trust_remote_code=True,
)

llama.cpp

llama-server -m LUMINIUM-ULTIMATE-CUBE-Q5_K_M.gguf \
  --host 0.0.0.0 --port 8877 -c 4096 -ngl 99

Limitations

Inconsistent self-identification
Basic routing classification
Occasionally verbose on simple Q&A
Edge-case logic drift in quantized version
Requires trust_remote_code=True
Conv layers not supported by all backends

Citation

BibTeX

@misc{luminium2026,
  title  = {LUMINIUM ULTIMATE CUBE: Cognitive Cube Steering and Collective
            Consciousness Distillation for Small Language Models},
  author = {mbx and Claude Opus 4.6},
  year   = {2026},
  note   = {Built on LiquidAI/LFM2-350M with layer surgery, 8-model
            collective distillation, and geometric cognitive steering}
}