LUMINIUM Ultimate Cube logo

LUMINIUM Ultimate Cube — 425M

A 425.8M parameter hybrid convolutional-attention language model built through layer surgery, collective-consciousness distillation, and cognitive-cube steering — distilled from an 8-model GPU cluster.

hybrid conv-attention cognitive-cube layer-surgery 128K context 10 languages Apache 2.0

At a glance

Parameters
425.8M
expanded from 354.5M
Architecture
12 conv + 8 GQA
20 layers total
Context
128K
tokens
Speed (Q5_K_M)
164 tok/s
AMD Radeon VII
Speed (bf16)
128 tok/s
AMD Radeon VII
Domain pass
16 / 16
math · logic · code · safety · …

Overview

Model ID
mambiux/Luminium-Gixel-Cube-v1
Author
mambiux (with Claude Opus 4.6)
Base model
LiquidAI/LFM2-350M
License
Apache 2.0
Pipeline
text-generation
Languages
10

Key features

  • Hybrid conv-attention architecture
  • 128K context window
  • Layer surgery (DARE+TIES merging)
  • Cognitive-cube steering
  • Multi-turn coherence
  • 28–64% faster than base
  • Natural conversational tone
  • Self-awareness signals

Architecture & surgery

Expanded from LFM2-350M (16 → 20 layers) using cross-model architectural surgery. Four new layers were inserted via DARE + TIES merging, then the result was fine-tuned on a 45-source balanced curriculum distilled from an 8-model GPU cluster. Cognitive-cube steering positions the model in a 3D cognitive space, with inverse-distance weighting toward 8 specialist corner models.

Cognitive Cube — 8 corner models

CornerModelParamsRole
fwd · dexo · upServer 1 (Qwen3-80B)80Bpredict — structured · abstract
fwd · dexo · downServer 2 (Qwen3.6 MoE)35Bpredict — structured · concrete
fwd · levo · upServer 3 (granite-4.1)8Bpredict — creative · abstract
fwd · levo · downServer 4 (LFM2-12B)12Bpredict — creative · concrete
back · dexo · upServer 5 (Qwen3.6-A3B)35Breflect — structured · abstract
back · dexo · downServer 6 (Qwen3.5-9B)9Breflect — structured · concrete
back · levo · upServer 7 (lumina-lexiR1-8B)8Breflect — creative · abstract
back · levo · downServer 8 (Berthier-24B)24Breflect — creative · concrete

Available formats

FormatSizeBest for
model.safetensors (bf16)1.6 GBFine-tuning · research
LUMINIUM-ULTIMATE-CUBE.gguf (bf16)815 MBFull-precision inference
LUMINIUM-ULTIMATE-CUBE-Q5_K_M.gguf297 MBProduction · edge devices

Training

Method
LoRA (PEFT) fine-tuning
LoRA rank / α
16 / 32
Learning rate
3e-5
Epochs
3
Batch size
8 (effective 16, grad-accum)
Max seq length
768
Precision
bfloat16
Hardware
AMD Radeon VII (16 GB HBM2, ROCm 6.2)

Training data — 18,593 records · 45 sources

  • General instruction — 3,499
  • Reasoning & CoT — 5,995
  • Agentic & tool use — 2,573
  • Code — 599
  • Knowledge — 2,495
  • Feedback & editing — 1,341
  • Analytical reasoning — 900
  • + 11 specialized categories

Usage

Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "mambiux/Luminium-Gixel-Cube-v1",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "mambiux/Luminium-Gixel-Cube-v1",
    trust_remote_code=True,
)
llama.cpp
llama-server -m LUMINIUM-ULTIMATE-CUBE-Q5_K_M.gguf \
  --host 0.0.0.0 --port 8877 -c 4096 -ngl 99

Limitations

  • Inconsistent self-identification
  • Basic routing classification
  • Occasionally verbose on simple Q&A
  • Edge-case logic drift in quantized version
  • Requires trust_remote_code=True
  • Conv layers not supported by all backends

Citation

BibTeX
@misc{luminium2026,
  title  = {LUMINIUM ULTIMATE CUBE: Cognitive Cube Steering and Collective
            Consciousness Distillation for Small Language Models},
  author = {mbx and Claude Opus 4.6},
  year   = {2026},
  note   = {Built on LiquidAI/LFM2-350M with layer surgery, 8-model
            collective distillation, and geometric cognitive steering}
}