You want uncensored local models that don't flinch at your scenes—and don't collapse into nonsense after the refusal circuit is stripped away. Reddit's noise is loud, but the reality is narrower. We've dug through model cards, third-party benchmarks, and our own use. This is what still works in May 2026.
Quick picks
| You want... | Download this | VRAM | Refusal rate | Method |
|---|---|---|---|---|
| All-rounder for writing & RP | Huihui-Qwen3.6-35B-A3B | ~14GB | Low | Huihui abliteration |
| Long-form fiction, transparent docs | wangzhang/Qwen3.6-27B-v2 | ~17GB | ~10% | Abliterix |
| Modest GPU — 3060 / MacBook | heretic-org/IBM-granite-4.1-8b | ~5GB | ~1.2% | Heretic v1.2.0 |
| Gemma instead of Qwen | DuoNeural/Gemma-4-26B-A4B | ~13GB | Low | Rep engineering |
| Workstation powerhouse | darkc0de/XORTRON-2026.3 | 123B dense | ~2% | Heretic v1.2.0 |
| Don't want to run locally | Grok via xAI API | Cloud | Never | Native uncensored |
All Hugging Face links are live as of May 2026.
The models
Huihui-Qwen3.6-35B-A3B-abliterated
This is the one. 35B total, ~3B active at inference — heavy lifting without the VRAM drain of a dense model. ~14GB on UD-Q4_K_M, runs clean on 16GB+ GPUs. Community benchmarks clock it at 101 tokens/sec on an RTX 3090. Qwen 3.6 (April 2026) finally squashed the looping bug that plagued Qwen 3.x — no more mid-scene melt-downs.
It handles 256K context, multi-modal input, and stays coherent across long roleplay sessions. Won't flinch when your prompt goes where standard models refuse. Refuses almost nothing worth refusing.
Best for: NSFW roleplay with deep context, boundary-testing creative writing, private uncensored chat, light coding that triggers standard safety filters.
Skip if: You're doing heavy agentic coding where dense-model coherence wins the day — or your GPU's under ~14GB.
wangzhang/Qwen3.6-27B-abliterated-v2
HF: wangzhang
Dense 27B, full parameter engagement. ~17GB at Q4_K_M — clean fit on a 24GB card. If you've got the VRAM and want long-form fiction that actually holds together, this is your pick. The model card is the most transparent doc in the abliteration space.
Uses two-pass orthogonal projection (DeepRefusal-peel) and LoRA rank-1 steering. The authors call it straight:
"Many abliterated models claim near-perfect scores. We urge the community to treat these numbers with skepticism unless the methodology is fully documented."
That is honesty you do not see often. Independent forensic analysis puts KL divergence at 0.024 — surgically clean. ~10% refusal rate — honest, not faked to zero.
Best for: Long-form dark fiction, complex NSFW roleplay, technical writing in security/systems domains.
Skip if: Your GPU cannot run 27B dense at Q4+, or you want lighter VRAM overhead.
heretic-org/IBM-granite-4.1-8b-heretic
HF: heretic-org · GGUF quants on the same page
This one is for everyone with a RTX 3060 (12GB), a MacBook, or anything with 8GB+ RAM. We ran it on a 12GB 3060 — zero drama.
Heretic v1.2.0 SOM. Refusal rate ~1.2%, KL divergence 0.029. Model card documents the method.
Best for: Quick uncensored responses on modest hardware, short roleplay and character chat.
Skip if: You need deep long-form prose, complex coding, or heavy reasoning.
DuoNeural/Gemma-4-26B-A4B-Abliterated
HF: DuoNeural
~13GB at Q3_K_M, fits 16GB+ GPUs. The only major Gemma abliteration in the wild as of May 2026.
Best for: Writers who prefer Gemma voice, multilingual writing.
Skip if: Q3_K_M quality loss bothers you.
darkc0de/XORTRON.CriminalComputing.LARGE.2026.3
HF: darkc0de
123B dense. Multi-GPU or Mac Studio only. ~2% refusal rate. The ceiling for local workstations.
Does abliteration reduce output quality?
Abliteration does not retrain the model — it surgically removes the refusal mechanism from the weights.
- Heretic — most precise. ~1,826 specific tensors edited, minimal collateral.
- HauhauCS — the butcher. 6x the KL divergence.
- Huihui — inconsistent. Low KL on big models, demolishes small ones.
Hardware: what fits where
| Your GPU | VRAM | What you can run |
|---|---|---|
| RTX 3060 | 12GB | Heretic 8B, Josiefied 4-8B |
| RTX 4090 | 24GB | wangzhang 27B, Huihui 35B at Q5 |
| Mac Studio (M3 Max) | 64-192GB | Up to 123B |
API access
Grok via xAI API
Natively uncensored. April 2026 refresh, 1M context. xAI retired Grok 4.1 Fast on May 15, 2026 — all requests auto-redirect to Grok 4.3. Update your model slug to grok-4.3.
Specialist providers
- abliteration.ai — OpenAI-compatible, $20-$50/month. Site
- Unfil AI — Pay-as-you-go from $0.90/M. Site
- Venice AI — Privacy-first, Venice: Uncensored free tier. Site
Bottom line
- Huihui-Qwen3.6-35B-A3B — Best balance of capability and refusal removal. 16GB+ GPU. — HF
- wangzhang/Qwen3.6-27B-v2 — Dense 27B, most transparent model card. 24GB+. — HF
- Heretic 8B (IBM Granite) — Best small uncensored model. Fits a 3060. — HF
- Grok 4.3 via xAI API — Native uncensored, zero setup. — docs.x.ai
- DuoNeural/Gemma-4-26B-A4B — Gemma abliteration for 16GB+. — HF
- darkc0de/XORTRON 123B — Workstation ceiling. — HF
Every model listed has a verified Hugging Face page as of May 2026.