Download the PHP package ksanyok/text-humanize without Composer
On this page you can find all versions of the php package ksanyok/text-humanize. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download ksanyok/text-humanize
More information about ksanyok/text-humanize
Files in ksanyok/text-humanize
Package text-humanize
Short Description Zero-dependency PHP library for algorithmic text humanization — transforms machine-generated text into natural prose
License proprietary
Homepage https://github.com/ksanyok/TextHumanize
Informations about the package text-humanize
[](https://www.python.org/downloads/) []() [](https://www.php.net/) [](https://github.com/ksanyok/TextHumanize/actions/workflows/ci.yml) [](https://github.com/ksanyok/TextHumanize/actions/workflows/ci.yml) []() [](https://pypi.org/project/texthumanize/) [-blue.svg)](LICENSE)
**235,000+ lines of code** · **122 Python modules** · **38-stage pipeline** · **25 languages + universal** · **2,105 tests** **3 proprietary technologies:** PHANTOM™ (gradient-guided internal score optimization) · ASH™ (adaptive signature humanization) · SentenceValidator™ (interstage quality gate) [Quick Start](#-quick-start) · [Proprietary Technologies](#-proprietary-technologies) · [Before & After](#-before--after-examples) · [Features](#-feature-matrix) · [Benchmarks](#-performance--benchmarks) · [AI Detection](#-ai-detection-engine) · [API Reference](#-api-reference) · [Documentation](https://ksanyok.github.io/TextHumanize/) · [Live Demo](https://texthumanize.link/) · [License](#-license--pricing)
Table of Contents
- Why TextHumanize?
- Proprietary Technologies
- Installation
- Quick Start
- Private Offline Workflow
- Before & After Examples
- Feature Matrix
- Comparison with Competitors
- Processing Pipeline
- AI Detection Engine
- API Reference
- Profiles & Presets
- Language Support
- NLP Infrastructure
- SEO Mode
- Readability Metrics
- Paraphrasing Engine
- Tone Analysis & Adjustment
- Watermark Detection & Cleaning
- Content Spinning
- Coherence Analysis
- Morphological Engine
- Stylistic Fingerprinting
- Auto-Tuner
- Plugin System
- Using Individual Modules
- CLI Reference
- REST API Server
- Async API
- Performance & Benchmarks
- Architecture
- TypeScript / JavaScript Port
- PHP Library
- Testing & Quality
- Security & Limits
- Responsible Use
- For Business & Enterprise
- FAQ & Troubleshooting
- What's New in v0.28.4
- Contributing
- Limitations
- Support the Project
- License & Pricing
TextHumanize is a pure-algorithmic text processing engine that transforms AI-generated drafts into clearer, more natural prose. Three proprietary technologies — PHANTOM™ (gradient-guided optimization against TextHumanize's own detector), ASH™ (adaptive signature humanization), and SentenceValidator™ (interstage quality control) — drive a 38-stage pipeline that reduces built-in AI-like style signals while preserving meaning. No neural networks, no API keys, no internet — just 235K+ lines of finely tuned rules, dictionaries, and statistical methods.
Honest note: TextHumanize is a style-normalization tool, not an AI-detection bypass tool. It reduces AI-like patterns (formulaic connectors, uniform sentence length, bureaucratic vocabulary) but does not guarantee that processed text will pass external AI detectors. Quality of humanization varies by language and text type. See Limitations below.
Built-in toolkit: AI Detection (3 detectors) · Paraphrasing · Tone Analysis · Watermark Cleaning · Content Spinning · Coherence Analysis · Readability Scoring · Stylistic Fingerprinting · Auto-Tuner · Perplexity Analysis · Plagiarism Detection · Grammar Check · Morphology Engine · Neural LM · Async API · SSE Streaming
Platforms: Python (full — 122 modules) · TypeScript/JavaScript (core) · PHP (full)
For business: SaaS integration · REST API with SSE streaming · Docker deployment · Bulk processing · Custom dictionaries · On-prem enterprise · White-label ready
Languages: 🇬🇧 EN · 🇷🇺 RU · 🇺🇦 UK · 🇩🇪 DE · 🇫🇷 FR · 🇪🇸 ES · 🇵🇱 PL · 🇧🇷 PT · 🇮🇹 IT · �🇱 NL · 🇸🇪 SV · 🇨🇿 CS · 🇷🇴 RO · 🇭🇺 HU · 🇩🇰 DA · 🇸🇦 AR · 🇨🇳 ZH · 🇯🇵 JA · 🇰🇷 KO · 🇹🇷 TR · 🇮🇳 HI · 🇻🇳 VI · 🇹🇭 TH · 🇮🇩 ID · 🇮🇱 HE · 🌍 any language via universal processor
🚀 Why TextHumanize?
Problem: Machine-generated text has uniform sentence lengths, bureaucratic vocabulary, formulaic connectors, and low stylistic diversity — reducing readability, engagement, and brand authenticity.
Solution: TextHumanize algorithmically normalizes text style while preserving meaning. Configurable intensity, deterministic output, full change reports. No cloud APIs, no rate limits, no data leaks.
| Advantage | Details | |
|---|---|---|
| 🚀 | Blazing fast | 300–500 ms for a paragraph; full article in 1–2 seconds |
| 🔒 | 100% private | All processing is local — your text never leaves your machine |
| 🎯 | Precise control | Intensity 0–100, 9 profiles, 5 style presets, keyword preservation, max change ratio |
| 🌍 | 25 languages | Deep support for EN/RU/UK/DE; dictionaries for 25 languages; statistical processor for any other |
| 📦 | Zero dependencies | Pure Python stdlib — no pip packages, no model downloads, starts in <100 ms |
| 🔁 | Reproducible | Seed-based PRNG — same input + same seed = identical output |
| 🧠 | 3-layer AI detection | 18-metric heuristic + 35-feature logistic regression + MLP neural detector — no ML framework required |
| 🔌 | Plugin system | Register custom hooks at any of 38 pipeline stages |
| 📊 | Full analytics | Readability (6 indices), coherence, plagiarism, stylometric fingerprint, content health score |
| 🎭 | Tone control | Analyze and adjust formality across 7 levels |
| 📚 | 2,944 dictionary entries | EN 1,733 + RU 1,345 + UK 1,042 + DE 874 + FR 718 + ES 749 + more |
| 🏢 | Enterprise-ready | Dual license, 2,105+ tests, CI/CD, REST API, Docker, on-prem deployment |
| 🛡️ | Secure by design | Input limits, zero network calls, linear-time regex, no eval/exec |
| 📝 | Full auditability | Every call returns change_ratio, quality_score, similarity, explain() report |
� Proprietary Technologies
TextHumanize includes three original, proprietary technologies not found in any other open-source library:
PHANTOM™ — Gradient-Guided Text Optimization Engine
phantom.py — 2,943 lines | An open-source text naturalizer that uses numerical gradient optimization against TextHumanize's own AI detector.
- ORACLE computes numerical gradients through the MLP detector via central differences (~70 forward passes, ~1.4ms), producing per-feature contribution analysis and ranked gap reports
- SURGEON executes 32 feature-targeted surgical text operations guided by Oracle gradients — rank-based magnitude scheduling focuses effort on highest-impact features first
- FORGE runs an iterative optimization loop with combined score tracking, stall detection, adaptive budget escalation, text expansion limits, and post-iteration cleanup
- Result: 100% internal pass rate on TextHumanize's built-in detector benchmark (15/15 texts across EN, RU, UK). Processing time: 0.7–1.4s. External detectors use different models and are not guaranteed.
ASH™ — Adaptive Signature Humanization
ash_engine.py + signature_transfer.py + perplexity_sculptor.py | Statistically transforms text to match real human writing signatures.
- Human Profiles — statistical fingerprints of real human writing per language (sentence length distribution, vocabulary richness, burstiness patterns, punctuation habits)
- Signature Transfer — morphs AI text's statistical signature toward the target human profile
- Perplexity Sculpting — adjusts word-level perplexity to match human perplexity distribution curves
- Metric Gaps — identifies and systematically closes the gap between AI and human writing on 35+ features
SentenceValidator™ — Interstage Quality Gate
sentence_validator.py — 350 lines | Catches and eliminates artifacts between pipeline stages in real-time.
- 10 checks per sentence: duplicate words (
the the), broken contractions (do n't), orphaned punctuation, double conjunctions (and and), dangling conjunctions, unterminated parentheses, triple+ character repeats, fragment chains, conjunction chains, empty sentences - 7 validation checkpoints between pipeline stages — catches artifacts the moment they appear
- Language-aware — recognizes conjunctions in EN, RU, UK, DE, FR, ES
- Final sanitization — post-pipeline cleanup removes residual artifacts that survive all stages
�📦 Installation
From source:
Tip: Pin your version for production:
pip install texthumanize==0.28.4
PHP / TypeScript
⚡ Quick Start
Private Offline Workflow
For privacy-sensitive content, use the local audit -> safe cleanup -> strict humanize -> audit pattern. It keeps processing offline, preserves critical terms, and records review metrics without using cloud APIs.
The example uses backend="local", quality_gate="strict", minimal=True,
brand/identifier preservation, and a socket guard that raises if any code tries
to open a network connection. See the full
Private Offline Workflow guide.
All Features at a Glance
🔄 Before & After Examples
English
Before (AI-generated, AI score: 94%):
Furthermore, it is important to note that the implementation of cloud computing facilitates the optimization of business processes. Additionally, the utilization of microservices constitutes a significant advancement. Moreover, the integration of artificial intelligence into the workflow enhances decision-making processes and contributes to overall organizational efficiency.
After (TextHumanize, profile="web", intensity=60, AI score: 23%):
Also, importantly, the implementation of cloud computing helps the tuning of business processes. Up a major advancement, additionally, the use of microservices makes. And, the merge of artificial intelligence into the workflow enhances decision-making processes; and, contributes to overall organizational speed.
Russian
Before (AI score: 80%):
Необходимо отметить, что данная методология обеспечивает существенное повышение эффективности рабочих процессов. Кроме того, внедрение инновационных технологий способствует оптимизации функционирования организации. Более того, использование искусственного интеллекта позволяет значительно улучшить процесс принятия решений.
After (AI score: 5%):
Важно — что данная метод даёт существенное повышение эффективности рабочих процессов! Впрочем, смотрите, внедрение инновационных технологий помогает оптимизации функционирования организации, значительно, к тому же, использование искусственного интеллекта позволяет улучшить процесс принятия решений.
Ukrainian
Before (AI score: 75%):
Необхідно зазначити, що дана методологія забезпечує суттєве підвищення ефективності робочих процесів. Крім того, впровадження інноваційних технологій сприяє оптимізації функціонування організації. Більш того, використання штучного інтелекту дозволяє значно покращити процес прийняття рішень.
After (AI score: 17%):
Важливо, що ця метод дає суттєве підвищення ефективності робочих процесів; в принципі, впровадження інноваційних технологій веде до оптимізації функціонування організації. До того ж, використання штучного інтелекту дає змогу сильно покращити процес прийняття рішень.
AI Score Reduction Summary
| Language | Before | After | Reduction | Mode |
|---|---|---|---|---|
| English | 94% | 2% | -92pp | web/70 |
| English | 94% | 23% | -71pp | web/60 |
| Russian | 80% | 5% | -75pp | web/50 |
| Ukrainian | 75% | 17% | -58pp | web/50 |
Built-in AI detector scores. Results measured with TextHumanize's 3-layer ensemble (heuristic + statistical + MLP neural). External detectors may produce different results.
Profile Comparison (EN, intensity=50)
| Profile | Change Ratio | Quality | AI Score After |
|---|---|---|---|
web |
0.50 | 0.20 | 27% 🟢 |
chat |
0.61 | 0.20 | 27% 🟢 |
marketing |
0.48 | 0.25 | 27% 🟢 |
seo |
0.48 | 0.25 | 33% 🟢 |
formal |
0.48 | 0.24 | 29% 🟢 |
academic |
0.48 | 0.24 | 29% 🟢 |
Input AI score: 94% — all profiles bring it below 35%.
🧩 Feature Matrix
| Category | Feature | Python | JS | PHP |
|---|---|---|---|---|
| Core | humanize() — 38-stage pipeline |
✅ | ✅ | ✅ |
humanize_batch() — parallel processing |
✅ | — | ✅ | |
humanize_chunked() — large text support |
✅ | — | ✅ | |
humanize_ai() — three-tier AI + rules |
✅ | — | — | |
humanize_until_human() — iterative |
✅ | — | — | |
humanize_sentences() — per-sentence |
✅ | — | — | |
humanize_stream() — streaming |
✅ | — | — | |
humanize_variants() — N output variants |
✅ | — | — | |
analyze() — artificiality scoring |
✅ | ✅ | ✅ | |
explain() — change report |
✅ | — | ✅ | |
| AI Detection | detect_ai() — 3-layer ensemble |
✅ | ✅ | ✅ |
detect_ai_batch() — batch detection |
✅ | — | — | |
detect_ai_sentences() — per-sentence |
✅ | — | — | |
detect_ai_mixed() — mixed content |
✅ | — | — | |
StatisticalDetector — 35-feature LR |
✅ | — | — | |
NeuralAIDetector — MLP (pure Python) |
✅ | — | — | |
| NLP | paraphrase() — syntactic transforms |
✅ | — | ✅ |
POSTagger — rule-based POS (4 langs) |
✅ | — | — | |
HMMTagger — Viterbi HMM tagger |
✅ | — | — | |
CJKSegmenter — zh/ja/ko segmentation |
✅ | — | — | |
SyntaxRewriter — 8+ sentence transforms |
✅ | — | — | |
WordLanguageModel — perplexity (14 langs) |
✅ | — | — | |
NeuralPerplexity — LSTM char-level LM |
✅ | — | — | |
CollocEngine — PMI scoring + replacement guard |
✅ | — | — | |
MorphologyEngine — 4 languages |
✅ | — | — | |
WordVec — lightweight word vectors |
✅ | — | — | |
| Tone | analyze_tone() — formality analysis |
✅ | — | ✅ |
adjust_tone() — 7-level adjustment |
✅ | — | ✅ | |
| Watermarks | detect_watermarks() — 6 types |
✅ | — | ✅ |
clean_watermarks() — removal |
✅ | — | ✅ | |
| Spinning | spin() / spin_variants() |
✅ | — | ✅ |
| Analysis | analyze_coherence() — paragraph flow |
✅ | — | ✅ |
full_readability() — 6 indices |
✅ | — | ✅ | |
check_grammar() — rule-based (9 langs) |
✅ | — | — | |
uniqueness_score() — plagiarism check |
✅ | — | — | |
content_health() — composite 0–100 |
✅ | — | — | |
semantic_similarity() — TF-IDF cosine |
✅ | — | — | |
sentence_readability() — per-sentence |
✅ | — | — | |
| Stylistic fingerprinting | ✅ | — | — | |
| Quality | BenchmarkSuite — 6-dimension scoring |
✅ | — | — |
FingerprintRandomizer — anti-detection |
✅ | — | — | |
QualityGate — CI/CD content check |
✅ | — | — | |
| Advanced | Style presets (5 personas) | ✅ | — | — |
| Auto-Tuner (feedback loop) | ✅ | — | — | |
| AI backend (OpenAI/Ollama/OSS) | ✅ | — | — | |
| Custom dictionary overlays | ✅ | — | — | |
| Domain dictionaries (SaaS/ecommerce/etc.) | ✅ | — | — | |
| Dictionary trainer (corpus) | ✅ | — | — | |
| Neural network training loop | ✅ | — | — | |
| Dashboard (HTML reports) | ✅ | — | — | |
| Plugin system | ✅ | — | ✅ | |
| REST API (OpenAPI + SSE) | ✅ | — | — | |
| SSE streaming | ✅ | — | — | |
| CLI (15+ commands) | ✅ | — | — | |
| Languages | Full dictionary support | 14 | 2 | 14 |
| Universal processor | ✅ | ✅ | ✅ |
⚔️ Comparison with Competitors
vs. Online Humanizers & GPT/LLM Rewriting
| Criterion | TextHumanize | Online Humanizers | GPT/LLM Rewriting |
|---|---|---|---|
| Works offline | ✅ | ❌ | ❌ |
| Privacy | ✅ 100% local | ❌ Third-party servers | ❌ Cloud API |
| Speed | ~300 ms/paragraph | 2–10 sec (network) | ~500 chars/sec |
| Cost per 1M chars | $0 | $10–50/month | $15–60 (GPT-4) |
| API key required | No | Yes | Yes |
| Deterministic | ✅ Seed-based | ❌ | ❌ |
| Languages | 25 + universal | 1–3 | 10+ but expensive |
| Built-in AI detector | ✅ 3-layer ensemble | ❌ or basic | ❌ |
| Max change control | ✅ max_change_ratio |
❌ | ❌ Unpredictable |
| Open source | ✅ | ❌ | ❌ |
| Self-hosted | ✅ Docker / pip | ❌ | ❌ |
| Audit trail | ✅ explain() |
❌ | ❌ |
vs. Other Open-Source Libraries
| Feature | TextHumanize | Typical Alternatives |
|---|---|---|
| Pipeline stages | 38 | 2–4 |
| Languages | 25 + universal | 1–2 |
| AI detection | ✅ 3-layer (18 + 35 + MLP) | ❌ |
| Python tests | 2,105 | 10–50 |
| Codebase size | 235,000+ lines | 500–2K |
| Platforms | Python + JS + PHP | Single |
| Plugin system | ✅ | ❌ |
| Tone analysis | ✅ 7 levels | ❌ |
| REST API | ✅ OpenAPI + SSE | ❌ |
| Readability metrics | ✅ 6 indices | 0–1 |
| Morphological engine | ✅ 4 languages | ❌ |
| Neural components | MLP + LSTM + HMM | ❌ |
| Content spinning | ✅ spintax | ❌ |
| Stylistic fingerprinting | ✅ | ❌ |
| Grammar checker | ✅ 9 languages | ❌ |
| Plagiarism detection | ✅ n-gram | ❌ |
vs. AI Detectors (GPTZero, Originality.ai)
| Feature | TextHumanize | GPTZero | Originality.ai |
|---|---|---|---|
| Price | Free | From $10/mo | From $14.95/mo |
| Works offline | ✅ | ❌ | ❌ |
| Self-hosted | ✅ | ❌ | ❌ |
| Per-sentence detection | ✅ | ✅ | ✅ |
| Mixed-content detection | ✅ | ✅ | ❌ |
| Combined humanize + detect | ✅ | ❌ | ❌ |
| Custom training | ✅ dict_trainer |
❌ | ❌ |
| API | ✅ REST + SSE | ✅ REST | ✅ REST |
| Batch detection | ✅ | ✅ (paid) | ✅ (paid) |
| CI/CD quality gate | ✅ quality_gate.py |
❌ | ❌ |
🔧 Processing Pipeline (38 Stages)
Adaptive intensity: Auto-reduces processing for already-natural text. Graduated retry: Retries at lower intensity if change ratio exceeds the limit. SentenceValidator™: 7 interstage checkpoints catch artifacts between stages (10 checks per sentence). Tier system: Tier 1 languages (EN/RU/UK/DE) get all 38 stages. Tier 2 (FR/ES/IT/PL/PT/NL/SV/CS/RO/HU/DA) get ~30. Tier 3 (AR/ZH/JA/KO/TR/HI/VI/TH/ID/HE) get ~20 + universal.
🧠 AI Detection Engine
Three independent detectors combined into a single score:
Architecture
18 Heuristic Metrics
| # | Metric | What It Measures |
|---|---|---|
| 1 | Entropy | Character/word-level Shannon entropy |
| 2 | Burstiness | Sentence/paragraph length variability (humans vary, AI doesn't) |
| 3 | Vocabulary | TTR, MATTR, Yule's K, hapax legomena ratio |
| 4 | Zipf | Fit to Zipf's law distribution |
| 5 | Stylometry | Function word patterns, punctuation fingerprint |
| 6 | AI Patterns | Formulaic phrases ("it is important to note", "furthermore") |
| 7 | Punctuation | Punctuation distribution profile |
| 8 | Coherence | Paragraph uniformity (too-uniform = AI) |
| 9 | Grammar | Grammatical "perfection" level (too-perfect = AI) |
| 10 | Openings | Sentence-opening diversity |
| 11 | Readability | Consistency of readability scores across sentences |
| 12 | Rhythm | Syllable patterns, sentence length rhythm |
| 13 | Perplexity | N-gram predictability |
| 14 | Discourse | Discourse structure (topic sentences, markers) |
| 15 | Semantic Repetition | Cross-paragraph semantic overlap |
| 16 | Entity | Specificity of named entities and examples |
| 17 | Voice | Passive vs. active voice ratio |
| 18 | Topic Sentence | Topic-sentence-per-paragraph pattern |
35-Feature Statistical Detector (Logistic Regression)
| Category | Features |
|---|---|
| Lexical (4) | Type-token ratio, hapax ratio, avg word length, word length variance |
| Sentence (3) | Mean sentence length, length variance, length skewness |
| Vocabulary (3) | Yule's K, Simpson's diversity, vocabulary richness |
| N-gram (3) | Bigram/trigram repetition rates, unique bigram ratio |
| Entropy (3) | Character entropy, word entropy, bigram entropy |
| Burstiness (2) | Sentence burstiness, vocabulary burstiness |
| Structural (3) | Paragraph count, avg paragraph length, list/bullet ratio |
| Punctuation (5) | Comma, semicolon, dash, question, exclamation rates |
| AI Pattern (1) | AI pattern rate (strongest single feature, weight −2.10) |
| Perplexity (2) | Word frequency rank variance, Zipf fit residual |
| Readability (2) | Syllables/word, Flesch score normalized |
| Discourse (3) | Starter diversity, conjunction rate, transition word rate |
| Rhythm (1) | Consecutive length difference variance |
Neural MLP Detector
Feed-forward neural network entirely in pure Python (no PyTorch, no TensorFlow). Pre-trained weights shipped as compressed JSON (54 KB).
Verdicts
| Score | Verdict | Meaning |
|---|---|---|
| < 35% | human_written |
Likely written by a human |
| 35–65% | mixed |
Mixed content or uncertain |
| ≥ 65% | ai_generated |
Likely AI-generated |
Detection Modes
📖 API Reference
humanize(text, lang, **kwargs) → HumanizeResult
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
str |
— | Input text (max 1 MB) |
lang |
str |
— | Language code: en, ru, uk, de, etc. |
profile |
str |
"web" |
Processing profile: chat, web, seo, docs, formal, academic, marketing, social, email, plus intent aliases seo_article, landing_page, product_description, support_reply, legal, social_post |
intensity |
int |
50 |
Aggressiveness 0–100 |
seed |
int |
None |
PRNG seed for reproducibility |
preserve |
dict |
{} |
Protect code, URLs, email, dates, prices, ids, quotes, named entities, brand terms |
minimal |
bool |
False |
Only humanize AI-flagged sentences |
max_change_ratio |
float |
None |
Maximum allowed proportion of change (0.0–1.0) |
constraints |
dict |
{} |
Advanced constraints (keep_keywords, etc.) |
quality_gate |
str |
None |
Use "strict" to rollback on similarity, grammar, or readability regression |
backend |
str |
None |
LLM backend: "openai", "ollama", "oss", "auto" |
Returns HumanizeResult:
| Field | Type | Description |
|---|---|---|
.text |
str |
Processed text |
.change_ratio |
float |
Proportion of text changed (0.0–1.0) |
.quality_score |
float |
Quality metric |
.similarity |
float |
Semantic similarity to original |
.metrics_after["humanize_explain"] |
dict |
Top 5 change reasons, top 5 remaining risks, sentence-level risk deltas |
.metrics_after["anti_overhumanize"] |
dict |
Final guard report for stacked fillers, repeated discourse markers, and excessive ! / ? punctuation |
.stages |
list |
Stages applied with timing |
Other Humanization Modes
detect_ai(text, lang) → dict
| Field | Description |
|---|---|
score |
AI probability (0.0–1.0) |
verdict |
"human_written", "mixed", or "ai_generated" |
confidence |
Confidence level (0.0–1.0) |
metrics |
Individual metric scores (18 heuristic + 35 statistical) |
combined_score |
Weighted average of all detectors |
Other Core Functions
| Function | Description |
|---|---|
analyze(text, lang) |
Returns AnalysisReport with artificiality score, sentence stats |
explain(result) |
Human-readable change report |
paraphrase(text, lang) |
Syntactic paraphrasing (voice transforms, connector shuffling) |
analyze_tone(text, lang) |
Tone analysis (formality, style) |
adjust_tone(text, target, lang) |
Adjust formality to 7 levels |
detect_ai_explain(text, lang) |
Explainable AI detector report with spans and suggested actions |
audit_report(text, lang) |
Combined AI + watermark audit JSON |
detect_watermarks(text) |
Detect 6 types of invisible watermarks |
clean_watermarks(text) |
Remove all detected watermarks |
watermark_report(text, lang) |
Unified Unicode + statistical watermark report |
spin(text, lang) |
Generate a single spun variant |
spin_variants(text, count, lang) |
Generate N spun variants |
analyze_coherence(text, lang) |
Paragraph flow analysis |
full_readability(text, lang) |
6 readability indices |
build_author_profile(text, lang) |
Stylometric fingerprint |
compare_fingerprint(text, profile) |
Compare text to an author profile |
anonymize_style(text, lang) |
Stylometric anonymization |
check_grammar(text, lang) |
Grammar check (9 languages) |
uniqueness_score(text) |
N-gram uniqueness |
content_health(text, lang) |
Composite quality score 0–100 |
🎭 Profiles & Style Presets
Processing Profiles
| Profile | Use Case | Sentence Length | Colloquialisms | Default Intensity |
|---|---|---|---|---|
chat |
Messaging, social media | 8–18 words | High | 80 |
web |
Blog posts, articles | 10–22 words | Medium | 60 |
seo |
SEO content (keyword-safe) | 12–25 words | None | 40 |
docs |
Technical documentation | 12–28 words | None | 50 |
formal |
Legal, official | 15–30 words | None | 30 |
academic |
Research papers | 15–30 words | None | 25 |
marketing |
Sales, promo copy | 8–20 words | Medium | 70 |
social |
Social media posts | 6–15 words | High | 85 |
email |
Business emails | 10–22 words | Medium | 50 |
Style Presets (5 Personas)
| Preset | Sentences | Vocabulary | Style |
|---|---|---|---|
🎓 student |
Short–medium | Simple | Conversational, informal |
✍️ copywriter |
Varied (short bursts + long) | Dynamic | Energetic, varied rhythm |
🔬 scientist |
Long, complex | Technical | Formal, precise, cautious hedging |
📰 journalist |
Medium, diverse | Clear | Neutral, fact-oriented |
💬 blogger |
Short, punchy | Informal | Questions, exclamations, personal |
Intensity Levels
| Range | Effect | Use Case |
|---|---|---|
| 0–20 | Minimal — typography and watermarks only | Already-natural text |
| 21–40 | Light — connectors and basic synonym swap | SEO, formal content |
| 41–60 | Moderate — structure + paraphrasing | Blog posts, web content |
| 61–80 | Aggressive — syntax rewriting + entropy | Chat, social media |
| 81–100 | Maximum — all transforms at full power | Heavy AI text |
🌍 Language Support
Language Tiers
| Tier | Languages | Detection | Humanization | Syntax Rewriting |
|---|---|---|---|---|
| 1 | EN, RU, UK, DE | ✅ Full | ✅ Full 38-stage | ✅ |
| 2 | FR, ES, IT, PL, PT | ✅ Good | ✅ 15-stage | ❌ |
| 3 | AR, ZH, JA, KO, TR | ✅ Basic | ✅ 10-stage + universal | ❌ |
| 0 | Any other language | ✅ Statistical | ✅ Universal processor | ❌ |
Dictionary Coverage
| Language | Code | Synonyms | Bureaucratic | AI Connectors | Sentence Starters | Colloquial | Collocations |
|---|---|---|---|---|---|---|---|
| English | en |
431 | 645 | 152 | 75 | 127 | 1,578 |
| Russian | ru |
269 | 486 | 100 | 73 | 102 | 408 |
| Ukrainian | uk |
243 | 338 | 75 | 46 | 86 | 38 |
| German | de |
138 | 361 | 65 | 54 | 88 | 125 |
| French | fr |
141 | 224 | 61 | 49 | 86 | 128 |
| Spanish | es |
166 | 230 | 60 | 49 | 78 | 126 |
| Polish | pl |
159 | 247 | 60 | 46 | 78 | 34 |
| Portuguese | pt |
163 | 204 | 60 | 51 | 79 | 36 |
| Italian | it |
168 | 231 | 63 | 49 | 79 | 38 |
| Arabic | ar |
126 | 139 | 65 | 40 | 59 | — |
| Chinese | zh |
127 | 137 | 51 | 38 | 59 | — |
| Japanese | ja |
120 | 123 | 66 | 41 | 59 | — |
| Korean | ko |
118 | 120 | 67 | 39 | 59 | — |
| Turkish | tr |
119 | 122 | 67 | 43 | 59 | — |
Universal processor works for any language using statistical methods — burstiness injection, perplexity normalization, sentence length variation, punctuation diversification.
🧬 NLP Infrastructure
TextHumanize includes a full NLP stack — all implemented in pure Python with zero external dependencies:
| Module | Component | Description |
|---|---|---|
pos_tagger.py |
POS Tagger (1,917 lines) | Rule-based part-of-speech tagger with suffix/prefix rules for EN/RU/UK/DE |
hmm_tagger.py |
HMM Tagger (642 lines) | Viterbi-decoding Hidden Markov Model for POS tagging |
cjk_segmenter.py |
CJK Segmenter (1,277 lines) | Forward/backward max-match Chinese, particle-stripping Korean, character-type Japanese |
morphology.py |
Morphology Engine (811 lines) | Suffix-based stemming and inflection for RU/UK/EN/DE |
collocation_engine.py |
Collocation Engine (224 lines) | PMI-based collocation scoring for context-aware synonym selection |
word_lm.py |
Word Language Model (435 lines) | Bigram/trigram with compressed frequency data for 25 languages |
neural_lm.py |
Neural Char-Level LM (391 lines) | LSTM-based character language model for perplexity scoring |
neural_engine.py |
Neural Primitives (610 lines) | Feed-forward net, LSTM cell, embeddings, HMM, layer norm, GELU — all in stdlib |
neural_paraphraser.py |
Seq2Seq Paraphraser (752 lines) | Encoder-decoder with Bahdanau attention for neural paraphrasing |
word_embeddings.py |
Word Vectors (399 lines) | Hash-based + cluster embeddings, cosine similarity, nearest neighbors |
sentence_split.py |
Smart Splitter (338 lines) | Abbreviation-aware sentence splitting (Mr./Dr./URLs/decimals) |
lang_detect.py |
Language Detector (328 lines) | Character trigram profiling for 25 languages |
context.py |
Contextual Synonyms (320 lines) | Word sense disambiguation via context windows and topic detection |
grammar.py |
Grammar Checker (360 lines) | Rule-based grammar for 9 languages (agreement, articles, punctuation) |
Total NLP infrastructure: ~8,800 lines of code, zero pip dependencies.
🔍 SEO Mode
TextHumanize includes a dedicated SEO workflow to humanize content without harming search rankings:
| Feature | How It Works |
|---|---|
| Keyword preservation | preserve and keep_keywords lists are never modified |
| Low intensity | SEO profile defaults to 40% — gentle transformations |
| No keyword stuffing | Does not add or repeat keywords |
| Structure preservation | Heading hierarchy (H1–H6) preserved |
| Meta-safe | Avoids changing first-paragraph introductions (critical for SEO) |
| Max change control | max_change_ratio=0.3 ensures minimal disruption |
📊 Readability Metrics
full_readability() returns 6 reading metrics:
| Index | Range | What It Measures |
|---|---|---|
| Flesch Reading Ease | 0–100 | Higher = easier (60–70 is ideal for web) |
| Flesch-Kincaid Grade | 0–18 | US school grade level |
| Coleman-Liau Index | 0–18 | Based on characters (not syllables) |
| Automated Readability Index | 0–14 | Character and word counts |
| SMOG Grade | 0–18 | Polysyllabic word density |
| Gunning Fog | 0–20 | Complex words + sentence length |
Grade interpretation:
| Grade | Audience |
|---|---|
| 5–6 | General public, social media |
| 7–8 | Web content, blog posts |
| 9–10 | Magazine articles |
| 11–12 | Academic papers |
| 13+ | Technical/legal documents |
✍️ Paraphrasing Engine
Rule-based syntactic paraphrasing — no LLM, no API, deterministic:
| Transform | Example |
|---|---|
| Active → Passive | "The team built the app" → "The app was built by the team" |
| Passive → Active | "The report was written by John" → "John wrote the report" |
| Clause reordering | "After analyzing data, we decided…" → "We decided… after analyzing data" |
| Nominalization reversal | "The implementation of X" → "Implementing X" |
| Connector shuffling | "Furthermore, X. Additionally, Y." → "What's more, X. Also, Y." |
| MWE decomposition | "take into account" → "consider" |
| Hedging injection | "X is true" → "X appears to be true" |
| Perspective rotation | "Users need X" → "X is needed by users" |
🎭 Tone Analysis & Adjustment
7-level formality scale with marker-based detection:
| Level | Name | Example Markers |
|---|---|---|
| 1 | slang |
"ya", "gonna", "lol", contractions |
| 2 | casual |
"pretty much", "kind of", first person |
| 3 | neutral |
Balanced register |
| 4 | professional |
"regarding", "in accordance with" |
| 5 | formal |
"henceforth", "notwithstanding" |
| 6 | academic |
"thus", "consequently", passive voice |
| 7 | legal |
"hereinafter", "whereas", "pursuant to" |
🛡️ Watermark Detection & Cleaning
Detects and removes 6 types of invisible text watermarks:
| Type | How It Hides | Detection Method |
|---|---|---|
| Zero-width characters | U+200B, U+200C, U+200D, U+FEFF | Unicode category scanning |
| Homoglyph substitution | Latin 'a' → Cyrillic 'а' | Confusable character mapping |
| Invisible Unicode | U+2060, U+2061–U+2064 | Codepoint range check |
| Directional markers | RTL/LTR overrides | Bidirectional control detection |
| Soft hyphens | U+00AD | Pattern matching |
| Tag characters | U+E0001–U+E007F | Unicode block scanning |
🔄 Content Spinning
Generate multiple unique variants with spintax support:
The spinner uses language-pack synonyms, contextual substitution, and sentence restructuring to produce each variant.
🔗 Coherence Analysis
Measure paragraph-level text flow:
| Metric | What It Measures |
|---|---|
| Paragraph similarity | TF-IDF cosine between adjacent paragraphs |
| Transition quality | Presence and appropriateness of connective phrases |
| Topic continuity | Keyword overlap between sections |
| Reference chains | Pronoun and entity co-reference tracking |
🔠 Morphological Engine
Rule-based morphology for 4 languages — lemmatization, inflection, declension:
| Language | Operations | Suffix Rules |
|---|---|---|
| Russian | Lemmatization, declension, conjugation | 200+ suffix patterns |
| Ukrainian | Lemmatization, declension | 180+ suffix patterns |
| English | Lemmatization, pluralization | 150+ rules |
| German | Lemmatization, compound splitting | 120+ rules |
🎨 Stylistic Fingerprinting
Extract and compare author stylometric profiles:
Fingerprint dimensions: Mean sentence length, length variance, vocabulary richness, function word distribution, punctuation profile, discourse marker usage, passive voice ratio, average word length.
🎛️ Auto-Tuner (Feedback Loop)
Automatically optimize intensity and profile based on feedback:
The tuner uses Bayesian-like optimization to find ideal (intensity, profile) combinations for your content type.
🔌 Plugin System
Register custom hooks at any of 20 pipeline stages:
Available hook points: watermark → segmentation → typography → debureaucratization → structure → repetitions → liveliness → paraphrasing → syntax_rewriting → tone → universal → naturalization → paraphrase_engine → sentence_restructuring → entropy_injection → readability → grammar → coherence → validation → restore
🧪 Using Individual Modules
Every module is independently importable:
💻 CLI Reference
CLI Flags
| Flag | Description |
|---|---|
-l, --lang |
Language code (required) |
-p, --profile |
Processing profile |
-i, --intensity |
Intensity 0–100 |
-o, --output |
Output file path |
--seed |
PRNG seed for reproducibility |
--keep |
Comma-separated keywords to preserve |
--brand |
Brand terms to never modify |
--max-change |
Maximum change ratio (0.0–1.0) |
--analyze |
Print analysis report |
--explain |
Print change explanation |
--detect-ai |
Run AI detection |
--audit |
Combined AI + watermark audit JSON |
--paraphrase |
Paraphrase mode |
--tone |
Adjust tone to target level |
--tone-analyze |
Analyze current tone |
--watermarks |
Detect watermarks |
--watermark-report |
Unified watermark JSON report |
--quality-gate |
off or strict post-processing guard |
--fail-under-quality |
Exit with code 2 if quality_score or benchmark average is below threshold |
--minimal / --only-flagged |
Only humanize AI-flagged sentences |
--spin |
Spin mode |
--variants N |
Number of spin variants |
--coherence |
Coherence analysis |
--readability |
Readability metrics |
--api |
Start REST API server |
--port |
API server port (default: 8080) |
--verbose |
Detailed output |
--report |
Save JSON report, or HTML when the path ends with .html |
--json |
JSON output format |
🌐 REST API Server
Zero-dependency HTTP server with rate limiting and CORS:
For FastAPI deployments, see examples/fastapi_integration.py. It includes
request body limits, text and batch size limits, per-request timeouts,
structured error envelopes with request ids, and /v1/humanize/batch.
OpenAPI 3.1 schema is available at GET /openapi.json for client generation,
contract tests, and API gateway import.
Endpoints
| Method | Endpoint | Description |
|---|---|---|
POST |
/humanize |
Full humanization |
POST |
/detect-ai |
AI detection (single or batch) |
POST |
/analyze |
Text metrics |
POST |
/paraphrase |
Paraphrase text |
POST |
/tone/analyze |
Tone analysis |
POST |
/tone/adjust |
Tone adjustment |
POST |
/watermarks/detect |
Detect watermarks |
POST |
/watermarks/clean |
Remove watermarks |
POST |
/spin |
Content spinning |
POST |
/spin/variants |
Spin N variants |
POST |
/coherence |
Coherence analysis |
POST |
/readability |
Readability metrics |
POST |
/sse/humanize |
SSE streaming humanization |
GET |
/health |
Health check |
GET |
/openapi.json |
OpenAPI 3.1 schema |
GET |
/ |
API documentation index |
OPTIONS |
* |
CORS preflight |
Rate limit: 10 req/s per IP, burst 20 · Max body: 5 MB
Example
Python Client
⚡ Async API
Native asyncio support for all public functions:
📈 Performance & Benchmarks
All benchmarks on Apple Silicon (M-series), Python 3.12, single thread, after warm-up. See the public Benchmark Methodology for corpus labels, quality dimensions, latency reporting rules, and detector limitations.
Speed
| Function | Text Size | Avg Latency |
|---|---|---|
humanize() |
~30 words | ~5 s |
humanize() |
~80 words | ~10 s |
humanize(phantom=True) |
~80 words | ~12 s |
detect_ai() |
~30 words | ~1 s |
detect_ai() |
~80 words | ~3 s |
paraphrase() |
~80 words | < 1 ms |
analyze_tone() |
~80 words | < 1 ms |
analyze() |
~80 words | ~80 ms |
AI Score Reduction
Properties
| Property | Value |
|---|---|
| Cold start | < 100 ms |
| LRU cache hit | 11× faster than cold |
| External network calls | 0 (offline-first) |
| Deterministic (same seed) | ✅ Always |
| Pipeline timeout | 30 s (configurable) |
| API rate limiting | 10 req/s per IP, burst 20 |
| Max input size | 1 MB |
| Memory per call | 4–200 KB |
Run benchmarks yourself:
🏗️ Architecture
Design principles:
| Principle | Implementation |
|---|---|
| Modular | Each stage is a standalone class; every module is independently importable |
| Zero dependencies | Pure Python stdlib — no pip packages at all |
| Declarative rules | Language packs are data-only (dicts), no logic in lang files |
| Idempotent | Running the pipeline twice won't double-transform text |
| Safe defaults | Works out-of-the-box with sensible profiles |
| Lazy imports | PEP 562 lazy loading — only imports what you use |
| Deterministic | Seed-based PRNG for reproducible output |
| Extensible | Plugin hooks at 38 stages, custom dictionaries, AI backend |
🟦 TypeScript / JavaScript Port
Core TextHumanize functionality in TypeScript for Node.js and browsers:
| Feature | Status |
|---|---|
humanize() |
✅ |
detectAi() |
✅ |
analyze() |
✅ |
| Language packs: EN, RU | ✅ |
| Universal processor | ✅ |
🐘 PHP Library
Full-featured PHP port with Composer support:
| Feature | Status |
|---|---|
| All 25 language packs | ✅ |
humanize(), humanize_batch(), humanize_chunked() |
✅ |
detect_ai(), analyze(), explain() |
✅ |
paraphrase(), analyze_tone(), adjust_tone() |
✅ |
detect_watermarks(), clean_watermarks() |
✅ |
spin(), spin_variants() |
✅ |
analyze_coherence(), full_readability() |
✅ |
| Plugin system | ✅ |
| 223 PHPUnit tests | ✅ |
✅ Testing & Quality
| Platform | Tests | Status |
|---|---|---|
| Python (pytest, 3.9–3.13) | 2,144 | ✅ All passing |
| PHP (PHPUnit, 8.1–8.3) | 223 | ✅ All passing |
| TypeScript (Jest) | 28 | ✅ All passing |
| Total | 2,395 | ✅ |
CI/CD: Every push triggers Python 3.9–3.13 + PHP 8.1–8.3 matrix, ruff lint, mypy type check, pytest with coverage ≥ 70%.
Core-language regressions: EN/RU/UK fixture packs verify protected tokens, cross-language leakage, and language-aware cleanup of over-humanized output.
Collocation guard: word-level replacements now keep strong local collocations intact, so natural phrases such as "heavy rain" are not weakened by context-free shorter synonyms.
Domain dictionaries: SaaS, ecommerce, fintech, legal, education, real
estate, and healthcare terms are auto-detected or explicitly protected via
preserve={"domains": [...]}.
🛡️ Security & Limits
| Aspect | Implementation |
|---|---|
| Input limits | 1 MB text, 5 MB API body |
| Network calls | Zero. No telemetry, no analytics, no phone-home |
| Dependencies | Zero. Pure stdlib only |
| Regex safety | All patterns are linear-time; no user input compiled to regex |
| Reproducibility | Seed-based PRNG, deterministic output |
| No eval/exec | No dynamic code execution |
| Rate limiting | Token bucket (API): 10 req/s, burst 20 |
| Sandboxing | Resource limits documented for production deployment |
Threat Model
| Threat | Mitigation |
|---|---|
| Data exfiltration | Zero network calls — impossible |
| ReDoS | All regex patterns audited for linear-time complexity |
| Memory exhaustion | 1 MB input limit, streaming for large texts |
| Model poisoning | Weights are read-only compressed JSON; no runtime training by default |
| Dependency supply chain | Zero pip dependencies — nothing to compromise |
Responsible Use
TextHumanize is built for style normalization, readability improvement, privacy-preserving audits, and internal AI-like/watermark risk checks. It does not guarantee passing external AI detectors, and its detector scores should be treated as internal quality signals rather than universal authorship verdicts.
Use it for content you own or are authorized to edit. Do not use it to misrepresent authorship, bypass required disclosure, remove provenance signals from third-party content, or submit work in contexts where AI assistance is prohibited.
Recommended production safeguards:
- show before/after diffs and
change_ratioto reviewers; - enable
quality_gate="strict"for sensitive content; - review
metrics_after["anti_overhumanize"]when high-intensity profiles are used; - use
minimal=True/--only-flaggedwhen only risky spans need edits; - preserve brand terms, named entities, numbers, URLs, quotes, and code;
- require manual review for legal, medical, financial, academic, and policy content;
- use neutral customer-facing language: "internal style and watermark risk signals", not "guaranteed detector bypass".
See the full Responsible Use guide.
🏢 For Business & Enterprise
| Requirement | How TextHumanize Delivers |
|---|---|
| Predictability | Seed-based PRNG — same input + seed = identical output |
| Privacy | 100% local. Zero network calls. No data leaves your server |
| Auditability | Every call returns change_ratio, quality_score, similarity, explain() report |
| Integration | Python SDK · JS SDK · PHP SDK · CLI · REST API · Docker · SSE streaming |
| Reliability | 2,356 tests across 3 platforms, CI/CD with ruff + mypy |
| No vendor lock-in | Zero dependencies. No cloud APIs, no API keys, no rate limits |
| Language coverage | 25 language packs + universal processor for any language |
| Self-hosted | Docker image, pip install, on-premise deployment |
| Content quality gate | quality_gate.py for CI/CD pipeline integration |
| Custom training | Train from your own corpus with dict_trainer and training.py |
| Brand safety | Keyword preservation, brand term protection, max change control |
Processing Modes
| Mode | Description | Use Case |
|---|---|---|
humanize() |
Full 38-stage pipeline | General-purpose normalization |
humanize_batch() |
Parallel processing (N workers) | Bulk content processing |
humanize_chunked() |
Split + process + rejoin | Documents > 10K chars |
humanize_until_human() |
Iterative (loop until built-in target score) | High-quality output |
humanize_stream() |
SSE paragraph streaming | Real-time UI |
humanize_ai() |
Rules + LLM backend (OpenAI/Ollama) | Maximum quality |
Docker Deployment
❓ FAQ & Troubleshooting
Q: Does TextHumanize guarantee passing GPTZero / Originality.ai / Turnitin? No. TextHumanize is a style normalization tool. It reduces AI-like patterns but does not guarantee bypassing external AI detectors. See Limitations.
Q: What's the best profile for reducing AI-like style signals?
chat with intensity 60–80 gives the largest reduction on TextHumanize's built-in detector benchmark. For professional content, try web at 70. External detector outcomes vary and should be verified separately.
Q: How do I preserve keywords (e.g., for SEO)?
Use constraints={"keep_keywords": ["keyword1", "keyword2"]} or
preserve={"brand_terms": ["BrandName"]}. By default, TextHumanize also protects
URLs, email, code, Markdown/HTML, dates, prices, versions, order ids, exact quotes,
and multi-token named entities.
Q: Can I use this for commercial projects? Yes, with a commercial license. See License & Pricing.
Q: Does it work offline? Does it send data to the internet? 100% offline. Zero network calls. Not even a health check ping. All processing is local.
Q: Why is the first call slower? The first call loads language packs and initializes caches. Subsequent calls are 11× faster via LRU cache.
Q: Can I train it on my own data?
Yes — dict_trainer.py trains custom dictionaries from your corpus, and training.py can retrain the neural detector/LM.
Q: How do I add support for a new language?
Create a language pack in texthumanize/lang/your_lang.py following the existing pattern (15 required sections). Or use the universal processor which works with any language automatically.
Q: Can I use individual modules (e.g., just POS tagger) without the full pipeline? Yes. Every module is independently importable. See Using Individual Modules.
Q: Is there a GUI? Try the Live Demo. For local use, the REST API + SSE streaming integrates easily with any frontend.
Q: How deterministic is it?
100% deterministic when using the same seed. Same input + same seed + same version = byte-identical output.
Q: What Python versions are supported? 3.9, 3.10, 3.11, 3.12, and 3.13 — all tested in CI.
🆕 What's New in v0.28.4
Explainable audit and safer humanization (0.28.4)
- Explainable AI detector reports —
detect_ai_explain()returns calibrated score, confidence interval, metric contributions, highlighted spans, sentence report, mixed-content shares, and suggested actions. - Unified watermark forensics —
watermark_report()covers invisible Unicode, homoglyphs, fullwidth/math lookalikes, and statistical watermark hypotheses with p-value/z-score evidence. - Promopilot-ready audit JSON —
audit_report()combines AI and watermark findings in a stable schema for product integrations and batch workflows. - Stricter quality controls —
quality_gate="strict"can reject risky rewrites, whileminimal=True/--only-flaggedchanges only flagged fragments. - Anti-overhumanize final guard — high-intensity output now trims stacked fillers, repeated discourse markers, and excessive expressive punctuation before returning.
- Better short commercial copy coverage — golden-set regression tests now cover landing, product, and support copy patterns.
- CI and release hardening — GitHub CI is green across Python 3.9-3.13, PHP 8.1-8.3, TypeScript/JavaScript, and docs; local release coverage is 80.09%.
Previous release readiness (0.28.3)
- GitHub community checklist completed — added Code of Conduct, Security Policy, issue templates, quality-report template, and pull request template.
- Release metadata sync — Python, PHP, and TypeScript package versions are aligned for PyPI, Packagist, and source installs.
- Safer release verification — version checks now validate package manifests plus README/CHANGELOG release references before publication.
Previous patch fixes (0.28.2)
- PHP HTML wrapper compatibility — internal orphan cleanup no longer strips external wrapper tokens like
THZ_APP_HTML_*, preventing broken restore in client wrappers. - HTML + keep_keywords flows now humanize properly —
THZ_KEYWORD_*/THZ_BRAND_*placeholders are treated as inline-safe, so structure/naturalization stages are not skipped. - Connector replacement after protected tags — connector rewrites now work even when a line starts with inline placeholders.
- Ukrainian naturalization hardening — added dedicated
ukreplacements/boosters and removed English fallback for non-English naturalization to avoid mixed-language artifacts.
Previous highlights (0.28.0)
Web Platform — Auth, Payments & Freemium (NEW)
- User registration with email/password + Google OAuth 2.0
- Multi-tier pricing — Free / Starter $29 / Pro $79 / Business $199/month
- API key management — create, revoke, and group keys per plan (1 / 5 / 20 keys)
- Monobank Acquiring payment integration with webhook activation
- Admin panel — user management, plan overrides, payment history, usage stats
- Freemium gates — guests limited to 3 requests/day; text results blurred after 500 chars
- API authentication — Bearer token support with session-based guest quota tracking
- Expanded API docs — authentication guide, per-plan rate limits, PAYG billing docs, error codes (401/429)
- Competitor comparison table added to Pricing page (vs Quillbot, Undetectable.ai, StealthGPT)
SentenceValidator™ — Interstage Quality Gate (v0.28.0)
sentence_validator.py(350 lines) — sentence-level artifact detection running at 7 checkpoints between pipeline stages- 10 artifact checks per sentence: duplicate words, broken contractions, orphaned punctuation, double conjunctions, dangling conjunctions, unterminated parens, triple+ repeats, fragment chains, conjunction chains, empty sentences
- Final sanitization in
run()method catches post-loop residual artifacts
Stats
- 2,105 tests · 122 modules · 235,000+ lines · 25 languages · 38-stage pipeline
🤝 Contributing
See CONTRIBUTING.md for development setup, testing, and PR guidelines.
Areas for contribution: New language packs · Improved synonym dictionaries · Better grammar rules · Performance optimizations · Additional integrations
Starter tasks with acceptance criteria are listed in the Good First Issues guide.
See CONTRIBUTORS.md for the full list of contributors.
⚠️ Limitations
TextHumanize is a style normalization tool. Please be aware of realistic expectations:
| Aspect | Current State | Notes |
|---|---|---|
| EN humanization | Reduces AI markers by 71–92% | Built-in detector; 94% → 2–23% |
| RU humanization | Reduces AI markers by 75% | Built-in detector; 80% → 5% |
| UK humanization | Reduces AI markers by 58% | Built-in detector; 75% → 17% |
| External AI detectors | Not a reliable bypass | GPTZero, Originality.ai use different models |
| Short texts (< 50 words) | Limited effect | Not enough context for meaningful transformation |
| Performance | 300–500 ms per paragraph | Fast enough for batch; not sub-millisecond |
| Built-in AI detector | Heuristic + statistical + neural | Useful for internal scoring; not equivalent to commercial detectors |
| Higher intensity | ≠ always lower AI score | Some transforms at high intensity may create new patterns |
What TextHumanize does well:
- ✅ Removes formulaic connectors ("furthermore", "it is important to note")
- ✅ Varies sentence length to add human-like burstiness
- ✅ Replaces bureaucratic vocabulary with simpler alternatives
- ✅ Deterministic, reproducible results with seed control
- ✅ 100% offline, no data leaks, zero dependencies
- ✅ Full audit trail with every call
What TextHumanize does NOT do:
- ❌ Guarantee passing external AI detectors (GPTZero, Originality.ai, Turnitin)
- ❌ Rewrite text at the semantic level (it's rule-based, not LLM-based)
- ❌ Handle domain-specific jargon (medical, legal, etc.) without custom dictionaries
💛 Support the Project
If TextHumanize saves you time or money, consider supporting development:
📄 License & Pricing
TextHumanize uses a dual license model:
| Use Case | License | Monthly |
|---|---|---|
| Personal / Academic / Open-source | Free License | Free |
| Commercial — 1 dev, 1 project | Indie | $29/mo |
| Commercial — up to 5 devs | Startup | $79/mo |
| Commercial — up to 20 devs | Business | $199/mo |
| Enterprise / On-prem / SLA / White-label | Enterprise | Contact us |
All commercial licenses include full source code, all updates, priority email support, and access to PHANTOM™ + ASH™ proprietary technologies. 100% offline — no data leaves your server, no per-request fees, no cloud lock-in. Monthly billing, cancel any time.
Documentation · Live Demo · PyPI · GitHub · Issues · Discussions · Commercial License
All versions of text-humanize with dependencies
ext-mbstring Version *
ext-json Version *