Download the PHP package ksanyok/text-humanize without Composer

On this page you can find all versions of the php package ksanyok/text-humanize. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package text-humanize

# TextHumanize ### The most advanced open-source text naturalization engine **Transform AI-generated text into clearer, more natural prose — with proprietary PHANTOM™, ASH™, and SentenceValidator™ technologies** **Reduce built-in AI-like style signals · 25 languages · 38-stage adaptive pipeline · 100% offline · Zero dependencies** **External AI detector results are not guaranteed.** TextHumanize improves style, readability, and internal risk signals; it is not a bypass guarantee.
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-3776AB.svg?logo=python&logoColor=white)](https://www.python.org/downloads/) [![TypeScript](https://img.shields.io/badge/TypeScript-5.x-3178C6.svg?logo=typescript&logoColor=white)]() [![PHP 8.1+](https://img.shields.io/badge/php-8.1+-777BB4.svg?logo=php&logoColor=white)](https://www.php.net/)    [![CI](https://github.com/ksanyok/TextHumanize/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/ksanyok/TextHumanize/actions/workflows/ci.yml) [![Tests](https://img.shields.io/badge/tests-2105%20passed-2ea44f.svg?logo=pytest&logoColor=white)](https://github.com/ksanyok/TextHumanize/actions/workflows/ci.yml)    [![Zero Dependencies](https://img.shields.io/badge/dependencies-zero-brightgreen.svg)]() [![PyPI](https://img.shields.io/pypi/v/texthumanize.svg?logo=pypi&logoColor=white)](https://pypi.org/project/texthumanize/) [![License](https://img.shields.io/badge/license-Dual%20(Free%20%2B%20Commercial)-blue.svg)](LICENSE)
**235,000+ lines of code** · **122 Python modules** · **38-stage pipeline** · **25 languages + universal** · **2,105 tests** **3 proprietary technologies:** PHANTOM™ (gradient-guided internal score optimization) · ASH™ (adaptive signature humanization) · SentenceValidator™ (interstage quality gate) [Quick Start](#-quick-start) · [Proprietary Technologies](#-proprietary-technologies) · [Before & After](#-before--after-examples) · [Features](#-feature-matrix) · [Benchmarks](#-performance--benchmarks) · [AI Detection](#-ai-detection-engine) · [API Reference](#-api-reference) · [Documentation](https://ksanyok.github.io/TextHumanize/) · [Live Demo](https://texthumanize.link/) · [License](#-license--pricing)

Table of Contents


TextHumanize is a pure-algorithmic text processing engine that transforms AI-generated drafts into clearer, more natural prose. Three proprietary technologies — PHANTOM™ (gradient-guided optimization against TextHumanize's own detector), ASH™ (adaptive signature humanization), and SentenceValidator™ (interstage quality control) — drive a 38-stage pipeline that reduces built-in AI-like style signals while preserving meaning. No neural networks, no API keys, no internet — just 235K+ lines of finely tuned rules, dictionaries, and statistical methods.

Honest note: TextHumanize is a style-normalization tool, not an AI-detection bypass tool. It reduces AI-like patterns (formulaic connectors, uniform sentence length, bureaucratic vocabulary) but does not guarantee that processed text will pass external AI detectors. Quality of humanization varies by language and text type. See Limitations below.

Built-in toolkit: AI Detection (3 detectors) · Paraphrasing · Tone Analysis · Watermark Cleaning · Content Spinning · Coherence Analysis · Readability Scoring · Stylistic Fingerprinting · Auto-Tuner · Perplexity Analysis · Plagiarism Detection · Grammar Check · Morphology Engine · Neural LM · Async API · SSE Streaming

Platforms: Python (full — 122 modules) · TypeScript/JavaScript (core) · PHP (full)

For business: SaaS integration · REST API with SSE streaming · Docker deployment · Bulk processing · Custom dictionaries · On-prem enterprise · White-label ready

Languages: 🇬🇧 EN · 🇷🇺 RU · 🇺🇦 UK · 🇩🇪 DE · 🇫🇷 FR · 🇪🇸 ES · 🇵🇱 PL · 🇧🇷 PT · 🇮🇹 IT · �🇱 NL · 🇸🇪 SV · 🇨🇿 CS · 🇷🇴 RO · 🇭🇺 HU · 🇩🇰 DA · 🇸🇦 AR · 🇨🇳 ZH · 🇯🇵 JA · 🇰🇷 KO · 🇹🇷 TR · 🇮🇳 HI · 🇻🇳 VI · 🇹🇭 TH · 🇮🇩 ID · 🇮🇱 HE · 🌍 any language via universal processor


🚀 Why TextHumanize?

Problem: Machine-generated text has uniform sentence lengths, bureaucratic vocabulary, formulaic connectors, and low stylistic diversity — reducing readability, engagement, and brand authenticity.

Solution: TextHumanize algorithmically normalizes text style while preserving meaning. Configurable intensity, deterministic output, full change reports. No cloud APIs, no rate limits, no data leaks.

Advantage Details
🚀 Blazing fast 300–500 ms for a paragraph; full article in 1–2 seconds
🔒 100% private All processing is local — your text never leaves your machine
🎯 Precise control Intensity 0–100, 9 profiles, 5 style presets, keyword preservation, max change ratio
🌍 25 languages Deep support for EN/RU/UK/DE; dictionaries for 25 languages; statistical processor for any other
📦 Zero dependencies Pure Python stdlib — no pip packages, no model downloads, starts in <100 ms
🔁 Reproducible Seed-based PRNG — same input + same seed = identical output
🧠 3-layer AI detection 18-metric heuristic + 35-feature logistic regression + MLP neural detector — no ML framework required
🔌 Plugin system Register custom hooks at any of 38 pipeline stages
📊 Full analytics Readability (6 indices), coherence, plagiarism, stylometric fingerprint, content health score
🎭 Tone control Analyze and adjust formality across 7 levels
📚 2,944 dictionary entries EN 1,733 + RU 1,345 + UK 1,042 + DE 874 + FR 718 + ES 749 + more
🏢 Enterprise-ready Dual license, 2,105+ tests, CI/CD, REST API, Docker, on-prem deployment
🛡️ Secure by design Input limits, zero network calls, linear-time regex, no eval/exec
📝 Full auditability Every call returns change_ratio, quality_score, similarity, explain() report

� Proprietary Technologies

TextHumanize includes three original, proprietary technologies not found in any other open-source library:

PHANTOM™ — Gradient-Guided Text Optimization Engine

phantom.py — 2,943 lines | An open-source text naturalizer that uses numerical gradient optimization against TextHumanize's own AI detector.

ASH™ — Adaptive Signature Humanization

ash_engine.py + signature_transfer.py + perplexity_sculptor.py | Statistically transforms text to match real human writing signatures.

SentenceValidator™ — Interstage Quality Gate

sentence_validator.py — 350 lines | Catches and eliminates artifacts between pipeline stages in real-time.


�📦 Installation

From source:

Tip: Pin your version for production: pip install texthumanize==0.28.4

PHP / TypeScript

⚡ Quick Start

Private Offline Workflow

For privacy-sensitive content, use the local audit -> safe cleanup -> strict humanize -> audit pattern. It keeps processing offline, preserves critical terms, and records review metrics without using cloud APIs.

The example uses backend="local", quality_gate="strict", minimal=True, brand/identifier preservation, and a socket guard that raises if any code tries to open a network connection. See the full Private Offline Workflow guide.

All Features at a Glance


🔄 Before & After Examples

English

Before (AI-generated, AI score: 94%):

Furthermore, it is important to note that the implementation of cloud computing facilitates the optimization of business processes. Additionally, the utilization of microservices constitutes a significant advancement. Moreover, the integration of artificial intelligence into the workflow enhances decision-making processes and contributes to overall organizational efficiency.

After (TextHumanize, profile="web", intensity=60, AI score: 23%):

Also, importantly, the implementation of cloud computing helps the tuning of business processes. Up a major advancement, additionally, the use of microservices makes. And, the merge of artificial intelligence into the workflow enhances decision-making processes; and, contributes to overall organizational speed.

Russian

Before (AI score: 80%):

Необходимо отметить, что данная методология обеспечивает существенное повышение эффективности рабочих процессов. Кроме того, внедрение инновационных технологий способствует оптимизации функционирования организации. Более того, использование искусственного интеллекта позволяет значительно улучшить процесс принятия решений.

After (AI score: 5%):

Важно — что данная метод даёт существенное повышение эффективности рабочих процессов! Впрочем, смотрите, внедрение инновационных технологий помогает оптимизации функционирования организации, значительно, к тому же, использование искусственного интеллекта позволяет улучшить процесс принятия решений.

Ukrainian

Before (AI score: 75%):

Необхідно зазначити, що дана методологія забезпечує суттєве підвищення ефективності робочих процесів. Крім того, впровадження інноваційних технологій сприяє оптимізації функціонування організації. Більш того, використання штучного інтелекту дозволяє значно покращити процес прийняття рішень.

After (AI score: 17%):

Важливо, що ця метод дає суттєве підвищення ефективності робочих процесів; в принципі, впровадження інноваційних технологій веде до оптимізації функціонування організації. До того ж, використання штучного інтелекту дає змогу сильно покращити процес прийняття рішень.

AI Score Reduction Summary

Language Before After Reduction Mode
English 94% 2% -92pp web/70
English 94% 23% -71pp web/60
Russian 80% 5% -75pp web/50
Ukrainian 75% 17% -58pp web/50

Built-in AI detector scores. Results measured with TextHumanize's 3-layer ensemble (heuristic + statistical + MLP neural). External detectors may produce different results.

Profile Comparison (EN, intensity=50)

Profile Change Ratio Quality AI Score After
web 0.50 0.20 27% 🟢
chat 0.61 0.20 27% 🟢
marketing 0.48 0.25 27% 🟢
seo 0.48 0.25 33% 🟢
formal 0.48 0.24 29% 🟢
academic 0.48 0.24 29% 🟢

Input AI score: 94% — all profiles bring it below 35%.


🧩 Feature Matrix

Category Feature Python JS PHP
Core humanize() — 38-stage pipeline
humanize_batch() — parallel processing
humanize_chunked() — large text support
humanize_ai() — three-tier AI + rules
humanize_until_human() — iterative
humanize_sentences() — per-sentence
humanize_stream() — streaming
humanize_variants() — N output variants
analyze() — artificiality scoring
explain() — change report
AI Detection detect_ai() — 3-layer ensemble
detect_ai_batch() — batch detection
detect_ai_sentences() — per-sentence
detect_ai_mixed() — mixed content
StatisticalDetector — 35-feature LR
NeuralAIDetector — MLP (pure Python)
NLP paraphrase() — syntactic transforms
POSTagger — rule-based POS (4 langs)
HMMTagger — Viterbi HMM tagger
CJKSegmenter — zh/ja/ko segmentation
SyntaxRewriter — 8+ sentence transforms
WordLanguageModel — perplexity (14 langs)
NeuralPerplexity — LSTM char-level LM
CollocEngine — PMI scoring + replacement guard
MorphologyEngine — 4 languages
WordVec — lightweight word vectors
Tone analyze_tone() — formality analysis
adjust_tone() — 7-level adjustment
Watermarks detect_watermarks() — 6 types
clean_watermarks() — removal
Spinning spin() / spin_variants()
Analysis analyze_coherence() — paragraph flow
full_readability() — 6 indices
check_grammar() — rule-based (9 langs)
uniqueness_score() — plagiarism check
content_health() — composite 0–100
semantic_similarity() — TF-IDF cosine
sentence_readability() — per-sentence
Stylistic fingerprinting
Quality BenchmarkSuite — 6-dimension scoring
FingerprintRandomizer — anti-detection
QualityGate — CI/CD content check
Advanced Style presets (5 personas)
Auto-Tuner (feedback loop)
AI backend (OpenAI/Ollama/OSS)
Custom dictionary overlays
Domain dictionaries (SaaS/ecommerce/etc.)
Dictionary trainer (corpus)
Neural network training loop
Dashboard (HTML reports)
Plugin system
REST API (OpenAPI + SSE)
SSE streaming
CLI (15+ commands)
Languages Full dictionary support 14 2 14
Universal processor

⚔️ Comparison with Competitors

vs. Online Humanizers & GPT/LLM Rewriting

Criterion TextHumanize Online Humanizers GPT/LLM Rewriting
Works offline
Privacy ✅ 100% local ❌ Third-party servers ❌ Cloud API
Speed ~300 ms/paragraph 2–10 sec (network) ~500 chars/sec
Cost per 1M chars $0 $10–50/month $15–60 (GPT-4)
API key required No Yes Yes
Deterministic ✅ Seed-based
Languages 25 + universal 1–3 10+ but expensive
Built-in AI detector ✅ 3-layer ensemble ❌ or basic
Max change control max_change_ratio ❌ Unpredictable
Open source
Self-hosted ✅ Docker / pip
Audit trail explain()

vs. Other Open-Source Libraries

Feature TextHumanize Typical Alternatives
Pipeline stages 38 2–4
Languages 25 + universal 1–2
AI detection ✅ 3-layer (18 + 35 + MLP)
Python tests 2,105 10–50
Codebase size 235,000+ lines 500–2K
Platforms Python + JS + PHP Single
Plugin system
Tone analysis ✅ 7 levels
REST API ✅ OpenAPI + SSE
Readability metrics ✅ 6 indices 0–1
Morphological engine ✅ 4 languages
Neural components MLP + LSTM + HMM
Content spinning ✅ spintax
Stylistic fingerprinting
Grammar checker ✅ 9 languages
Plagiarism detection ✅ n-gram

vs. AI Detectors (GPTZero, Originality.ai)

Feature TextHumanize GPTZero Originality.ai
Price Free From $10/mo From $14.95/mo
Works offline
Self-hosted
Per-sentence detection
Mixed-content detection
Combined humanize + detect
Custom training dict_trainer
API ✅ REST + SSE ✅ REST ✅ REST
Batch detection ✅ (paid) ✅ (paid)
CI/CD quality gate quality_gate.py

🔧 Processing Pipeline (38 Stages)

Adaptive intensity: Auto-reduces processing for already-natural text. Graduated retry: Retries at lower intensity if change ratio exceeds the limit. SentenceValidator™: 7 interstage checkpoints catch artifacts between stages (10 checks per sentence). Tier system: Tier 1 languages (EN/RU/UK/DE) get all 38 stages. Tier 2 (FR/ES/IT/PL/PT/NL/SV/CS/RO/HU/DA) get ~30. Tier 3 (AR/ZH/JA/KO/TR/HI/VI/TH/ID/HE) get ~20 + universal.


🧠 AI Detection Engine

Three independent detectors combined into a single score:

Architecture

18 Heuristic Metrics

# Metric What It Measures
1 Entropy Character/word-level Shannon entropy
2 Burstiness Sentence/paragraph length variability (humans vary, AI doesn't)
3 Vocabulary TTR, MATTR, Yule's K, hapax legomena ratio
4 Zipf Fit to Zipf's law distribution
5 Stylometry Function word patterns, punctuation fingerprint
6 AI Patterns Formulaic phrases ("it is important to note", "furthermore")
7 Punctuation Punctuation distribution profile
8 Coherence Paragraph uniformity (too-uniform = AI)
9 Grammar Grammatical "perfection" level (too-perfect = AI)
10 Openings Sentence-opening diversity
11 Readability Consistency of readability scores across sentences
12 Rhythm Syllable patterns, sentence length rhythm
13 Perplexity N-gram predictability
14 Discourse Discourse structure (topic sentences, markers)
15 Semantic Repetition Cross-paragraph semantic overlap
16 Entity Specificity of named entities and examples
17 Voice Passive vs. active voice ratio
18 Topic Sentence Topic-sentence-per-paragraph pattern

35-Feature Statistical Detector (Logistic Regression)

Category Features
Lexical (4) Type-token ratio, hapax ratio, avg word length, word length variance
Sentence (3) Mean sentence length, length variance, length skewness
Vocabulary (3) Yule's K, Simpson's diversity, vocabulary richness
N-gram (3) Bigram/trigram repetition rates, unique bigram ratio
Entropy (3) Character entropy, word entropy, bigram entropy
Burstiness (2) Sentence burstiness, vocabulary burstiness
Structural (3) Paragraph count, avg paragraph length, list/bullet ratio
Punctuation (5) Comma, semicolon, dash, question, exclamation rates
AI Pattern (1) AI pattern rate (strongest single feature, weight −2.10)
Perplexity (2) Word frequency rank variance, Zipf fit residual
Readability (2) Syllables/word, Flesch score normalized
Discourse (3) Starter diversity, conjunction rate, transition word rate
Rhythm (1) Consecutive length difference variance

Neural MLP Detector

Feed-forward neural network entirely in pure Python (no PyTorch, no TensorFlow). Pre-trained weights shipped as compressed JSON (54 KB).

Verdicts

Score Verdict Meaning
< 35% human_written Likely written by a human
35–65% mixed Mixed content or uncertain
≥ 65% ai_generated Likely AI-generated

Detection Modes


📖 API Reference

humanize(text, lang, **kwargs) → HumanizeResult

Parameter Type Default Description
text str Input text (max 1 MB)
lang str Language code: en, ru, uk, de, etc.
profile str "web" Processing profile: chat, web, seo, docs, formal, academic, marketing, social, email, plus intent aliases seo_article, landing_page, product_description, support_reply, legal, social_post
intensity int 50 Aggressiveness 0–100
seed int None PRNG seed for reproducibility
preserve dict {} Protect code, URLs, email, dates, prices, ids, quotes, named entities, brand terms
minimal bool False Only humanize AI-flagged sentences
max_change_ratio float None Maximum allowed proportion of change (0.0–1.0)
constraints dict {} Advanced constraints (keep_keywords, etc.)
quality_gate str None Use "strict" to rollback on similarity, grammar, or readability regression
backend str None LLM backend: "openai", "ollama", "oss", "auto"

Returns HumanizeResult:

Field Type Description
.text str Processed text
.change_ratio float Proportion of text changed (0.0–1.0)
.quality_score float Quality metric
.similarity float Semantic similarity to original
.metrics_after["humanize_explain"] dict Top 5 change reasons, top 5 remaining risks, sentence-level risk deltas
.metrics_after["anti_overhumanize"] dict Final guard report for stacked fillers, repeated discourse markers, and excessive ! / ? punctuation
.stages list Stages applied with timing

Other Humanization Modes

detect_ai(text, lang) → dict

Field Description
score AI probability (0.0–1.0)
verdict "human_written", "mixed", or "ai_generated"
confidence Confidence level (0.0–1.0)
metrics Individual metric scores (18 heuristic + 35 statistical)
combined_score Weighted average of all detectors

Other Core Functions

Function Description
analyze(text, lang) Returns AnalysisReport with artificiality score, sentence stats
explain(result) Human-readable change report
paraphrase(text, lang) Syntactic paraphrasing (voice transforms, connector shuffling)
analyze_tone(text, lang) Tone analysis (formality, style)
adjust_tone(text, target, lang) Adjust formality to 7 levels
detect_ai_explain(text, lang) Explainable AI detector report with spans and suggested actions
audit_report(text, lang) Combined AI + watermark audit JSON
detect_watermarks(text) Detect 6 types of invisible watermarks
clean_watermarks(text) Remove all detected watermarks
watermark_report(text, lang) Unified Unicode + statistical watermark report
spin(text, lang) Generate a single spun variant
spin_variants(text, count, lang) Generate N spun variants
analyze_coherence(text, lang) Paragraph flow analysis
full_readability(text, lang) 6 readability indices
build_author_profile(text, lang) Stylometric fingerprint
compare_fingerprint(text, profile) Compare text to an author profile
anonymize_style(text, lang) Stylometric anonymization
check_grammar(text, lang) Grammar check (9 languages)
uniqueness_score(text) N-gram uniqueness
content_health(text, lang) Composite quality score 0–100

🎭 Profiles & Style Presets

Processing Profiles

Profile Use Case Sentence Length Colloquialisms Default Intensity
chat Messaging, social media 8–18 words High 80
web Blog posts, articles 10–22 words Medium 60
seo SEO content (keyword-safe) 12–25 words None 40
docs Technical documentation 12–28 words None 50
formal Legal, official 15–30 words None 30
academic Research papers 15–30 words None 25
marketing Sales, promo copy 8–20 words Medium 70
social Social media posts 6–15 words High 85
email Business emails 10–22 words Medium 50

Style Presets (5 Personas)

Preset Sentences Vocabulary Style
🎓 student Short–medium Simple Conversational, informal
✍️ copywriter Varied (short bursts + long) Dynamic Energetic, varied rhythm
🔬 scientist Long, complex Technical Formal, precise, cautious hedging
📰 journalist Medium, diverse Clear Neutral, fact-oriented
💬 blogger Short, punchy Informal Questions, exclamations, personal

Intensity Levels

Range Effect Use Case
0–20 Minimal — typography and watermarks only Already-natural text
21–40 Light — connectors and basic synonym swap SEO, formal content
41–60 Moderate — structure + paraphrasing Blog posts, web content
61–80 Aggressive — syntax rewriting + entropy Chat, social media
81–100 Maximum — all transforms at full power Heavy AI text

🌍 Language Support

Language Tiers

Tier Languages Detection Humanization Syntax Rewriting
1 EN, RU, UK, DE ✅ Full ✅ Full 38-stage
2 FR, ES, IT, PL, PT ✅ Good ✅ 15-stage
3 AR, ZH, JA, KO, TR ✅ Basic ✅ 10-stage + universal
0 Any other language ✅ Statistical ✅ Universal processor

Dictionary Coverage

Language Code Synonyms Bureaucratic AI Connectors Sentence Starters Colloquial Collocations
English en 431 645 152 75 127 1,578
Russian ru 269 486 100 73 102 408
Ukrainian uk 243 338 75 46 86 38
German de 138 361 65 54 88 125
French fr 141 224 61 49 86 128
Spanish es 166 230 60 49 78 126
Polish pl 159 247 60 46 78 34
Portuguese pt 163 204 60 51 79 36
Italian it 168 231 63 49 79 38
Arabic ar 126 139 65 40 59
Chinese zh 127 137 51 38 59
Japanese ja 120 123 66 41 59
Korean ko 118 120 67 39 59
Turkish tr 119 122 67 43 59

Universal processor works for any language using statistical methods — burstiness injection, perplexity normalization, sentence length variation, punctuation diversification.


🧬 NLP Infrastructure

TextHumanize includes a full NLP stack — all implemented in pure Python with zero external dependencies:

Module Component Description
pos_tagger.py POS Tagger (1,917 lines) Rule-based part-of-speech tagger with suffix/prefix rules for EN/RU/UK/DE
hmm_tagger.py HMM Tagger (642 lines) Viterbi-decoding Hidden Markov Model for POS tagging
cjk_segmenter.py CJK Segmenter (1,277 lines) Forward/backward max-match Chinese, particle-stripping Korean, character-type Japanese
morphology.py Morphology Engine (811 lines) Suffix-based stemming and inflection for RU/UK/EN/DE
collocation_engine.py Collocation Engine (224 lines) PMI-based collocation scoring for context-aware synonym selection
word_lm.py Word Language Model (435 lines) Bigram/trigram with compressed frequency data for 25 languages
neural_lm.py Neural Char-Level LM (391 lines) LSTM-based character language model for perplexity scoring
neural_engine.py Neural Primitives (610 lines) Feed-forward net, LSTM cell, embeddings, HMM, layer norm, GELU — all in stdlib
neural_paraphraser.py Seq2Seq Paraphraser (752 lines) Encoder-decoder with Bahdanau attention for neural paraphrasing
word_embeddings.py Word Vectors (399 lines) Hash-based + cluster embeddings, cosine similarity, nearest neighbors
sentence_split.py Smart Splitter (338 lines) Abbreviation-aware sentence splitting (Mr./Dr./URLs/decimals)
lang_detect.py Language Detector (328 lines) Character trigram profiling for 25 languages
context.py Contextual Synonyms (320 lines) Word sense disambiguation via context windows and topic detection
grammar.py Grammar Checker (360 lines) Rule-based grammar for 9 languages (agreement, articles, punctuation)

Total NLP infrastructure: ~8,800 lines of code, zero pip dependencies.


🔍 SEO Mode

TextHumanize includes a dedicated SEO workflow to humanize content without harming search rankings:

Feature How It Works
Keyword preservation preserve and keep_keywords lists are never modified
Low intensity SEO profile defaults to 40% — gentle transformations
No keyword stuffing Does not add or repeat keywords
Structure preservation Heading hierarchy (H1–H6) preserved
Meta-safe Avoids changing first-paragraph introductions (critical for SEO)
Max change control max_change_ratio=0.3 ensures minimal disruption

📊 Readability Metrics

full_readability() returns 6 reading metrics:

Index Range What It Measures
Flesch Reading Ease 0–100 Higher = easier (60–70 is ideal for web)
Flesch-Kincaid Grade 0–18 US school grade level
Coleman-Liau Index 0–18 Based on characters (not syllables)
Automated Readability Index 0–14 Character and word counts
SMOG Grade 0–18 Polysyllabic word density
Gunning Fog 0–20 Complex words + sentence length

Grade interpretation:

Grade Audience
5–6 General public, social media
7–8 Web content, blog posts
9–10 Magazine articles
11–12 Academic papers
13+ Technical/legal documents

✍️ Paraphrasing Engine

Rule-based syntactic paraphrasing — no LLM, no API, deterministic:

Transform Example
Active → Passive "The team built the app" → "The app was built by the team"
Passive → Active "The report was written by John" → "John wrote the report"
Clause reordering "After analyzing data, we decided…" → "We decided… after analyzing data"
Nominalization reversal "The implementation of X" → "Implementing X"
Connector shuffling "Furthermore, X. Additionally, Y." → "What's more, X. Also, Y."
MWE decomposition "take into account" → "consider"
Hedging injection "X is true" → "X appears to be true"
Perspective rotation "Users need X" → "X is needed by users"

🎭 Tone Analysis & Adjustment

7-level formality scale with marker-based detection:

Level Name Example Markers
1 slang "ya", "gonna", "lol", contractions
2 casual "pretty much", "kind of", first person
3 neutral Balanced register
4 professional "regarding", "in accordance with"
5 formal "henceforth", "notwithstanding"
6 academic "thus", "consequently", passive voice
7 legal "hereinafter", "whereas", "pursuant to"

🛡️ Watermark Detection & Cleaning

Detects and removes 6 types of invisible text watermarks:

Type How It Hides Detection Method
Zero-width characters U+200B, U+200C, U+200D, U+FEFF Unicode category scanning
Homoglyph substitution Latin 'a' → Cyrillic 'а' Confusable character mapping
Invisible Unicode U+2060, U+2061–U+2064 Codepoint range check
Directional markers RTL/LTR overrides Bidirectional control detection
Soft hyphens U+00AD Pattern matching
Tag characters U+E0001–U+E007F Unicode block scanning

🔄 Content Spinning

Generate multiple unique variants with spintax support:

The spinner uses language-pack synonyms, contextual substitution, and sentence restructuring to produce each variant.


🔗 Coherence Analysis

Measure paragraph-level text flow:

Metric What It Measures
Paragraph similarity TF-IDF cosine between adjacent paragraphs
Transition quality Presence and appropriateness of connective phrases
Topic continuity Keyword overlap between sections
Reference chains Pronoun and entity co-reference tracking

🔠 Morphological Engine

Rule-based morphology for 4 languages — lemmatization, inflection, declension:

Language Operations Suffix Rules
Russian Lemmatization, declension, conjugation 200+ suffix patterns
Ukrainian Lemmatization, declension 180+ suffix patterns
English Lemmatization, pluralization 150+ rules
German Lemmatization, compound splitting 120+ rules

🎨 Stylistic Fingerprinting

Extract and compare author stylometric profiles:

Fingerprint dimensions: Mean sentence length, length variance, vocabulary richness, function word distribution, punctuation profile, discourse marker usage, passive voice ratio, average word length.


🎛️ Auto-Tuner (Feedback Loop)

Automatically optimize intensity and profile based on feedback:

The tuner uses Bayesian-like optimization to find ideal (intensity, profile) combinations for your content type.


🔌 Plugin System

Register custom hooks at any of 20 pipeline stages:

Available hook points: watermarksegmentationtypographydebureaucratizationstructurerepetitionslivelinessparaphrasingsyntax_rewritingtoneuniversalnaturalizationparaphrase_enginesentence_restructuringentropy_injectionreadabilitygrammarcoherencevalidationrestore


🧪 Using Individual Modules

Every module is independently importable:


💻 CLI Reference

CLI Flags

Flag Description
-l, --lang Language code (required)
-p, --profile Processing profile
-i, --intensity Intensity 0–100
-o, --output Output file path
--seed PRNG seed for reproducibility
--keep Comma-separated keywords to preserve
--brand Brand terms to never modify
--max-change Maximum change ratio (0.0–1.0)
--analyze Print analysis report
--explain Print change explanation
--detect-ai Run AI detection
--audit Combined AI + watermark audit JSON
--paraphrase Paraphrase mode
--tone Adjust tone to target level
--tone-analyze Analyze current tone
--watermarks Detect watermarks
--watermark-report Unified watermark JSON report
--quality-gate off or strict post-processing guard
--fail-under-quality Exit with code 2 if quality_score or benchmark average is below threshold
--minimal / --only-flagged Only humanize AI-flagged sentences
--spin Spin mode
--variants N Number of spin variants
--coherence Coherence analysis
--readability Readability metrics
--api Start REST API server
--port API server port (default: 8080)
--verbose Detailed output
--report Save JSON report, or HTML when the path ends with .html
--json JSON output format

🌐 REST API Server

Zero-dependency HTTP server with rate limiting and CORS:

For FastAPI deployments, see examples/fastapi_integration.py. It includes request body limits, text and batch size limits, per-request timeouts, structured error envelopes with request ids, and /v1/humanize/batch.

OpenAPI 3.1 schema is available at GET /openapi.json for client generation, contract tests, and API gateway import.

Endpoints

Method Endpoint Description
POST /humanize Full humanization
POST /detect-ai AI detection (single or batch)
POST /analyze Text metrics
POST /paraphrase Paraphrase text
POST /tone/analyze Tone analysis
POST /tone/adjust Tone adjustment
POST /watermarks/detect Detect watermarks
POST /watermarks/clean Remove watermarks
POST /spin Content spinning
POST /spin/variants Spin N variants
POST /coherence Coherence analysis
POST /readability Readability metrics
POST /sse/humanize SSE streaming humanization
GET /health Health check
GET /openapi.json OpenAPI 3.1 schema
GET / API documentation index
OPTIONS * CORS preflight

Rate limit: 10 req/s per IP, burst 20 · Max body: 5 MB

Example

Python Client


⚡ Async API

Native asyncio support for all public functions:


📈 Performance & Benchmarks

All benchmarks on Apple Silicon (M-series), Python 3.12, single thread, after warm-up. See the public Benchmark Methodology for corpus labels, quality dimensions, latency reporting rules, and detector limitations.

Speed

Function Text Size Avg Latency
humanize() ~30 words ~5 s
humanize() ~80 words ~10 s
humanize(phantom=True) ~80 words ~12 s
detect_ai() ~30 words ~1 s
detect_ai() ~80 words ~3 s
paraphrase() ~80 words < 1 ms
analyze_tone() ~80 words < 1 ms
analyze() ~80 words ~80 ms

AI Score Reduction

Properties

Property Value
Cold start < 100 ms
LRU cache hit 11× faster than cold
External network calls 0 (offline-first)
Deterministic (same seed) ✅ Always
Pipeline timeout 30 s (configurable)
API rate limiting 10 req/s per IP, burst 20
Max input size 1 MB
Memory per call 4–200 KB

Run benchmarks yourself:


🏗️ Architecture

Design principles:

Principle Implementation
Modular Each stage is a standalone class; every module is independently importable
Zero dependencies Pure Python stdlib — no pip packages at all
Declarative rules Language packs are data-only (dicts), no logic in lang files
Idempotent Running the pipeline twice won't double-transform text
Safe defaults Works out-of-the-box with sensible profiles
Lazy imports PEP 562 lazy loading — only imports what you use
Deterministic Seed-based PRNG for reproducible output
Extensible Plugin hooks at 38 stages, custom dictionaries, AI backend

🟦 TypeScript / JavaScript Port

Core TextHumanize functionality in TypeScript for Node.js and browsers:

Feature Status
humanize()
detectAi()
analyze()
Language packs: EN, RU
Universal processor

🐘 PHP Library

Full-featured PHP port with Composer support:

Feature Status
All 25 language packs
humanize(), humanize_batch(), humanize_chunked()
detect_ai(), analyze(), explain()
paraphrase(), analyze_tone(), adjust_tone()
detect_watermarks(), clean_watermarks()
spin(), spin_variants()
analyze_coherence(), full_readability()
Plugin system
223 PHPUnit tests

✅ Testing & Quality

Platform Tests Status
Python (pytest, 3.9–3.13) 2,144 ✅ All passing
PHP (PHPUnit, 8.1–8.3) 223 ✅ All passing
TypeScript (Jest) 28 ✅ All passing
Total 2,395

CI/CD: Every push triggers Python 3.9–3.13 + PHP 8.1–8.3 matrix, ruff lint, mypy type check, pytest with coverage ≥ 70%.

Core-language regressions: EN/RU/UK fixture packs verify protected tokens, cross-language leakage, and language-aware cleanup of over-humanized output.

Collocation guard: word-level replacements now keep strong local collocations intact, so natural phrases such as "heavy rain" are not weakened by context-free shorter synonyms.

Domain dictionaries: SaaS, ecommerce, fintech, legal, education, real estate, and healthcare terms are auto-detected or explicitly protected via preserve={"domains": [...]}.


🛡️ Security & Limits

Aspect Implementation
Input limits 1 MB text, 5 MB API body
Network calls Zero. No telemetry, no analytics, no phone-home
Dependencies Zero. Pure stdlib only
Regex safety All patterns are linear-time; no user input compiled to regex
Reproducibility Seed-based PRNG, deterministic output
No eval/exec No dynamic code execution
Rate limiting Token bucket (API): 10 req/s, burst 20
Sandboxing Resource limits documented for production deployment

Threat Model

Threat Mitigation
Data exfiltration Zero network calls — impossible
ReDoS All regex patterns audited for linear-time complexity
Memory exhaustion 1 MB input limit, streaming for large texts
Model poisoning Weights are read-only compressed JSON; no runtime training by default
Dependency supply chain Zero pip dependencies — nothing to compromise

Responsible Use

TextHumanize is built for style normalization, readability improvement, privacy-preserving audits, and internal AI-like/watermark risk checks. It does not guarantee passing external AI detectors, and its detector scores should be treated as internal quality signals rather than universal authorship verdicts.

Use it for content you own or are authorized to edit. Do not use it to misrepresent authorship, bypass required disclosure, remove provenance signals from third-party content, or submit work in contexts where AI assistance is prohibited.

Recommended production safeguards:

See the full Responsible Use guide.


🏢 For Business & Enterprise

Requirement How TextHumanize Delivers
Predictability Seed-based PRNG — same input + seed = identical output
Privacy 100% local. Zero network calls. No data leaves your server
Auditability Every call returns change_ratio, quality_score, similarity, explain() report
Integration Python SDK · JS SDK · PHP SDK · CLI · REST API · Docker · SSE streaming
Reliability 2,356 tests across 3 platforms, CI/CD with ruff + mypy
No vendor lock-in Zero dependencies. No cloud APIs, no API keys, no rate limits
Language coverage 25 language packs + universal processor for any language
Self-hosted Docker image, pip install, on-premise deployment
Content quality gate quality_gate.py for CI/CD pipeline integration
Custom training Train from your own corpus with dict_trainer and training.py
Brand safety Keyword preservation, brand term protection, max change control

Processing Modes

Mode Description Use Case
humanize() Full 38-stage pipeline General-purpose normalization
humanize_batch() Parallel processing (N workers) Bulk content processing
humanize_chunked() Split + process + rejoin Documents > 10K chars
humanize_until_human() Iterative (loop until built-in target score) High-quality output
humanize_stream() SSE paragraph streaming Real-time UI
humanize_ai() Rules + LLM backend (OpenAI/Ollama) Maximum quality

Docker Deployment


❓ FAQ & Troubleshooting

Q: Does TextHumanize guarantee passing GPTZero / Originality.ai / Turnitin? No. TextHumanize is a style normalization tool. It reduces AI-like patterns but does not guarantee bypassing external AI detectors. See Limitations.

Q: What's the best profile for reducing AI-like style signals? chat with intensity 60–80 gives the largest reduction on TextHumanize's built-in detector benchmark. For professional content, try web at 70. External detector outcomes vary and should be verified separately.

Q: How do I preserve keywords (e.g., for SEO)? Use constraints={"keep_keywords": ["keyword1", "keyword2"]} or preserve={"brand_terms": ["BrandName"]}. By default, TextHumanize also protects URLs, email, code, Markdown/HTML, dates, prices, versions, order ids, exact quotes, and multi-token named entities.

Q: Can I use this for commercial projects? Yes, with a commercial license. See License & Pricing.

Q: Does it work offline? Does it send data to the internet? 100% offline. Zero network calls. Not even a health check ping. All processing is local.

Q: Why is the first call slower? The first call loads language packs and initializes caches. Subsequent calls are 11× faster via LRU cache.

Q: Can I train it on my own data? Yes — dict_trainer.py trains custom dictionaries from your corpus, and training.py can retrain the neural detector/LM.

Q: How do I add support for a new language? Create a language pack in texthumanize/lang/your_lang.py following the existing pattern (15 required sections). Or use the universal processor which works with any language automatically.

Q: Can I use individual modules (e.g., just POS tagger) without the full pipeline? Yes. Every module is independently importable. See Using Individual Modules.

Q: Is there a GUI? Try the Live Demo. For local use, the REST API + SSE streaming integrates easily with any frontend.

Q: How deterministic is it? 100% deterministic when using the same seed. Same input + same seed + same version = byte-identical output.

Q: What Python versions are supported? 3.9, 3.10, 3.11, 3.12, and 3.13 — all tested in CI.


🆕 What's New in v0.28.4

Explainable audit and safer humanization (0.28.4)

Previous release readiness (0.28.3)

Previous patch fixes (0.28.2)

Previous highlights (0.28.0)

Web Platform — Auth, Payments & Freemium (NEW)

SentenceValidator™ — Interstage Quality Gate (v0.28.0)

Stats


🤝 Contributing

See CONTRIBUTING.md for development setup, testing, and PR guidelines.

Areas for contribution: New language packs · Improved synonym dictionaries · Better grammar rules · Performance optimizations · Additional integrations

Starter tasks with acceptance criteria are listed in the Good First Issues guide.

See CONTRIBUTORS.md for the full list of contributors.


⚠️ Limitations

TextHumanize is a style normalization tool. Please be aware of realistic expectations:

Aspect Current State Notes
EN humanization Reduces AI markers by 71–92% Built-in detector; 94% → 2–23%
RU humanization Reduces AI markers by 75% Built-in detector; 80% → 5%
UK humanization Reduces AI markers by 58% Built-in detector; 75% → 17%
External AI detectors Not a reliable bypass GPTZero, Originality.ai use different models
Short texts (< 50 words) Limited effect Not enough context for meaningful transformation
Performance 300–500 ms per paragraph Fast enough for batch; not sub-millisecond
Built-in AI detector Heuristic + statistical + neural Useful for internal scoring; not equivalent to commercial detectors
Higher intensity ≠ always lower AI score Some transforms at high intensity may create new patterns

What TextHumanize does well:

What TextHumanize does NOT do:


💛 Support the Project

If TextHumanize saves you time or money, consider supporting development:

PayPal


📄 License & Pricing

TextHumanize uses a dual license model:

Use Case License Monthly
Personal / Academic / Open-source Free License Free
Commercial — 1 dev, 1 project Indie $29/mo
Commercial — up to 5 devs Startup $79/mo
Commercial — up to 20 devs Business $199/mo
Enterprise / On-prem / SLA / White-label Enterprise Contact us

All commercial licenses include full source code, all updates, priority email support, and access to PHANTOM™ + ASH™ proprietary technologies. 100% offline — no data leaves your server, no per-request fees, no cloud lock-in. Monthly billing, cancel any time.

[email protected]


Documentation · Live Demo · PyPI · GitHub · Issues · Discussions · Commercial License


All versions of text-humanize with dependencies

PHP Build Version
Package Version
Requires php Version >=8.1
ext-mbstring Version *
ext-json Version *
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package ksanyok/text-humanize contains the following files

Loading the files please wait ...