PHP download

Download the PHP package ksanyok/text-humanize without Composer

On this page you can find all versions of the php package ksanyok/text-humanize. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

Table of contents
Download ksanyok/text-humanize
More information about ksanyok/text-humanize
Files in ksanyok/text-humanize

Vendor ksanyok
Package text-humanize
Short Description Zero-dependency PHP library for algorithmic text humanization — transforms machine-generated text into natural prose
License proprietary
Homepage https://github.com/ksanyok/TextHumanize

Keywords typography nlp multilingual readability text-processing natural-language text-naturalization text-humanization

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:

If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.

Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
To use Composer is sometimes complicated. Especially for beginners.
Composer needs much resources. Sometimes they are not available on a simple webspace.
If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.

Please rate this library. Is it a good library?

Example code of ksanyok/text-humanize

Informations about the package text-humanize

# TextHumanize ### The most advanced open-source text naturalization engine **Transform AI-generated text into clearer, more natural prose — with proprietary PHANTOM™, ASH™, and SentenceValidator™ technologies** **Reduce built-in AI-like style signals · 25 languages · 38-stage adaptive pipeline · 100% offline · Zero dependencies** **External AI detector results are not guaranteed.** TextHumanize improves style, readability, and internal risk signals; it is not a bypass guarantee.
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-3776AB.svg?logo=python&logoColor=white)](https://www.python.org/downloads/) [![TypeScript](https://img.shields.io/badge/TypeScript-5.x-3178C6.svg?logo=typescript&logoColor=white)]() [![PHP 8.1+](https://img.shields.io/badge/php-8.1+-777BB4.svg?logo=php&logoColor=white)](https://www.php.net/) [![CI](https://github.com/ksanyok/TextHumanize/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/ksanyok/TextHumanize/actions/workflows/ci.yml) [![Tests](https://img.shields.io/badge/tests-2105%20passed-2ea44f.svg?logo=pytest&logoColor=white)](https://github.com/ksanyok/TextHumanize/actions/workflows/ci.yml) [![Zero Dependencies](https://img.shields.io/badge/dependencies-zero-brightgreen.svg)]() [![PyPI](https://img.shields.io/pypi/v/texthumanize.svg?logo=pypi&logoColor=white)](https://pypi.org/project/texthumanize/) [![License](https://img.shields.io/badge/license-Dual%20(Free%20%2B%20Commercial)-blue.svg)](LICENSE)
**235,000+ lines of code** · **122 Python modules** · **38-stage pipeline** · **25 languages + universal** · **2,105 tests** **3 proprietary technologies:** PHANTOM™ (gradient-guided internal score optimization) · ASH™ (adaptive signature humanization) · SentenceValidator™ (interstage quality gate) [Quick Start](#-quick-start) · [Proprietary Technologies](#-proprietary-technologies) · [Before & After](#-before--after-examples) · [Features](#-feature-matrix) · [Benchmarks](#-performance--benchmarks) · [AI Detection](#-ai-detection-engine) · [API Reference](#-api-reference) · [Documentation](https://ksanyok.github.io/TextHumanize/) · [Live Demo](https://texthumanize.link/) · [License](#-license--pricing)

Why TextHumanize?
Proprietary Technologies
Installation
Quick Start
Private Offline Workflow
Before & After Examples
Feature Matrix
Comparison with Competitors
Processing Pipeline
AI Detection Engine
API Reference
Profiles & Presets
Language Support
NLP Infrastructure
SEO Mode
Readability Metrics
Paraphrasing Engine
Tone Analysis & Adjustment
Watermark Detection & Cleaning
Content Spinning
Coherence Analysis
Morphological Engine
Stylistic Fingerprinting
Auto-Tuner
Plugin System
Using Individual Modules
CLI Reference
REST API Server
Async API
Performance & Benchmarks
Architecture
TypeScript / JavaScript Port
PHP Library
Testing & Quality
Security & Limits
Responsible Use
For Business & Enterprise
FAQ & Troubleshooting
What's New in v0.28.4
Contributing
Limitations
Support the Project
License & Pricing

TextHumanize is a pure-algorithmic text processing engine that transforms AI-generated drafts into clearer, more natural prose. Three proprietary technologies — PHANTOM™ (gradient-guided optimization against TextHumanize's own detector), ASH™ (adaptive signature humanization), and SentenceValidator™ (interstage quality control) — drive a 38-stage pipeline that reduces built-in AI-like style signals while preserving meaning. No neural networks, no API keys, no internet — just 235K+ lines of finely tuned rules, dictionaries, and statistical methods.

Honest note: TextHumanize is a style-normalization tool, not an AI-detection bypass tool. It reduces AI-like patterns (formulaic connectors, uniform sentence length, bureaucratic vocabulary) but does not guarantee that processed text will pass external AI detectors. Quality of humanization varies by language and text type. See Limitations below.

Built-in toolkit: AI Detection (3 detectors) · Paraphrasing · Tone Analysis · Watermark Cleaning · Content Spinning · Coherence Analysis · Readability Scoring · Stylistic Fingerprinting · Auto-Tuner · Perplexity Analysis · Plagiarism Detection · Grammar Check · Morphology Engine · Neural LM · Async API · SSE Streaming

Platforms: Python (full — 122 modules) · TypeScript/JavaScript (core) · PHP (full)

For business: SaaS integration · REST API with SSE streaming · Docker deployment · Bulk processing · Custom dictionaries · On-prem enterprise · White-label ready

Languages: 🇬🇧 EN · 🇷🇺 RU · 🇺🇦 UK · 🇩🇪 DE · 🇫🇷 FR · 🇪🇸 ES · 🇵🇱 PL · 🇧🇷 PT · 🇮🇹 IT · �🇱 NL · 🇸🇪 SV · 🇨🇿 CS · 🇷🇴 RO · 🇭🇺 HU · 🇩🇰 DA · 🇸🇦 AR · 🇨🇳 ZH · 🇯🇵 JA · 🇰🇷 KO · 🇹🇷 TR · 🇮🇳 HI · 🇻🇳 VI · 🇹🇭 TH · 🇮🇩 ID · 🇮🇱 HE · 🌍 any language via universal processor

🚀 Why TextHumanize?

Problem: Machine-generated text has uniform sentence lengths, bureaucratic vocabulary, formulaic connectors, and low stylistic diversity — reducing readability, engagement, and brand authenticity.

Solution: TextHumanize algorithmically normalizes text style while preserving meaning. Configurable intensity, deterministic output, full change reports. No cloud APIs, no rate limits, no data leaks.

	Advantage	Details
🚀	Blazing fast	300–500 ms for a paragraph; full article in 1–2 seconds
🔒	100% private	All processing is local — your text never leaves your machine
🎯	Precise control	Intensity 0–100, 9 profiles, 5 style presets, keyword preservation, max change ratio
🌍	25 languages	Deep support for EN/RU/UK/DE; dictionaries for 25 languages; statistical processor for any other
📦	Zero dependencies	Pure Python stdlib — no pip packages, no model downloads, starts in <100 ms
🔁	Reproducible	Seed-based PRNG — same input + same seed = identical output
🧠	3-layer AI detection	18-metric heuristic + 35-feature logistic regression + MLP neural detector — no ML framework required
🔌	Plugin system	Register custom hooks at any of 38 pipeline stages
📊	Full analytics	Readability (6 indices), coherence, plagiarism, stylometric fingerprint, content health score
🎭	Tone control	Analyze and adjust formality across 7 levels
📚	2,944 dictionary entries	EN 1,733 + RU 1,345 + UK 1,042 + DE 874 + FR 718 + ES 749 + more
🏢	Enterprise-ready	Dual license, 2,105+ tests, CI/CD, REST API, Docker, on-prem deployment
🛡️	Secure by design	Input limits, zero network calls, linear-time regex, no eval/exec
📝	Full auditability	Every call returns `change_ratio`, `quality_score`, `similarity`, `explain()` report

� Proprietary Technologies

TextHumanize includes three original, proprietary technologies not found in any other open-source library:

PHANTOM™ — Gradient-Guided Text Optimization Engine

phantom.py — 2,943 lines | An open-source text naturalizer that uses numerical gradient optimization against TextHumanize's own AI detector.

ORACLE computes numerical gradients through the MLP detector via central differences (~70 forward passes, ~1.4ms), producing per-feature contribution analysis and ranked gap reports
SURGEON executes 32 feature-targeted surgical text operations guided by Oracle gradients — rank-based magnitude scheduling focuses effort on highest-impact features first
FORGE runs an iterative optimization loop with combined score tracking, stall detection, adaptive budget escalation, text expansion limits, and post-iteration cleanup
Result: 100% internal pass rate on TextHumanize's built-in detector benchmark (15/15 texts across EN, RU, UK). Processing time: 0.7–1.4s. External detectors use different models and are not guaranteed.

ASH™ — Adaptive Signature Humanization

ash_engine.py + signature_transfer.py + perplexity_sculptor.py | Statistically transforms text to match real human writing signatures.

Human Profiles — statistical fingerprints of real human writing per language (sentence length distribution, vocabulary richness, burstiness patterns, punctuation habits)
Signature Transfer — morphs AI text's statistical signature toward the target human profile
Perplexity Sculpting — adjusts word-level perplexity to match human perplexity distribution curves
Metric Gaps — identifies and systematically closes the gap between AI and human writing on 35+ features

SentenceValidator™ — Interstage Quality Gate

sentence_validator.py — 350 lines | Catches and eliminates artifacts between pipeline stages in real-time.

10 checks per sentence: duplicate words (the the), broken contractions (do n't), orphaned punctuation, double conjunctions (and and), dangling conjunctions, unterminated parentheses, triple+ character repeats, fragment chains, conjunction chains, empty sentences
7 validation checkpoints between pipeline stages — catches artifacts the moment they appear
Language-aware — recognizes conjunctions in EN, RU, UK, DE, FR, ES
Final sanitization — post-pipeline cleanup removes residual artifacts that survive all stages

�📦 Installation

From source:

Tip: Pin your version for production: pip install texthumanize==0.28.4

PHP / TypeScript

⚡ Quick Start

Private Offline Workflow

For privacy-sensitive content, use the local audit -> safe cleanup -> strict humanize -> audit pattern. It keeps processing offline, preserves critical terms, and records review metrics without using cloud APIs.

The example uses backend="local", quality_gate="strict", minimal=True, brand/identifier preservation, and a socket guard that raises if any code tries to open a network connection. See the full Private Offline Workflow guide.

All Features at a Glance

🔄 Before & After Examples

English

Before (AI-generated, AI score: 94%):

Furthermore, it is important to note that the implementation of cloud computing facilitates the optimization of business processes. Additionally, the utilization of microservices constitutes a significant advancement. Moreover, the integration of artificial intelligence into the workflow enhances decision-making processes and contributes to overall organizational efficiency.

After (TextHumanize, profile="web", intensity=60, AI score: 23%):

Also, importantly, the implementation of cloud computing helps the tuning of business processes. Up a major advancement, additionally, the use of microservices makes. And, the merge of artificial intelligence into the workflow enhances decision-making processes; and, contributes to overall organizational speed.

Russian

Before (AI score: 80%):

Необходимо отметить, что данная методология обеспечивает существенное повышение эффективности рабочих процессов. Кроме того, внедрение инновационных технологий способствует оптимизации функционирования организации. Более того, использование искусственного интеллекта позволяет значительно улучшить процесс принятия решений.

After (AI score: 5%):

Важно — что данная метод даёт существенное повышение эффективности рабочих процессов! Впрочем, смотрите, внедрение инновационных технологий помогает оптимизации функционирования организации, значительно, к тому же, использование искусственного интеллекта позволяет улучшить процесс принятия решений.

Ukrainian

Before (AI score: 75%):

Необхідно зазначити, що дана методологія забезпечує суттєве підвищення ефективності робочих процесів. Крім того, впровадження інноваційних технологій сприяє оптимізації функціонування організації. Більш того, використання штучного інтелекту дозволяє значно покращити процес прийняття рішень.

After (AI score: 17%):

Важливо, що ця метод дає суттєве підвищення ефективності робочих процесів; в принципі, впровадження інноваційних технологій веде до оптимізації функціонування організації. До того ж, використання штучного інтелекту дає змогу сильно покращити процес прийняття рішень.

AI Score Reduction Summary

Language	Before	After	Reduction	Mode
English	94%	2%	-92pp	web/70
English	94%	23%	-71pp	web/60
Russian	80%	5%	-75pp	web/50
Ukrainian	75%	17%	-58pp	web/50

Built-in AI detector scores. Results measured with TextHumanize's 3-layer ensemble (heuristic + statistical + MLP neural). External detectors may produce different results.

Profile Comparison (EN, intensity=50)

Profile	Change Ratio	Quality	AI Score After
`web`	0.50	0.20	27% 🟢
`chat`	0.61	0.20	27% 🟢
`marketing`	0.48	0.25	27% 🟢
`seo`	0.48	0.25	33% 🟢
`formal`	0.48	0.24	29% 🟢
`academic`	0.48	0.24	29% 🟢

Input AI score: 94% — all profiles bring it below 35%.

🧩 Feature Matrix

Category	Feature	Python	JS	PHP
Core	`humanize()` — 38-stage pipeline	✅	✅	✅
	`humanize_batch()` — parallel processing	✅	—	✅
	`humanize_chunked()` — large text support	✅	—	✅
	`humanize_ai()` — three-tier AI + rules	✅	—	—
	`humanize_until_human()` — iterative	✅	—	—
	`humanize_sentences()` — per-sentence	✅	—	—
	`humanize_stream()` — streaming	✅	—	—
	`humanize_variants()` — N output variants	✅	—	—
	`analyze()` — artificiality scoring	✅	✅	✅
	`explain()` — change report	✅	—	✅
AI Detection	`detect_ai()` — 3-layer ensemble	✅	✅	✅
	`detect_ai_batch()` — batch detection	✅	—	—
	`detect_ai_sentences()` — per-sentence	✅	—	—
	`detect_ai_mixed()` — mixed content	✅	—	—
	`StatisticalDetector` — 35-feature LR	✅	—	—
	`NeuralAIDetector` — MLP (pure Python)	✅	—	—
NLP	`paraphrase()` — syntactic transforms	✅	—	✅
	`POSTagger` — rule-based POS (4 langs)	✅	—	—
	`HMMTagger` — Viterbi HMM tagger	✅	—	—
	`CJKSegmenter` — zh/ja/ko segmentation	✅	—	—
	`SyntaxRewriter` — 8+ sentence transforms	✅	—	—
	`WordLanguageModel` — perplexity (14 langs)	✅	—	—
	`NeuralPerplexity` — LSTM char-level LM	✅	—	—
	`CollocEngine` — PMI scoring + replacement guard	✅	—	—
	`MorphologyEngine` — 4 languages	✅	—	—
	`WordVec` — lightweight word vectors	✅	—	—
Tone	`analyze_tone()` — formality analysis	✅	—	✅
	`adjust_tone()` — 7-level adjustment	✅	—	✅
Watermarks	`detect_watermarks()` — 6 types	✅	—	✅
	`clean_watermarks()` — removal	✅	—	✅
Spinning	`spin()` / `spin_variants()`	✅	—	✅
Analysis	`analyze_coherence()` — paragraph flow	✅	—	✅
	`full_readability()` — 6 indices	✅	—	✅
	`check_grammar()` — rule-based (9 langs)	✅	—	—
	`uniqueness_score()` — plagiarism check	✅	—	—
	`content_health()` — composite 0–100	✅	—	—
	`semantic_similarity()` — TF-IDF cosine	✅	—	—
	`sentence_readability()` — per-sentence	✅	—	—
	Stylistic fingerprinting	✅	—	—
Quality	`BenchmarkSuite` — 6-dimension scoring	✅	—	—
	`FingerprintRandomizer` — anti-detection	✅	—	—
	`QualityGate` — CI/CD content check	✅	—	—
Advanced	Style presets (5 personas)	✅	—	—
	Auto-Tuner (feedback loop)	✅	—	—
	AI backend (OpenAI/Ollama/OSS)	✅	—	—
	Custom dictionary overlays	✅	—	—
	Domain dictionaries (SaaS/ecommerce/etc.)	✅	—	—
	Dictionary trainer (corpus)	✅	—	—
	Neural network training loop	✅	—	—
	Dashboard (HTML reports)	✅	—	—
	Plugin system	✅	—	✅
	REST API (OpenAPI + SSE)	✅	—	—
	SSE streaming	✅	—	—
	CLI (15+ commands)	✅	—	—
Languages	Full dictionary support	14	2	14
	Universal processor	✅	✅	✅

⚔️ Comparison with Competitors

vs. Online Humanizers & GPT/LLM Rewriting

Criterion	TextHumanize	Online Humanizers	GPT/LLM Rewriting
Works offline	✅	❌	❌
Privacy	✅ 100% local	❌ Third-party servers	❌ Cloud API
Speed	~300 ms/paragraph	2–10 sec (network)	~500 chars/sec
Cost per 1M chars	$0	$10–50/month	$15–60 (GPT-4)
API key required	No	Yes	Yes
Deterministic	✅ Seed-based	❌	❌
Languages	25 + universal	1–3	10+ but expensive
Built-in AI detector	✅ 3-layer ensemble	❌ or basic	❌
Max change control	✅ `max_change_ratio`	❌	❌ Unpredictable
Open source	✅	❌	❌
Self-hosted	✅ Docker / pip	❌	❌
Audit trail	✅ `explain()`	❌	❌

vs. Other Open-Source Libraries

Feature	TextHumanize	Typical Alternatives
Pipeline stages	38	2–4
Languages	25 + universal	1–2
AI detection	✅ 3-layer (18 + 35 + MLP)	❌
Python tests	2,105	10–50
Codebase size	235,000+ lines	500–2K
Platforms	Python + JS + PHP	Single
Plugin system	✅	❌
Tone analysis	✅ 7 levels	❌
REST API	✅ OpenAPI + SSE	❌
Readability metrics	✅ 6 indices	0–1
Morphological engine	✅ 4 languages	❌
Neural components	MLP + LSTM + HMM	❌
Content spinning	✅ spintax	❌
Stylistic fingerprinting	✅	❌
Grammar checker	✅ 9 languages	❌
Plagiarism detection	✅ n-gram	❌

vs. AI Detectors (GPTZero, Originality.ai)

Feature	TextHumanize	GPTZero	Originality.ai
Price	Free	From $10/mo	From $14.95/mo
Works offline	✅	❌	❌
Self-hosted	✅	❌	❌
Per-sentence detection	✅	✅	✅
Mixed-content detection	✅	✅	❌
Combined humanize + detect	✅	❌	❌
Custom training	✅ `dict_trainer`	❌	❌
API	✅ REST + SSE	✅ REST	✅ REST
Batch detection	✅	✅ (paid)	✅ (paid)
CI/CD quality gate	✅ `quality_gate.py`	❌	❌

🔧 Processing Pipeline (38 Stages)

Adaptive intensity: Auto-reduces processing for already-natural text. Graduated retry: Retries at lower intensity if change ratio exceeds the limit. SentenceValidator™: 7 interstage checkpoints catch artifacts between stages (10 checks per sentence). Tier system: Tier 1 languages (EN/RU/UK/DE) get all 38 stages. Tier 2 (FR/ES/IT/PL/PT/NL/SV/CS/RO/HU/DA) get ~30. Tier 3 (AR/ZH/JA/KO/TR/HI/VI/TH/ID/HE) get ~20 + universal.

🧠 AI Detection Engine

Three independent detectors combined into a single score:

Architecture

18 Heuristic Metrics

#	Metric	What It Measures
1	Entropy	Character/word-level Shannon entropy
2	Burstiness	Sentence/paragraph length variability (humans vary, AI doesn't)
3	Vocabulary	TTR, MATTR, Yule's K, hapax legomena ratio
4	Zipf	Fit to Zipf's law distribution
5	Stylometry	Function word patterns, punctuation fingerprint
6	AI Patterns	Formulaic phrases ("it is important to note", "furthermore")
7	Punctuation	Punctuation distribution profile
8	Coherence	Paragraph uniformity (too-uniform = AI)
9	Grammar	Grammatical "perfection" level (too-perfect = AI)
10	Openings	Sentence-opening diversity
11	Readability	Consistency of readability scores across sentences
12	Rhythm	Syllable patterns, sentence length rhythm
13	Perplexity	N-gram predictability
14	Discourse	Discourse structure (topic sentences, markers)
15	Semantic Repetition	Cross-paragraph semantic overlap
16	Entity	Specificity of named entities and examples
17	Voice	Passive vs. active voice ratio
18	Topic Sentence	Topic-sentence-per-paragraph pattern

35-Feature Statistical Detector (Logistic Regression)

Category	Features
Lexical (4)	Type-token ratio, hapax ratio, avg word length, word length variance
Sentence (3)	Mean sentence length, length variance, length skewness
Vocabulary (3)	Yule's K, Simpson's diversity, vocabulary richness
N-gram (3)	Bigram/trigram repetition rates, unique bigram ratio
Entropy (3)	Character entropy, word entropy, bigram entropy
Burstiness (2)	Sentence burstiness, vocabulary burstiness
Structural (3)	Paragraph count, avg paragraph length, list/bullet ratio
Punctuation (5)	Comma, semicolon, dash, question, exclamation rates
AI Pattern (1)	AI pattern rate (strongest single feature, weight −2.10)
Perplexity (2)	Word frequency rank variance, Zipf fit residual
Readability (2)	Syllables/word, Flesch score normalized
Discourse (3)	Starter diversity, conjunction rate, transition word rate
Rhythm (1)	Consecutive length difference variance

Neural MLP Detector

Feed-forward neural network entirely in pure Python (no PyTorch, no TensorFlow). Pre-trained weights shipped as compressed JSON (54 KB).

Verdicts

Score	Verdict	Meaning
< 35%	`human_written`	Likely written by a human
35–65%	`mixed`	Mixed content or uncertain
≥ 65%	`ai_generated`	Likely AI-generated

Detection Modes

📖 API Reference

`humanize(text, lang, **kwargs) → HumanizeResult`

Parameter	Type	Default	Description
`text`	`str`	—	Input text (max 1 MB)
`lang`	`str`	—	Language code: `en`, `ru`, `uk`, `de`, etc.
`profile`	`str`	`"web"`	Processing profile: `chat`, `web`, `seo`, `docs`, `formal`, `academic`, `marketing`, `social`, `email`, plus intent aliases `seo_article`, `landing_page`, `product_description`, `support_reply`, `legal`, `social_post`
`intensity`	`int`	`50`	Aggressiveness 0–100
`seed`	`int`	`None`	PRNG seed for reproducibility
`preserve`	`dict`	`{}`	Protect code, URLs, email, dates, prices, ids, quotes, named entities, brand terms
`minimal`	`bool`	`False`	Only humanize AI-flagged sentences
`max_change_ratio`	`float`	`None`	Maximum allowed proportion of change (0.0–1.0)
`constraints`	`dict`	`{}`	Advanced constraints (`keep_keywords`, etc.)
`quality_gate`	`str`	`None`	Use `"strict"` to rollback on similarity, grammar, or readability regression
`backend`	`str`	`None`	LLM backend: `"openai"`, `"ollama"`, `"oss"`, `"auto"`

Returns HumanizeResult:

Field	Type	Description
`.text`	`str`	Processed text
`.change_ratio`	`float`	Proportion of text changed (0.0–1.0)
`.quality_score`	`float`	Quality metric
`.similarity`	`float`	Semantic similarity to original
`.metrics_after["humanize_explain"]`	`dict`	Top 5 change reasons, top 5 remaining risks, sentence-level risk deltas
`.metrics_after["anti_overhumanize"]`	`dict`	Final guard report for stacked fillers, repeated discourse markers, and excessive `!` / `?` punctuation
`.stages`	`list`	Stages applied with timing

Other Humanization Modes

`detect_ai(text, lang) → dict`

Field	Description
`score`	AI probability (0.0–1.0)
`verdict`	`"human_written"`, `"mixed"`, or `"ai_generated"`
`confidence`	Confidence level (0.0–1.0)
`metrics`	Individual metric scores (18 heuristic + 35 statistical)
`combined_score`	Weighted average of all detectors

Other Core Functions

Function	Description
`analyze(text, lang)`	Returns `AnalysisReport` with artificiality score, sentence stats
`explain(result)`	Human-readable change report
`paraphrase(text, lang)`	Syntactic paraphrasing (voice transforms, connector shuffling)
`analyze_tone(text, lang)`	Tone analysis (formality, style)
`adjust_tone(text, target, lang)`	Adjust formality to 7 levels
`detect_ai_explain(text, lang)`	Explainable AI detector report with spans and suggested actions
`audit_report(text, lang)`	Combined AI + watermark audit JSON
`detect_watermarks(text)`	Detect 6 types of invisible watermarks
`clean_watermarks(text)`	Remove all detected watermarks
`watermark_report(text, lang)`	Unified Unicode + statistical watermark report
`spin(text, lang)`	Generate a single spun variant
`spin_variants(text, count, lang)`	Generate N spun variants
`analyze_coherence(text, lang)`	Paragraph flow analysis
`full_readability(text, lang)`	6 readability indices
`build_author_profile(text, lang)`	Stylometric fingerprint
`compare_fingerprint(text, profile)`	Compare text to an author profile
`anonymize_style(text, lang)`	Stylometric anonymization
`check_grammar(text, lang)`	Grammar check (9 languages)
`uniqueness_score(text)`	N-gram uniqueness
`content_health(text, lang)`	Composite quality score 0–100

🎭 Profiles & Style Presets

Processing Profiles

Profile	Use Case	Sentence Length	Colloquialisms	Default Intensity
`chat`	Messaging, social media	8–18 words	High	80
`web`	Blog posts, articles	10–22 words	Medium	60
`seo`	SEO content (keyword-safe)	12–25 words	None	40
`docs`	Technical documentation	12–28 words	None	50
`formal`	Legal, official	15–30 words	None	30
`academic`	Research papers	15–30 words	None	25
`marketing`	Sales, promo copy	8–20 words	Medium	70
`social`	Social media posts	6–15 words	High	85
`email`	Business emails	10–22 words	Medium	50

Style Presets (5 Personas)

Preset	Sentences	Vocabulary	Style
🎓 `student`	Short–medium	Simple	Conversational, informal
✍️ `copywriter`	Varied (short bursts + long)	Dynamic	Energetic, varied rhythm
🔬 `scientist`	Long, complex	Technical	Formal, precise, cautious hedging
📰 `journalist`	Medium, diverse	Clear	Neutral, fact-oriented
💬 `blogger`	Short, punchy	Informal	Questions, exclamations, personal

Intensity Levels

Range	Effect	Use Case
0–20	Minimal — typography and watermarks only	Already-natural text
21–40	Light — connectors and basic synonym swap	SEO, formal content
41–60	Moderate — structure + paraphrasing	Blog posts, web content
61–80	Aggressive — syntax rewriting + entropy	Chat, social media
81–100	Maximum — all transforms at full power	Heavy AI text

🌍 Language Support

Language Tiers

Tier	Languages	Detection	Humanization	Syntax Rewriting
1	EN, RU, UK, DE	✅ Full	✅ Full 38-stage	✅
2	FR, ES, IT, PL, PT	✅ Good	✅ 15-stage	❌
3	AR, ZH, JA, KO, TR	✅ Basic	✅ 10-stage + universal	❌
0	Any other language	✅ Statistical	✅ Universal processor	❌

Dictionary Coverage

Language	Code	Synonyms	Bureaucratic	AI Connectors	Sentence Starters	Colloquial	Collocations
English	`en`	431	645	152	75	127	1,578
Russian	`ru`	269	486	100	73	102	408
Ukrainian	`uk`	243	338	75	46	86	38
German	`de`	138	361	65	54	88	125
French	`fr`	141	224	61	49	86	128
Spanish	`es`	166	230	60	49	78	126
Polish	`pl`	159	247	60	46	78	34
Portuguese	`pt`	163	204	60	51	79	36
Italian	`it`	168	231	63	49	79	38
Arabic	`ar`	126	139	65	40	59	—
Chinese	`zh`	127	137	51	38	59	—
Japanese	`ja`	120	123	66	41	59	—
Korean	`ko`	118	120	67	39	59	—
Turkish	`tr`	119	122	67	43	59	—

Universal processor works for any language using statistical methods — burstiness injection, perplexity normalization, sentence length variation, punctuation diversification.

🧬 NLP Infrastructure

TextHumanize includes a full NLP stack — all implemented in pure Python with zero external dependencies:

Module	Component	Description
`pos_tagger.py`	POS Tagger (1,917 lines)	Rule-based part-of-speech tagger with suffix/prefix rules for EN/RU/UK/DE
`hmm_tagger.py`	HMM Tagger (642 lines)	Viterbi-decoding Hidden Markov Model for POS tagging
`cjk_segmenter.py`	CJK Segmenter (1,277 lines)	Forward/backward max-match Chinese, particle-stripping Korean, character-type Japanese
`morphology.py`	Morphology Engine (811 lines)	Suffix-based stemming and inflection for RU/UK/EN/DE
`collocation_engine.py`	Collocation Engine (224 lines)	PMI-based collocation scoring for context-aware synonym selection
`word_lm.py`	Word Language Model (435 lines)	Bigram/trigram with compressed frequency data for 25 languages
`neural_lm.py`	Neural Char-Level LM (391 lines)	LSTM-based character language model for perplexity scoring
`neural_engine.py`	Neural Primitives (610 lines)	Feed-forward net, LSTM cell, embeddings, HMM, layer norm, GELU — all in stdlib
`neural_paraphraser.py`	Seq2Seq Paraphraser (752 lines)	Encoder-decoder with Bahdanau attention for neural paraphrasing
`word_embeddings.py`	Word Vectors (399 lines)	Hash-based + cluster embeddings, cosine similarity, nearest neighbors
`sentence_split.py`	Smart Splitter (338 lines)	Abbreviation-aware sentence splitting (Mr./Dr./URLs/decimals)
`lang_detect.py`	Language Detector (328 lines)	Character trigram profiling for 25 languages
`context.py`	Contextual Synonyms (320 lines)	Word sense disambiguation via context windows and topic detection
`grammar.py`	Grammar Checker (360 lines)	Rule-based grammar for 9 languages (agreement, articles, punctuation)

Total NLP infrastructure: ~8,800 lines of code, zero pip dependencies.

🔍 SEO Mode

TextHumanize includes a dedicated SEO workflow to humanize content without harming search rankings:

Feature	How It Works
Keyword preservation	`preserve` and `keep_keywords` lists are never modified
Low intensity	SEO profile defaults to 40% — gentle transformations
No keyword stuffing	Does not add or repeat keywords
Structure preservation	Heading hierarchy (H1–H6) preserved
Meta-safe	Avoids changing first-paragraph introductions (critical for SEO)
Max change control	`max_change_ratio=0.3` ensures minimal disruption

📊 Readability Metrics

full_readability() returns 6 reading metrics:

Index	Range	What It Measures
Flesch Reading Ease	0–100	Higher = easier (60–70 is ideal for web)
Flesch-Kincaid Grade	0–18	US school grade level
Coleman-Liau Index	0–18	Based on characters (not syllables)
Automated Readability Index	0–14	Character and word counts
SMOG Grade	0–18	Polysyllabic word density
Gunning Fog	0–20	Complex words + sentence length

Grade interpretation:

Grade	Audience
5–6	General public, social media
7–8	Web content, blog posts
9–10	Magazine articles
11–12	Academic papers
13+	Technical/legal documents

✍️ Paraphrasing Engine

Rule-based syntactic paraphrasing — no LLM, no API, deterministic:

Transform	Example
Active → Passive	"The team built the app" → "The app was built by the team"
Passive → Active	"The report was written by John" → "John wrote the report"
Clause reordering	"After analyzing data, we decided…" → "We decided… after analyzing data"
Nominalization reversal	"The implementation of X" → "Implementing X"
Connector shuffling	"Furthermore, X. Additionally, Y." → "What's more, X. Also, Y."
MWE decomposition	"take into account" → "consider"
Hedging injection	"X is true" → "X appears to be true"
Perspective rotation	"Users need X" → "X is needed by users"

🎭 Tone Analysis & Adjustment

7-level formality scale with marker-based detection:

Level	Name	Example Markers
1	`slang`	"ya", "gonna", "lol", contractions
2	`casual`	"pretty much", "kind of", first person
3	`neutral`	Balanced register
4	`professional`	"regarding", "in accordance with"
5	`formal`	"henceforth", "notwithstanding"
6	`academic`	"thus", "consequently", passive voice
7	`legal`	"hereinafter", "whereas", "pursuant to"

🛡️ Watermark Detection & Cleaning

Detects and removes 6 types of invisible text watermarks:

Type	How It Hides	Detection Method
Zero-width characters	U+200B, U+200C, U+200D, U+FEFF	Unicode category scanning
Homoglyph substitution	Latin 'a' → Cyrillic 'а'	Confusable character mapping
Invisible Unicode	U+2060, U+2061–U+2064	Codepoint range check
Directional markers	RTL/LTR overrides	Bidirectional control detection
Soft hyphens	U+00AD	Pattern matching
Tag characters	U+E0001–U+E007F	Unicode block scanning

🔄 Content Spinning

Generate multiple unique variants with spintax support:

The spinner uses language-pack synonyms, contextual substitution, and sentence restructuring to produce each variant.

🔗 Coherence Analysis

Measure paragraph-level text flow:

Metric	What It Measures
Paragraph similarity	TF-IDF cosine between adjacent paragraphs
Transition quality	Presence and appropriateness of connective phrases
Topic continuity	Keyword overlap between sections
Reference chains	Pronoun and entity co-reference tracking

🔠 Morphological Engine

Rule-based morphology for 4 languages — lemmatization, inflection, declension:

Language	Operations	Suffix Rules
Russian	Lemmatization, declension, conjugation	200+ suffix patterns
Ukrainian	Lemmatization, declension	180+ suffix patterns
English	Lemmatization, pluralization	150+ rules
German	Lemmatization, compound splitting	120+ rules

🎨 Stylistic Fingerprinting

Extract and compare author stylometric profiles:

Fingerprint dimensions: Mean sentence length, length variance, vocabulary richness, function word distribution, punctuation profile, discourse marker usage, passive voice ratio, average word length.

🎛️ Auto-Tuner (Feedback Loop)

Automatically optimize intensity and profile based on feedback:

The tuner uses Bayesian-like optimization to find ideal (intensity, profile) combinations for your content type.

🔌 Plugin System

Available hook points: watermark → segmentation → typography → debureaucratization → structure → repetitions → liveliness → paraphrasing → syntax_rewriting → tone → universal → naturalization → paraphrase_engine → sentence_restructuring → entropy_injection → readability → grammar → coherence → validation → restore

🧪 Using Individual Modules

Every module is independently importable:

💻 CLI Reference

CLI Flags

Flag	Description
`-l`, `--lang`	Language code (required)
`-p`, `--profile`	Processing profile
`-i`, `--intensity`	Intensity 0–100
`-o`, `--output`	Output file path
`--seed`	PRNG seed for reproducibility
`--keep`	Comma-separated keywords to preserve
`--brand`	Brand terms to never modify
`--max-change`	Maximum change ratio (0.0–1.0)
`--analyze`	Print analysis report
`--explain`	Print change explanation
`--detect-ai`	Run AI detection
`--audit`	Combined AI + watermark audit JSON
`--paraphrase`	Paraphrase mode
`--tone`	Adjust tone to target level
`--tone-analyze`	Analyze current tone
`--watermarks`	Detect watermarks
`--watermark-report`	Unified watermark JSON report
`--quality-gate`	`off` or `strict` post-processing guard
`--fail-under-quality`	Exit with code 2 if `quality_score` or benchmark average is below threshold
`--minimal` / `--only-flagged`	Only humanize AI-flagged sentences
`--spin`	Spin mode
`--variants N`	Number of spin variants
`--coherence`	Coherence analysis
`--readability`	Readability metrics
`--api`	Start REST API server
`--port`	API server port (default: 8080)
`--verbose`	Detailed output
`--report`	Save JSON report, or HTML when the path ends with `.html`
`--json`	JSON output format

🌐 REST API Server

Zero-dependency HTTP server with rate limiting and CORS:

For FastAPI deployments, see examples/fastapi_integration.py. It includes request body limits, text and batch size limits, per-request timeouts, structured error envelopes with request ids, and /v1/humanize/batch.

OpenAPI 3.1 schema is available at GET /openapi.json for client generation, contract tests, and API gateway import.

Endpoints

Method	Endpoint	Description
`POST`	`/humanize`	Full humanization
`POST`	`/detect-ai`	AI detection (single or batch)
`POST`	`/analyze`	Text metrics
`POST`	`/paraphrase`	Paraphrase text
`POST`	`/tone/analyze`	Tone analysis
`POST`	`/tone/adjust`	Tone adjustment
`POST`	`/watermarks/detect`	Detect watermarks
`POST`	`/watermarks/clean`	Remove watermarks
`POST`	`/spin`	Content spinning
`POST`	`/spin/variants`	Spin N variants
`POST`	`/coherence`	Coherence analysis
`POST`	`/readability`	Readability metrics
`POST`	`/sse/humanize`	SSE streaming humanization
`GET`	`/health`	Health check
`GET`	`/openapi.json`	OpenAPI 3.1 schema
`GET`	`/`	API documentation index
`OPTIONS`	`*`	CORS preflight

Rate limit: 10 req/s per IP, burst 20 · Max body: 5 MB

Example

Python Client

⚡ Async API

Native asyncio support for all public functions:

📈 Performance & Benchmarks

All benchmarks on Apple Silicon (M-series), Python 3.12, single thread, after warm-up. See the public Benchmark Methodology for corpus labels, quality dimensions, latency reporting rules, and detector limitations.

Speed

Function	Text Size	Avg Latency
`humanize()`	~30 words	~5 s
`humanize()`	~80 words	~10 s
`humanize(phantom=True)`	~80 words	~12 s
`detect_ai()`	~30 words	~1 s
`detect_ai()`	~80 words	~3 s
`paraphrase()`	~80 words	< 1 ms
`analyze_tone()`	~80 words	< 1 ms
`analyze()`	~80 words	~80 ms

AI Score Reduction

Properties

Property	Value
Cold start	< 100 ms
LRU cache hit	11× faster than cold
External network calls	0 (offline-first)
Deterministic (same seed)	✅ Always
Pipeline timeout	30 s (configurable)
API rate limiting	10 req/s per IP, burst 20
Max input size	1 MB
Memory per call	4–200 KB

Run benchmarks yourself:

🏗️ Architecture

Design principles:

Principle	Implementation
Modular	Each stage is a standalone class; every module is independently importable
Zero dependencies	Pure Python stdlib — no pip packages at all
Declarative rules	Language packs are data-only (dicts), no logic in lang files
Idempotent	Running the pipeline twice won't double-transform text
Safe defaults	Works out-of-the-box with sensible profiles
Lazy imports	PEP 562 lazy loading — only imports what you use
Deterministic	Seed-based PRNG for reproducible output
Extensible	Plugin hooks at 38 stages, custom dictionaries, AI backend

🟦 TypeScript / JavaScript Port

Core TextHumanize functionality in TypeScript for Node.js and browsers:

Feature	Status
`humanize()`	✅
`detectAi()`	✅
`analyze()`	✅
Language packs: EN, RU	✅
Universal processor	✅

🐘 PHP Library

Full-featured PHP port with Composer support:

Feature	Status
All 25 language packs	✅
`humanize()`, `humanize_batch()`, `humanize_chunked()`	✅
`detect_ai()`, `analyze()`, `explain()`	✅
`paraphrase()`, `analyze_tone()`, `adjust_tone()`	✅
`detect_watermarks()`, `clean_watermarks()`	✅
`spin()`, `spin_variants()`	✅
`analyze_coherence()`, `full_readability()`	✅
Plugin system	✅
223 PHPUnit tests	✅

✅ Testing & Quality

Platform	Tests	Status
Python (pytest, 3.9–3.13)	2,144	✅ All passing
PHP (PHPUnit, 8.1–8.3)	223	✅ All passing
TypeScript (Jest)	28	✅ All passing
Total	2,395	✅

CI/CD: Every push triggers Python 3.9–3.13 + PHP 8.1–8.3 matrix, ruff lint, mypy type check, pytest with coverage ≥ 70%.

Core-language regressions: EN/RU/UK fixture packs verify protected tokens, cross-language leakage, and language-aware cleanup of over-humanized output.

Collocation guard: word-level replacements now keep strong local collocations intact, so natural phrases such as "heavy rain" are not weakened by context-free shorter synonyms.

Domain dictionaries: SaaS, ecommerce, fintech, legal, education, real estate, and healthcare terms are auto-detected or explicitly protected via preserve={"domains": [...]}.

🛡️ Security & Limits

Aspect	Implementation
Input limits	1 MB text, 5 MB API body
Network calls	Zero. No telemetry, no analytics, no phone-home
Dependencies	Zero. Pure stdlib only
Regex safety	All patterns are linear-time; no user input compiled to regex
Reproducibility	Seed-based PRNG, deterministic output
No eval/exec	No dynamic code execution
Rate limiting	Token bucket (API): 10 req/s, burst 20
Sandboxing	Resource limits documented for production deployment

Threat Model

Threat	Mitigation
Data exfiltration	Zero network calls — impossible
ReDoS	All regex patterns audited for linear-time complexity
Memory exhaustion	1 MB input limit, streaming for large texts
Model poisoning	Weights are read-only compressed JSON; no runtime training by default
Dependency supply chain	Zero pip dependencies — nothing to compromise

Responsible Use

TextHumanize is built for style normalization, readability improvement, privacy-preserving audits, and internal AI-like/watermark risk checks. It does not guarantee passing external AI detectors, and its detector scores should be treated as internal quality signals rather than universal authorship verdicts.

Use it for content you own or are authorized to edit. Do not use it to misrepresent authorship, bypass required disclosure, remove provenance signals from third-party content, or submit work in contexts where AI assistance is prohibited.

Recommended production safeguards:

show before/after diffs and change_ratio to reviewers;
enable quality_gate="strict" for sensitive content;
review metrics_after["anti_overhumanize"] when high-intensity profiles are used;
use minimal=True / --only-flagged when only risky spans need edits;
preserve brand terms, named entities, numbers, URLs, quotes, and code;
require manual review for legal, medical, financial, academic, and policy content;
use neutral customer-facing language: "internal style and watermark risk signals", not "guaranteed detector bypass".

See the full Responsible Use guide.

🏢 For Business & Enterprise

Requirement	How TextHumanize Delivers
Predictability	Seed-based PRNG — same input + seed = identical output
Privacy	100% local. Zero network calls. No data leaves your server
Auditability	Every call returns `change_ratio`, `quality_score`, `similarity`, `explain()` report
Integration	Python SDK · JS SDK · PHP SDK · CLI · REST API · Docker · SSE streaming
Reliability	2,356 tests across 3 platforms, CI/CD with ruff + mypy
No vendor lock-in	Zero dependencies. No cloud APIs, no API keys, no rate limits
Language coverage	25 language packs + universal processor for any language
Self-hosted	Docker image, pip install, on-premise deployment
Content quality gate	`quality_gate.py` for CI/CD pipeline integration
Custom training	Train from your own corpus with `dict_trainer` and `training.py`
Brand safety	Keyword preservation, brand term protection, max change control

Processing Modes

Mode	Description	Use Case
`humanize()`	Full 38-stage pipeline	General-purpose normalization
`humanize_batch()`	Parallel processing (N workers)	Bulk content processing
`humanize_chunked()`	Split + process + rejoin	Documents > 10K chars
`humanize_until_human()`	Iterative (loop until built-in target score)	High-quality output
`humanize_stream()`	SSE paragraph streaming	Real-time UI
`humanize_ai()`	Rules + LLM backend (OpenAI/Ollama)	Maximum quality

Docker Deployment

❓ FAQ & Troubleshooting

Q: Does TextHumanize guarantee passing GPTZero / Originality.ai / Turnitin? No. TextHumanize is a style normalization tool. It reduces AI-like patterns but does not guarantee bypassing external AI detectors. See Limitations.

Q: What's the best profile for reducing AI-like style signals? chat with intensity 60–80 gives the largest reduction on TextHumanize's built-in detector benchmark. For professional content, try web at 70. External detector outcomes vary and should be verified separately.

Q: How do I preserve keywords (e.g., for SEO)? Use constraints={"keep_keywords": ["keyword1", "keyword2"]} or preserve={"brand_terms": ["BrandName"]}. By default, TextHumanize also protects URLs, email, code, Markdown/HTML, dates, prices, versions, order ids, exact quotes, and multi-token named entities.

Q: Can I use this for commercial projects? Yes, with a commercial license. See License & Pricing.

Q: Does it work offline? Does it send data to the internet? 100% offline. Zero network calls. Not even a health check ping. All processing is local.

Q: Why is the first call slower? The first call loads language packs and initializes caches. Subsequent calls are 11× faster via LRU cache.

Q: Can I train it on my own data? Yes — dict_trainer.py trains custom dictionaries from your corpus, and training.py can retrain the neural detector/LM.

Q: How do I add support for a new language? Create a language pack in texthumanize/lang/your_lang.py following the existing pattern (15 required sections). Or use the universal processor which works with any language automatically.

Q: Can I use individual modules (e.g., just POS tagger) without the full pipeline? Yes. Every module is independently importable. See Using Individual Modules.

Q: Is there a GUI? Try the Live Demo. For local use, the REST API + SSE streaming integrates easily with any frontend.

Q: How deterministic is it? 100% deterministic when using the same seed. Same input + same seed + same version = byte-identical output.

Q: What Python versions are supported? 3.9, 3.10, 3.11, 3.12, and 3.13 — all tested in CI.

🆕 What's New in v0.28.4

Explainable audit and safer humanization (0.28.4)

Explainable AI detector reports — detect_ai_explain() returns calibrated score, confidence interval, metric contributions, highlighted spans, sentence report, mixed-content shares, and suggested actions.
Unified watermark forensics — watermark_report() covers invisible Unicode, homoglyphs, fullwidth/math lookalikes, and statistical watermark hypotheses with p-value/z-score evidence.
Promopilot-ready audit JSON — audit_report() combines AI and watermark findings in a stable schema for product integrations and batch workflows.
Stricter quality controls — quality_gate="strict" can reject risky rewrites, while minimal=True / --only-flagged changes only flagged fragments.
Anti-overhumanize final guard — high-intensity output now trims stacked fillers, repeated discourse markers, and excessive expressive punctuation before returning.
Better short commercial copy coverage — golden-set regression tests now cover landing, product, and support copy patterns.
CI and release hardening — GitHub CI is green across Python 3.9-3.13, PHP 8.1-8.3, TypeScript/JavaScript, and docs; local release coverage is 80.09%.

Previous release readiness (0.28.3)

GitHub community checklist completed — added Code of Conduct, Security Policy, issue templates, quality-report template, and pull request template.
Release metadata sync — Python, PHP, and TypeScript package versions are aligned for PyPI, Packagist, and source installs.
Safer release verification — version checks now validate package manifests plus README/CHANGELOG release references before publication.

Previous patch fixes (0.28.2)

PHP HTML wrapper compatibility — internal orphan cleanup no longer strips external wrapper tokens like THZ_APP_HTML_*, preventing broken restore in client wrappers.
HTML + keep_keywords flows now humanize properly — THZ_KEYWORD_* / THZ_BRAND_* placeholders are treated as inline-safe, so structure/naturalization stages are not skipped.
Connector replacement after protected tags — connector rewrites now work even when a line starts with inline placeholders.
Ukrainian naturalization hardening — added dedicated uk replacements/boosters and removed English fallback for non-English naturalization to avoid mixed-language artifacts.

Previous highlights (0.28.0)

Web Platform — Auth, Payments & Freemium (NEW)

User registration with email/password + Google OAuth 2.0
Multi-tier pricing — Free / Starter $29 / Pro $79 / Business $199/month
API key management — create, revoke, and group keys per plan (1 / 5 / 20 keys)
Monobank Acquiring payment integration with webhook activation
Admin panel — user management, plan overrides, payment history, usage stats
Freemium gates — guests limited to 3 requests/day; text results blurred after 500 chars
API authentication — Bearer token support with session-based guest quota tracking
Expanded API docs — authentication guide, per-plan rate limits, PAYG billing docs, error codes (401/429)
Competitor comparison table added to Pricing page (vs Quillbot, Undetectable.ai, StealthGPT)

SentenceValidator™ — Interstage Quality Gate (v0.28.0)

sentence_validator.py (350 lines) — sentence-level artifact detection running at 7 checkpoints between pipeline stages
10 artifact checks per sentence: duplicate words, broken contractions, orphaned punctuation, double conjunctions, dangling conjunctions, unterminated parens, triple+ repeats, fragment chains, conjunction chains, empty sentences
Final sanitization in run() method catches post-loop residual artifacts

Stats

2,105 tests · 122 modules · 235,000+ lines · 25 languages · 38-stage pipeline

🤝 Contributing

See CONTRIBUTING.md for development setup, testing, and PR guidelines.

Areas for contribution: New language packs · Improved synonym dictionaries · Better grammar rules · Performance optimizations · Additional integrations

Starter tasks with acceptance criteria are listed in the Good First Issues guide.

See CONTRIBUTORS.md for the full list of contributors.

⚠️ Limitations

TextHumanize is a style normalization tool. Please be aware of realistic expectations:

Aspect	Current State	Notes
EN humanization	Reduces AI markers by 71–92%	Built-in detector; 94% → 2–23%
RU humanization	Reduces AI markers by 75%	Built-in detector; 80% → 5%
UK humanization	Reduces AI markers by 58%	Built-in detector; 75% → 17%
External AI detectors	Not a reliable bypass	GPTZero, Originality.ai use different models
Short texts (< 50 words)	Limited effect	Not enough context for meaningful transformation
Performance	300–500 ms per paragraph	Fast enough for batch; not sub-millisecond
Built-in AI detector	Heuristic + statistical + neural	Useful for internal scoring; not equivalent to commercial detectors
Higher intensity	≠ always lower AI score	Some transforms at high intensity may create new patterns

What TextHumanize does well:

✅ Removes formulaic connectors ("furthermore", "it is important to note")
✅ Varies sentence length to add human-like burstiness
✅ Replaces bureaucratic vocabulary with simpler alternatives
✅ Deterministic, reproducible results with seed control
✅ 100% offline, no data leaks, zero dependencies
✅ Full audit trail with every call

What TextHumanize does NOT do:

❌ Guarantee passing external AI detectors (GPTZero, Originality.ai, Turnitin)
❌ Rewrite text at the semantic level (it's rule-based, not LLM-based)
❌ Handle domain-specific jargon (medical, legal, etc.) without custom dictionaries

💛 Support the Project

If TextHumanize saves you time or money, consider supporting development:

📄 License & Pricing

TextHumanize uses a dual license model:

Use Case	License	Monthly
Personal / Academic / Open-source	Free License	Free
Commercial — 1 dev, 1 project	Indie	$29/mo
Commercial — up to 5 devs	Startup	$79/mo
Commercial — up to 20 devs	Business	$199/mo
Enterprise / On-prem / SLA / White-label	Enterprise	Contact us

All commercial licenses include full source code, all updates, priority email support, and access to PHANTOM™ + ASH™ proprietary technologies. 100% offline — no data leaves your server, no per-request fees, no cloud lock-in. Monthly billing, cancel any time.

[email protected]

Documentation · Live Demo · PyPI · GitHub · Issues · Discussions · Commercial License

All versions of text-humanize with dependencies

PHP Build Version

Package Version

Version 0.33.0 Release 11. Jun 2026
create-project require 0 people chose require and
0 people chose create-project.

Download

Requires php Version >=8.1
ext-mbstring Version *
ext-json Version *

Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package ksanyok/text-humanize contains the following files
Loading the files please wait ...