Text Diff Checker
Line-by-line comparison using LCS (Longest Common Subsequence) algorithm — instantly see what was added, removed, or unchanged
Text Similarity Checker
Cosine similarity with TF (Term Frequency) vectorization — compare two texts and get a 0–100% similarity score. Great for comparing AI-generated outputs or detecting paraphrased content
Deduplicate, Sort & Number Lines
Remove duplicate lines (keeping first occurrence), sort alphabetically A→Z or Z→A, randomize order, or add line numbers — all in one click
How Text Diff Works — LCS Algorithm & Line-Level Comparison
The Rosenav diff checker implements the Longest Common Subsequence (LCS) algorithm to find the optimal alignment between two texts. By comparing text line-by-line rather than character-by-character, it produces human-readable diffs that clearly show what was added, removed, and kept unchanged — the same approach used by git diff and professional code review tools.
The LCS Algorithm — Finding the Optimal Edit Sequence
The Longest Common Subsequence (LCS) problem is a classic dynamic programming challenge: given two sequences, find the longest subsequence present in both in the same order. For text diff, each line is treated as an element in the sequence. The algorithm builds a 2D DP table where dp[i][j] stores the length of the LCS between the first i lines of text A and the first j lines of text B. With O(m×n) time and space complexity, it efficiently handles texts up to several thousand lines in the browser. After building the DP table, a backtracking pass reconstructs the actual diff: lines that match are marked "unchanged," lines present only in the original are "removed," and lines present only in the modified text are "added." This produces the familiar unified diff format with +, -, and space prefixes — the same visual language developers use daily in version control.
Line-Level vs. Character-Level Diff — When Each Matters
Rosenav uses line-level diffing as the default because it's more meaningful for most real-world use cases: comparing code revisions, reviewing document edits, checking AI output variations, or validating configuration changes. Line-level diff groups related changes together and avoids the noise of character-level differences. Each line is treated as an atomic unit — if a single character changes, the entire line is marked as removed-then-added rather than showing a jumble of per-character color coding. For most content and code comparison tasks, this produces the cleanest, most actionable output. The diff result also includes a stats summary: total unchanged, added, and removed lines — giving you an instant quantitative measure of how much two texts differ.
Real-World Applications — Beyond Just Code Diff
While line-level diff originated in software version control (the Unix diff command dates to 1974), modern applications extend far beyond code. Content editors use diff to track revisions across drafts of articles, blog posts, and documentation. AI practitioners compare outputs from different prompt variations to see exactly what changed between two LLM responses. Legal professionals compare contract versions clause by clause. Data analysts validate CSV exports and configuration files after automated transformations. Students compare essay drafts to see their editing progress. The Rosenav diff tool handles all these scenarios with instant, client-side processing — no uploads, no waiting, no registration required. Paste, click Compare, and see the differences immediately.
Cosine Similarity Explained — TF Vectorization & Semantic Comparison
Cosine similarity measures how similar two texts are by comparing their word frequency vectors in high-dimensional space. Unlike exact string matching, it captures semantic overlap — two texts using similar vocabulary in similar proportions will score high even if the exact phrasing differs. This makes it ideal for comparing AI-generated outputs, detecting paraphrased content, or measuring document similarity.
From Text to Vectors — Term Frequency (TF) Tokenization
The first step transforms raw text into a mathematical representation. The tokenizer lowercases all text, strips punctuation, and splits on whitespace to extract individual word tokens. Each unique word becomes a dimension in a shared vector space. The Term Frequency (TF) for each word is simply its count in the text — more frequent words get higher weight. For example, if Text A is "the cat sat on the mat" and Text B is "the dog sat on the floor," the shared vocabulary is [the, cat, sat, on, mat, dog, floor], and each text becomes a 7-dimensional vector: Text A = [2,1,1,1,1,0,0] and Text B = [2,0,1,1,0,1,1]. The cosine of the angle between these vectors — computed as dot product divided by the product of magnitudes — gives a similarity score from 0 (completely different) to 1 (identical).
Why Cosine Instead of Euclidean Distance — The Angle Matters
Cosine similarity focuses on the direction of vectors, not their magnitude. This is crucial for text comparison because document length should not penalize similarity: a 500-word article and a 5,000-word article on the same topic should score high if they share vocabulary proportions. Euclidean distance would flag them as "far apart" purely due to the magnitude difference. Cosine similarity normalizes for length by dividing by the vector magnitudes, measuring only the angle between the vectors. Two texts that use the same words in the same proportions will have a cosine similarity of 1.0 (100%) regardless of whether one is 10× longer than the other. The Rosenav similarity checker displays the percentage score, a color-coded progress bar, a qualitative interpretation (from "Nearly identical" to "Very low similarity"), and token counts for transparency.
AI Output Comparison — The Killer Use Case for Similarity Detection
One of the most powerful applications is comparing outputs from large language models (LLMs). When you run the same prompt through ChatGPT, Claude, Gemini, or any other LLM multiple times — or compare responses across different models — cosine similarity quantifies how consistent or divergent the outputs are. A score of 85%+ suggests the model is highly deterministic for that prompt; 40–60% indicates significant variation in phrasing while covering similar content; below 20% suggests fundamentally different responses. This helps prompt engineers tune for consistency, content creators verify originality, and researchers compare model behaviors. The tool also highlights that high similarity does not necessarily mean plagiarism — two independent summaries of the same source material will naturally share vocabulary. The cosine score must be interpreted in context, which is why Rosenav provides the qualitative tier labels alongside the raw percentage.
Deduplication, Sorting & Line Operations — How Each Tool Works
The third tab bundles five common line-level text operations into a single interface. Remove duplicate lines while preserving first-occurrence order. Sort alphabetically ascending or descending using locale-aware comparison. Randomize line order with cryptographically secure shuffling. Add padded line numbers for reference. Each operation is instant and runs entirely in your browser.
Deduplication — Preserving Order, Removing Redundancy
The deduplication algorithm processes lines sequentially, maintaining a hash set of seen values. Each line's first occurrence is kept; all subsequent duplicates are removed. This first-occurrence-wins strategy preserves the original ordering while eliminating redundancy — ideal for cleaning email lists, removing duplicate URLs from scraped datasets, consolidating keyword lists, or de-duplicating log entries. The tool reports how many lines were removed so you can quantify the duplication rate at a glance. Unlike simple sort | uniq pipelines that reorder your data, Rosenav's dedup preserves your original sequence — a critical feature when line order carries meaning (timestamps, ranked lists, sequential data).
Sorting Options — A→Z, Z→A, and Random Shuffle
Three sort modes cover common text processing needs. Sort A→Z arranges lines alphabetically in ascending order using locale-aware string comparison (localeCompare with base sensitivity), correctly handling accented characters and mixed case. Sort Z→A reverses the order for descending needs. Randomize shuffles lines using the Fisher-Yates algorithm powered by crypto.getRandomValues() — the same CSPRNG used for cryptographic key generation — ensuring unbiased, unpredictable shuffling suitable for A/B test assignment, randomized survey question ordering, or lottery-style selection. The shuffle produces every possible permutation with equal probability, unlike Math.random()-based shuffles that exhibit subtle statistical biases.
Line Numbering — Padded, Aligned, Readable
The line numbering feature prepends each line with a zero-padded sequential number and a vertical bar separator (e.g., 1 | content, 42 | content). The padding width automatically adjusts to the total line count — a 9-line text uses single-digit padding while a 1,000-line text uses 4-digit padding — keeping numbers right-aligned for easy scanning. This is useful for referencing specific lines during code reviews, adding reference numbers to datasets, preparing text for annotation workflows, or creating numbered lists from raw data. The operation is reversible: copy the numbered output, paste it back, and use a simple text editor to strip the prefixes if needed.
Zero-Upload Architecture — Why Your Text Never Leaves Your Browser
Many online text tools silently upload your content to remote servers for "processing" — exposing sensitive documents, proprietary code, or confidential data to third-party infrastructure. Rosenav inverts this model: every computation — LCS-based diff, TF-IDF-inspired cosine similarity, deduplication, sorting, shuffling, and line numbering — executes entirely within your browser's JavaScript runtime. You can disconnect from the internet after page load and all tools continue working perfectly offline.
Verifiable Privacy — Inspect the Network Tab Yourself
Rosenav operates without login walls, account registration, or tracking cookies of any kind. There is no backend server to POST text to, no cloud-based analysis queue, and no third-party API integration. Every computation runs locally on your device's CPU via vanilla JavaScript. This architecture provides three verifiable guarantees: (1) Open your browser's Developer Tools → Network tab while using any tool — you will observe exactly zero outbound requests. (2) Disconnect from the internet entirely after the page loads — every function including text diff, similarity check, and line operations continues to operate without degradation. (3) The entire source code is delivered as readable, unobfuscated JavaScript — right-click → View Source to audit every function. No localStorage, no sessionStorage, no cookies, no IndexedDB entries tied to your text input. Close the tab, and every byte evaporates from temporary DOM memory.
Air-Gapped Operation — Fully Functional Offline
Because the entire tool suite is pure client-side JavaScript with zero external API dependencies, all three tools remain fully functional after you sever your internet connection. All assets — HTML templates, CSS stylesheets, JavaScript logic, and Material Symbols icon font — are self-hosted on the rosenav.com origin with no CDN proxying. This air-gapped design serves professionals handling sensitive documents: legal contracts that cannot leave the device, proprietary source code under NDA, unpublished manuscripts, or classified data in air-gapped environments. Load the page once over a trusted connection, disconnect, and use the tools with mathematical certainty that no data transits the network boundary.
Aligned with 345tool Core Principles — Convenient · Simple · Beautiful
The 345tool collective builds every tool around three non-negotiable UX principles refined across the entire satellite-site matrix. Convenient: paste text and get instant results across all three tools — no configuration screens, no setup wizards, no learning curve. Simple: single-purpose tools that perform one function exceptionally well — diff, similarity check, and line operations — nothing more, nothing less. Beautiful: clean, high-contrast interfaces with responsive layouts that render correctly on viewports from 320px mobile screens to 4K desktop monitors. The purple-themed dark gradient hero section transitions naturally into the white-background tool cards, with Material Symbols providing consistent, legible iconography across all breakpoints. Rosenav embodies all three principles: paste, click, see results — wrapped in an interface that respects both your time and your privacy.
Frequently Asked Questions — Text Tools & Usage
Common questions about the diff algorithm, cosine similarity calculation, deduplication logic, and how the Rosenav text tools work under the hood.
How does the text diff comparison work?
The diff tool uses the Longest Common Subsequence (LCS) dynamic programming algorithm to find the optimal alignment between two texts at the line level. Each line is treated as an atomic element — the algorithm builds a 2D table computing the LCS length for every prefix pair, then backtracks to produce the actual edit sequence: lines present in both texts are marked "unchanged," lines only in the original are "removed" (red, - prefix), and lines only in the modified are "added" (green, + prefix). This is the same algorithmic approach used by git diff and Unix diff. The result shows each line with a visual indicator, plus a summary of total unchanged, added, and removed lines. The algorithm runs in O(m×n) time where m and n are the line counts of the two texts — efficient for documents up to several thousand lines in the browser.
What is cosine similarity and how is it calculated?
Cosine similarity measures the angle between two text vectors in high-dimensional word-frequency space, producing a score from 0 (completely different vocabulary) to 1 (identical word frequency distribution), displayed as 0–100%. The process: (1) Tokenization lowercases text, strips punctuation, and splits on whitespace to extract word tokens. (2) Term Frequency (TF) counts occurrences of each unique word in both texts, creating two sparse vectors over the shared vocabulary. (3) The cosine of the angle between these vectors is computed as dot product ÷ (magnitude_A × magnitude_B). Because cosine normalizes for vector length, a 500-word and 5,000-word text on the same topic can score high — the metric measures vocabulary proportion, not absolute length. The result is color-coded: ≥90% nearly identical, ≥70% high similarity, ≥40% moderate, below 40% low similarity. Token counts for both texts are displayed for transparency.
When should I use the similarity checker vs. the diff tool?
These tools serve complementary but distinct purposes. Use the diff tool when you need to see exactly what changed between two versions — which lines were added, removed, or kept. It's ideal for code reviews, document revision tracking, config file comparisons, and any scenario where you need a line-by-line change log. Use the similarity checker when you want a quantitative measure of how similar two texts are overall — for comparing AI model outputs across different prompts or providers, detecting paraphrased content, measuring document relatedness, or clustering similar texts. The diff tells you "where" things changed; the similarity checker tells you "how much" things changed. Many workflows benefit from using both: run the similarity checker first to see if there's a meaningful difference, then use the diff tool to inspect exactly where the differences lie.
How does the deduplication tool handle duplicate lines?
The deduplication tool uses a first-occurrence-wins strategy: as it processes lines sequentially from top to bottom, it maintains a hash set of previously seen lines. The first time a line appears, it's kept in the output and added to the set. All subsequent occurrences of the same line are silently discarded. This preserves the original relative ordering of unique lines — a critical feature when line order carries meaning (timestamps, ranked lists, sequential data like log entries). After processing, the tool displays a stats summary showing how many lines were removed: e.g., "Deduplicated: 150 → 127 lines (23 removed)." This is different from the common sort | uniq shell pipeline which alphabetically reorders your data before deduplication. For large datasets, the hash-set approach provides O(n) expected time complexity, making it fast even for thousands of lines.
Can I sort lines in different orders?
Yes, three sort modes are available. Sort A→Z arranges lines alphabetically ascending using localeCompare with base sensitivity, which correctly handles accented characters (é, ü, ñ) and case-insensitive comparison. Sort Z→A reverses the order. Randomize shuffles lines using the Fisher-Yates algorithm powered by crypto.getRandomValues() — a cryptographically secure PRNG that produces unbiased shuffles where every permutation has equal probability. This is mathematically superior to Math.random()-based shuffles which exhibit subtle statistical biases due to the limited state space of standard LCG/Xorshift128+ generators. The random shuffle is suitable for A/B test group assignment, survey question randomization, lottery selection, or any scenario requiring fair, unpredictable ordering.
Does this tool send my text to any server?
No. Zero characters leave your device — ever. The entire text processing engine is a self-contained vanilla JavaScript file (text-tools.js) with zero network calls: no fetch(), no XMLHttpRequest, no navigator.sendBeacon(), no WebSocket connections, and no analytics events bound to any text input field. You can verify this independently in three ways: (1) Open your browser's Developer Tools (F12) → Network tab, then use any of the three tools — observe zero outbound requests. (2) After the page finishes loading, disconnect your internet entirely — every function including diff, similarity check, dedup, sort, and line numbering continues working without degradation. (3) View the source code directly: right-click → View Page Source, locate /js/text-tools.js, and inspect every function — the code is delivered unobfuscated specifically to enable this audit. The text input fields are standard HTML elements — your text exists exclusively in your browser's memory and is never serialized, persisted, or transmitted.
What use cases are these text tools designed for?
The three tools cover a wide range of text processing needs. Text Diff: code review comparisons, document revision tracking, config file change validation, comparing AI prompt outputs across different models or parameters, contract clause review, and tracking edits across article drafts. Similarity Checker: comparing LLM output consistency, detecting paraphrased or AI-generated content, measuring document relatedness for clustering, verifying originality of summaries against source material, and analyzing vocabulary overlap between technical documents. Dedup / Sort / Lines: cleaning email lists and removing duplicate entries, consolidating scraped URL datasets, sorting keyword lists for SEO research, randomizing survey question order, adding reference numbers to datasets for annotation workflows, and preparing text for import into databases or spreadsheets. All three tools run instantly in the browser with no file size limits beyond your device's available memory.
Who built Rosenav and what is the business model?
Rosenav is engineered, maintained, and hosted by 345tool, an independent international developer collective specializing in lightweight, privacy-first browser utilities that replace bloated internet tools. The platform operates on a strict zero-tracking, zero-registration, zero-data-collection model across the entire 345tool satellite-site matrix. Monetization relies exclusively on non-disruptive, contextually relevant banner placements positioned outside the core tool interface — these placements never interfere with tool functionality. Over time, these transition into premium B2B link partnerships with verified technical organizations in adjacent fields. No user data, text content, behavioral analytics, or any other telemetry are ever collected, packaged, or sold — there is literally nothing to sell because nothing is collected. For complete transparency, the full source code is readable directly in the browser via View Source or Developer Tools → Sources panel. The 345tool team can be contacted at x345tool@outlook.com or visited at 345tool.com.
345tool Team
We are the 345tool Team
345tool is an independent developer collective engineering elite, pure client-side, and privacy-first web utilities to replace bloated internet tools.
