Where did the decades of classical NLP go? No gold-standard resources like WordNet? No statistical methods?
There's nothing wrong with this, the solution is a good pragmatic choice. It's just interesting how our collective consciousness of expansive scientific fields can be so thoroughly purged when a new paradigm arises.
LLMs have completely overshadowed ML NLP methods from 10 years ago, and they themselves replaced decades statistical NLP work, which also replaced another few decades of symbolic grammar-based NLP work.
Progress is good, but it's important not to forget all those hard-earned lessons, it can sometimes be a real superpower to be able to leverage that old toolbox in modern contexts. In many ways, we had much more advanced methods in the 60s for solving this problem than what Harper is doing here by naively reinventing the wheel.
chilipepperhott 7 hours ago [-]
I'll admit it's something of a bold label, but there is truth in it.
Before our rule engine has a chance to touch the document, we run several pre-processing steps that imbue semantic meaning to the words it reads.
> LLMs have completely overshadowed ML NLP methods from 10 years ago, and they themselves replaced decades statistical NLP work, which also replaced another few decades of symbolic grammar-based NLP work.
This is a drastic oversimplification. I'll admit that transformer-based approaches are indeed quite prevalent, but I do not believe that "LLMs" in the conventional sense are "replacing" a significant fraction of NLP research.
I appreciate your skepticism and attention to detail.
tough 7 hours ago [-]
to someone who would like to study/learn that evolution, any good recs?
aDyslecticCrow 8 hours ago [-]
Harper is decent.
I've relied on Grammarly to spellcheck all my writing for a few years (dyslexia
prevents me from seeing the errors even when reading it 10 times). However, I find its increasing focus on LLMs and its insistence on rewriting sentences in more verbose ways bothers me a lot. (It removes personality and makes human-written text read like AI text.)
So I've tried out alternatives, and Harper is the closest I've found at the moment... but i still feel like grammarly does a better job at the basic word suggestion.
Really, all I wish for is a spellcheck that can use the context of the sentence to suggest words. Most ordinary dictionary spellchecks can pick the wrong word because it's syntactically closer. They may replace "though" with "thought" because I wrote "thougt" when the sentence clearly indicates "though" is correct; and I see no difference visually between any of the three words.
tolerance 20 hours ago [-]
I would much rather check my writing against grammatical rules that are hard coded in an open source program—meaning that I can change them—than ones that I imagine would be subject to prompt fiddling or worse; implicitly hard coded in a tangle of training data that the LLM would draw from.
The whole thing seems cool. Automattic should mention this on their homepage. Tools like this are the future of something.
triknomeister 17 hours ago [-]
You would lose out on evolution of language.
phoe-krk 15 hours ago [-]
Natural languages evolve so slowly that writing and editing rules for them is easily achievable even this way. Think years versus minutes.
fakedang 14 hours ago [-]
Aight you win fam, I was trippin fr. You're absolutely bussin, no cap. Harvard should be taking notes.
(^^ alien language that was developed in less than a decade)
notahacker 12 hours ago [-]
The existence of common slang which isn't used in the sort of formal writing that grammar linting tools are typically designed to promote is more of a weakness of learning grammar by a weighted model of the internet vs formal grammatical rules than a strength.
Not an insurmountable problem, ChatGPT will use "aight fam" only in context-sensitive ways and will remove it if you ask to rephrase to sound more like a professor, but RHLFing slang into predictable use is likely a bigger potential challenge than simply ensuring the word list of an open source program is sufficiently up to date to include slang whose etymology dates back to the noughties or nineties, if phrasing things in that particular vernacular is even a target for your grammar linting tool...
chrisweekly 11 hours ago [-]
Huh, this is the first time I've seen "noughties" used to describe the first decade of the 2000s. Slightly amusing that it's surely pronounced like "naughties". I wonder if it'll catch on and spread.
harvey9 11 hours ago [-]
The fact that you never saw it before suggests it did not catch on and spread during the last 25 years.
nailer 10 hours ago [-]
‘Noughties’ was popular in Australia from 2010 onwards. Radio stations would “play the best from the eighties nineties noughties and today”.
notahacker 10 hours ago [-]
Common in Britain too, also appears in the opening lines of the Wikipedia description for the decade and the OED.
afeuerstein 14 hours ago [-]
I don't think anyone has the need to check such a message for grammar or spelling mistakes.
Even then, I would not rely on a LLM to accurately track this "evolution of language".
fakedang 12 hours ago [-]
What if you're writing emails to GenZers?
dpassens 11 hours ago [-]
As a zoomer, I'd rather not receive emails that sound like they're written by a moron.
bombcar 11 hours ago [-]
Attempting to write like a GenZ when you’re not gets you “hello fellow kids” and “Boomer” right away.
dmoy 5 hours ago [-]
Pedantically,
aight, trippin, fr (at least the spoken version), and fam were all very common in the 1990s (which was the last decade I was able to speak like that without getting jeered at by peers).
phoe-krk 14 hours ago [-]
Yes, precisely. This "less than a decade" is magnitudes above the hours or days that it would take to manually add those words and idioms to proper dictionaries and/or write new grammar rules to accomodate aspects like skipping "g" in continuous verbs to get "bussin" or "bussin'" instead of "bussing". Thank you for illustrating my point.
Also, it takes at most few developers to write those rules into a grammar checking system, compared to millions and more that need to learn a given piece of "evolved" language as it becomes impossible to avoid learning it. It's not only fast enough to do this manually, it also takes much less work-intensive and more scalable.
efitz 7 hours ago [-]
I’m glad we have people at HN who could have eliminated decades of effort by tens of thousands of people, had they only been consulted first on the problem.
phoe-krk 6 hours ago [-]
Which effort? Learning a language is something that can't be eliminated. Everyone needs to do it on their own. Writing grammar checking software, though, can be done few times and then copied.
fakedang 12 hours ago [-]
Not exactly. It takes time for those words to become mainstream for a generation. While you'd have to manually add those words in dictionaries, LLMs can learn these words on the fly, based on frequency of usage.
phoe-krk 11 hours ago [-]
At this point we're already using different definitions of grammar and vocabulary - are they discrete (as in a rule system, vide Harper) or continuous (as in a probability, vide LLMs). LLMs, like humans, can learn them on the fly, and, like humans, they'll have problems and disagreements judging whether something should be highlighted as an error or not.
Or, in other words: if you "just" want a utility that can learn speech on the fly, you don't need a rigid grammar checker, just a good enough approximator. If you want to check if a document contains errors, you need to define what an error is, and then if you want to define it in a strict manner, at that point you need a rule engine of some sort instead of something probabilistic.
qwery 14 hours ago [-]
Please share your reasoning that led you to this conclusion -- that natural language "evolves slowly".
You also seem to be making an assumption that natural languages (English, I'm assuming) can be well defined by a simple set of rigid patterns/rules?
phoe-krk 14 hours ago [-]
> Please share your reasoning that led you to this conclusion -- that natural language "evolves slowly".
Languages are used to successfully communicate. To achieve this, all parties involved in the communication must know the language well enough to send and receive messages. This obviously includes messages that transmit changes in the language, for instance, if you tried to explain to your parents the meaning of the current short-lived meme and fad nouns/adjectives like "skibidi ohio gyatt rizz".
It takes time for a language feature to become widespread and de-facto standardized among a population. This is because people need to asynchronously learn it, start using it themselves, and gain critical mass so that even people who do not like using that feature need to start respecting its presence. This inertia is the main source of slowness that I mention, and also and a requirement for any kind of grammar-checking software. From the point of such software, a language feature that (almost) nobody understands is not a language feature, but an error.
> You also seem to be making an assumption that natural languages (English, I'm assuming) can be well defined by a simple set of rigid patterns/rules?
Yes, that set of patterns is called a language grammar. Even dialects and slangs have grammars of their own, even if they're different, less popular, have less formal materials describing them, and/or aren't taught in schools.
qwery 12 hours ago [-]
Fair enough, thanks for replying. I don't see the task of specifying a grammar as straightforward as you do, perhaps. I guess I just didn't understand the chain of comments.
I find that clear-cut, rigid rules tend to be the least helpful ones in writing. Obviously this class of rule is also easy/easier to represent in software, so it also tends to be the source of false positives and frustration that lead me to disable such features altogether.
phoe-krk 12 hours ago [-]
When you do writing as a form of art, rules are meant to be bent or broken; it's useful to have the ability to explicitly write new ones and make new forms of the language legal, rather than wrestle with hallucinating LLMs.
When writing for utility and communication, though, English grammar is simple and standard enough. Browsing Harper sources, https://github.com/Automattic/harper/blob/0c04291bfec25d0e93... seems to have a lot of the basics already nailed down. Natural language grammar can often be represented as "what is allowed to, should, or should not, appear where, when, and in which context" - IIUC, Harper seems to tackle the problem the same way.
bombcar 11 hours ago [-]
Just because the rules aren’t set fully in stone, or can be bent or broken, doesn’t mean they don’t “exist” - perhaps not the way mathematical truths exist, but there’s something there.
Even these few posts follow innumerable “rules” which make it easier to (try) to communicate.
Perhaps what you’re angling against is where rules of language get set it stone and fossilized until the “Official” language is so diverged from the “vulgar tongue” that it’s incomprehensibly different.
Like church/legal Latin compared to Italian, perhaps. (Fun fact - the Vulgate translation of the Bible was INTO the vulgar tongue at the time: Latin).
airstrike 11 hours ago [-]
I don't need grammar to evolve in real time. In fact, having a stabilizing function is probably preferable to the alternative.
eadmund 9 hours ago [-]
If a language changes, there are only three possible options: either it becomes more expressive; or it becomes less expressive; or it remains as expressive as before.
Certainly we would never want our language to be less expressive. There’s no point to that.
And what would be the point of changing for the sake of change? Sure, we blop use the word ‘blop’ instead of the word ‘could’ without losing or gaining anything, but we’d incur the cost of changing books and schooling for … no gain.
Ah, but it’d be great to increase expressiveness, right? The thing is, as far as I am aware all human languages are about equal in terms of expressiveness. Changes don’t really move the needle.
So, what would the point of evolution be? If technology impedes it … fine.
canjobear 6 hours ago [-]
The world that we need to be expressive about is changing.
dragonwriter 9 hours ago [-]
> So, what would the point of evolution be?
Being equally as expressive overall but being more focussed where current needs are.
OTOH, I don't think anything is going to stop language from evolving in that way.
Polarity 14 hours ago [-]
why did you use chatgpt for this text then?
acidburnNSA 14 hours ago [-]
I can write em-dashes on my keyboard in one second using the compose key: right alt + ---
Freak_NL 11 hours ago [-]
Same here — the compose key is so convenient you forget most people never heard of it. This em-dashes mean LLM output thing is getting annoying though.
johnisgood 9 hours ago [-]
> This em-dashes mean LLM output thing is getting annoying though.
Agreed. Same with those non-ASCII single and double quotes.
14 hours ago [-]
shortformblog 20 hours ago [-]
LanguageTool (a Grammarly competitor) is also open source and can be managed locally:
I haven't messed with Harper closely but I am aware of its existence. It's nice to have options, though.
It would sure be nice if the Harper website made clear that one of the two competitors it compares itself to can also be run locally.
akazantsev 16 hours ago [-]
There are two versions of the LanguageTool: open source and cloud-based. Open source checks the individual words in the dictionary just like the system's spell checker. Maybe there is something more to it, but in my tests, it did not fix even obvious errors. It's not an alternative to Grammarly or this tool.
shortformblog 12 hours ago [-]
There is. It can be heavily customized to your needs and built to leverage a large ngram data set:
IMO not using LLMs is a big plus in my book. Grammarly has been going downhill since they've been larding it with "AI features," it has become remarkably inconsistent. It will tell me to remove a comma one hour, and then tell me to add it back the next.
tiew9Vii 14 hours ago [-]
Being dyslexic, I was an avid Grammarly user. Once it started adding "AI features" the deterioration was noticeable, I cancelled my subscription and stopped using it a year ago.
I also only ever used the web app, so copy+pasting as installing the app is for all intentness and purposes is installing a key logger.
Grammar works on rules, not sure why that needs an LLM, Grammarly certainly worked better for me when it was more dumb, using rules.
InsideOutSanta 15 hours ago [-]
Grammarly sometimes gets stuck in a loop, where it suggests changing from A to B. It then immediately suggests changing from B to A again, continuing to suggest the opposite change every time I accept the suggestion.
It's not a problem; I make the determination which option I like better, but it is funny.
harvey9 10 hours ago [-]
'imo' and 'in my book' are redundant in the same sentence. Are there rules-based techniques to catch things like that? Btw I loved the use of 'larding' outside the context of food.
boplicity 23 hours ago [-]
General purpose LLMs seem to get very confused about punctuation, in my experience. It's one of their big areas of obvious failing. I'm surprised Grammarly would allow this to happen.
jethro_tell 18 hours ago [-]
The internet, especially post phone keyboards, is extremely inconsistent about punctuation. I’m not sure how anyone could think an llm wouldn’t be.
Alex-Programs 13 hours ago [-]
DeepL Write was pretty good in the post-LLM, pre-ChatGPT era.
Dr4kn 12 hours ago [-]
DeepL is different in my opinion. They always focused on machine learning for languages.
They must have acquired fantastic data for their Models. Especially because of the business language and professional translations which they focus on.
They keep your intended message in tact and just refine it. Like a book post editing. Grammarly and other tools force you to sound like they think is best.
DeepL shows, in my opinion, how much more useful a model trained for specific uses is.
monkeywork 5 hours ago [-]
Any suggestions for models ppl can run locally that are close to deepl
raincole 1 days ago [-]
So is there a similar tool but based on an LLM?
Not that I think LLM is always better, but it would be interesting to compare these two approaches.
Given LISP was supposed to build "The AI" ... pretty sad than a dumb LLM is taking its place now
7thaccount 23 hours ago [-]
Grammarly came out before the LLMs. I'm not sure what approach it took, but they're likely feeling a squeeze as LLMs can tell you how to rewrite a sentence to remove passive voice and all that. I doubt the LLMs are as consistent (some comments below show some big issues), but they're free (for now).
chneu 23 hours ago [-]
Thank you. In general my grammarly and gboard predictions have become so, so bad over the last year.
raverbashing 18 hours ago [-]
> It will tell me to remove a comma one hour, and then tell me to add it back the next.
So just like English teachers I see
demarq 1 days ago [-]
"Me and Jennifer went to have seen the ducks cousin."
No errors detected. So this needs a lot of rule contributions to get to Grammarly level.
alpb 1 days ago [-]
Similarly 0 grammatical errors flagged: "My name John. What your name? What day today?"
Tsarp 20 hours ago [-]
I was initially impressed. But then I tested a bunch, it wasn't catching some really basic things. Mostly hit or miss.
marginalia_nu 11 hours ago [-]
Goes the other way around too. For
> In large, this is _how_ anything crawler-adjacent tends to be
It suggests
> In large, this is how _to_ anything crawler-adjacent tends to be
wellthisisgreat 1 days ago [-]
What the duck is that test
canyp 1 days ago [-]
Nominative vs objective
thfuran 19 hours ago [-]
There's a little more going on than that.
rdlw 5 hours ago [-]
In addition to case, it's testing tense (went to have seen) and plural vs. posessive (ducks cousin)
canyp 8 hours ago [-]
Yeah, I stopped parsing after "Me and Jennifer".
healsdata 23 hours ago [-]
Given this is an Automattic product, I'm hesitant to use it. If it gets remotely successful, Matt will ruin it in the name of profit.
josephcsible 23 hours ago [-]
It's FOSS, so even if the worst happens, anyone could just fork the last good version and continue development there.
jantissler 22 hours ago [-]
Oh, that’s a big no from me then.
icapybara 1 days ago [-]
Why wouldn't you want an LLM for a language learning tool? Language is one of things I would trust an LLM completely on. Have you ever seen ChatGPT make an English mistake?
healsdata 23 hours ago [-]
Grammarly is all in on AI and recently started recommended splitting "wasn't" and added the contraction to the word it modified. Example: "truly wasn't" becomes "was trulyn't"
Hm ... I wonder, is Grammarly also responsible for the flood of contraction of lexical "have" the last few years? It's standard in British English, but outside of poetry it is proscribed in almost all other dialects (which only permit contraction of auxiliary "have").
Even in British I'm not sure how widely they actually use it - do they say "I've a car" and "I haven't a car"?
filterfish 21 hours ago [-]
"they" say "I haven't got a car".
Contractions are common in Australian English to, though becoming less so due to the influence of US English.
NoboruWataya 15 hours ago [-]
In my experience "I've a car" is much more common than "I haven't a car" (I've never heard the latter construct used, but regularly hear the former in casual speech). "I haven't got a car" or "I've no car" would be relatively common though.
Destiner 16 hours ago [-]
I don't think an LLM would recommend an edit like that.
Has to be a bug in their rule-based system?
healsdata 8 hours ago [-]
Gemini: "Was trulyn't" is a contraction that follows the rules of forming contractions, but it is not a widely used or accepted form in standard English. It is considered grammatically correct in a technical sense, but it's not common usage and can sound awkward or incorrect to native speakers.
marginalia_nu 11 hours ago [-]
I wonder how much memes like whomst'd might skew the training set.
InsideOutSanta 15 hours ago [-]
Yeah, I agree. An open-source LLM-based grammar checker with a user interface similar to Grammarly is probably what I'm looking for. It doesn't need to be perfect (none of the options are); it just needs to help me become a better writer by pointing out issues in my text. I can ignore the false positives, and as long as it helps improve my text, I don't mind if it doesn't catch every single issue.
Using an LLM would also help make it multilingual. Both Grammarly and Harper only support English and will likely never support more than a few dozen very popular languages. LLMs could help cover a much wider range of languages.
Szpadel 6 hours ago [-]
I tried to use one LLM based tool to rewrite sentence in more official corporate form, and it rewrote something like "we are having issues with xyz" into "please provide more information and I'll do my best to help".
LLMs are trained so hard to be helpful that it's really hard to contain them into other tasks
Groxx 24 hours ago [-]
uh. yes? it's far from uncommon, and sometimes it's ludicrously wrong. Grammarly has been getting quite a lot of meme-content lately showing stuff like that.
it is of course mostly very good at it, but it's very far from "trustworthy", and it tends to mirror mistakes you make.
perching_aix 19 hours ago [-]
Do you have any examples? The only time I noticed an LLM make a language mistake was when using a quantized model (gemma) with my native language (so much smaller training data pool).
dartharva 19 hours ago [-]
Because this "language learning tool" will be dominantly used to avoid actually learning the language.
How big is English in "English grammar checker"? Is it plausible to add other languages to it, or the underlying framework is so English-specific that it doesn't make sense to even bother building something else than English grammar checker upon it?
loughnane 8 hours ago [-]
Surprised coming into this that I don't see anyone mentioning vale[0]. I've been using it for ~4 years now and love it.
I use grammarly briefly when it came out and liked the idea. Admittedly it has more polish than vale for people writing in google docs, &c. Still, I stick with Vale. Is there any case for moving to Harper?
Vale requires a lot of tweaking, and I’ve never been able to get a rule set with which I’m happy.
It’s missing a default rule set with rules that are generally okay without being too opinionated.
aDyslecticCrow 8 hours ago [-]
Looks interesting for linting and cleaning markdown documentation, but it doesn't seem like a very competent "spellcheck". I'll check it out... but it doesn't actually do the same thing as Grammarly or Harper.
raybb 18 hours ago [-]
Would be nice if they had a website where you could demo/test it before downloading extensions and stuff. Their firefox extension opens to this page https://writewithharper.com/install-browser-extension but when you paste in anything more than a few paragraphs the highlighting is all messed up.
VTimofeenko 1 days ago [-]
Comes with a great LSP server capable of checking grammar in code comments:
i honestly don't trust grammarly ... i mean, its essentially a keylogger.
i did try it a bit once, and i never seem to have it work that well for me. But i am multilingual so maybe thats part of my hurdle
ibobev 16 hours ago [-]
I'm a long-time Grammarly user. I just tried Harper, and it simply performs very poorly. It is a good initiative, but I don't feel the current state of this software to be worthwhile.
novoreorx 9 hours ago [-]
Seeing Harper as an implementation of natural language's LSP brings me great joy, as it proves an idea I've had for a long time—natural language and programming languages are interconnected. Many concepts and techniques from programming languages can also be applied to natural language, making our lives more convenient. The development of LLMs and vibe coding has further blurred the boundary between natural language and programming languages, offering similar insights.
IceWreck 1 days ago [-]
Slightly controversial compared to other comments here but I haven't used Grammerly at all since LLMs came out. Even a 4B local LLM is good enough to rephrase all forms of text and fix most grammer mistakes.
gglanzani 17 hours ago [-]
I think a lot of value comes by integrating with a language server and/or browser extensions.
Do you have a setup where this is possible or do you copy paste between text fields? (Genuine question. I’d love to use a local LLM integrating with an LSP).
In a world of LLMs, it's great to see classic NLP works like Harper. Both definitely have their own use cases.
msravi 9 hours ago [-]
Looks very good. Was looking to replace ltex (which is really slow), but for some reason the nvim-lspconfig filetype setting for harper doesn't seem to have (la)tex listed as a default, although markdown and typst are listed. Anyone knows why?
chilipepperhott 7 hours ago [-]
Harper maintainer here
We've had some contributors have a go at adding LaTeX support in the past, but they've yet to succeed with a truly polished option. The irregularity of LaTeX makes it somewhat difficult to parse.
We accept contributions, if anyone is interested in getting us across the finish line.
behnamoh 24 hours ago [-]
I wish it had keyboard shortcuts. As a Vim user, in Chrome it's tedious to click on every suggestion given by the app. Also, maybe add a "delay" so it doesn't think the currently-being-typed word is a mistake (let me finish typing first!).
Otherwise, it's great work. There should be an option to import/export the correction rules though.
cAtte_ 22 hours ago [-]
this solution is just fundamentally insufficient. in the age of LLMs it's pretty insane to imagine programmers manually hard-coding an arbitrary subset of grammatical corrections (sure: it's faster, it's local first, but it's not enough). on top of that, English (like any other natural language) is such a complicated beast that you will never write a classic deterministic parser that's sophisticated enough to allow you to reliably implement even the most basic of grammatical corrections (check the other comments for examples). it's just not gonna happen.
i guess it's a nice and lightweight enhancement on top of the good old spellchecker, though
yablak 5 hours ago [-]
Any chance to make the obsidian plugin work in mobile/Android?
jimaek 14 hours ago [-]
I don't understand why we even need such services. Why don't the browsers and maybe even the OS just not improve their included grammar checkers?
The Chrome enhanced grammar checker is still awful after decades.
Maybe the AI hype will finally fix this? I'm still surprised this wasn't the first thing they did.
klabetron 15 hours ago [-]
Odd choice that the example text on the homepage is almost all obvious typos that a standard spell check would pick up.
AbstractH24 12 hours ago [-]
My biggest problem with Grammarly has always been how buggy the product is. From not checking random sites to messing up formatting to not updating text with the selected changes.
If Harper does better at this I’d change in a minute.
b0a04gl 18 hours ago [-]
this is the right direction. rulebased, local, transparent. not perfect yet, but that's not the point. getting something lightweight and tweakable matters more than catching every edge case out of the box. if it misses, you add rules. simple as that. if you expect it to match grammarly day one then might be we are missing the tradeoff
piperly 13 hours ago [-]
Unfortunately, the last time I tested Harper inside Neovim, it alone used more than 1 GB of RAM for just the LSP! However, the concept is nice, open source, no AI, and easy to integrate.
victorbjorklund 16 hours ago [-]
Very cool. Has anyone integrated this into their own app? How was your experience?
paxys 21 hours ago [-]
Looks cool, but it's weird to constantly make comparisons to Grammarly (in the post title, description section of the site, benchmarks) when this is clearly a rule-based spellcheck and very different from what Grammarly offers.
Instead tell me how it compares to the built-in spellcheck in my browser/IDE/word processor/OS.
JPLeRouzic 1 days ago [-]
It is available in Autommatic's Github repository:
"For most documents, Harper can serve up suggestions in under 10ms." 10l is OK. 10kg as well. Why is 10ms wrong?
mpaepper 12 hours ago [-]
Are languages other than English also supported? Or is this for English only?
ssernikk 11 hours ago [-]
From their FAQ:
> We currently only support English and its dialects British, American, Canadian, and Australian. Other languages are on the horizon, but we want our English support to be truly amazing before we diversify.
lurk2 17 hours ago [-]
Who is the target market is for Grammarly? Working professionals who speak English as a second language?
victorbjorklund 16 hours ago [-]
I think it is anyone who wanna make sure they write correctly. I know for example David Sparks (MacSparky https://www.macsparky.com ) uses it (or at leased used it). And he was an American lawyer and he says writing has been his passion his whole life so I assume his English is better than the average person.
InsideOutSanta 15 hours ago [-]
Adam Engst from TidBITs, a person whose job has been writing things for all his life, also uses Grammarly:
“Think of how poorly the average person writes, and realize half of them write worse than that.”
(George Carlin or something, quote's veracity depends on what you mean by “average.”)
I think everybody could benefit from having something like Grammarly on their computer. None of us writes perfectly, and it's always beneficial to strive for improvement.
Veen 15 hours ago [-]
I use it as a proofreader, not to improve my writing. It’s difficult to proofread your own work, and Grammarly is a useful assistant. Plus, I’m British and I often write on behalf of American clients. I’m pretty good at following US English standards because I’ve been doing it for a long time, but the odd Britishism slips through and Grammarly usually catches it (although a standard spell checker would too, I suppose).
m00dy 17 hours ago [-]
People who haven't heard of LLMs
akazantsev 16 hours ago [-]
LLMs are not nice to use for spell checking. I do not want to read a wall of text from LLM just to find a missed article somewhere and I want to receive feedback as I type.
Also, once I asked LLM to check the message. It said everything looked fine and made a copy of the message in its response with one sentence in the middle removed.
SilverSlash 15 hours ago [-]
I haven't used Grammarly but for simple things like spelling mistakes, missed articles, or punctuation, wouldn't even Google Docs be enough?
1 days ago [-]
orliesaurus 1 days ago [-]
Very buggy, but great start!!
I.e. if you write an "MISTAEK" and then you scroll the highlight follows me around the page
21 hours ago [-]
crimputer 1 days ago [-]
Good start. But still has bugs i guess.
I tried with the following phrase
--
"This should can't logic be done me."
--
No errors.
v5v3 15 hours ago [-]
I used to see ads for Grammarly and wondered if anyone was using it.
Then post COVID with the increase in screen sharing video calls, I soon realised nearly every non-native English speaker from countries around the world heavily relied on it in their jobs. As I could see it installed when people share screens.
Huge market, good luck.
Finnucane 12 hours ago [-]
No serial comma? Screw that.
jacooper 1 days ago [-]
I think if you can self host language tool, it would still be the better option.
harper 1 days ago [-]
nice name!
sharkjacobs 1 days ago [-]
This seems to use a hard coded list of explicit rules, not an LLM
"PointIsMoot" => (
["your point is mute"],
["your point is moot"],
"Did you mean `your point is moot`?",
"Typo: `moot` (meaning debatable) is correct rather than `mute`."
),
a2128 1 days ago [-]
From a quick look phrase corrections is just one type of rule. There are many other rules, some are dynamic like when to use "your" vs "you're", oxford commas, etc.
That it doesn't use LLMs is its advantage, it runs in under 10ms and can be easily embedded in software and still provide useful grammar checking even if it's not exhaustive
I never understood the appeal of grammar tools. If you have reached the minimum professional/academic level needed to be designated to write something, shouldn't you at least be capable of verifying its semantic "correctness" just by reading through it once yourself?
Why would you pass a writing job to someone who isn't 100% fluent in the language and then make up for it by buying complex tools?
facundo_olano 18 hours ago [-]
As a non native English speaker/writer there are a bunch of errors I miss, no matter how much attention I pay and how much I proofread, and these tools are useful to catch those.
victorbjorklund 15 hours ago [-]
I know for example David Sparks (MacSparky https://www.macsparky.com ) uses it (or at leased used it). And he was an American lawyer and he says writing has been his passion his whole life so I assume his English is better than the average person.
speedgoose 17 hours ago [-]
Have you considered that some people aren’t 100% fluent in English but still competent?
Semaphor 19 hours ago [-]
I use it (well, languagetool) in the free version for comments on sites like this. It directly catches mistakes I make, that I'd normally only catch on re-reads. From typos, over my brain doing weird stuff, to sometimes things I simply didn't (actively) know.
jordanpg 5 hours ago [-]
I'm a lawyer. I write 10s of pages of text every day. "Reading through it once yourself" is obviously an imperfect solution. See, e.g., Poisson statistics. It's also slow and I bill in 6-minute increments. There is significant value in a grammar tool that protects confidentiality and is more effective than my wetware.
Veen 14 hours ago [-]
People are bad at proofreading their own work. Professional writers often use third-party copy editors and proofreaders for that reason.
Finnucane 10 hours ago [-]
I’m a production editor at an uni press, and I can tell you there’s not a strong correlation between professional/academic level and writing well.
boars_tiffs 14 hours ago [-]
vim plug?
chilipepperhott 7 hours ago [-]
There's an LSP. Not sure if that fits your use-case, though.
I'm just a bit skeptical about this quote:
> Harper takes advantage of decades of natural language research to analyze exactly how your words come together.
But it's just a rather small collection of hard-coded rules:
https://docs.rs/harper-core/latest/harper_core/linting/trait...
Where did the decades of classical NLP go? No gold-standard resources like WordNet? No statistical methods?
There's nothing wrong with this, the solution is a good pragmatic choice. It's just interesting how our collective consciousness of expansive scientific fields can be so thoroughly purged when a new paradigm arises.
LLMs have completely overshadowed ML NLP methods from 10 years ago, and they themselves replaced decades statistical NLP work, which also replaced another few decades of symbolic grammar-based NLP work.
Progress is good, but it's important not to forget all those hard-earned lessons, it can sometimes be a real superpower to be able to leverage that old toolbox in modern contexts. In many ways, we had much more advanced methods in the 60s for solving this problem than what Harper is doing here by naively reinventing the wheel.
Before our rule engine has a chance to touch the document, we run several pre-processing steps that imbue semantic meaning to the words it reads.
> LLMs have completely overshadowed ML NLP methods from 10 years ago, and they themselves replaced decades statistical NLP work, which also replaced another few decades of symbolic grammar-based NLP work.
This is a drastic oversimplification. I'll admit that transformer-based approaches are indeed quite prevalent, but I do not believe that "LLMs" in the conventional sense are "replacing" a significant fraction of NLP research.
I appreciate your skepticism and attention to detail.
I've relied on Grammarly to spellcheck all my writing for a few years (dyslexia prevents me from seeing the errors even when reading it 10 times). However, I find its increasing focus on LLMs and its insistence on rewriting sentences in more verbose ways bothers me a lot. (It removes personality and makes human-written text read like AI text.)
So I've tried out alternatives, and Harper is the closest I've found at the moment... but i still feel like grammarly does a better job at the basic word suggestion.
Really, all I wish for is a spellcheck that can use the context of the sentence to suggest words. Most ordinary dictionary spellchecks can pick the wrong word because it's syntactically closer. They may replace "though" with "thought" because I wrote "thougt" when the sentence clearly indicates "though" is correct; and I see no difference visually between any of the three words.
The Neovim configuration for the LSP looks neat: https://writewithharper.com/docs/integrations/neovim
The whole thing seems cool. Automattic should mention this on their homepage. Tools like this are the future of something.
(^^ alien language that was developed in less than a decade)
Not an insurmountable problem, ChatGPT will use "aight fam" only in context-sensitive ways and will remove it if you ask to rephrase to sound more like a professor, but RHLFing slang into predictable use is likely a bigger potential challenge than simply ensuring the word list of an open source program is sufficiently up to date to include slang whose etymology dates back to the noughties or nineties, if phrasing things in that particular vernacular is even a target for your grammar linting tool...
aight, trippin, fr (at least the spoken version), and fam were all very common in the 1990s (which was the last decade I was able to speak like that without getting jeered at by peers).
Also, it takes at most few developers to write those rules into a grammar checking system, compared to millions and more that need to learn a given piece of "evolved" language as it becomes impossible to avoid learning it. It's not only fast enough to do this manually, it also takes much less work-intensive and more scalable.
Or, in other words: if you "just" want a utility that can learn speech on the fly, you don't need a rigid grammar checker, just a good enough approximator. If you want to check if a document contains errors, you need to define what an error is, and then if you want to define it in a strict manner, at that point you need a rule engine of some sort instead of something probabilistic.
Languages are used to successfully communicate. To achieve this, all parties involved in the communication must know the language well enough to send and receive messages. This obviously includes messages that transmit changes in the language, for instance, if you tried to explain to your parents the meaning of the current short-lived meme and fad nouns/adjectives like "skibidi ohio gyatt rizz".
It takes time for a language feature to become widespread and de-facto standardized among a population. This is because people need to asynchronously learn it, start using it themselves, and gain critical mass so that even people who do not like using that feature need to start respecting its presence. This inertia is the main source of slowness that I mention, and also and a requirement for any kind of grammar-checking software. From the point of such software, a language feature that (almost) nobody understands is not a language feature, but an error.
> You also seem to be making an assumption that natural languages (English, I'm assuming) can be well defined by a simple set of rigid patterns/rules?
Yes, that set of patterns is called a language grammar. Even dialects and slangs have grammars of their own, even if they're different, less popular, have less formal materials describing them, and/or aren't taught in schools.
I find that clear-cut, rigid rules tend to be the least helpful ones in writing. Obviously this class of rule is also easy/easier to represent in software, so it also tends to be the source of false positives and frustration that lead me to disable such features altogether.
When writing for utility and communication, though, English grammar is simple and standard enough. Browsing Harper sources, https://github.com/Automattic/harper/blob/0c04291bfec25d0e93... seems to have a lot of the basics already nailed down. Natural language grammar can often be represented as "what is allowed to, should, or should not, appear where, when, and in which context" - IIUC, Harper seems to tackle the problem the same way.
Even these few posts follow innumerable “rules” which make it easier to (try) to communicate.
Perhaps what you’re angling against is where rules of language get set it stone and fossilized until the “Official” language is so diverged from the “vulgar tongue” that it’s incomprehensibly different.
Like church/legal Latin compared to Italian, perhaps. (Fun fact - the Vulgate translation of the Bible was INTO the vulgar tongue at the time: Latin).
Certainly we would never want our language to be less expressive. There’s no point to that.
And what would be the point of changing for the sake of change? Sure, we blop use the word ‘blop’ instead of the word ‘could’ without losing or gaining anything, but we’d incur the cost of changing books and schooling for … no gain.
Ah, but it’d be great to increase expressiveness, right? The thing is, as far as I am aware all human languages are about equal in terms of expressiveness. Changes don’t really move the needle.
So, what would the point of evolution be? If technology impedes it … fine.
Being equally as expressive overall but being more focussed where current needs are.
OTOH, I don't think anything is going to stop language from evolving in that way.
Agreed. Same with those non-ASCII single and double quotes.
https://github.com/languagetool-org/languagetool
I generally run it in a Docker container on my local machine:
https://hub.docker.com/r/erikvl87/languagetool
I haven't messed with Harper closely but I am aware of its existence. It's nice to have options, though.
It would sure be nice if the Harper website made clear that one of the two competitors it compares itself to can also be run locally.
https://dev.languagetool.org/finding-errors-using-n-gram-dat...
I would suggest diving into it more because it seems like you missed how customizable it is.
I also only ever used the web app, so copy+pasting as installing the app is for all intentness and purposes is installing a key logger.
Grammar works on rules, not sure why that needs an LLM, Grammarly certainly worked better for me when it was more dumb, using rules.
It's not a problem; I make the determination which option I like better, but it is funny.
They must have acquired fantastic data for their Models. Especially because of the business language and professional translations which they focus on.
They keep your intended message in tact and just refine it. Like a book post editing. Grammarly and other tools force you to sound like they think is best.
DeepL shows, in my opinion, how much more useful a model trained for specific uses is.
Not that I think LLM is always better, but it would be interesting to compare these two approaches.
Given LISP was supposed to build "The AI" ... pretty sad than a dumb LLM is taking its place now
So just like English teachers I see
No errors detected. So this needs a lot of rule contributions to get to Grammarly level.
> In large, this is _how_ anything crawler-adjacent tends to be
It suggests
> In large, this is how _to_ anything crawler-adjacent tends to be
https://imgur.com/a/RQZ2wXA
Even in British I'm not sure how widely they actually use it - do they say "I've a car" and "I haven't a car"?
Contractions are common in Australian English to, though becoming less so due to the influence of US English.
Has to be a bug in their rule-based system?
Using an LLM would also help make it multilingual. Both Grammarly and Harper only support English and will likely never support more than a few dozen very popular languages. LLMs could help cover a much wider range of languages.
LLMs are trained so hard to be helpful that it's really hard to contain them into other tasks
it is of course mostly very good at it, but it's very far from "trustworthy", and it tends to mirror mistakes you make.
https://automattic.com/2024/11/21/automattic-welcomes-harper...
I use grammarly briefly when it came out and liked the idea. Admittedly it has more polish than vale for people writing in google docs, &c. Still, I stick with Vale. Is there any case for moving to Harper?
[0] https://vale.sh/
It’s missing a default rule set with rules that are generally okay without being too opinionated.
https://writewithharper.com/docs/integrations/language-serve...
i honestly don't trust grammarly ... i mean, its essentially a keylogger.
i did try it a bit once, and i never seem to have it work that well for me. But i am multilingual so maybe thats part of my hurdle
Do you have a setup where this is possible or do you copy paste between text fields? (Genuine question. I’d love to use a local LLM integrating with an LSP).
Passes.
For reference: https://youtu.be/w-R_Rak8Tys?si=h3zFCq2kyzYNRXBI
We've had some contributors have a go at adding LaTeX support in the past, but they've yet to succeed with a truly polished option. The irregularity of LaTeX makes it somewhat difficult to parse.
We accept contributions, if anyone is interested in getting us across the finish line.
Otherwise, it's great work. There should be an option to import/export the correction rules though.
i guess it's a nice and lightweight enhancement on top of the good old spellchecker, though
The Chrome enhanced grammar checker is still awful after decades.
Maybe the AI hype will finally fix this? I'm still surprised this wasn't the first thing they did.
If Harper does better at this I’d change in a minute.
Instead tell me how it compares to the built-in spellcheck in my browser/IDE/word processor/OS.
https://github.com/Automattic/harper
Is there any reason why there is no firefox extension?
https://addons.mozilla.org/en-US/firefox/addon/private-gramm...
> We currently only support English and its dialects British, American, Canadian, and Australian. Other languages are on the horizon, but we want our English support to be truly amazing before we diversify.
https://tidbits.com/2025/01/30/why-grammarly-beats-apples-wr...
(George Carlin or something, quote's veracity depends on what you mean by “average.”)
I think everybody could benefit from having something like Grammarly on their computer. None of us writes perfectly, and it's always beneficial to strive for improvement.
Also, once I asked LLM to check the message. It said everything looked fine and made a copy of the message in its response with one sentence in the middle removed.
I.e. if you write an "MISTAEK" and then you scroll the highlight follows me around the page
I tried with the following phrase -- "This should can't logic be done me." --
No errors.
Then post COVID with the increase in screen sharing video calls, I soon realised nearly every non-native English speaker from countries around the world heavily relied on it in their jobs. As I could see it installed when people share screens.
Huge market, good luck.
https://writewithharper.com/docs/rules
https://github.com/Automattic/harper/blob/0c04291bfec25d0e93...
That it doesn't use LLMs is its advantage, it runs in under 10ms and can be easily embedded in software and still provide useful grammar checking even if it's not exhaustive
https://github.com/Automattic/harper
Why would you pass a writing job to someone who isn't 100% fluent in the language and then make up for it by buying complex tools?