[FM Discuss] command line manual - please have a squiz
Andy Oram
andyo at oreilly.com
Thu Apr 9 17:51:05 PDT 2009
Well, we should get rid of fuzzy, even though I think it gives the chapter a warm fuzzy feel. I'm always balancing casualness against precision, and I don't want to press for something that people with a strong background in the field will find undisciplined.
But introducing the concepts of regular expressions to new users is a delicate matter that can't be pushed too fast. I'm glad you provided examples, but I think we need to still slow things down.
Andy
----- Original Message -----
From: "Edward Cherlin" <echerlin at gmail.com>
To: discuss at lists.flossmanuals.net
Sent: Thursday, April 9, 2009 6:01:48 PM GMT -05:00 US/Canada Eastern
Subject: Re: [FM Discuss] command line manual - please have a squiz
On Thu, Apr 9, 2009 at 8:20 AM, Andy Oram <andyo at oreilly.com> wrote:
> I look forward to the improvements listed below. But don't throw out the word "fuzzy" entirely. I want terms that mean something to average readers. They won't immediately understand what you mean by "pattern" (it has innumerable meanings in everyday life and compute science). The reader for this book, if I have the right image of the reader, won't connect "fuzzy" with the academic use by search technologists. But its use in the chapter could be reduced.
I always try to avoid using ambiguous words and phrases, or else to
give a reasonably precise statement of what *I* intend the word to
mean in the context given. What did *you* intend it to mean it grep?
You begin
Regular Expressions
Text processing often involves fuzzy matching.
You go on to discuss
o fuzzy quantities. "Any number" or "3 to 5"
o Fuzzy Matches, Classes, and Ranges "The dot is commonly combined
with one of the fuzzy quantifiers in the previous section."
This use of fuzzy matching is correct, but does not capture the whole
idea of regexes and pattern matching. What we are trying to get at is
precise definition of sets of alternatives, including ranges. How
about something like this?
=====
Text processing often involves searches for any of a range of
alternatives, or combining searches for parts of a word or phrase. You
don't want to search for "help", and then "helper", and then "helping"
and so on as separate commands. Using '.', which matches any
character, and '*', which matches zero or more of the previous item,
the pattern
help.*
does this in one search, also finding "helps", "helped", and so on.
You do want to be able to find names of very general types. Suppose
you want to find names consisting of a letter followed by any number
of letters and numbers. That would be
[a-zA-Z][a-zA-Z0-9]*
Maybe you need to find all text files in a directory, regardless of
their names. That is
.*.txt
Maybe you want to find words of three to five letters, or other such
fuzzy criteria. You can use curly brackets '{}' to specify a numeric
range.
.{3,5}
Sometimes you might want to match just a few specific words. You can
use '|' for this.
apple|pear|cherry
Regular expressions (regexes) give you a precise language for
specifying exactly how much fuzz to apply to any part of a search.
This chapter explains the use of regexes in grep and sed. Use in other
programs, including file globbing in a shell, is similar, but differs
significantly in details.
=====
(Fuzz is a term of art in numerical analysis, with a meaning
consistent with this.)
> Andy
>
> ----- Original Message -----
> From: "adam hyde" <adam at flossmanuals.net>
> To: discuss at lists.flossmanuals.net
> Sent: Thursday, April 9, 2009 10:59:30 AM GMT -05:00 US/Canada Eastern
> Subject: Re: [FM Discuss] command line manual - please have a squiz
>
> On Thu, 2009-04-09 at 16:48 +0200, Luka Frelih wrote:
>> heya!
>>
>> great book, nice catch fixing the pre bold font.
>>
>> on page 77 (advanced part toc) gnu screen lost the space between words
>>
>> in regexp chapter, "fuzzy matching" is used where i guess "pattern
>> matching" is more appropriate.
>> i understand fuzzy to mean approximate matching, using soundex or
>> levenshtein distances to find similar strings, like in typo correction
>> or spellchecking.
>>
>> perhaps it would be nice to mention the rpl command in the sed chapter?
>> i know it's not installed by default but it's a more user friendly way
>> to perform basic search and replace, which is usually the first reason
>> to use sed in a command line
>>
> _______________________________________________
> Discuss mailing list
> Discuss at lists.flossmanuals.net
> http://lists.flossmanuals.net/listinfo.cgi/discuss-flossmanuals.net
>
--
Silent Thunder (默雷/धर्ममेघशब्दगर्ज/دھرممیگھشبدگر ج) is my name
And Children are my nation.
The Cosmos is my dwelling place, The Truth my destination.
http://earthtreasury.org/worknet (Edward Mokurai Cherlin)
_______________________________________________
Discuss mailing list
Discuss at lists.flossmanuals.net
http://lists.flossmanuals.net/listinfo.cgi/discuss-flossmanuals.net
--
----------------------------------------------------------------------
Andy Oram O'Reilly Media email: andyo at oreilly.com
Editor 10 Fawcett Street, Fourth Floor voice: 617-499-7479
Cambridge, MA 02138-1175, USA fax: 617-661-1116
identi.ca/Twitter:praxagora http://www.praxagora.com/andyo/
----------------------------------------------------------------------
More information about the Discuss
mailing list