142

I am trying to search for the substring "abc" in a specific file in linux/bash

So I do:

grep '*abc*' myFile

It returns nothing.

But if I do:

grep 'abc' myFile

It returns matches correctly.

Now, this is not a problem for me. But what if I want to grep for a more complex string, say

*abc * def *

How would I accomplish it using grep?

Saobi
  • 16,121
  • 29
  • 71
  • 81
  • 5
    grep itself doesn't support wildcards on most platforms. You have to use egrep to use wildcards. Shells have a different syntax. "*" in the shell is . In egrep it's an operator that says "0 to many of the previous entity". In grep, it's just a regular character. – PanCrit Jul 01 '09 at 17:15
  • 2
    @PanCrit: `*` means the same thing in grep and egrep: it's a *quantifier* meaning zero or more of the preceding atom. That's a completely different concept than the *wildcards* used by the shell. – Alan Moore Jul 06 '16 at 04:36
  • @AlanMoore Thanks for the update. I don't know when this changed, but you're correct that the modern `grep` supports most basic regular expressions. It used to be the case that you had to use egrep to get more than flat strings, but I see that grep has evolved. – PanCrit Aug 29 '21 at 17:48

12 Answers12

178

For such two-part matches, use .* between the two parts.

For instance:

grep 'abc.*def' myFile

will match a string that contains abc followed by def with something optionally in between.


The asterisk is just a repetition operator, but you need to tell it what you repeat. /*abc*/ matches a string containing ab and zero or more c's (because the second * is on the c; the first is meaningless because there's nothing for it to repeat). If you want to match anything, you need to say .* -- the dot means any character (within certain guidelines). If you want to just match abc, you could just say grep 'abc' myFile.

Update based on a comment:

* in a regular expression is not exactly the same as * in the console. In the console, * is part of a glob construct, and just acts as a wildcard (for instance ls *.log will list all files that end in .log). However, in regular expressions, * is a modifier, meaning that it only applies to the character or group preceding it. If you want * in regular expressions to act as a wildcard, you need to use .* as previously mentioned -- the dot is a wildcard character, and the star, when modifying the dot, means find one or more dot; ie. find one or more of any character.

mirekphd
  • 4,799
  • 3
  • 38
  • 59
Daniel Vandersluis
  • 91,582
  • 23
  • 169
  • 153
  • 3
    I think the questionner is confused about the difference between shell wildcards and regular expressions. I also suspect that the more complicated expression would be: grep 'abc .* def' (at least one space present - possibly two as I wrote). – Jonathan Leffler Jul 01 '09 at 14:18
  • 1
    Actually, the questionner seems not to understand that 'abc' is not the same thing as '^abc$' :-D – Massa Jul 01 '09 at 16:54
  • 1
    Yes, i was confused between glob and full regular expressions. I use the * without a dot to mean matching anything on the shell. – Saobi Jul 01 '09 at 18:56
  • 1
    grep `*` means "0 or more", and grep is greedy by default. Note that in grep **basic** regular expressions the metacharacters `?`, `+` , `{` , `|` , `(` , and `)` lose their special meaning. More info: [grep regexps](http://ss64.com/bash/grep-regex.html) – KrisWebDev Oct 17 '15 at 19:07
  • 1
    Please note that, according to [POSIX](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html), in basic regular expressions (BRE), the asterisk looses its special meaning when used as the first character of the BRE or right after the caret (^). When using Extended regular expressions (ERE), this creates undefined behaviour. So `grep '*foo' file` searches for the substring `"*foo"` while `grep -E '*foo' file` is undefined. – kvantour Jul 22 '22 at 11:03
47

The dot character means match any character, so .* means zero or more occurrences of any character. You probably mean to use .* rather than just *.

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
smcameron
  • 1,307
  • 8
  • 8
14

Use grep -P - which enables support for Perl style regular expressions.

grep -P "abc.*def" myfile
Artem Russakovskii
  • 21,516
  • 18
  • 92
  • 115
13

The "star sign" is only meaningful if there is something in front of it. If there isn't the tool (grep in this case) may just treat it as an error. For example:

'*xyz'    is meaningless
'a*xyz'   means zero or more occurrences of 'a' followed by xyz
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 8
    The * is not meaningless; it just doesn't have its usual meaning (of repetition) but means "I'm a star". It would match a line containing a star followed by x, y, and z. – Jonathan Leffler Jul 01 '09 at 14:20
  • 2
    @Jonathan It depends on the tool. –  Jul 01 '09 at 14:22
9

This worked for me:

grep ".*${expr}" - with double-quotes, preceded by the dot. Where ${expr} is whatever string you need in the end of the line.

So in your case:

grep ".*abc.*" myFile

Standard unix grep.

access_granted
  • 1,807
  • 20
  • 25
6

The expression you tried, like those that work on the shell command line in Linux for instance, is called a "glob". Glob expressions are not full regular expressions, which is what grep uses to specify strings to look for. Here is (old, small) post about the differences. The glob expressions (as in "ls *") are interpreted by the shell itself.

It's possible to translate from globs to REs, but you typically need to do so in your head.

unwind
  • 391,730
  • 64
  • 469
  • 606
  • 1
    It's only a glob if it's parsed by the shell. Since he is preserving the search string inside of single quotes, the shell leaves the string alone, and passed it intact in argv to grep. – Conspicuous Compiler Jul 01 '09 at 14:26
4

You're not using regular expressions, so your grep variant of choice should be fgrep, which will behave as you expect it to.

Andrew Beals
  • 1,177
  • 8
  • 18
3

Try grep -E for extended regular expression support

Also take a look at:

The grep man page

Brian
  • 2,253
  • 2
  • 23
  • 39
2

'*' works as a modifier for the previous item. So 'abc*def' searches for 'ab' followed by 0 or more 'c's follwed by 'def'.

What you probably want is 'abc.*def' which searches for 'abc' followed by any number of characters, follwed by 'def'.

Conspicuous Compiler
  • 6,403
  • 1
  • 40
  • 52
1

This may be the answer you're looking for:

grep abc MyFile | grep def

Only thing is... it will output lines were "def" is before OR after "abc"

Matt
  • 74,352
  • 26
  • 153
  • 180
1

I summarize other answers, and make these examples to understand how the regex and glob work.

There are three files

echo 'abc' > file1
echo '*abc' > file2
echo '*abcc' > file3

Now I execute the same commands for these 3 files, let's see what happen.

(1)

grep '*abc*' file1 

As you said, this one return nothing. * wants to repeat something in front of it. For the first *, there is nothing in front of it to repeat, so the system recognize this * just a character *. Because the string in the file is abc, there is no * in the string, so you cannot find it. The second * after c means it repeat c 0 or more times.

(2)

grep '*abc*' file2

This one return *abc, because there is a * in the front, it matches the pattern *abc*.

(3)

grep '*abc*' file3

This one return *abcc because there is a * in the front and 2 c at the tail. so it matches the pattern *abc*

(4)

grep '.*abc.*' file1

This one return abc because .* indicate 0 or more repetition of any character.

music_piano
  • 638
  • 7
  • 9
0
$ cat a.txt
123abcd456def798
123456def789
Abc456def798
123aaABc456DEF

* matches the preceding character zero or more times.

$ grep -i "abc*def" a.txt


$

It would match, for instance "abdef" or "abcdef" or "abcccccccccdef". But none of these are in the file, so no match.

. means "match any character" Together with *, .* means match any character any number of times.

$ grep -i "abc.*def" a.txt
123abcd456def798
Abc456def798
123aaABc456DEF

So we get matches. There are alot of online references about regular expressions, which is what is being used here.

helvete
  • 2,455
  • 13
  • 33
  • 37