How to Use Awk for Text Filtering with Pattern-Specific Actions – Part 3

In the third part of the following awk command series, we shall take a look at filtering text or strings based on specific patterns that a user can define.

Sometimes, when filtering text, you want to indicate certain lines from an input file or lines of strings based on a given condition or using a specific pattern that can be matched. Doing this with Awk is very easy, it is one of the great features of Awk that you will find helpful.

Let us take a look at an example below, say you have a shopping list for food items that you want to buy, called food_prices.list.

It has the following list of food items and their prices.

cat food_prices.list 
No	Item_Name		Quantity	Price
1	Mangoes			   10		$2.45
2	Apples			   20		$1.50
3	Bananas			   5		$0.90
4	Pineapples		   10		$3.46
5	Oranges			   10		$0.78
6	Tomatoes		   5		$0.55
7	Onions			   5            $0.45

And then, you want to indicate a (*) sign on food items whose price is greater than $2, this can be done by running the following command:

awk '/ *\$[2-9]\.[0-9][0-9] */ { print $1, $2, $3, $4, "*" ; } / *\$[0-1]\.[0-9][0-9] */ { print ; }' food_prices.list
Print Items Whose Price is Greater Than $2
Print Items Whose Price is Greater Than $2

From the output above, you can see that there is a (*) sign at the end of the lines having food items, mangoes and pineapples. If you check their prices, they are above $2.

In this example, we have used two patterns:

  • the first: / *\$[2-9]\.[0-9][0-9] */ gets the lines that have food item prices greater than $2 and
  • the second: /*\$[0-1]\.[0-9][0-9] */ looks for lines with food item prices less than $2.

This is what happens, there are four fields in the file, when pattern one encounters a line with a food item price greater than $2, it prints all four fields and a (*) sign at the end of the line as a flag.

The second pattern simply prints the other lines with food prices less than $2 as they appear in the input file, food_prices.list.

This way you can use pattern-specific actions to filter out food items that are priced above $2, though there is a problem with the output, the lines that have the (*) sign are not formatted out like the rest of the lines making the output not clear enough.

We saw the same problem in Part 2 of the awk series, but we can solve it in two ways:

1. Using the printf command which is a long and boring way using the command below:

$ awk '/ *\$[2-9]\.[0-9][0-9] */ { printf "%-10s %-10s %-10s %-10s\n", $1, $2, $3, $4 "*" ; } / *\$[0-1]\.[0-9][0-9] */ { printf "%-10s %-10s %-10s %-10s\n", $1, $2, $3, $4; }' food_prices.list 
Filter and Print Items Using Awk and Printf
Filter and Print Items Using Awk and Printf

2. Using $0 field. Awk uses the variable 0 to store the whole input line. This is handy for solving the problem above and it is simple and fast as follows:

$ awk '/ *\$[2-9]\.[0-9][0-9] */ { print $0 "*" ; } / *\$[0-1]\.[0-9][0-9] */ { print ; }' food_prices.list 
Filter and Print Items Using Awk and Variable
Filter and Print Items Using Awk and Variable
Conclusion

That’s it, for now, these are simple ways of filtering text using pattern-specific action that can help in flagging lines of text or strings in a file using the Awk command.

For those seeking a comprehensive resource, we’ve compiled all the Awk series articles into a book, that includes 13 chapters and spans 41 pages, covering both basic and advanced Awk usage with practical examples.

Product Name Price Buy
eBook: Introducing the Awk Getting Started Guide for Beginners $8.99 [Buy Now]

Hope you find this article helpful and remember to read the next part of the series which will focus on using comparison operators using the awk tool.

Hey TecMint readers,

Exciting news! Every month, our top blog commenters will have the chance to win fantastic rewards, like free Linux eBooks such as RHCE, RHCSA, LFCS, Learn Linux, and Awk, each worth $20!

Learn more about the contest and stand a chance to win by sharing your thoughts below!

Aaron Kili
Aaron Kili is a Linux and F.O.S.S enthusiast, an upcoming Linux SysAdmin, web developer, and currently a content creator for TecMint who loves working with computers and strongly believes in sharing knowledge.

Each tutorial at TecMint is created by a team of experienced Linux system administrators so that it meets our high-quality standards.

Join the TecMint Weekly Newsletter (More Than 156,129 Linux Enthusiasts Have Subscribed)
Was this article helpful? Please add a comment or buy me a coffee to show your appreciation.

9 Comments

Leave a Reply
  1. I found that it should use .* pattern to match all the characters before the $ character:

    [root@telecom exer]# awk '/.*\$[2-9]\.[0-9][0-9]/ { print $1, $2, $3, $4, "*" ;} /.*\$[0-1]\.[0-9][0-9]/ {print ;}' food_prices.txt

    1 Mangoes 10 $2.45 *
    2       Apples                     20           $1.50
    3       Bananas                    5            $0.90
    4 Pineapples 10 $3.46 *
    5       Oranges                    10           $0.78
    6       Tomatoes                   5            $0.55
    7       Onions                     5            $0.45
    
    Reply
  2. What are the *'s for in the expressions? For example in:

    $ awk '/ *$[2-9]\.[0-9][0-9] */ { print $1, $2, $3, $4, "*" ; } / *$[0-1]\.[0-9][0-9] */ { print ; }' food_prices.list

    What is the * before the $ and the * before the / for?

    Reply
  3. Good suggestion, we shall look more into conditional statements in AWK in one of the next parts of the series. Thanks for reading.

    Reply
  4. Much simpler: awk ‘{w=$4;gsub(/\$/, “”, w);if(w+0>2){print $0, “*”}else{print $0}}’ food_prices.list

    Reply
    • That is a great suggestion but only works for experienced users. In the one of the upcoming parts of the Awk series, we shall look at how to use the control statements in Awk in detail.

      Reply
      • I’m quite inexperienced, and your solution is really difficult to follow, because you give so few details. For instance, you don’t explain how this (‘/ *\$[2-9]\.[0-9][0-9] */ { print $1, $2, $3, $4, “*” ; } / *\$[0-1]\.[0-9][0-9] */ { print ; }’) actually works.

        You don’t say why there’s a space and then a *, given that in a previous post you said that . means any character and * should mean 0 or however many of the proceeding character.

        Then there’s a ; after print, which again you don’t explain – might be meaningless after all, but when you explain to inexperienced users, you shouldn’t leave out so many things. Normally the ; is not necessary, but I suppose you’re writing it for consistency. You don’t explain what %-10s is and so on, and so forth.

        I’ve been following tecmint for quite a lot time and I like it, but these types of posts seem to work only as solutions to problems users had thought of before hand. They’re not really tutorials.

        In other contexts being so pragmatic should work (such as setting up a web server or a mail server, where you simply want it to work), but here people who want to learn need much more detail. In my opinion, the article should have been double in size.

        Moreover, the gif image is really hard to follow. When you try to concentrate on how awk filters the text, you need to see the output permanently, so as to compare it to the original and understand how awk syntax works. It’s quite frustrating, to be honest.

        At first glance, Gurpreet Singh’s actually seems simpler, as his syntax is more self-explanatory in a way than yours.

        Reply
        • I know this is really late but I agree with you, I can’t find a good tutorial site that doesn’t just throw a bunch of code in your face and expect you to understand it.

          Reply
          • You should start from Part 1, which explains all the symbols and concepts. Also, experiment with the code yourself.

Got Something to Say? Join the Discussion...

Thank you for taking the time to share your thoughts with us. We appreciate your decision to leave a comment and value your contribution to the discussion. It's important to note that we moderate all comments in accordance with our comment policy to ensure a respectful and constructive conversation.

Rest assured that your email address will remain private and will not be published or shared with anyone. We prioritize the privacy and security of our users.