Linux AWK command with example

awk command in Linux is used text processing and manipulation. Here are some of the most commonly used options:

Basic Syntax:

awk '{pattern + action}' file

Options:

-F - Use a specified field separator (default is whitespace).
-v - Define a variable for use within the awk program.
-f - Read the awk program from a file instead of specifying it on the command line.
-i - Inplace edit of the file.
-W - Assign a variable to an environment variable.

here are a few simple examples of using awk:

Print every line of a file

Suppose we have a file named "data.txt" that contains some text. We want to use awk to print out every line of this file. The awk command to accomplish this would be:

awk '{print}' data.txt

Here, we are using the print function with no arguments, which tells awk to print the entire line. This is equivalent to just running cat data.txt, but can be useful if we want to pipe the output to another command.

Print the number of lines in a file

Suppose we have a file named "data.txt" that contains some text. We want to use awk to print out the number of lines in this file. The awk command to accomplish this would be:

awk 'END {print NR}' data.txt

Here, we are using the NR variable, which represents the current line number, to count the number of lines in the file. The END keyword tells awk to perform this action after it has processed all of the lines in the file.

Print the sum of a column of numbers

Suppose we have a file named "data.txt" that contains a column of numbers. We want to use awk to print out the sum of these numbers. The awk command to accomplish this would be:

awk '{sum += $1} END {print sum}' data.txt

Here, we are using a variable named sum to accumulate the values of the first field on each line. The END keyword tells awk to print the final value of sum after it has processed all of the lines in the file.

Print lines that match a pattern

Suppose we have a file named "data.txt" that contains some text. We want to use awk to print out only the lines that contain the word "hello". The awk command to accomplish this would be:

awk '/hello/ {print}' data.txt

Here, we are using the /hello/ pattern to match lines that contain the word "hello". The {print} action tells awk to print these matching lines.

Print fields that match a pattern

Suppose we have a file named "data.txt" that contains some data. We want to use awk to print out only the fields that contain the word "hello". The awk command to accomplish this would be:

awk '{for(i=1; i<=NF; i++) if($i ~ /hello/) print $i}' data.txt

Here, we are using a for loop to iterate over each field in the line (NF is a built-in variable that represents the number of fields on the line). The if statement checks if the current field matches the /hello/ pattern, and if it does, we use print to output the field.

Calculate the average of a column of numbers

Suppose we have a file named "data.txt" that contains two columns of numbers. We want to use awk to print out the average of the second column. The awk command to accomplish this would be:

awk '{sum += $2} END {print sum/NR}' data.txt

Extract a specific column

Suppose we have a file named "data.txt" that contains several columns of data, separated by whitespace. We want to use awk to extract only the second column of data. The awk command to accomplish this would be:

awk '{print $2}' data.txt

Here, we are using the print function to output the second field of each line ($2). This is a common operation in text processing, and awk makes it easy to extract specific columns of data.

Use a different field separator

Suppose we have a file named "data.txt" that contains several columns of data, separated by commas instead of whitespace. We want to use awk to extract only the third column of data. The awk command to accomplish this would be:

awk -F, '{print $3}' data.txt

Here, we are using the -F option to specify that the field separator is a comma instead of whitespace. This allows awk to correctly identify and extract the fields in the input file.

Replace text in a file

Suppose we have a file named "data.txt" that contains some text, and we want to use awk to replace all occurrences of the word "apple" with the word "pear".

The awk command to accomplish this would be:

awk '{gsub(/apple/, "pear"); print}' data.txt

Here, we are using the gsub() function to perform a global search and replace operation on the input file. The first argument specifies the pattern to search for (/apple/), and the second argument specifies the replacement text ("pear"). The print function is used to output the modified lines.

These are just a few simple examples. Below we have mentioned 3 complex example with AWK.

Suppose we have a file named "data.txt" that contains the following data:

apple orange banana
carrot tomato broccoli

We want to use awk to print out only the first column of this file. The awk command to accomplish this would be:

awk '{print $1}' data.txt

Here's a breakdown of how this command works:

'{print $1}': Specifies the action to perform on each line of the input file. In this case, we use the print function to output the first field of each line (which is separated by whitespace). $1 represents the first field.
data.txt: Specifies the input file to process.

When we run this command, we get the following output:

apple
carrot

we have a CSV file named "data.csv" that contains the following data:

Name, Age, Gender
John, 32, M
Jane, 25, F
Bob, 47, M
Alice, 19, F

We want to use awk to extract the names and ages of all the people in the file, and print them out in the format "Name (Age)".

The awk command to accomplish this would be:

awk -F"," '{print $1,$2}' data.csv
John (32)
Jane (25)
Bob (47)
Alice (19)

-F,: Specifies that the field separator in the input file is a comma.
'{print $1, $2}': here we use the print function to format the output string as "Name (Age)", where $1 represents the first field (Name) and $2 represents the second field (Age). The \n at the end of the string adds a newline character to the output.
data.csv: Specifies the input file to process.

Count the number of lines in a file

Suppose we have a file named "data.txt" that contains some text. We want to use awk to count the number of lines in the file. The awk command to accomplish this would be:

awk 'END {print NR}' data.txt

Here, we are using the NR built-in variable to keep track of the number of lines processed by awk. The END pattern matches after all lines have been processed, and the print function is used to output the final count.

Convert all text to uppercase

Suppose we have a file named "data.txt" that contains some text, and we want to use awk to convert all the text to uppercase. The awk command to accomplish this would be:

awk '{print toupper($0)}' data.txt

Here, we are using the toupper() function to convert the entire input line ($0) to uppercase. The print function is used to output the modified lines.

Combine fields into a single string

Suppose we have a file named "data.txt" that contains several columns of data, separated by whitespace. We want to use awk to combine the second and third columns into a single string, separated by a hyphen. The awk command to accomplish this would be:

awk '{print $2 "-" $3}' data.txt

Here, we are using the string concatenation operator (-) to combine the second and third fields into a single string. The print function is used to output the modified lines.

we have another log file named "access.log" that contains records of website access. Each record consists of several fields, separated by spaces. Here is an example of a record:

192.168.1.1 - - [01/Mar/2023:10:00:01 -0500] "GET /index.html HTTP/1.1" 200 1234

We want to extract the IP address, timestamp, HTTP status code, and number of bytes transferred for each record, and print them out in a formatted table.

The awk command to accomplish this would be:

awk '{printf "%-15s %-26s %-10s %-10s\n", $1, substr($4, 2), $9, $10}' access.log

Here's a breakdown of how this command works:

'{printf "%-15s %-26s %-10s %-10s\n", $1, substr($4, 2), $9, $10}': Specifies the action to perform on each line of the input file. In this case, we use the printf function to format the output as a table with columns for the IP address, timestamp (extracted from the fourth field), HTTP status code, and number of bytes transferred. The %-prefixed codes are placeholders for each field, and the - flag specifies left alignment for each field. The substr function extracts the timestamp from the fourth field, starting from the second character (to remove the leading bracket).
access.log: Specifies the input file to process.

When we run this command, we get output like this:

192.168.1.1     01/Mar/2023:10:00:01 -0500  200        1234      
192.168.1.2     01/Mar/2023:10:00:02 -0500  404        0         
192.168.1.3     01/Mar/2023:10:00:03 -0500  200        5678      
192.168.1.4     01/Mar/2023:10:00:04 -0500  200        9012

These are just a few examples of how to use the AWK command in Linux. If you have further questons then you can reach out to us on whatsapp numbers: 7838238895/8909068089/8920228066