A little language.

Awk, named for its creators and for the animal the auk, is a language for text processing.

It has a few variants.

A view of pattern action rules that are applied to lines from an input file.

Awk rules are put into an awk script or embedded to commands. Syntax is based on C. Where sed operates on lines, awk operates on fields in lines.

Rules and Fields

A rule is 1+ awk statements enclosed in parentheses { }, which can be preceded by a pattern to filter lines. Rules without patterns are applied to all lines.

awk regards input lines as a sequence of split on field seperators, by default space or tab.

Special Rules

BEGIN {<actions>} is executed before first line is read END {<actions>} is executed after last line is read

Gawk command options

Variables

Built in vars:

Assignment returns the value like C. Also has compound assignment like C, +=, -= etc.

Arithmetic

Uses ; to seperate commands, double quotes for strings.

Commands

Conditionals

[<test>] {<statements>}

Prefixing a set of actions with a test which

Loop

Uses C style for loops.

for (i=0; i < 10; i++) {
    <statements>
}

Functions

Built-in Functions

User Defined Functions

function name(arg1, arg2) {
 # pretty straightforward
}

All awk variables are global except function parameters.

Scalar parameters are all pass by value, arrays are pass by reference.

Conditional Rules

<condition> { <rules> }
NF == 4 {print NF}

# use ~ for regex comparison
$3~/^8/ {print}

BEGIN and END are merely special cases of this. Standard C comparison syntax otherwise applies. ~ for regex and !~ for inverse regex.

Arrays

Arrays are actually just dictionaries. Awk will fill in gaps with undefined if you assign to a high number. Elements can be deleted with delete arr[5].

You can test key inclusion using k in arr. Is equivalent to arr[k] != "".

Can traverse array with for (k in arr). Array order is not guaranteed.