Command line challenge

nevj · March 28, 2023, 7:12am

I have been entering lots of rainfall data
I have daily rainfall in mm and I put one month of data per line. So it looks like this

nevj@mary /common/Rwork/rainfall $ cat 2020.dat
0 0 0 0.5 0.5 0 0.5 0 2.5 1 1.5 0.5 0 0 2.5 9 12 1.5 1.5 19 0.5 0 0 0 0 0.5 0.5 0 0 0 0
0 2 0 0 2 6 30 29 118 4.5 3.5 21 0.5 0.5 4 14 1 13.5 0.5 0 2.5 0.5 0 0 0 0 0 0 0
0 0 7 7 6 3.5 0.5 3 3 0.5 0.5 0 4 4.5 2 0.5 0 0 0 0 0 1 0.5 0.5 2 8 0.5 3 1.5 0 18
3 3.5 2.5 0 0 3.5 1.5 1 1 0 6 0 0.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 2.5
0 0 0 0 0 0 0 0 0 0 0 0 3 0.5 11.5 0 1 0.5 0 1.5 33 2.5 1 0.5 7 7 0 0.5 0.5 0 0
0.5 0 0 0 0 0.5 0 5 0.5 1.5 1.5 2.5 3.5 0 0 0 2 0 0 3 0 0 0 0 0 0.5 0 2.5 1 0
0 0 1.5 0 0 0 0 4.5 0 1 6 14 13.5 13.5 1 0 1 0 0 0 0 0 0.5 0.5 10 72 147 2 0 0.5 0 
0 0 0.5 0 0 2 68.5 120 65 0 3.5 0.5 0.5 4 5 0 0 0 0.5 0 0.5 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 2.5 0 0 1 0 0 0 0 3 2 7 0 0 0 0 0 3.5 0 0 0 0 2
0 0 0 0 0.5 3 2.5 0 0 0 0 0 0 0 1.5 0 0 1 0 0 0 0 3.5 6.5 24 5.5 2 3 0 12 10 
15.5 0.5 0 22 15.5 0.5 0 0 0 0 0 1 0.5 0 0 0 0 0 0 0 0 4.5 1.5 1.5 0 0 0 0 0 0 
4.5 2.5 0 0 0 0 0 0 0 0.5 0 6.5 4 1.5 16 3 4 2 1 0 11.5 0.5 0 1.5 0 0 0 24 7 0.5 0

I want to check that I have not missed any entries, so I want to count the number of entries ( called fields) in a line, and check that it equals the number of days in a month.
Something like this

nevj@mary /common/Rwork/rainfall $ awk -f nfield.awk 2020.dat
1  31 
2  29 
3  31 
4  30 
5  31 
6  30 
7  31 
8  31 
9  30 
10  31 
11  30 
12  31

So here is the challenge. Can anyone do this with a simple script, without resorting to awk like I did above.

daniel.m.tripp · March 28, 2023, 7:35am

Have you looked at the “column” command… there’s also “cut” (I’m sure you know about “cut”)

I can remember using “cut” about 25 years ago… But when it’s simple columns of data these days, I always fall back to awk… Haven’t used “cut” since forever…

Here’s a quick and dirty script I wrote that used column to tabulate an even column output from a CSV file - I call the script “cli-excel” - maybe I should retitle it “cli-visicalc” or “cli-123” (I reckon I might!)…

#!/usr/bin/env bash
# read a csv in the terminal
# column -s, -t < eni-linux-list.csv | less -#2 -N -S
# expect an input file name
PROG=$(basename $0)
CSVFILE=$1
if [ "$#" -lt 1 ] ; then
        echo "need argument - expecting a CSV file..."
	echo "e.g. : $PROG filename.csv"
        exit 1
fi
column -s, -t < $CSVFILE | less -#2 -N -S

Hmmm - but that doesn’t do anything like what you want… So you maybe count the number of spaces between each value - on each line +1
i.e. number of entries should be number-of-spaces-+1… Maybe? Then some kinda algorithm for getting the number of days in each month? Can you do that with the “cal” command (which I’ve already ranted about - how debian and ubuntu (and derivatives) no longer install cal/ncal by default - because IT’S SO HUGE they want save space (sarcasm)…

There’s a bunch of examples here (yes most seem to mention awk - but there are others) :

kovacslt · March 28, 2023, 9:24am

Maybe with wc, if awk is out of scope.
i=1; while read line; do printf "$i "; echo $line | wc -w; i=$((i+1)); done < 2020.dat

That’s a one-liner, but you could transform it into a script easily

dbauthor · April 3, 2023, 9:09pm

Here’s a solution with wc that doesn’t use a loop.

sed -e 's/^/echo "/' -e 's/$/" | wc -w/' < 2020.dat | bash | cat -n

If it makes your head spin, try running it without the pipes to see what it does:

sed -e 's/^/echo "/' -e 's/$/" | wc -w/' < 2020.dat

Then add the pipe to bash, and then add the pipe to cat.

nevj · April 5, 2023, 9:47am

OK lets see if we can understand this
The first bit

sed -e 's/^/echo "/' -e 's/$/" | wc -w/' < 2020.dat

executes the script

's/^/echo "/' -e 's/$/" | wc -w/'

with 2020.dat as input
and that makes a set of lines like

echo "0 0 0 0.5 0.5 0 0.5 0 2.5 1 1.5 0.5 0 0 2.5 9 12 1.5 1.5 19 0.5 0 0 0 0 0.5 0.5 0 0 0 0" | wc -w
echo "0 2 0 0 2 6 30 29 118 4.5 3.5 21 0.5 0.5 4 14 1 13.5 0.5 0 2.5 0.5 0 0 0 0 0 0 0" | wc -w
...........

for every line in the data file.
This set of lines is itself a script, so it is piped to bash

$ sed -e 's/^/echo "/' -e 's/$/" | wc -w/' < 2020.dat|bash
31
29
31
30
31
30
31
31
30
31
30
31

resulting in a set of wc results, one per line
and then you use cat -n to number the lines

$ sed -e 's/^/echo "/' -e 's/$/" | wc -w/' < 2020.dat|bash|cat -n
     1	31
     2	29
     3	31
     4	30
     5	31
     6	30
     7	31
     8	31
     9	30
    10	31
    11	30
    12	31

We are there. This was done in Void Linux.

So the big trick, is to generate a multiline script which operates on one line of the file at a time. So in a backhanded way you used sed to do the looping.

I must admit I have never used sed to generate a script like that.
Looks like I need to buy your book.
Thanks, I am sure several people will be interested in how that works
Regards
Neville

dbauthor · April 5, 2023, 12:02pm

You got it!

The general technique here is turning data into commands, and piping the commands to bash. This technique really transformed the way I use Linux at the command line.

The trick is that bash is itself a command that reads from standard input, so you can send it strings to execute.

In my book Efficient Linux at the Command Line, I go into detail about this technique in chapter 7, which is titled “11 More Ways to Run a Command.”

Best wishes,
Dan

nevj · April 5, 2023, 12:52pm

You sold me, I ordered the book.
I’m a bit of a book fan. There was a time when being a Unix user meant having a shelf full of O’Reilly books. Things have changed with internet access, but there is still a place for well written learning books.
Regards
Neville