Grep - Explanation for solution on HackerRank Linux Shell problem

Hey everyone!
I hope you are doing well!

I am hoping there are some grep or regex experts on this forum that can answer this question.

Recently, I was doing some HackerRank Linux Shell challenges to learn and test my Linux shell skills. I am not great at problems like this in Linux shell, but I would like to get better.

Anyway, I was faced with this problem, Grep - B

Here is the gist of the problem

Current Task

Given an input file, with N credit card numbers, each in a new line, your task is to grep out and output only those credit card numbers which have two or more consecutive occurrences of the same digit (which may be separated by a space, if they are in different segments). Assume that the credit card numbers will have 4 space separated segments with 4 digits each.

Sample Input

1234 5678 9101 1234  
2999 5178 9101 2234  
9999 5628 9201 1232  
8482 3678 9102 1232

Sample Output

1234 5678 9101 1234  
2999 5178 9101 2234  
9999 5628 9201 1232

While I was researching how to solve this problem, I found someone had provided a solution, which is this:

grep '\([0-9]\) *\1'

This solution does work, however, they provided no explanation for how this works. I think the part that is like ([0-9]) is self-explanatory, we are dealing with digits after all, however, I don’t understand how *\1 allows for the matching two or more consecutive occurrences of the same digit.

If anyone can explain, or point me to documentation that would help show how this answer works, it would be very appreciated. Thanks!

2 Likes

This may help

I have never used a back reference.
I dont understand the *

4 Likes

I always consider the back reference \n as a place holder.

\n holds the same value as the expression in the last nth parenthesis.

So, \1 here means the same match as the one in ([0-9]). So, this way, you get the two consecutive occurrence matches.

Now, * means zero or more occurrences of space.

This way you get the consecutive numbers even if there are spaces in between.

6 Likes

Thank you Abhishek!

Very good explanantion. I appreciate it.

Thanks for the further explanation on . I already new what it did, but I didn’t parse the regex as . I thought the * was modifying the \1 somehow, but now that I think about it * only works for expression that came before.

3 Likes