Trying to accomplish a correct backup with the "tar" command

Hello Friends

With the purpose to do a correct backup for the LiveDemo directory (any name as you want) consider the following directory structure:

/home/you/LiveDemo
 aaa
 bbb
 ccc
 .git

Where:

  • The aaa, bbb and ccc are directories
  • The .git is a hidden directory

The following command was executed within the same LiveDemo directory:

  • tar -czf LiveDemo.tar.gz *

It works but I did do realize that when it is unpacked with the following command (in other machine) :

  • tar -xzf LiveDemo.tar.gz

Does not appear the .git directory. Therefore as follows:

/home/you/LiveDemo
 aaa
 bbb
 ccc

It is confuse because (if I am not mistaken) for the following different directory structure each .git hidden directory appears.

/home/you/Cybertron
 alpha
  .git
 beta
  .git
 gamma
 . git

Therefore

  • It seems that the tar -czf backup.tar.gz * command “does not work well” by ignoring (or not including) any hidden content available at the same level where it is executed. Pls correct me if I am wrong.

After to did do a research in the Web the following command was executed instead (again within the same LiveDemo directory):

  • tar -czf LiveDemo.tar.gz .

As you can see was changed from * to . but generates the following situations:

One

The LiveDemo.tar.gz file is created but appears the following message

tar: ./LiveDemo.tar.gz: file changed as we read it
tar: .: file changed as we read it

Two

The LiveDemo.tar.gz file has a size of 2.4MB but when it is unpacked in other machine the original size changes to a size of 1MB

Therefore in other words:

  • The LiveDemo.tar.gz file is copied (from pendrive) and pasted (to target machine)
  • The LiveDemo.tar.gz file has a size of 2.4MB (as the pendrive because it is a simple copy)
  • The tar -xzf LiveDemo.tar.gz command is executed
  • Appears the unpacked content (including finally the .git hidden directory)
  • The LiveDemo.tar.gz remains in the target directory
  • The LiveDemo.tar.gz has a new size of 1MB

Questions

  1. What does the One’s message mean?
  2. Should I do something to fix it? Or simply ignore it?
  3. Why the LiveDemo.tar.gz file changed its size after when it is unpacked?

Thanks in advance

This ànswers the question about ‘*’ and ‘.’

Are you saving the .tar.gz file inside the same folder you are trying to archive.?

Maybe the same as above
Maybe the filesystem blocksize differs in ‘other machine’

Hello Neville

Thanks for the reply

This ànswers the question about ‘*’ and ‘.’

Oh I remember thank link, a lot to read

Are you saving the .tar.gz file inside the same folder you are trying to archive.?

Yes

BTW without any wildcard all works fine but is verbose put each directory. I mean

  • tar -czf LiveDemo.tar.gz aaa bbb ccc

but I didn’t test with the hidden directory yet. I mean

  • tar -czf LiveDemo.tar.gz aaa bbb ccc .git

Maybe the same as above

Understood

Maybe the filesystem blocksize differs in ‘other machine’

Maybe … I see your point

It seems that if the . wildcard is applied then is mandatory do the pack approach outside of the directory target (the one to apply the backup goal).

But the unpack logic is other history.

Yes.
You cant have any active writing going on in the directory you are archiving.
So, dont put the tarfile in there, and watch out for other processes writing in there… eg logs.

Yes.

Thanks for the confirmation. Therefore interesting the behavior between

  • tar -czf LiveDemo.tar.gz . and tar -czf LiveDemo.tar.gz aaa bbb ccc

It about the where (and why) to execute the command

You cant have any active writing going on in the directory you are archiving.

Understood

So, dont put the tarfile in there, and watch out for other processes writing in there… eg logs.

By using -cvzf ? (v is included now)

OK … so you can see if it is misbehaving.

Wasn’t there a dry-run option, like in rsync? I cannot find it…

It’s confusing… I know…

But in my experience - “/*” as the source in a tar command - will still pickup “.” files…

But I almost, always, universally do :

tar tvpf $TAR-DESTINATION * when the job’s finished…

And I NEVER EVER EVER blindly untar a tar file before I’ve examined its contents…

I’ve been doing that so long - but even so - I often still do a “tar tvf $TAR-DESTINATION” to verify everything I wanted is in there - including “.” files or folders… tar is so old - it’s still “UNIX” and I don’t think it gives a rat’s arse (Aussie colloquilism) about “dot files” or “dot folders” (i.e. if something’s there - it will get grabbed)…

NEVER assume your backup was successful…

But UNIX design was so beautiful - and elegant…

Looks like it does not have --dry-run
Direct the output to /dev/null

That is what grabbed me. It was created by some very talented people.

Hello Friends

As an advance about the .tar.gz creation

One

Command being executed outside of the LiveDemo directory (the one as backup)

Any as

tar -czf ~/Backups/LiveDemo.tar.gz ~/LiveDemo/
tar -czf ~/Backups/LiveDemo.tar.gz ~/LiveDemo

Where always appears the following message

  • tar: Removing leading `/’ from member names

Two

Command being executed inside of the LiveDemo directory (the one as backup)

tar -czf ~/Backups/Scripts.tar.gz .

Therefore

As suggested by you:

  • In general is better create the tar.gz file outside of the directory itself (the one as backup)

I use tar to backup several of my Linux servers…

But : I don’t use tar directly…

I use rsync - to create an incremental backup folder tree…

THEN I use tar to create a gzip’d TAR backup…

Note : when creating backups with tar - you should use “p” to preserve permissions…

e.g.

tar czpf ~/Backups/LiveDemo.tar.gz ~/LiveDemo/

Note : tar arguments can be done without the hyphen / dash “-”…

Also note : some versions of tar - like the one in busybox, will not work if you don’t put “f” as the last argument :

e.g. gnu tar on Linux is perfectly happy about the order :
tar czvfp File.tgz $SOURCE
and
tar czvpf File.tgz $SOURCE
will do the same thing on GNU Tar on most linux distros that install the full GNU tar package.

But with busybox tar you must put the “f” as the last argument / switch
tar czvpf File.tgz $SOURCE

Note: due to the above with busybox tar - I always - put the “f” last… :smiley:

Very important … we all missed that

I never bother with compression … disk space is cheap so why burden your cpu with compression.?

Valuable feedback Dan

But : I don’t use tar directly…
I use rsync - to create an incremental backup folder tree…
THEN I use tar to create a gzip’d TAR backup…

Ok, you use two commands rsync and tar (in that order).
Could you expand the idea about the italic part?

Note : when creating backups with tar - you should use “p” to preserve permissions…
e.g.
tar czpf ~/Backups/LiveDemo.tar.gz ~/LiveDemo/

Interesting the -p option, according with man tar:

-p, --preserve-permissions, --same-permissions
       Set permissions of extracted files to those recorded in the archive (default for superuser).

I am assuming all about the classic rwx (ugo) and user/group (owner)

To be honest I’ve never used that option because my machines have replicated two things

  • username
  • directory’s structure

In what scenario would be necessary (or mandatory) use it? Share your experience pls

Note : tar arguments can be done without the hyphen / dash “-”…

Correct, I did do realize about that since some months ago by mistake …

Note

  • I assumed it is a standard use “-” to declare any option(s)

Also note : some versions of tar - like the one in busybox, will not work if you don’t put “f” as the last argument :

I remember months ago that is mandatory declare a specific set of order about the options. It for czf and xzf . I think is invalid put the -f option as the first one (alpha)

e.g. gnu tar on Linux is perfectly happy about the order :
tar czvfp File.tgz $SOURCE
and
tar czvpf File.tgz $SOURCE
will do the same thing on GNU Tar on most linux distros that install the full GNU tar package.

Thanks for that example, it is a concern for bash script purposes

Note: due to the above with busybox tar - I always - put the “f” last… :smiley:

So do I but due the “alpha” experience

Thanks in advance

I do - 'cause then filling up my disk (it’s 6 TB - and also hosts TimeMachine backups for 2 macs) will be further in the future :smiley:

e.g. the backup of my Pi 4 that hosts the 6 TB HDD (USB3) is 10 GB uncompressed - and 5 GB compressed…

Also - sometimes there are legitimate reasons to compress on the fly - e.g. you have a network speed bottleneck… I’ve covered before a “gem” I’ve been using 30 years - pipe tar to itself :

tar cvzpf - * ( cd /mnt/NFS/DESTINATION ; tar xcvpf -)

Assuming “NFS” is a mountpoint and the bottleneck is the network…

Anyway - my backups are weekly… I periodically manually go through and delete everything I don’t consider an EOM (end of month) - backup - the ones I deem “EOM” are from the first week of each month - i.e. the first Sunday of each month…