Scanning .mp4 files with clamscan

Hi all, :wave:

I just wanted to scan an .mp4 file I downloaded from youtube with yt-dlp_linux.
It´s the song “All Along the Watchtower”, also known as the title music from “Vanity Fair” (TV mini series).
The file is just 1.4 MB in size.

I tried to scan it this way: clamscan All_Along_the_Watchtower.mp4 .

The result was:

/media/rosika/f14a27c2-0b49-4607-94ea-2e56bbf76fe1/DATEN-PARTITION/Dokumente/Ergänzungen_zu_Programmen/zu_yt-dlp/alt/zum_Ausführen/All_Along_the_Watchtower.mp4: OK

----------- SCAN SUMMARY -----------
Known viruses: 8697640
Engine version: 0.103.11
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 0.00 MB
Data read: 1.34 MB (ratio 0.00:1)
Time: 25.576 sec (0 m 25 s)
Start Date: 2024:08:29 17:16:26
End Date:   2024:08:29 17:16:52

So “Data scanned: 0.00 MB”. Hmm, that struck me as odd. Seems not to have been scanned at all. :thinking:

I did the same with another mp4 file, with the same results. (“Vanity_Fair_title_music.mp.4”, 780 kB).

However: scanning a third mp4 file (“g-kgw_Victoria.mp4”, weighing in at 242,3 MB) presented no problems: :smiley:

[...]
Scanned files: 1
Infected files: 0
Data scanned: 492.29 MB
Data read: 231.12 MB (ratio 2.13:1)
[...]

I was wondering: the various mp4 files must be somehow different. And indeed, they are:

file *
All_Along_the_Watchtower.mp4: Audio file with ID3 version 2.3.0, contains:\012- MPEG ADTS, AAC, v4 LC, 22.05 kHz, stereo
g-kgw_Victoria.mp4:           ISO Media, MP4 Base Media v1 [ISO 14496-12:2003]
Vanity_Fair_title_music.mp4:  Audio file with ID3 version 2.3.0, contains:\012- MPEG ADTS, AAC, v4 LC, 22.05 kHz, stereo

The only video file (including the sound track, of course) is “g-kgw_Victoria.mp4”, the other two only contain the sound track. No video there.

That got me thinking:

From previous experience I know that for scanning e.g. .ogg files and .mp3 files clamscan needed some workaround by zipping the files first, etc.
So I applied the same procedure here. And indeed: it worked this way:

cat All_Along_the_Watchtower.mp4 | gzip | clamscan -
stdin: OK

----------- SCAN SUMMARY -----------
Known viruses: 8697640
Engine version: 0.103.11
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 1.38 MB
Data read: 1.30 MB (ratio 1.07:1)
Time: 30.752 sec (0 m 30 s)
Start Date: 2024:08:29 17:39:05
End Date:   2024:08:29 17:39:36

So “Data scanned: 1.38 MB” this time. Yes, this way the file was scanned indeed.
Great. :+1:

Perhaps that´s worth keeping in mind for anyone needing to scan .mp4 audio files (pure audio files, that is).

Many greetings from Rosika :slightly_smiling_face:

2 Likes

Hi
My understanding of clamwin and other scanners is that they try to match virus signatures to a virus which may be in the scanned file.
Using gzip or other compression app on a file changes the structure of the file in order to remove redundancy, eg it may encode a sequence of 1’s or 0’s (or other pattern) as the number of 1s or 0s plus the 1 or 0. Hence the virus itself may be compressed and hence will not be recognised by the scanner. This is ancient theory for me and may be out-of-date but I would be happy to know more detail of how it all works.

3 Likes

Sorry this is off topic.
If I download a compressed file, should I uncompress it first, before scanning for a virus?
That sort of beats the point… what if the uncompress operation releases the virus.

@Rosika 's case is strange. Why would an audio only .mp4 file not scan?
Maybe audio data is assumed virus free and never scans?
I wonder could @Rosika try an .mp3 file with clamav?

2 Likes

I can understand hiding/embedding the payload for a virus or trojan in a video file - but why worry about it on Linux?

I know there were nefarious means of hiding the payload of a virus / trojan in an AVI or DIVX file - which contained ActiveX code that could execute in Windows Media player - I actually had one - it was a DIVX of “The Rocky Horror Picture Show” - I think I still have the file somewhere…

But why worry on a Linux system? Does VLC play hidden code in video files?

I though mp4 was just a stream of data…

I only ever use MPV to play videos and never even thought about scanning audio visual content for vulnerabilities - since I stopped using Windows…

I still have that divx file - clamav doesn’t pick it up as infected - but I know it’s in there :

╭─x@titan ~/Videos/Movies/1975-RockyHorrorPictureShow  
╰─➤  clamscan RockyHorrorPictureShow-Microsoft-ActiveX-Virus.divx                                                                                            
/mnt/BARGEARSE/MVZ/1975-RockyHorrorPictureShow/RockyHorrorPictureShow-Microsoft-ActiveX-Virus.divx: OK

----------- SCAN SUMMARY -----------
Known viruses: 8697640
Engine version: 0.103.11
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 0.00 MB
Data read: 678.90 MB (ratio 0.00:1)
Time: 12.750 sec (0 m 12 s)
Start Date: 2024:08:30 13:23:18
End Date:   2024:08:30 13:23:30

That file is 21 years old though! Is ActiveX still a thing? I still kinda cringe when I hit a website (usually internally hosted “intranet” page) and part of the URL has an exe file and maybe something about ActiveX…

2 Likes

That is why it did not find your virus… it did not look

The question is, why is clam not scanning video files?

Some more info

 The topic of .MP3's has been discussed here before. They are being purposely ignored as evidenced by this section of the database:
daily.ftm
0:0:494433:MP3:CL_TYPE_ANY:CL_TYPE_IGNORED
0:0:fffb90:MP3:CL_TYPE_ANY:CL_TYPE_IGNORED

That came from here
https://clamav-users.clamav.narkive.com/wbbsXygk/dynamic-engine-module-for-scanning-media-files-e-g-mp3-mp4-etc

So it would seem clam deliberately does not scan mp3 files, and maybe some mp4 files.

and
Clamav has a file size limit of 2Gb. If file is larger it returns that 0 bytes scanned message.

2 Likes

Rosika
From previous experience I know that for scanning e.g. .ogg files and .mp3 files clamscan needed some workaround by zipping the files first, etc.

This could be because clamscan on longer recognises the compressed files as .ogg and .mp3 and just treats them as another data file.

Neville
If I download a compressed file, should I uncompress it first, before scanning for a virus?

Clamscan will not recognise a virus in a compressed file so it has to be uncompressed first but will clamscan scan it? @Rosika has written about using firejail sandbox so setting up a sandbox to do the decompression and scan would be a safe way to go.

Thinking about putting a virus in an audio or video file I wonder if it could be hidden in the meta-data, there may be an easy way of executing code from there.

In my previous post I mentioned run-length coding but another well known way used for compression is Morse code as originally used for telegraphy.

I need to know more about formatting in these posts, is there a noddy guide?

Regards Clive

3 Likes

Hi @Clivegg12 ,
This may help

Formatting can be done with Markdown or with HTML
or you can use the buttons at the top of the editing screen for some things.

Regards
Neville

2 Likes

Hi @all, :wave:

sorry I couldn´t reply earlier, but I wasn´t online yesterday.

Thanks so much for your valued input. :heart:

@Clivegg12 :

I see. Well, that got me interested.
Getting a bit confused (because I´ve been using this method suggested by the user Ged from clamav-users@lists.clamav.net for quite a long time) I tried to do some research on the matter.

In the end I asked ChatGPT for its opinion:

Your forum member’s concern is understandable but based on a misunderstanding of how modern antivirus (AV) engines like ClamAV handle compressed files.

Let me explain why your method remains effective:

How ClamAV Handles Compressed Files

1. Signature Matching in Compressed Files:

Modern AV engines, including ClamAV, are designed to scan files within compressed archives.
When you use gzip to compress a file and then pass it to clamscan, ClamAV decompresses the file internally during the scan process.

The AV engine then inspects the decompressed data to match it against known virus signatures. The compression does not interfere with the scanning because ClamAV works with the original, uncompressed data internally.

2. Virus Signatures and Compression:

Virus signatures are patterns that AV engines look for within files. If a file is compressed, the AV engine decompresses it first to retrieve the original data where the signature might be present.
Compression does change the file’s binary structure, but this is a temporary state. When clamscan decompresses the file, it restores the original sequence of bytes, allowing it to detect any known virus patterns just as it would in an uncompressed file.

3. Real-World Application:

AV engines routinely scan files in compressed formats like .zip, .rar, and .gz. This is because viruses could potentially be distributed inside compressed archives, and it’s crucial for AV tools to detect them.
Therefore, ClamAV is equipped to handle this scenario, ensuring that the virus detection process remains effective even if the file is compressed.

Addressing the Concern Directly

  • Decompression During Scanning:

ClamAV decompresses the data during the scanning process, so any virus that was present in the original uncompressed file would still be detected.
The compression and subsequent decompression are not part of the detection process; they are simply steps to ensure that the AV engine can access and scan the actual content.

  • No Loss of Signature Detection:

  • The forum member’s concern would be valid if the AV engine could only scan the file in its compressed form without decompressing it. However, since ClamAV does decompress the file, there is no loss in the ability to detect a virus

Conclusion

The method you described—compressing the file with gzip and then scanning it with clamscan—is still effective.
ClamAV will decompress the file internally and scan the original content for any viruses, so the compression does not hinder its ability to detect threats. The concern raised by your forum member doesn’t apply to how ClamAV processes compressed files.

If all of this is correct the method suggested by Ged should still be valid and effective :wink: .

@nevj :

You raise a valid point, Neville.

ChatGPT confirms that clamav is designed to scan files within compressed archives (see above). :+1:

Also from ChatGPT, but I think I read something like that elsewhere as well:

Why clamscan Might Not Scan Certain Files Directly

clamscan is primarily designed to scan for viruses in common file types that are known to potentially contain malware, such as executables, scripts, and document files. Multimedia files like .mp4, .mp3, and .ogg are generally not seen as high-risk for containing malware in the same way, so clamscan might not scan them as thoroughly by default.

It´s the same wth mp3 files. I already provided thunar with a user-defined right-click entry to that effect, so that I can scan mp3 files easily.

Thanks for the link, Neville.

@daniel.m.tripp :

Thanks to you as well, Dan.
Your personal experience is very interesting, as always.

Many greetings to all.

Rosika :slightly_smiling_face:

2 Likes

I am relieved to hear that, thank you @Rosika

We still have not explained why clam refused to scan some uncompressed mp4 files.
That I believe was the original question.

2 Likes

Hi Neville, :wave:

you´re welcome.

I hope this a more elaborate answer to your query:

Why clamscan Might Skip or Refuse to Scan Certain MP4 Files

  1. File Type Heuristics:
  • Heuristic Scanning:

Antivirus engines like clamscan use heuristics to decide how thoroughly to scan a file.
This means they rely on certain patterns, file types, or content indicators to determine whether a file is likely to be a threat. Since MP4 files are primarily used for media content and are not typically associated with malware, clamscan may choose to skip or minimally scan these files to save processing time.

  • File Extensions vs. Content:

Even if a file has an .mp4 extension, clamscan might look at the file’s content and structure to decide how to handle it. If it detects that the file is a straightforward media file (especially audio-only), it might not scan the entire file deeply because it assesses that the risk of it containing malware is low.

  1. File Header Analysis:
  • Quick Header Check:

clamscan might perform a quick scan of the file’s header (the initial part of the file that contains metadata about the file format) and determine that the file doesn’t need a full scan. For example, if the header indicates that the file is a standard, well-formed media file without any executable content, clamscan might skip further analysis.

  • Non-Standard Files:

If the MP4 file contains non-standard headers or metadata that clamscan doesn’t fully recognize, it might skip scanning under the assumption that it’s not a typical threat vector.

  1. Resource Optimization:
  • Performance Considerations:

Scanning large multimedia files can be resource-intensive. To optimize performance, clamscan might skip or lightly scan files that it considers to be low-risk. This is particularly true for large video files where the majority of the file is non-executable data (like video frames or audio samples).

  • Default Configurations:

By default, clamscan may be configured to not fully scan large or multimedia files unless explicitly instructed to do so (e.g., with specific flags like --max-filesize or --max-scansize).

  1. MP4 Complexity:
  • Container Complexity:

The MP4 format is a container that can hold different types of data, including video, audio, subtitles, and more. The complexity of these files means that some parts of an MP4 file might not be scanned thoroughly because clamscan might not fully parse every possible type of content within the container.

  • Lack of Executable Code:

Most of the data in an MP4 file is not executable (it’s just media content), which is less likely to carry viruses. Since ClamAV and other AV engines are primarily concerned with executable code, they might skip over non-executable data, reducing the thoroughness of the scan.

Summary

In summary, clamscan might skip or refuse to thoroughly scan certain uncompressed MP4 files because it uses heuristics to determine that these files are low-risk. The combination of performance considerations, file content analysis, and the nature of MP4 files as media containers makes clamscan more likely to focus its efforts on file types that are more likely to pose a threat.

So, while the underlying structure of the file (as we discussed earlier) plays a role, these additional factors contribute to why clamscan might not fully scan some MP4 files by default.

(source: ChatGPT - once again)

Huh, that´s a lot to take in :wink: , but it may explain a lot.

Many greetings from Rosika :slightly_smiling_face:

2 Likes

It certainly does, thank you Rosika.
Clam is a much more intelligent program than I envisaged.

2 Likes

you´re welcome, Neville. :heart:

Yes, it seems that way.
I certainly learned a lot by the discussion with you, the other members and by ChatGPT´s input.

Thanks so much to all of you.

Cheers from Rosika :slightly_smiling_face:

2 Likes

@Rosika
Thanks for getting that info which resolves the queries.Clamscan is more capable than I had realised.
Clive

3 Likes

This has been an interesting discussion glad i read it, did not have anything to offer but interesting to read.

I have installed clam on many computers and when had a issue, run it as a test just in case but its never found any Problems or virus issues. I have never worried as they are rare in linux.

I also dont use compression on any files, but really interesed in the reply from your chatgp on testing compression unpacking on the fly.

Thanks for sharing this

2 Likes

Hi again, :wave:

@Clivegg12 :

Thanks for your feedback, Clive.

Yes, I was surprised at clamscan´s capabilities as well.
I´m glad you found the info helpful/interesting.

@callpaul.eu :

I´m pleased to learn you found the discussion interesting as well. So it hasn´t been a total waste of time. :wink:

It´s the same with me. It never came up with any virus problems.
The only thing it ever found was when I scanned my e-mail folder from thunderbird:

LibClamAV info: Suspicious link found!
LibClamAV info:   Real URL:    https://mailing.[XXXX].de
LibClamAV info:   Display URL: https://mailing.[YYYYY].de

But that´s an issue introduced by my banking institute. I may consider it to be a false positive then.

Still, I´m glad that clamscan is that attentive. :wink:

Yes, you´re right. I was pretty impressed by it, too.

Thanks to all of you and many greetings from Rosika. :slightly_smiling_face:

1 Like

So it scans thunderbird and email links, impressive.

I dont use thunderbird so never needed to test it against that, but useful to know…

Its gone up in my estimations now, as dont think windows defender does the same with either mail or compression

2 Likes

You probably do - without being aware of it?

PNG, JPG, MPG, MKV, MP4, MP3 - are all compressed…

2 Likes

Never thought like that, just file format. For me compresed files are zip

Thats your Windows heritage.

2 Likes

You can take a boy out of windows
But you can’t take windows out of a boy…

1 Like