Mat2

Mat2: what is it?

Mat2 is a powerful software tool written in Python for removing metadata from several file formats, developed by Julien Voisin who we thank for his availability in having replied to our email.

The project Mat2 is also on the Python repository - The Python Package Index (PyPI) is a repository of software for Python programming language, and it is installable via the pip install mat2 command.

Mat2 can be installed on MacOS from the terminal using Homebrew (we have already briefly described homebrew in our previous article) with the command:

brew install exiftool cairo pygobject3 poppler gdk-pixbuf librsvg ffmpeg

For those who don’t know, Homebrew is a package manager for MacOS systems.

On the Mat2 GitLab page, there are also directions for installing the Desktop GUI for GNU/Linux.

Mat2 works with the ExifTool by Phil Harvey that is a platform-independent Perl library plus a command-line application for reading, writing, and modifying meta information of numerous file formats. ExifTool is available for Windows, MacOS, and Unix platforms.

Metadata

Going back to Mat2, we mentioned that the tool reads the metadata present in individual files that the user can decide to delete.

What is metadata?

We are not aware of an uncontroversial definition of metadata.

The etymology of the term “metadata” - as in many other cases (e.g., digital, from the Latin “digit”, finger, to indicate the possibility of having a numerical reference regarding the way of counting with fingers) - comes partly from Greek and partly from Latin. Thus, the term “metadata” is composed of “meta” (from the Greek, “over”) and the word “date” which is derived from the Latin “datum” (plural “data”). In short, it refers to information that describes a set of data, often known as “data about data”.

We’ve probably heard the term “metadata” or “metadata” frequently in computer science and regarding additional information contained in a file. Metadata is additional or descriptive information about one or more pieces of data. More correctly, we should refer to information related to a digital, physical, or abstract resource.

In computer science, files usually contain metadata that can be, for example, the information:

  • Of the software that generated them;
  • of the author;
  • of the date;
  • the time, etc.

Files that contain images derived from photos (e.g., jpg, jpeg, etc.) taken with a digital camera, smartphone, tablet, or other devices, have metadata such as, for example, the day, the time, the location where one took the photo (if active geolocation with GPS), etc.

Metadata can be different and sometimes grouped by categories. We mention one subdivision into the three main classes namely:

  1. Descriptive metadata;
  2. Structural metadata;
  3. Administrative metadata.

Mainly, concerning the IT document, it is hardly worth mentioning the AgID document entitled “Metadata which constitutes the “Annex 5 to the document “Guidelines on the formation, management, and storage of IT documents”.

In this post, we refer only to computer science and metadata related to files.

Metadata and privacy

After this brief metadata description, it’s pretty clear that if we exchange files with others, we also have to worry about the additional related information (metadata), which can often reveal information we don’t want people to know.

Privacy is also crucial and decisive concerning metadata.

Metadata can contain information that allows identifying a natural person. Otherwise, metadata can reveal to the people with whom you share files information (often not available to the user who is unaware of their existence) such as - for example - those related to the author of the creation of a file, the date, the place, etc.

Sometimes it is possible to create embarrassing situations like the following one.

Alice is working on a confidential project, and she has to deliver the related report to Charlie who commissioned it.

Thinking that support for editing the content might be helpful, Alice gets help from Bob, who completes the activities and saves the file in the final version.

Alice receives the file from Bob and gives it to Charlie.

However, Charlie realizes that the file delivered by Alice contains metadata revealing Bob’s information, which was unrelated to Charlie and the project.

Charlie asks for clarification about Bob to Alice, who, in front of the evidence, can do nothing but confess everything with the utmost discomfort.

In addition to being an embarrassing situation, the case described shows how easy it is to disclose even personal information without the knowledge of the person concerned or without any intervention to remove the metadata.

Often, our common applications add metadata, which should make us pay attention to the software resources we use. There are applications available - for example, for writing - that allow you to work with greater peace of mind because they do not save additional information (metadata) within the file.

Choosing minimalist and straightforward solutions is sometimes not wrong, although it depends on the conditions and the context.

In our humble opinion, for those who have to write (but we’ll make a post ad hoc on the subject), it is preferable to use applications so-called “distraction-free”. Using those applications, we can write without distraction in plain text with markdown and then save its content, possibly without metadata within the file. Our choice is along these lines, and several editors allow you to write in markdown, and our feedback is undoubtedly very positive about productivity.

Data protection and privacy are paramount in our lives.

Once again, we reiterate the principle that users should have full control over their personal data, control that can be exercised by deleting metadata and, in addition, developing applications under the principles of “Data Protection by Design and Protection by Default” described in Article 25 of EU Regulation 2016/679 (GDPR) or “Privacy by Design”.

So, if we value our privacy, we pay attention to any metadata contained in the files we use that we may share with others.
Thinking about your privacy and not disregarding it means taking care of yourself.

Mat2 commands

Mat2 allows to get rid of metadata from files permanently, and its use is quite simple.

You have to run Mat2 from the terminal and so, once installed as explained above, from the Mac, we launch the Terminal app and then type:

  • to find out what all the Mat2 commands are - although the complete list of commands is available on developer’s GitLab page, we type:
mat2 -h
or
mat2 --help
  • To find out what file formats Mat2 handles, type the command (the -l parameter stands for list):
mat2 -l
  • To see if a file contains metadata, we type the following command (the -s parameter stands for show, show):
mat2 -s filename
  • to get rid of metadata from a file (e.g., in the previous case, we checked the existence of metadata), we type (the -V parameter - the V is uppercase because the lowercase one gives the version - stands for verbose, i.e., it allows us to have information about what the tool does):
mat2 -V filename

In the hypothesis of metadata deletion, you should note that Mat2 does not delete the file on which you intervene because it leaves the original file as it is but creates another file filename.cleaned.extension.

Mat2 in action

To better explain how Mat2 acts on files, we have prepared three different videos of a few seconds.

We performed a test on the following file formats:

  • jpg
  • docx
  • pdf

We typed the command ls -la in the individual videos, which returns the detailed list of files in the folder. That demonstrates how Mat2, for metadata removal, creates a new file by inserting cleaned between the filename and its extension (in our case, for example, the file mat2.docx became mat2.cleaned.docx).

In the first example, we used Mat2 with the file giustizia.jpg (extension .jpg):

In the second example, we used Mat2 with the file mat2.docx (extension .docx - Word file):

In the third example, we used Mat2 with the file mat2.pdf (extension .pdf - file created by Word):

Mat2-web

There is also a web version of Mat2 - Mat2 web - that, if you want, you can install on your server.

Mat2-web

That is a valuable resource that avoids typing the commands above and allows you to use Mat2 directly from the browser.


Mat2-web
Mat2-web - image from the developer’s GitLab page

For a demonstration of how the Mat2-web service looks like, we refer to the page of the “Demo” instance created ad hoc by the developer.

However, we preferred not to install this service for security reasons to avoid possible attacks as the developer also stated.



Stay tuned!