Using Diff to Spot the Difference in Text Files


Imagine you have two files that have minor differences in them.
It might be difficult to spot the differences between them.

Thankfully, there is a command called diff that will help us out.

diff --help
Usage: diff [OPTION]... FILES
Compare files line by line.

But first, we need some files with a couple of differences in them.

This article makes use of a collection of random files. These files were put together to help you “Tweak Your Terminal”. Checkout our guide to setting up the random files.

We can use the sed command to slightly change one of the random files.

sed -e 's/^baz;/bay;/g' strandberg.txt > strandberg.diff.txt

Now that we a file with a few differences, you can use the diff command to see them.

diff strandberg.txt strandberg.diff.txt
3c3
< baz; Bar! Foo? Hoo? Baz. Foo bar, Foo Foo Baz! Foo bar, baz. Bar. Foo;
---
> bay; Bar! Foo? Hoo? Baz. Foo bar, Foo Foo Baz! Foo bar, baz. Bar. Foo;

The diff command will list every difference between the two files.  The first file passed is the left file, and the second file is the right file.

For each part of the file that is different, you will see the location and the difference.

The location is displayed as “3c3”, line 3 of the left file and line 3 of the right file. 
Then there are the actual differences. 
The left hand file line is show with the less than symbol (<) and the right hand is shown with the greater than symbol (>).

In the world of programming, diff is very useful.  It allows you to see the changes between one version of a file and another.  However, in a lot of programming languages, the whitespace doesn’t have any effect of the programme.  Which is good because some people like to code using spaces and others like to use tabs.

What we need is to modify our previous example to change the white-space. 

sed -e 's/^baz;/bay;/g' -e 's/ /  /g' strandberg.txt > strandberg.diff.txt

Note: It might not be obvious, but the last sed command is to replace a single space with two spaces.

If we compare the files now, every line will have issues, because of the white-space change we made.

diff strandberg.txt strandberg.diff.txt 
1,6c1,6
< Foo? Hoo? Baz. Foo bar, Foo Foo Baz! Foo bar, baz? Bar? Foo! Hoo?
< Foo Foo Baz. Foo bar, baz. Bar; Foo. Hoo! Baz! Foo bar, Foo Foo Baz!
< baz; Bar! Foo? Hoo? Baz. Foo bar, Foo Foo Baz! Foo bar, baz. Bar. Foo;
---
> Foo?  Hoo?  Baz.  Foo  bar,  Foo  Foo  Baz!  Foo  bar,  baz?  Bar?  Foo!  Hoo?
> Foo  Foo  Baz.  Foo  bar,  baz.  Bar;  Foo.  Hoo!  Baz!  Foo  bar,  Foo  Foo  Baz!
> bay;  Bar!  Foo?  Hoo?  Baz.  Foo  bar,  Foo  Foo  Baz!  Foo  bar,  baz.  Bar.  Foo;

Note: When the location has a comma, e.g. 1,6c2,3, then that is start line and end line.  

To get diff to compare just the text changes and ignore the white-space change, you need to use the -w option. 

diff -w strandberg.txt strandberg.diff.txt
3c3
< baz; Bar! Foo? Hoo? Baz. Foo bar, Foo Foo Baz! Foo bar, baz. Bar. Foo; Hoo! Baz!
---
> bay;  Bar!  Foo?  Hoo?  Baz.  Foo  bar,  Foo  Foo  Baz!  Foo  bar,  baz.  Bar.  Foo;  Hoo!  Baz!

Conclusion

The diff command is an invaluable tool in the Unix-like operating systems, being a linchpin for comparing file differences. With its ability to meticulously dissect and display changes between files, diff has become an essential command for developers, system administrators, and everyday users alike.

Whether you’re comparing simple text files or analyzing changes between intricate code files, diff offers you an efficient way to pinpoint additions, deletions, and modifications. Furthermore, with its output formats like the unified format or the context format, diff provides flexible ways to visualize these changes, catering to various user preferences and needs.

Additionally, the power of diff is not only confined to comparing two files. When used with other commands like patch, it can create and apply patches — a feature that has become a cornerstone in version control systems, leading to efficient collaborative coding environments.

However, the full depth of diff extends far beyond what we’ve explored in this guide. This command is vast, offering a variety of flags and options that can cater to more specific use cases and preferences. For instance, the --ignore-case, --ignore-all-space options can add even more flexibility to your file comparisons.

In essence, mastering the diff command allows you to navigate file changes with confidence and precision, whether you’re reconciling different code versions, comparing configuration files, or simply checking for alterations in text documents. Remember, you can always explore the man diff or diff --help commands to unlock even more of its potential.

Daniel

Whilst building web applications, Daniel also sets up web servers from scratch because he has yet to find the perfect hosting solution. His philosophy is “Why settle, when you can build it better yourself?”

Recent Posts