Python Training by Dan Bader

Working With File I/O in Python

Learn the basics of working with files in Python. How to read from files, how to write data to them, what file seeks are, and why files should be closed.

Reading and Writing Files in Python

In this tutorial you’ll learn how to work with files using Python.

Reading and writing to files in any programming language is an important feature. Without it, all variables and information are stored on volatile memory that is lost when the computer is shut down or the program ends. When you save data to a permanent file, you can retrieve it at a later date without worry.

Here’s what we’ll cover:

  • The difference between binary and text files
  • Where to find Python’s built-in file I/O functions and tools
  • How to open and close files in Python
  • The various ways to read data from a file in Python
  • How to write data to a file object in Python
  • File seeks in Python and moving the read/write pointer
  • Editing an existing text file with Python

Let’s get started!

Binary vs Text Files in Python

There are two separate types of files that Python handles: binary and text files. Knowing the difference between the two is important because of how they are handled.

Most files that you use during your normal computer use are actually binary files, not text. That’s right, that Microsoft Word .doc file is actually a binary file, even if it just has text in it. Other examples of binary files include:

  • Image files including .jpg, .png, .bmp, .gif, etc.
  • Database files including .mdb, .frm, and .sqlite
  • Documents including .doc, .xls, .pdf, and others.

That’s because these files all have requirements for special handling and require a specific type of software to open it. For example, you need Excel to open an .xls file, and a database program to open a .sqlite file.

A text file on the other hand, has no specific encoding and can be opened by a standard text editor without any special handling. Still, every text file must adhere to a set of rules:

  • Text files have to be readable as is. They can (and often do) contain a lot of special encoding, especially in HTML or other markup languages, but you’ll still be able to tell what it says
  • Data in a text file is organized by lines. In most cases, each line is a distinct element, whether it’s a line of instruction or a command.

Additionally, text files all have an unseen character at the end of each line which lets the text editor know that there should be a new line. When interacting with these files through programming, you can take advantage of that character. In Python, it is denoted by the “\n”.

Where to Find Python’s File I/O Tools

When working in Python, you don’t have to worry about importing any specific external libraries to work with files. Python comes with “batteries included” and the file I/O tools and utilties are a built-in part of the core language.

In other languages like C++, to work with files you have to enable the file I/O tools by including the correct header file, for example #include <fstream>. And if you are coding in Java, you need the import java.io.* statement.

With Python, this isn’t necessary—

Instead, Python has a built in set of functions that handle everything you need to read and write to files. We’ll now take a closer look at them.

Opening a File in Python

The first function that you need to know is open(). In both Python 2 and Python 3, this command will return a file object as specified in the parameters. The basic function usage for open() is the following:

file_object = open(filename, mode)

In this instance, filename is the name of the file that you want to interact with, with the file extension included. That is, if you have a text file that is workData.txt, your filename is not just "workData". It’s "workData.txt".

You can also specify the exact path that the file is located at, such as “C:\ThisFolder\workData.txt”, if you’re using Windows.

Remember, however, that a single backslash in a string indicates to Python the beginning of a string literal. So there’s a problem here, because these two meanings will conflict…

Thankfully, Python has two ways to deal with this. The first is to use double backslashes like so: "C:\\ThisFolder\\workData.txt". The second is to use forward slashes: "C:/ThisFolder/workData.txt".

The mode in the open function tells Python what you want to do with the file. There are multiple modes that you can specify when dealing with text files.

  • 'w' – Write Mode: This mode is used when the file needs to be altered and information changed or added. Keep in mind that this erases the existing file to create a new one. File pointer is placed at the beginning of the file.
  • 'r' – Read Mode: This mode is used when the information in the file is only meant to be read and not changed inany way. File pointer is placed at the beginning of the file.
  • 'a' – Append Mode: This mode adds information to the end of the file automatically. File pointer is placed at the end of the file.
  • 'r+' – Read/Write Mode: This is used when you will be making changes to the file and reading information from it. The file pointer is placed at the beginning of the file.
  • 'a+' – Append and Read Mode: A file is opened to allow data to be added to the end of the file and lets your program read information as well. File pointer is placed at the end of the file.

When you are using binary files, you will use the same mode specifiers. However, you add a b to the end. So a write mode specifier for a binary file is 'wb'. The others are 'rb', 'ab', 'r+b', and 'a+b' respectively.

In Python 3, there is one new mode that was added:

  • 'x' – Exclusive Creation Mode: This mode is used exclusively to create a file. If a file of the same name already exists, the function call will fail.

Let’s go through an example of how to open a file and setting the access mode.

When using the open() function, you’d typically assign its result to variable. Given a file named workData.txt, the proper code to open the file for reading and writing would be the following:

data_file = open("workData.txt", "r+")

This creates an object called data_file that we can then manipulate using Pythons File Object Methods.

We used the 'r+' access mode in this code example which tells Python that we want to open the file for reading and writing. This gives us a lot of flexibility, but often you might want to restrict your program to just reading or just writing to a file and this is where the other modes come in handy.

Closing a File in Python

Knowing how to close a file is important when you’re reading and writing.

It frees up system resources that your program is using for I/O purposes. When writing a program that has space or memory constraints, this lets you manage your resources effectively.

Also, closing a file ensures that any pending data is written out to the underlying storage system, for example, your local disk drive. By explicitly closing the file you ensure that any buffered data held in memory is flushed out and written to the file.

The function to close a file in Python is simply fileobject.close(). Using the data_file file object that we created in the previous example, the command to close it would be:

data_file.close()

After you close a file, you can’t access it any longer until you reopen it at a later date. Attempting to read from or write to a closed file object will throw a ValueError exception:

>>> f = open("/tmp/myfile.txt", "w")
>>> f.close()
>>> f.read()
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    f.read()
ValueError: I/O operation on closed file.

In Python, the best practice for opening and closing files uses the with keyword. This keyword closes the file automatically after the nested code block completes:

with open("workData.txt", "r+") as workData:
    # File object is now open.
    # Do stuff with the file:
    workData.read()

# File object is now closed.
# Do other things...

If you don’t use the with keyword or use the fileobject.close() function then Python will automatically close and destroy the file object through the built in garbage collector. However, depending on your code, this garbage collection can happen at any time.

So it’s recommended to use the with keyword in order to control when the file will be closed—namely after the inner code block finishes executing.

Working With Python File Objects

Once you’ve successfully opened a file, you can use built-in methods to deal with the new file object. You can read data from it, or write new data to it. There are also other operations like moving the “read/write pointer”, which determines where in the file data is read from and where it is written to. We’ll take a look at that a little later in the tutorial.

Next up you’ll learn how to read data from a file you’ve opened:

Reading Data From a File in Python

Reading a file’s contents uses the fileobject.read(size) method. By default, this method will read the entire file and print it out to the console as either a string (in text mode) or as byte objects (in binary mode).

You have to be careful when using the default size, however. If the file you’re reading is larger than your available memory, you won’t be able to access the entire file all at once. In a case like this, you need to use the size parameter to break it up into chunks your memory can handle.

The size parameter tells the read method how many bytes into the file to return to the display. So let’s assume that our “workData.txt” file has the following text in it:

This data is on line 1
This data is on line 2
This data is on line 3

Then if you wrote the following program in Python 3:

with open("workData.txt", "r+") as work_data:
    print("This is the file name: ", work_data.name)
    line = work_data.read()
    print(line)

You’ll get this output:

This is the file name: workData.txt
This data is on line 1
This data is on line 2
This data is on line 3

On the other hand, if you tweak the third line to say:

line = workData.read(6)

You’ll get the following output:

This is the file name: workData.txt
This d

As you can see, the read operation only read the data in the file up to position 6, which is what we passed to the read() call above. That way you can limit how much data is read from a file in one go.

If you read from the same file object again, it will continue reading data where you left off. That way you can process a large file in several smaller “chunks.”

Reading Text Files Line-by-Line With readline()

You can also parse data in a file by reading it line by line. This can let you scan an entire file line by line, advancing only when you want to, or let you see a specific line.

The fileobject.readline(size) method defaults to returning the first line of the file. But by changing the integer size parameter, you can get any line in your file you need.

For example:

with open("workData.txt", "r+") as work_data:
     print("This is the file name: ", work_data.name)
     line_data = work_data.readline()
     print(line_data)

This would return the output of:

This is the file name:  workData.txt
This data is on line 1

Putting a 2 or a 3 as the size variable will return the second or third lines accordingly.

A similar method is the fileobject.readlines() call (notice the plural), which returns every line in a tuple format. If you did a call of:

print(work_data.readlines())

You would get the following output:

['This data is on line 1', 'This data is on line 2', 'This data is on line 3']

As you can see, this reads the whole file into memory and splits it up into several lines. This only works with text files however. A binary file is just a blob of data—it doesn’t really have a concept of what a single line is.

Processing an Entire Text File Line-By-Line

The easiest way to process an entire text file line-by-line in Python is by using a simple loop:

with open("workData.txt", "r+") as work_data:
    for line in work_data:
        print(line)

This has the following output:

This data is on line 1
This data is on line 2
This data is on line 3

This approach is very memory-efficient, because we’ll be reading and processing each line individually. This means our program never needs to read the whole file into memory at once. Thus, using readline() is a comfortable and efficient way to process a big text file in smaller chunks.

Writing to a File With Python Using write()

Files wouldn’t be any good if you couldn’t write data to them. So let’s discuss that.

Remember that when you create a new file object, Python will create the file if one doesn’t already exist. When creating a file for the first time, you should either use the a+ or w+ modes.

Often it’s preferable to use the a+ mode because the data will default to be added to the end of the file. Using w+ will clear out any existing data in the file and give you a “blank slate” to start from.

The default method of writing to a file in Python is using fileobject.write(data). For example, you could add a new line to our “workData.txt” file by using the following code:

work_data.write("This data is on line 4\n")

The \n acts as the new line indicator, moving subsequent writes to the next line.

If you want to write something that isn’t a string to a text file, such as a series of numbers, you have to convert or “cast” them to strings, using conversion code.

For example, if you wanted to add the integers 1234, 5678, 9012 to the work_data file, you’d do the following. First, you cast your non-strings as a string, then you write that string to your file object:

values = [1234, 5678, 9012]

with open("workData.txt", "a+") as work_data:
    for value in values:
        str_value = str(value)
        work_data.write(str_value)
        work_data.write("\n")

File Seeks: Moving the Read/Write Pointer

Remember that when you write using the a+ mode, your file pointer is always going to be at the end of the file. So taking the above code where we’ve written the two numbers, if you use the fileobject.write() method, you’re not going to get anything in return. That’s because that method is looking after the pointer to find additional text.

What you need to do then, is move the pointer back to the beginning of the file. The easiest way to do this is to use the fileobject.seek(offset, from_what) method. In this method, you put the pointer at a specific spot.

The offset is the number of characters from the from_what parameter. The from_what parameter has three possible values:

  • 0 – indicates the beginning of the file
  • 1 – indicates the current pointer position
  • 2 – indicates the end of the file

When you’re working with text files (those that have been opened without a b in the mode), you can only use the default 0, or a seek(0, 2), which will take you to the end of the file.

So by using work_data.seek(3, 0) on our “workData.txt” file, you will place the pointer at the 4th character (remember that Python starts counts at 0). If you use the line print loop, you would then get an output of:

s data is on line 1
This data is on line 2
This data is on line 3

If you want to check the current position of the pointer, you can use the fileobject.tell() method, which returns a decimal value for where the pointer is at in the current file. If we want to find how long our current work_data file is, we can use the following code:

with open("workData.txt", "a+") as work_data:
    print(work_data.tell())

This will give a return value of 69, which is the size of the file.

Editing an Existing Text File with Python

There will come a time when you need to edit an existing file rather than just append data to it. You can’t just use w+ mode to do it. Remember that mode w will completely overwrite the file, so even with using fileobject.seek(), you won’t be able to do it. And a+ will always insert any data at the end of the file.

The easiest way to do it involves pulling the entire file out and creating a list or array data type with it. Once the list is created, you can use the list.insert(i, x) method to insert your new data. Once the new list is created, you can then join it back together and write it back to your file.

Remember that for list.insert(i, x), i is an integer that indicates the cell number. The data of x then is placed before the cell in the list indicated by i.

For example, using our “workData.txt” file, let’s say we needed to insert the text line, “This goes between line 1 and 2” in between the first and second lines. The code to do it is:

# Open the file as read-only
with open("workData.txt", "r") as work_data:
    work_data_contents = work_data.readlines()

work_data_contents.insert(1, "This goes between line 1 and 2\n")

# Re-open in write-only format to overwrite old file
with open("workData.txt", "w") as work_data:
    work_dataContents = "".join(work_data_contents)
    work_data.write(work_data_contents)

Once this code runs, if you do the following:

with open("workData.txt", "r") as work_data:
    for line in work_data:
        print(line)

You’ll get an output of:

This data is on line 1
This goes between line 1 and 2
This data is on line 2
This data is on line 3

This demonstrated how to edit an existing text file in Python, inserting a new line of text at exactly the place you wanted.

Python File I/O – Additional Resources

In this tutorial you learned the basics of file handling in Python. Here’s the range of topics we covered:

  • The difference between binary and text files
  • Where to find Python’s built-in file I/O functions and tools
  • How to open and close files in Python
  • The various ways to read data from a file in Python
  • How to write data to a file object in Python
  • File seeks in Python and moving the read/write pointer
  • Editing an existing text file with Python

But really, we’ve only scratched the surface here. As with anything programming-related, there’s lots more to learn…

So I wanted to give you a few additional resources you can use to deepen your Python file-handling skills:

<strong><em>Improve Your Python</em></strong> with a fresh 🐍 <strong>Python Trick</strong> 💌 every couple of days

Improve Your Python with a fresh 🐍 Python Trick 💌 every couple of days

🔒 No spam ever. Unsubscribe any time.

This article was filed under: file-io, and python.

Related Articles:
Pip, PyPI, Virtualenv: How to Set It All Up

Pip, PyPI, Virtualenv: How to Set It All Up
Avoid common Python packaging pitfalls with this free email course:
» Click here to get the first lesson

Latest Articles:
from coffee import *

from coffee import *
Every Pythonista Needs a Great Coffee (Or Tea) Mug
» Browse Python Mugs at Nerdlettering.com

← Browse All Articles