Python Training by Dan Bader

Array Data Structures in Python

How to implement arrays in Python using only built-in data types and classes from the standard library. Includes code examples and recommendations.

Python Arrays

An array is a fundamental data structure available in most programming languages and it has a wide range of uses across different algorithms.

In this article we’ll take a look at array implementations in Python that only use core language features or functionality included in the Python standard library.

You’ll see the strengths and weaknesses of each approach so you can decide which implementation is right for your use case.

But before we jump in—let’s cover some of the basics first.

So, how do arrays work in Python and what are they used for?

Arrays consist of fixed-size data records that allow each element to be efficiently located based on its index.

Because arrays store information in adjoining blocks of memory they’re considered contiguous data structures (as opposed to a linked data structure like a linked list, for example.)

A real world analogy for an array data structure is a parking lot:

You can look at the parking lot as a whole and treat it as a single object. But inside the lot there are parking spots indexed by a unique number. Parking spots are containers for vehicles—each parking spot can either be empty or have a car, a motorbike, or some other vehicle parked on it.

But not all parking lots are the same:

Some parking lots may be restricted to only one type of vehicle. For example, a motorhome parking lot wouldn’t allow bikes to be parked on it. A “restricted” parking lot corresponds to a “typed array” data structure that only allows elements that have the same data type stored in them.

Performance-wise it’s very fast to look up an element contained in an array given the element’s index. A proper array implementation guarantees a constant O(1) access time for this case.

Python includes several array-like data structures in its standard library that each have slightly different characteristics. If you’re wondering how to declare an array in Python, this list will help pick the right data structure.

Let’s take a look at the available options:

list – Mutable Dynamic Arrays

Lists are a part of the core Python language. Despite their name, Python’s lists are implemented as dynamic arrays behind the scenes. This means lists allow elements to be added or removed and they will automatically adjust the backing store that holds these elements by allocating or releasing memory.

Python lists can hold arbitrary elements—“everything” is an object in Python, including functions. Therefore you can mix and match different kinds of data types and store them all in a single list.

This can be a powerful feature, but the downside is that supporting multiple data types at the same time means that data is generally less tightly packed and the whole structure takes up more space as a result.

>>> arr = ['one', 'two', 'three']
>>> arr[0]

# Lists have a nice repr:
>>> arr
['one', 'two', 'three']

# Lists are mutable:
>>> arr[1] = 'hello'
>>> arr
['one', 'hello', 'three']

>>> del arr[1]
>>> arr
['one', 'three']

# Lists can hold arbitrary data types:
>>> arr.append(23)
>>> arr
['one', 'three', 23]

tuple – Immutable Containers

Tuples are a part of the Python core language. Unlike lists Python’s tuple objects are immutable, this means elements can’t be added or removed dynamically—all elements in a tuple must be defined at creation time.

Just like lists, tuples can hold elements of arbitrary data types. Having this flexibility is powerful, but again it also means that data is less tightly packed than it would be in a typed array.

>>> arr = 'one', 'two', 'three'
>>> arr[0]

# Tuples have a nice repr:
>>> arr
('one', 'two', 'three')

# Tuples are immutable:
>>> arr[1] = 'hello'
TypeError: "'tuple' object does not support item assignment"

>>> del arr[1]
TypeError: "'tuple' object doesn't support item deletion"

# Tuples can hold arbitrary data types:
# (Adding elements creates a copy of the tuple)
>>> arr + (23,)
('one', 'two', 'three', 23)

array.array – Basic Typed Arrays

Python’s array module provides space-efficient storage of basic C-style data types like bytes, 32-bit integers, floating point numbers, and so on.

Arrays created with the array.array class are mutable and behave similarly to lists—except they are “typed arrays” constrained to a single data type.

Because of this constraint array.array objects with many elements are more space-efficient than lists and tuples. The elements stored in them are tightly packed and this can be useful if you need to store many elements of the same type.

Also, arrays support many of the same methods as regular lists. For example, to append to an array in Python you can just use the familiar array.append() method.

As a result of this similarity between Python lists and array objects, you might be able to use it as a “drop-in replacement” without requiring major changes to your application.

>>> import array
>>> arr = array.array('f', (1.0, 1.5, 2.0, 2.5))
>>> arr[1]

# Arrays have a nice repr:
>>> arr
array('f', [1.0, 1.5, 2.0, 2.5])

# Arrays are mutable:
>>> arr[1] = 23.0
>>> arr
array('f', [1.0, 23.0, 2.0, 2.5])

>>> del arr[1]
>>> arr
array('f', [1.0, 2.0, 2.5])

>>> arr.append(42.0)
>>> arr
array('f', [1.0, 2.0, 2.5, 42.0])

# Arrays are "typed":
>>> arr[1] = 'hello'
TypeError: "must be real number, not str"

str – Immutable Arrays of Unicode Characters

Python 3.x uses str objects to store textual data as immutable sequences of Unicode characters. Practically speaking that means a str is an immutable array of characters. Oddly enough it’s also a recursive data structure—each character in a string is a str object of length 1 itself.

String objects are space-efficient because they’re tightly packed and specialize in a single data type. If you’re storing Unicode text you should use them. Because strings are immutable in Python modifying a string requires creating a modified copy. The closest equivalent to a “mutable string” is storing individual characters inside a list.

>>> arr = 'abcd'
>>> arr[1]

>>> arr

# Strings are immutable:
>>> arr[1] = 'e'
TypeError: "'str' object does not support item assignment"

>>> del arr[1]
TypeError: "'str' object doesn't support item deletion"

# Strings can be unpacked into a list to
# get a mutable representation:
>>> list('abcd')
['a', 'b', 'c', 'd']
>>> ''.join(list('abcd'))

# Strings are recursive data structures:
>>> type('abc')
"<class 'str'>"
>>> type('abc'[0])
"<class 'str'>"

bytes – Immutable Arrays of Single Bytes

Bytes objects are immutable sequences of single bytes (integers in the range of 0 <= x <= 255). Conceptually they’re similar to str objects and you can also think of them as immutable arrays of bytes.

Like strings, bytes have their own literal syntax for creating objects and they’re space-efficient. Bytes objects are immutable, but unlike strings there’s a dedicated “mutable byte array” data type called bytearray that they can be unpacked into. You’ll hear more about that in the next section.

>>> arr = bytes((0, 1, 2, 3))
>>> arr[1]

# Bytes literals have their own syntax:
>>> arr
>>> arr = b'\x00\x01\x02\x03'

# Only valid "bytes" are allowed:
>>> bytes((0, 300))
ValueError: "bytes must be in range(0, 256)"

# Bytes are immutable:
>>> arr[1] = 23
TypeError: "'bytes' object does not support item assignment"

>>> del arr[1]
TypeError: "'bytes' object doesn't support item deletion"

bytearray – Mutable Arrays of Single Bytes

The bytearray type is a mutable sequence of integers in the range 0 <= x <= 255. They’re closely related to bytes objects with the main difference being that bytearrays can be modified freely—you can overwrite elements, remove existing elements, or add new ones. The bytearray object will grow and shrink appropriately.

Bytearrays can be converted back into immutable bytes objects but this incurs copying the stored data in full—an operation taking O(n) time.

>>> arr = bytearray((0, 1, 2, 3))
>>> arr[1]

# The bytearray repr:
>>> arr

# Bytearrays are mutable:
>>> arr[1] = 23
>>> arr

>>> arr[1]

# Bytearrays can grow and shrink in size:
>>> del arr[1]
>>> arr

>>> arr.append(42)
>>> arr

# Bytearrays can only hold "bytes"
# (integers in the range 0 <= x <= 255)
>>> arr[1] = 'hello'
TypeError: "an integer is required"

>>> arr[1] = 300
ValueError: "byte must be in range(0, 256)"

# Bytearrays can be converted back into bytes objects:
# (This will copy the data)
>>> bytes(arr)

Which array implementation should I use in Python?

There are a number of built-in data structures you can choose from when it comes to implementing arrays in Python. In this article we’ve concentrated on core language features and data structures included in the standard library only.

If you’re willing to go beyond the Python standard library, third-party packages like NumPy offer a wide range of fast array implementations for scientific computing.

But focusing on the array data structures included with Python, here’s what your choice comes down to:

  • You need to store arbitrary objects, potentially with mixed data types? Use a list or a tuple, depending on whether you want an immutable data structure or not.

  • You have numeric (integer / floating point) data and tight packing and performance is important? Try out array.array and see if it does everything you need. Consider going beyond the standard library and try out packages like NumPy.

  • You have textual data represented as Unicode characters? Use Python’s built-in str. If you need a “mutable string” use a list of characters.

  • You want to store a contiguous block of bytes? Use bytes (immutable) or bytearray (mutable).

Personally, I like to start out with a simple list in most cases and only specializing later on if performance or storage space becomes an issue.

This is especially important when you need to make a choice between using a Python list vs an array. The key difference here is that Python arrays are more space-efficient than lists, but that doesn’t automatically make them the right choice in your specific use case.

Most of the time using a general purpose array data structure like list in Python gives you the fastest development speed and the most programming convenience.

I found that this is usually much more important in the beginning than squeezing out every last drop of performance from the start.

Read the full “Fundamental Data Structures in Python” article series here. This article is missing something or you found an error? Help a brother out and leave a comment below.

<strong><em>Improve Your Python</em></strong> with a fresh 🐍 <strong>Python Trick</strong> 💌 every couple of days

Improve Your Python with a fresh 🐍 Python Trick 💌 every couple of days

🔒 No spam ever. Unsubscribe any time.

This article was filed under: data-structures, and python.

Related Articles:
  • 6 things you’re missing out on by never using classes in your Python code – Maybe you’ve been using Python for a while now and you’re starting to feel like you’re getting the hang of it. But one day you catch yourself thinking: “What about classes?”
  • Records, Structs, and Data Transfer Objects in Python – How to implement records, structs, and “plain old data objects” in Python using only built-in data types and classes from the standard library.
  • Python’s Functions Are First-Class – Python’s functions are first-class objects. You can assign them to variables, store them in data structures, pass them as arguments to other functions, and even return them as values from other functions.
  • Sets and Multisets in Python – How to implement mutable and immutable set and multiset (bag) data structures in Python using built-in data types and classes from the standard library.
  • Queues in Python – How to implement a FIFO queue data structure in Python using only built-in data types and classes from the standard library.
Latest Articles:
← Browse All Articles