Python Training by Dan Bader

Writing a Domain Specific Language (DSL) in Python

Learn how to create your own Domain Specific Language with Python from scratch with this step-by-step tutorial.

Writing a DSL in Python

A Domain Specific Language, or DSL for short, is a language that’s specialized to a particular application domain. In other words, it’s a programming language that’s used for a more specific application or use case than a general-purpose language like Python.

For example, regular expressions are a DSL. Another widely-used DSL is SQL. As you can see, DSLs run the gamut from the complex, like regular expressions, to the simple and very niche variety we’re going to create in this tutorial.

To give you an idea of how simple they can be, let’s take a sneak peek at what our DSL written in Python will look like:

# This is a comment
module1 add 1 2
module2 sub 12 7
module1 print_results

With the DSL you’ll create in this tutorial you’ll be able to call Python functions and pass arguments to them using a syntax that resembles assembly language.

Blank lines or comment lines that start with “#” are ignored, just like Python. Any other line starts with the module name, then the function name followed by its arguments, separated by spaces.

As you’ll see in the course of this tutorial, even a simple language like this can offer a lot of flexibility and make your Python applications “scriptable.”

What You’ll Learn in This Tutorial

Writing a Domain Specific Language (DSL) may sound difficult—like something that’s really hard and should only be done by advanced programmers. Perhaps you haven’t heard of a DSL before. Or you’re not sure what one is.

If so, then this tutorial is for you. This isn’t a subject reserved for advanced programmers. A DSL doesn’t have to be complex or involve studying parser theory and abstract syntax trees.

We’re going to write a simple DSL in Python that’s generic in nature that uses other Python source files to do some work. It’s simple and generic for a reason. I want to show you how easy it is to use Python to write a DSL that you can adapt for your own use in your projects.

Even if you don’t have a direct use for a DSL today, you may pick up some new ideas or bits of the language that you haven’t seen before. We’ll look at:

  • dynamically importing Python modules at runtime
  • using getatttr() to access an object’s attributes
  • using variable-length function arguments and keyword arguments
  • converting strings to other data types

Defining Your Own Programming Language

Our DSL is a language that’s used to run Python code to perform some work. The work that’s done is completely arbitrary. It can be whatever you decide is appropriate to expose to the user that helps them accomplish their work. Also, the users of our DSL aren’t necessarily Python programmers. They just know that they have work to get done via our DSL.

It’s up to the user to decide what they need to accomplish and therefore write in the DSL source file. All the user knows is they have been provided a library of functionality, or commands, that they can run using the DSL.

For writing our DSL, we’ll start with the simplest implementation possible and incrementally add functionality. Each version of the source files you’ll see for Python and our DSL will have the same version suffix added to it.

So our first implementation will have the source files “dsl1.py”, “src1.dsl” and “module1.py”. The second version with additional functionality will end with “2” and so on.

In summary, we’ll end up with the following naming scheme for our files:

  • “src1.dsl” is the DSL source file that users write. This is not Python code but contains code written in our custom DSL.
  • “dsl1.py” is the Python source file that contains the implementation of our domain specific language.
  • “module1.py” contains the Python code that users will call and execute indirectly via our DSL.

If you ever get stuck, you can find the full source code for this tutorial on GitHub.

DSL Version 1: Getting Started

Let’s make this more concrete by deciding what the first version of our DSL will be able to do. What’s the simplest version we could make?

Since the users need to be able to run our Python code, they need to be able to specify the module name, function name and any arguments the function might accept. So the first version of our DSL will look like this:

# src1.dsl
module1 add 1 2

Blank lines or comment lines that start with “#” are ignored, just like Python. Any other line starts with the module name, then the function name followed by its arguments, separated by spaces.

Python makes this easy by simply reading the DSL source file line by line and using string methods. Let’s do that:

# dsl1.py

#!/usr/bin/env python3
import sys

# The source file is the 1st argument to the script
if len(sys.argv) != 2:
    print('usage: %s <src.dsl>' % sys.argv[0])
    sys.exit(1)

with open(sys.argv[1], 'r') as file:
    for line in file:
        line = line.strip()
        if not line or line[0] == '#':
            continue
        parts = line.split()
        print(parts)

Running “dsl1.py” from the command-line will lead to the following result:

$ dsl1.py src1.dsl
['module1', 'add', '1', '2']

If you’re using macOS or Linux, remember to make “dsl1.py” executable if it’s not already. This will allow you to run your application as a command-line command.

You can do this from your shell by running chmod +x dsl1.py. For Windows, it should work with a default Python installation. If you run into errors, check the Python FAQ.

With just a few lines of code, we were able to get a list of tokens from a line in our source file.  These token values, in the list “parts”, represent the module name, function name and function arguments. Now that we have these values, we can call the function in our module with its arguments.

Importing a Python Module at Runtime

But this brings up a new challenge. How do we import a module in Python if we don’t know the module name ahead of time? Typically, when we’re writing code, we know the module name we want to import and just enter import module1.

But with our DSL, we have the module name as the first item in a list as a string value. How do we use this?

The answer is we use can use importlib from the standard library to dynamically import the module at runtime. So let’s dynamically import our module next by adding the following line at the top of “dsl1.py” right under import sys:

import importlib

Before the with block you’ll want to add another line to tell Python where to import modules from:

sys.path.insert(0, '/Users/nathan/code/dsl/modules')

The sys.path.insert() line is necessary so Python knows where to find the directory that contains the modules that make up our library. Adjust this path as needed for your application so it references the directory where Python modules are saved.

Then, at the end of the file, insert the following lines of code:

mod = importlib.import_module(parts[0])
print(mod)

After making these changes, “dsl1.py” will look as follows:

# dsl1.py -- Updated

#!/usr/bin/env python3
import sys
import importlib

# The source file is the 1st argument to the script
if len(sys.argv) != 2:
    print('usage: %s <src.dsl>' % sys.argv[0])
    sys.exit(1)

sys.path.insert(0, '/Users/nathan/code/dsl/modules')

with open(sys.argv[1], 'r') as file:
    for line in file:
        line = line.strip()
        if not line or line[0] == '#':
            continue
        parts = line.split()
        print(parts)

        mod = importlib.import_module(parts[0])
        print(mod)

Now if we run “dsl1.py” from the command-line again, it will lead to the following result and printout:

$ dsl1.py src1.dsl
['module1', 'add', '1', '2']
<module 'module1' from '/Users/nathan/code/dsl/modules/module1.py'>

Great–we just imported a Python module dynamically at runtime using the importlib module from the standard library.

Additional importlib Learning Resources

To learn more about importlib and how you can benefit from using it in your programs, check out the following resources:

Invoking Code

Now that we’ve imported the module dynamically and have a reference to the module stored in a variable called mod, we can invoke (call) the specified function with its arguments. At the end of “dsl1.py”, let’s add the following line of code:

getattr(mod, parts[1])(parts[2], parts[3])

This may look a little odd. What’s happening here?

We need to get a reference to the function object in the module in order to call it. We can do this by using getattr with the module reference. This is the same idea as using import_module to dynamically get a reference to the module.

Passing the module to getattr and the name of the function returns a reference to the module’s add function object. We then call the function by using parentheses and passing the arguments along, the last two items in the list.

Remember, everything in Python is an object. And objects have attributes. So it follows that we’d be able to access a module dynamically at runtime using getattr to access its attributes. For more information, see getattr in the Python docs.

Let’s look at “module1.py”:

# module1.py

def add(a, b):
    print(a + b)

If we run “dsl1.py src1.dsl” now, what will the output be? “3”? Let’s see:

$ dsl1.py src1.dsl
['module1', 'add', '1', '2']
<module 'module1' from '/Users/nathan/code/dsl/modules/module1.py'>
12

Wait, “12”? How did that happen? Shouldn’t the output be “3”?

This is easy to miss at first and may or may not be what you want. It depends on your application. Our arguments to the add function were strings. So Python dutifully concatenated them and returned the string “12”.

This brings us to a higher level question and something that’s more difficult. How should our DSL handle arguments of different types? What if a user needs to work with integers?

One option would be to have two add functions, e.g. add_str and add_int. add_int would convert the string parameters to integers:

print(int(a) + int(b))

Another option would be for the user to specify what types they’re working with and have that be an argument in the DSL:

module1 add int 1 2

What decisions you make in regards to your DSL’s syntax and how it functions depends on your application and what your users need to accomplish. What we’ve seen so far is, of course, a simple example, but the dynamic nature of Python is powerful.

In other words, Python’s built-in features can take you a long way; without having to write a lot of custom code. We’ll explore this more next in version 2 of our DSL.

You can find the final version of “dsl1.py” here on GitHub.

DSL Version 2: Parsing Arguments

Let’s move on to version 2 and make things more general and flexible for our users. Instead of hardcoding the arguments, we’ll let them pass any number of arguments. Let’s look at the new DSL source file:

# src2.dsl
module2 add_str foo bar baz debug=1 trace=0
module2 add_num 1 2 3 type=int
module2 add_num 1 2 3.0 type=float

We’ll add a function that splits the DSL arguments into an “args” list and a “kwargs” dictionary that we can pass to our module functions:

def get_args(dsl_args):
    """return args, kwargs"""
    args = []
    kwargs = {}
    for dsl_arg in dsl_args:
        if '=' in dsl_arg:
            k, v = dsl_arg.split('=', 1)
            kwargs[k] = v
        else:
            args.append(dsl_arg)
    return args, kwargs

This get_args function we just wrote can be used as follows:

args, kwargs = get_args(parts[2:])
getattr(mod, parts[1])(*args, **kwargs)

After calling get_args, we’ll have an arguments list and a keyword arguments dictionary. All that’s left to do is change our module function signatures to accept *args and **kwargs and update our code to use the new values.

From within our module’s function, *args is a tuple and **kwargs is a dictionary. Here’s the new generalized code for “module2.py” that uses these new values:

# module2.py

def add_str(*args, **kwargs):
    kwargs_list = ['%s=%s' % (k, kwargs[k]) for k in kwargs]
    print(''.join(args), ','.join(kwargs_list))

def add_num(*args, **kwargs):
    t = globals()['__builtins__'][kwargs['type']]
    print(sum(map(t, args)))

In add_str, kwargs_list is a list that’s created using a list comprehension. If you haven’t seen this before, a list comprehension creates a list using an expressive and convenient syntax.

We simply loop over the keys in the dictionary (for k in kwargs) and create a string representing each key/value pair in the dictionary. We then print the result of joining the list of arguments with an empty string and the result of joining the list of keyword arguments with “,“:

foobarbaz debug=1,trace=0

For more on list comprehensions, see this tutorial: “Comprehending Python’s Comprehensions”.

With add_num, we decided to give the user a little more power. Since they need to add numbers of specific types (int or float), we need to handle the string conversion somehow.

We call globals() to get a dictionary of references to Python’s global variables. This gives us access to the __builtins__ key/value which in turn gives us access to the classes and constructors for “int” and “float”.

This allows the user to specify the type conversion for the string values passed in our DSL source file “src2.dsl”, e.g. “type=int”. The type conversion is done in one step for all arguments in the call to map and its output is fed to sum.

The map() function takes a function and an iterable and calls the function for each item in the iterable, capturing its output. Think of it as a way of transforming a sequence of values into new values. If it’s not clear and it’s too much on one line, break it into two lines for clarity:

converted_types = map(t, args)  # t is class "int" or "float"
print(sum(converted_types))

For the DSL source lines:

module2 add_num 1 2 3 type=int
module2 add_num 1 2 3.0 type=float

We get the output:

6
6.0

Users can now pass any number of arguments to our functions. What I think is particularly helpful is the use of **kwargs, the keyword arguments dictionary.

Users can call our functions with keywords from the DSL, passing options, just like they’d do if they were Python programmers or running programs from the command line. Keywords are also a form of micro-documentation and serve as reminders for what’s possible. For best results, try to pick succinct and descriptive names for your keyword arguments.

Once again you can find the final version of “dsl2.py” on GitHub.

DSL Version 3: Adding Documentation

Let’s add one more feature to help our users and create version 3. They need some documentation. They need a way to discover the functionality provided by the library of modules.

We’ll add this feature by adding a new command line option in “dsl3.py” and checking the modules and their functions for docstrings. Python docstrings are string literals that appear as the first line of a module, function, class or method definition. The convention is to use triple-quoted strings like this:

def function_name():
    """A helpful docstring."""
    # Function body

When users pass “help=module3” on the command line to “dsl3.py”, the get_help function is called with “module3”:

def get_help(module_name):
    mod = importlib.import_module(module_name)
    print(mod.__doc__ or '')
    for name in dir(mod):
        if not name.startswith('_'):
            attr = getattr(mod, name)
            print(attr.__name__)
            print(attr.__doc__ or '', '\n')

In get_help, the module is dynamically imported using import_module like we’ve done before. Next we check for the presence of a docstring value using the attribute name __doc__ on the module.

Then we need to check all functions in the module for a docstring. To do this we’ll use the built-in function “dir”. “dir” returns a list of all attribute names for an object. So we can simply loop over all the attribute names in the module, filter out any private or special names that begin with “_” and print the function’s name and docstring if it exists.

The final version of “dsl3.py” is also available on GitHub.

Writing a DSL With Python – Review & Recap

Let’s recap what we’ve done in this tutorial. We’ve created a simple DSL that lets our users easily get some work done by calling into a library of functions. Luckily for us, we know Python. So we can use it to implement our DSL and make things easy for us too.

DSLs are powerful tools that are fun to think about and work on. They’re another way we can be creative and solve problems that make it easier for our users to get work done. I hope this tutorial has given you some new ideas and things to think about that you can apply and use in your own code.

From the user’s perspective, they’re just running “commands.” From our perspective, we get to leverage Python’s dynamic nature and its features and, in turn, reap the rewards of having all of the power of Python and its ecosystem available to us. For example, we can easily make changes to a library module or extend the library with new modules to expose new functionality using the standard library or 3rd party packages.

In this tutorial we looked at a few techniques:

  • importlib.import_module(): dynamically import a module at runtime
  • getattr(): get an object’s attribute
  • variable-length function arguments and keyword arguments
  • converting a string to a different type

Using just these techniques is quite powerful. I encourage you to take some time to think about how you might extend the code and functionality I’ve shown here. It could be as simple as adding a few lines of code using some of the features built-in to Python or writing more custom code using classes.

Using importlib

I’d like to mention one more thing regarding the use of “importlib”. Another application and example of using dynamic imports with “importlib” is implementing a plugin system. Plugin systems are very popular and widely used in all types of software.

There’s a reason for this. Plugin systems are a method of allowing extensibility and flexibility in an otherwise static application. If you’re interested in deepening your knowledge, see Dan’s excellent tutorial “Python Plugin System: Load Modules Dynamically With importlib

Error Checking

In this tutorial I’ve omitted error checking on purpose. One reason is to keep additional code out of the examples for clarity. But also so the users and Python programmers of the library modules can see a full stack trace when there are errors.

This may or may not be the right behavior for your application. Think about what makes the most sense for your users and handle errors appropriately, especially for common error cases.

Security Considerations

A cautionary note on security: please consider and be aware that the dynamic nature of importing and running code may have security implications depending on your application and environment. Be sure that only authorized users have access to your source and module directories. For example, unauthorized write access to the “modules” directory will allow users to run arbitrary code.

Python DSLs: Next Steps

Where do we go from here? What’s next? You may be thinking, “Well, this is nice and all, but I need more cowbell! I need to create a real DSL with real syntax and keywords.”

A good next step would be to look at Python parsing libraries. There are many! And their functionality, ease-of-use and documentation vary widely.

If you’d like to use the code used in this tutorial for your own experiments, the full source code is available on GitHub.

<strong><em>Improve Your Python</em></strong> with a fresh 🐍 <strong>Python Trick</strong> 💌 every couple of days

Improve Your Python with a fresh 🐍 Python Trick 💌 every couple of days

🔒 No spam ever. Unsubscribe any time.

This article was filed under: python.

Related Articles:
  • Interfacing Python and C: The CFFI Module – How to use Python’s built-in CFFI module for interfacing Python with native libraries as an alternative to the “ctypes” approach.
  • Debugging memory usage in a live Python web app – I worked on a Python web app a while ago that was struggling with using too much memory in production. A helpful technique for debugging this issue was adding a simple API endpoint that exposed memory stats while the app was running.
  • Installing Python and Pip on Windows – In this tutorial you’ll learn how to set up Python and the Pip package manager on Windows 10, completely from scratch.
  • Python’s Functions Are First-Class – Python’s functions are first-class objects. You can assign them to variables, store them in data structures, pass them as arguments to other functions, and even return them as values from other functions.
  • Python Decorators: A Step-By-Step Introduction – Understanding decorators is a milestone for any serious Python programmer. Here’s your step-by-step guide to how decorators can help you become a more efficient and productive Python developer.

About the Author

Nathan Jennings

I started with C a long time ago, but eventually found Python. The search is over. I enjoy learning all things Python. From web applications and data collection to networking and network security.

Latest Articles:
← Browse All Articles