A Breakneck Guide to Nim
Note: This article is a work-in-progress with many unfinished and entirely missing sections.
Nim is a general-purpose programming language designed by Andreas Rumpf (Araq). It can be variously described as all of the following:
- A statically compiled Python with a comprehensive type system
- A memory-safe systems language that compiles down to and interfaces seamlessly with C
- A small core language that is extendable through powerful metaprogramming features
- A performant, easy-to-write language with move semantics and optional annotations
Code written in Nim looks like this:
import std/strformat
type Person = object
: string
name: Natural # Ensures the age is positive
age
let people = @[
Person(name: "John", age: 45),
Person(name: "Kate", age: 30)
]
proc printAges(people: seq[Person]) =
for person in people:
echo fmt"{person.name} is {person.age} years old"
printAges(people)
Without further ado, let’s jump right into it.
Table of Contents
- Design Decisions
- Basic Syntax
- Control Flow
- Type System
- Functions and Procedures
- Metaprogramming
- Interop
- Memory Management
Design Decisions
significant whitespace
Perhaps the most obvious feature of Nim is its syntax: it looks like Python! Where’d all the brackets go? Statement blocks in Nim are determined through significant whitespace.
import std/sugar
func takesALambda(a: (string, string) -> string) =
... # implementation omitted
# regular syntax for multi-line lambdas
takesALambda(
func (a, b: string): string =
return a + b
)
# same example using syntax sugar. types can be elided!
takesALambda((a, b) => (a + b))
More specifically: significant whitespace is queried while not in a statement, to determine the scope of the next line. As indentation determines scope only outside of expressions: when writing multi-line expressions, you must break after an operator.
# break long lines like this...
hereAreSomeQuiteLongFunctions() + andYetAnother() +
thatReturnValuesForthwith() # indentation here may vary for aesthetics
# not like this! compilation error
hereAreSomeQuiteLongFunctions() + andYetAnother()
+ thatReturnValuesForthwith() # the above is a complete expression
Standard indentation practice uses two spaces, but any (consistent) number works.
Indenting with tabs in Nim is disallowed at the compiler level. I consider this an excellent design decision.
If you absolutely must: adding #? replace("\t", " ")
to the beginning of any Nim file will cause the compiler to treat tabulation characters as two spaces when compiling.
uniform function call syntax
A particularly unique feature of Nim (okay, not unique - D did it first) is what is known as uniform function call syntax (or UFCS for short).
In short, the following statements are equivalent under UFCS:
let a = "Hello, UFCS!"
echo a.len()
echo len(a)
echo a.len
This is made possible by the revelation that there isn’t all that much difference between a method call, a function call that takes a class, and an inherent property of a type. How many times have you pondered whether an operation should be a function, member, or method? It’s just a distracting detail with no benefit. And now you don’t need to care!
This makes many things much nicer in practice. Function chaining, in particular, is now easy: a.foo().bar().baz()
. No need for a separate pipe operator!
style insensitivity
Nim is partially style insensitive.
In other words: identifiers in Nim are considered equal if - aside from their first character - they match with underscores removed and when taken to lowercase. This first character exception is so that code like let dog: Dog
can be written.
In code:
func same(a, b: string): bool =
[0] == b[0] and
a.replace("_", "").toLowerAscii == b.replace("_", "").toLowerAscii a
This has proven to be somewhat of a controversial feature. Critics say it hampers IDE support, breaks tooling, makes reading documentation harder, and can cause consistency issues. Proponents say language servers handle it fine, alternative tooling is available, you get used to reading documentation, and helps with codebase consistency.
I like it quite a lot. Being able to adopt a consistent snake_case or camelCase style in your codebase regardless of what external libraries do is a great boon: and optional (but likely to become default) --stylecheck
flags can treat inconsistencies as warnings or errors. I would encourage anyone to try it out for a little while before flaming it.
Basic Syntax
let var const
There are three assignment keywords in Nim: let
(for immutable variables), var
(for mutable variables), and const
(for compile-time evaluated constants).
const testValues = [1, 0, 25]
let immutable = "This variable cannot be changed."
var mutable: string
= stdin.readLine()
mutable = "Disregarding user input..." mutable
The =
operator is used for assignment and reassignment.
Note that you can declare a (mutable) variable without assigning anything to it.
With the exception of ref
and ptr
values, it has a default value depending on the type: more on nil/notnil later.
statements vs. expressions
conditional assignment
more on indentation
comments
Comments are prefixed with the pound sign #
.
Documentation comments are prefixed with ##
.
Common convention (and the one used by nim doc
) is to put documentation comments directly beneath function signatures or type declarations.
Multiline comments are made with #[ ... ]#
and can be nested.
imports and includes
logic and operations
Control Flow
if / elif / else
when / else
when
statements are statically (compile-time) evaluated if
statements. The else
keyword can be used with them.
While there’s not much else to say about when
statements themselves: the kinds of conditions they evaluate can be very helpful to see examples of.
when defined(macos):
when defined(js):
# check if the file is compiled with `-d:release`
when not defined(release):
# do some debug code here
# check if some code compiles with no errors
when compiles(3 + 4):
# the `+` operation is defined for integers
# check whether a library provides a certain feature
when not declared(strutils.toUpper):
# let's provide our own, then
case / of
The case
statement allows for basic compiler-checked pattern matching. A case
statement must handle all possibilities.
var x = stdin.readChar()
case x
of 'a'..'z', 'A'..'Z':
echo "A letter!"
of '0'..'9':
echo "A number!"
else:
echo "Something else!"
Idiomatic Nim does not put a colon after the case
parameter, nor indents the of
blocks. Both of those are, however, valid Nim. (This may change in the future).
for / in
todo
while
todo
block / break / continue
todo
try / finally / except / raise
todo
Type System
Nim has a static (ie. compile-time evaluated) and comprehensive type system.
- basic types:
int
,float
,char
,bool
,string
- collection types:
array[T]
,seq[T]
,set[T]
- range types:
range[T]
- structured types: types declared with the
object
ortuple
keywords - enumerated types: types declared with the
enum
keyword - procedure types:
proc
,func
- reference types: automatically managed references declared with the
ref
keyword - pointer types: unsafe, manually managed pointers declared with the
ptr
keyword - distinct types: types declared with the
distinct
keyword - parameter types:
var
,static
,typedesc
- generic types:
[T]
,[T: int | float]
,openarray
- optional types:
Option[T]
,Result[T, E]
basic types
ints
floats
chars
bools
strings
object types
object variants
By combining the case
statement with Nim’s object types, it is possible to create what are known as object variants. Object variants can have different fields depending on the value of the matched field. These are also known by a wide variety of other names: including variant types, tagged unions, and discriminated unions.
This is best explained with an example:
import std/tables
type NodeKind = enum
Text, Element
type Node = ref object
, y: float
x, height: float
widthcase kind
of Text:
: string
textof Element:
: string
tag: Table[string, string]
attributes: seq[Node] children
In many cases, variant types provide a more idiomatic alternative to generics. However, they have their limitations: field names may not be reused across cases, and the kind
of the variant is just a field within the object rather than a higher-level identifier as in Rust’s enums.
openarrays
Often, it’s helpful to write code that can deal with multiple kinds of iterable types: for example, a function that prints out every element in an array or a sequence. While generics are powerful, they run into a limitation here: arrays are sized! We would have to explicitly parametrize over every size of the array that we want to use. Furthermore, even disregarding arrays, it is frequently helpful to be generic over both sequences and strings, and annotating : string | seq[T]
every time gets old.
Openarrays solve this! They provide a special openarray
type that is generic over all arrays (of any length, regardless of their inner type), sequences (regardless of their inner type), and strings. Just like other generic types, openarrays are only available as parameter types.
Functions and Procedures
procedures
What are typically known as functions in other languages are known in Nim as procedures.
Procedures use the proc
keyword, followed by a name, (optional) parameters, an (optional) return type, and the procedure body.
proc plusOrMinus(a: bool, b, c: int): int =
if a:
return b + c
else:
return b - c
# You don't even need parentheses if your procedure doesn't take parameters!
proc anotherProcedure =
echo "This procedure doesn't return anything."
functions
What Nim considers functions are typically known as pure functions in other languages. Functions are declared identically to procedures, only with the func
keyword instead of the proc
keyword. Functions are statically guaranteed by the compiler to have no side effects.
Side effects are considered to be any action modifying state outside of the function’s current scope. This includes modifying a global variable declared outside of the function, modifying : var T
parameters (more on those later), and I/O.
Side effects do not currently include the modification of a ref
type (more on those later), but this behavior is expected to change in the near future. See: {.experimental: "strictFuncs".}
As a special exception, the debugEcho
procedure is not considered to have side effects - despite dealing with I/O - through compiler magic. This is to allow for easier debugging of pure functions.
Note that this guide has used the terms procedure and function interchangeably, and will continue to do so.
return and result
The return
keyword returns the provided value and instantly exits the function, just like many other languages.
While you can simply return
out of a procedure any time, Nim also provides an implicit result variable.
The result
variable is initialized to the default value of the return type at the beginning of the function’s scope. If nothing has been explicitly return
ed by the end of the scope, the current value of result
is returned. This allows for writing cleaner, more idiomatic code.
An example of return
and result
is as follows:
discard
The discard
keyword allows for calling a statement that returns a value without doing anything with that value. This is best explained with an example:
proc returns(): bool =
echo "This procedure runs some code and returns a value."
return true
# fails: expression `returns()` is of type `bool` and has to be used...
returns()
# compiles: ... or discarded
discard returns()
parameters
Every parameter must have a type. Multiple adjacent parameters sharing a type can have their types elided.
proc subtract(a: int, b: int): int =
return a - b
proc supersubtract(a, b, c: int): int =
return a - b - c
Multiple parameters of the same type… the auto feature… however, this is broadly considered to be an anti-pattern.
ref… copied in…
sink parameters
mutable parameters
Parameters are immutable by default and passed by value (Copy
ed, for Rust programmers). The var
keyword is reused in function signatures to denote a mutable parameter. This is best explained with an example:
proc immutableParameters(a: bool) =
= true
a
proc mutableParameters(a: var bool) =
= true
a
let a = false
immutableParameters(a) # Error: `a` cannot be assigned to
mutableParameters(a) # Compiles fine, a is now true
The compiler will try and optimize copies into moves, and can be helped out some by the programmer. More on that later.
Note that ref
types behave somewhat unintuitively as parameters. A ref
type is simply an automatically-managed pointer to some memory. The pointer (memory address), not the memory itself, is copied into the function signature. This makes the data of ref types mutable without the var
annotation.
A var
on a ref
type parameter, then, lets you change what group of data that ref
type variable is pointing to. This is usually a misnomer.
static parameters
The static
keyword is also reused in function signatures to denote a parameter that must be known at compile time. This is best explained with an example:
Note that a: static T
is a: static[T]
.
varargs
The varargs
keyword allows you to specify that a function can take a dynamic number of parameters. Only one parameter can be varargs
, and it must be the last parameter in the function signature. This is best explained with an example:
Metaprogramming
todo.
Interop
todo.
Memory Management
Nim’s memory management strategy is optimized reference counting with a cycle breaker. This may surprise some people, because one of Nim’s primary design goals is being efficient, and reference counting is typically considered to be less efficient than tracing GCs.
Nim’s version of reference counting (called ARC/ORC) brings to the table two things, however:
- Hard determinism
- Optimizing reference counts away with move semantics.
The former (hard determinism) comes from ARC/ORC not doing any sort of magic with deferred reference counts, and instead injecting destructors into the generated code. These injected destructors also provide other niceties, such as automagically-closing file streams.
The latter (optimizing reference counts away) comes from the move semantics used by Nim, Lobster, and Rust. Reference counting typically incurs a fairly substantial overhead: a counter increment every time an object is referenced, and a counter decrement + check every time something that references the object goes out of scope. However: if you can statically prove that no references exceed an object’s lifetime in the compilation step, then you don’t need that reference count.
This is the guiding principle behind Rust’s borrow checker…
ARC/ORC also has the advantage of greatly simplifying memory management across threads, but we won’t get into that here. Mostly because I don’t understand how it works.