Revisiting Rust’s modules, part 2
It’s been a week since my last post on Rust’s module system. Unsurprisingly, the strawman proposal in that post garnered a lot of commentary–174 comments in one week!–with sentiments ranging from
Now this is a proposal I can get behind
to
I’ve rarely hated anything as much as I hate the module system proposal
and everything in between :-)
The discussion has raised a number of very interesting points; thanks to everyone who has participated so far!. I won’t try to give a comprehensive summary here. What I want to do instead is focus on one particular critique of the earlier proposal, and present a quite different strawman design that embraces a different set of priorities.
For ease of discussion:
- I’ll call the strawman in my last post the “directories-as-modules” proposal.
- I’ll call the strawman in this post the “use-universally” proposal.
A critique of the directories-as-modules proposal
There were a number of concerns about the directories-as-modules proposal (including its fairly radical nature), but the one that struck me was that the proposal was very heavily weighted toward a particular subset of the problems the original post raised, and didn’t help much with some of the others.
To recap briefly: the original post talked about obstacles both for learning the
module system, and for using it at scale. It ultimately focused a lot on the
issue of how much we have to employ pub use
(aka the “facade pattern”) when
setting things up today, and I think the proposal clearly streamlines that
story. (There are also variants like “inline” aka “anonymous” modules that bring
in just part of the proposal).
On the other hand, the proposal didn’t do much to help with issues around “path confusion”:
The fact that
use
declarations work with absolute paths while other items do not is confusing, and even experienced Rust programmers (myself included) often confuse the two. To make matters worse, the top-level namespace contains all of the external crates, but also the contents of the current crate. Unless, of course, you’re writing an external test or binary. And finally, when you’re working at the top level, the absolute/relative distinction doesn’t matter, which means that you can have the wrong mental model and only find it when trying to expand out into submodules.
Many on the thread cited this as the core problematic issue with the module system; I’ve collected some data about confusion around Rust modules which also supports that to a degree.
My goal in this post is to float a quite different proposal that emphasizes these issues, de-emphasizes the facading issues, and overall is more conservative. Similarly to last time, the idea here is to present a coherent, plausible “spike” with ideas that could be useful, and seek feedback on the broad direction without getting too bogged down in the fine details.
One other bit of framing
Before giving the proposal, though, I want to record one other insight I’ve had along the way, in terms of where people sometimes go wrong when learning the module system.
Coming from other languages, there’s often an expectation that adding a .rs
file to the source tree, or a dependency to Cargo.toml
, should be all that’s
needed to set up the naming hierarchy. From that perspective, you’d expect to be
able to use use
to pull items out of any of these. Instead, you sometimes
can, but need to write the correct incantation (extern crate
or mod
) in the
right place first. It requires a shift in mental model. And the fact that use
is much more common than mod
can make this all the more confusing.
@kornel put together a really great chart comparing module systems that makes this point quite strongly.
Part of the reason I’m labeling this proposal as “use-universally” is that it
sets up use
declarations as the only thing you need to write in your Rust
source to bring items into scope. The items that are available, by contrast,
are determined by Cargo (or another build system), together with your file
system. This is one aspect that mirrors the earlier proposal, part of which is
now an RFC.
The basic ingredients
Here’s a quick summary of the proposal:
- Start with today’s module system.
- Deprecate
extern crate
, along the lines of the in-progress RFC. - Deprecate
mod foo;
and instead determine module structure from the file system.- However, unlike the previous proposal, this determination is the same as today, i.e. files are modules, and directories are used to introduce nested modules.
- Improve
use
for greater clarity around paths, which I’ll explain below. - Modules are
pub(crate)
unless they arepub use
d (sopub mod foo;
becomespub use foo;
– note that this is using relative paths, as I’ll explain next).
The meat is in making two adjustments for use
declarations:
- Introduce a
from <crate_name> use <path>;
form for importing items from external crates. - Change
use <path>;
to treat the path as relative to the current module (i.e. as if it started withself::
).- A leading
::
takes you to the root of the current crate, but is not a way to reference items from other crates.
- A leading
(Similar adjustments are needed for referencing paths in function signatures etc., which I’ll elide here.)
This is, of course, a breaking change. However, it has some properties that make it a reasonable fit for the checkpoint model:
- It’s trivial to write a
rustfix
tool that mechanically switches today’suse
declarations to this new setup, and likewise deals withmod
andextern crate
. - We could introduce and stabilize the
from/use
syntax, then deprecate use of absolute paths inuse
(without a leading::
), and employrustfix
at that point – all before a new checkpoint is needed.
Of course, the full migration story needs to be significantly fleshed out, but this is just meant to sketch plausibility.
What does it look like?
Before talking about the rationale, I want to show an example for clarity. First, the parts that don’t change.
Here’s a Cargo.toml
excerpt:
[dependencies]
petgraph = "0.4.5"
A directory structure excerpt:
src/
lib.rs
coherence/
mod.rs
solve.rs
Code in today’s module system
In lib.rs
:
extern crate petgraph;
pub mod coherence;
In mod.rs
:
use petgraph::prelude::*;
use errors::Result;
use ir::{Program, ItemId};
mod solve;
pub use self::solve::Solver;
In solve.rs
:
use std::sync::Arc;
use itertools::Itertools;
use errors::*;
use ir::*;
Code in the proposed system:
In lib.rs
:
pub use coherence; // note relative path; this makes `coherence` pub
In mod.rs
:
from petgraph use prelude::*;
use ::errors::Result;
use ::ir::{Program, ItemId};
pub use solve::Solver; // note use of relative path
In solve.rs
:
from std use sync::Arc;
from itertools use Itertools;
use ::errors::*;
use ::ir::*;
Rationale
Each piece of this proposal has a rationale, but in some cases they’re tied together:
Introducing
from
/use
. This form provides a much more clear distinction between imports from external crates and those from the local crate, which can be helpful when exploring a codebase. Splitting out this form also means we eliminate the very confusing issue that extern crates are “mounted” in the current crate’s module hierarchy, usually at root. (In this analogy, thefrom
form is more like addressing an entirely separate volume.) Incidentally, grepping for this declaration will tell you which external crates are in use.Changing
use
to take paths relative to the current module. There are two main reasons to do this.- If submodules are always in scope for their parent module, things like function signatures feel like they are taking relative paths. (In actuality, they are resolving names based on what’s in scope). In any case, making paths everywhere relative to the current module reduces confusion.
- We want to use
pub use
to export submodules publicly, but with absolute paths this would bepub use self::my_submodule
which is awkward and confusing; people are almost certain to forgetself
much of the time. - Note that there are often arguments that
use
-like mechanisms should employ absolute paths by default because that’s the common case. However, for Rust I think that’s at least partly based on the current use for pulling in items from external crates, and would be more evenly split in this new setup.
Using
pub use
for exporting modules. If the module hierarchy is determined from the file system, we need some way to say whether a module is public. While we could say this in the module itself, doing so is syntactically awkward, and also means that a module’s exports are spread over multiple files. At the same time,pub use
still exists as a form you need to use for re-exporting items, and it provides a reasonable mental model when using it to export your child module.The general privacy setup. A basic premise is that the visibility of a module name is not terribly important by itself; what really matters is the visibility of items within the module. Thus we simplify matters by making all modules have at least crate visibility—though this does mean that marking an item
pub
in a module means it, in reality, has at leastpub(crate)
visibility (and perhaps more, if it’s exported in a public module). This is arguably a good thing; today, the fact that you can writepub
but the actual visibility is determined by a complex nest of re-exports and module visibilities can make it quite hard to reason about unfamiliar code. As has been argued on thread, the vast majority of the time you only need visibility at one of three levels: the current module, the crate, or the world. This proposal makes those cases all easy to express, and requires a more explicitpub(super)
etc to get other privacy granularities.- TL;DR: writing
pub
on an item meanspub(crate)
unless (re)exported in a public module (which itself is done via re-exporting).
- TL;DR: writing
Deprecating
mod
/extern crate
. This was already explained above. There’s already been some discussion around the downsides (and ways to mitigate them), so I’m not going to spend time on that here.- Note, however, that one of the alternatives below may help further mitigate these concerns.
Alternatives
This design pulls together choices I believe cohere well, but there are many possible variations that are also quite plausible. These can be broken down into largely orthogonal knobs. I’ll take a brief look at each, and the tradeoffs as I see them.
Knob: from
/use
ordering
The from
/use
syntax follows precedent from Python, but we could instead use the
use
/from
ordering from JS.
Possible benefits of use
/from
:
- Makes it easier to read at a glance, when the item name makes obvious what the crate is.
- Avoids “jagged edges” of imported names.
- Arguably more “natural” reading (as a sentence).
Possible benefits of from
/use
:
- More natural for autocomplete in Ides.
- Gives you the crate name first when reading left-to-right (better if you often need that information to understand the import)
It’s interesting to consider the choices when it comes to multi-line imports:
from std use {
io::{self, Read, Write},
collections::{HashMap, HashSet},
rc::Rc,
};
// versus
use {
io::{self, Read, Write},
collections::{HashMap, HashSet},
rc::Rc,
} from std;
There are of course plenty of other possible syntactic choices, but these are relatively intuitive and descend from very commonly-used languages.
Knob: pub use foo
vs pub mod foo
Rather than using re-exports to make a module public, we could say that the file
system determines module structure, but you use pub mod foo;
to make a child
module foo
public.
The main advantage would be that it’s more plausible to continue to make use
take absolute paths, which reduces breakage. On the other hand, it seems to
double down on some aspects of “path confusion”, and doesn’t achieve the
unification around use
that the main proposal does.
Knob: absolute vs relative paths
We could keep other elements of this proposal, but have use
continue to use
absolute paths. (We could then, for example, only allow you to reference
external crates that were brought in through extern crate
in use
, but ones
implied from Cargo.toml
would go through from
/use
, potentially making the
whole system backwards compatible).
If we go that route, then to make a module public we’d most likely wind up with one of the following:
pub use self::my_submodule;
pub mod my_submodule;
And again, as above, some path confusion issues remain.
Knob: include on use
Rather than determining the module hierarchy from the file system immediately,
we could follow many other languages which add modules to the name hierarchy
only if they are in some way referenced (e.g. via use
); only at that point
would we examine the file system for resolution.
Such an approach makes the Rust source somewhat more independent of the precise state of the file system, and may thereby address some of the concerns people have raised about previous proposals.
A downside, though: sometimes modules contain nothing but impl
blocks, in
which case they are not naturally referenced elsewhere. You’d have to explicitly
use
such modules, and forgetting to do so could lead to some head-scratching
errors. (That said, we could generate a warning if the directory contains unused
.rs
files).
Knob: use
ing submodules
The proposal assumes that submodules are always in scope for their parents. We
could instead require you to use
them before referring to them. I can’t see a
lot of advantage to doing that, though.
Extensions
Finally, while the proposal as-is only marginally helps with facades (by
removing the need for self::
that’s currently common when facading), it’s
compatible with future extensions that do more.
For example, we could draw from earlier proposals involving “anonymous modules”
(aka “inline modules”) – say, files beginning with a leading _
– which do
not affect the module hierarchy, and where all non-private items are
automatically re-exported by the parent module. This has some of the flavor of
the previous proposal, but with a more opt-in form.
Wrapping up
Just like last time around, please take this proposal as charting out one more plausible point in the design space, and see whether there are big-picture aspects to like or dislike, or ideas that might have promise. I’m looking forward to your feedback!