NPM Sucks, BowerJS Too, Here’s Why

It’s been years since I’ve ranted on this blog, and my apologies to you, I know you’ve been missing it.  I have many blog postings from the last several years and I may migrate hem back onto this site, I’m still deciding.  In the meantime I wanted to rant about some tech stuff.

It’d be good if there were a canonical ‘Why npm sucks’ article, like the ‘fractal of bad design’ one for PHP.” – https://news.ycombinator.com/item?id=8300438

Therefore I just couldn’t resist.  Let’s start with the disclaimers.  I make no “canonical” or “fractal” claims here.  I come from the C, C++, Java, and PHP development worlds to NodeJS.  I’ve been using client-side JavaScript for almost 20 years and I understand it well.  The first time I ever worked with server-side JavaScript was a trivial maintenance job on a Netscape Server circa 2003, so I’m excited about using JavaScript today.  It’s not like I’m familiar with (all) JavaScript engine intrinsics but I can be if called upon.  I’m still agnostic about software engineering tools, so it’s not like I love JavaScript.

NPM and Bower suck for different reasons.  BowerJS is designed to (mis)use SCCS as a dependency management repository.  NPM is prone to unwieldy folder structures.  The above link also mentions that nested dependencies are a problem, and they are an incredibly mundane and troublesome issue.  I’m not actually keeping score in this article since that’s not my intent.  I just hope for better open-source systems.

Nested Dependencies

Since I admit there are at least two problems, let’s start with the latter, nested dependencies.  It’s indirectly related to the first and not-so-subtle.  NPM uses a non-parsimonious approach to storing dependencies in local folders.   By convention we are talking about folders named “node_modules” nested inside of the “node_modules” directory in a standard NodeJS/NPM project.  This issue seem to affect Bower very much, except that Bower can be used to link to dependencies that then use NPM (for their dependencies).

I recently ranted on another site that adding two (2) dependencies to a NodeJS project’s package.json declaration resulted in at least 934 dependency folders in ./node_modules.  Now, it’s true that one of those dependencies is HapiJS, which is a complete web application framework, and is indeed lightweight.  Still, most software (that end-users want/need) use some type of application framework like this.  In other words, most NodeJS projects should have more than 934 dependency folders in ./node_modules.  I mean, I suppose we could somehow aggregate actual statistics from Github, but we haven’t done that here.

In a given NodeJS project that uses NPM there are many redundant nested dependencies.  For example, if HapiJS v15 uses Lodash; and Sequelize ODM uses the same version of Lodash; then you’ll have two copies of Lodash nested under ./node_modules.  The only case where one copy of Lodash will be downloaded is when Lodash is a parent dependency for the project – in other words your project depends on Lodash in addition to HapiJS and Sequelize.

In practice, there are usually version mismatches right down to [my-favorite-dependency] v1.2.3 and [my-favorite-depenency] v1.2.2.  There are many redundancies and version mismatches of sub-dependencies in any given NodeJS ./node_modules folder.  However they can be viewed with a single NPM command and managed (npm ls).

Another problem with NPM folder structure is its design.  It’s not clear why sub-dependencies are stored in node_modules folders underneath their dependency.  In short, there is a project-local view towards default NPM behavior.  NPM creates a node_modules folder in the directory that declares a package.json.  Within node_modules, and as I attempted to describe above, most dependencies have their own dependencies.  In many cases (unless your project declares the same top-level dep.) the project will end up containing nested node_modules/**/node_modules folders, since sub-dependencies can nest their own deps.

The motivation for NPM’s non-parsimonious nesting of node_modules/**/node_modules is not explained on the NPM.org website.  It’s not clear why NPM doesn’t simply store all dependencies at the top-level.  The @scope attribute for deps. doesn’t seem to explain it either.  Additionally, the motivation for the folder structure resulting from @scope is unclear.  It’s unclear why @scope is not given an implied (default) value and thus organized under node_modules.

NPM’s non-parsimonious view of HDD storage while claiming the opposite is more than annoying, it can become an obstacle to productivity.  In theory storage space  isn’t a problem because storage space is inexpensive and HDDs, computers, and network connections are fast.  In the real-world having so many dependencies is a problem in two cases:  A)  whenever the network is slow (aka not at the office); B) whenever projects must be copied, transferred, cloned, re-initialized, and so-forth.  Total size of some NodeJS projects + dependencies starts to rival small VMs at 2GB.  In practice, I’ve had to wire computers together on a gigabit switch to transfer repositories of NodeJS code because that’s simply the only reasonable way to avoid potentially 24-48 hours of downtime.  The last time I had to do that when re-creating (even larger) repositories of Java projects was…never!

Nevertheless, this problem of large numbers of dependency folders is not worthy of a “sucks” label.  A feature request might be better.  There’s a switch in NPM to use a global repository.  However from the NPM perspective this is intended for installing system-wide applications along with man-pages (documentation).  Using it doesn’t abate nested dependencies but it may help with transferring multiple projects in distributed team-environments.  NPM does not have notion of “user-local” dependencies, although it does have a user-local cache.  NPM only has project-local dependencies and global dependencies, with possibly redundant nested folders.  You can learn more about NPM folder structure at https://docs.npmjs.com/files/folders.

Tools could be built to handle the issues around version redundancy and mismatch – and there are straightforward solutions.  The nested dependency problem can be hammered down and it has been with other dependency management tools.

SCCS as Dependency Repository

The problem with Bower, and fortunately not NPM, and another reason Bower deserves the “suck” label in 2016, is that it uses source-code control systems (SCCS), in this case primarily Git (therefore Github.com by convention) as a dependency repository.  This is a bad design decision and it could be non-recoverable, depending on factors potentially beyond the control of these tools.

Indeed, it’s taken me a long time using Bower, trusting it along the way, to realize that it’s designed on a fundamentally flawed premise:  that SCCS repositories can double as dependency repositories.  They can in theory, but really they can’t in practice.

Bower uses Git tags to identify versions of packages declared in their respective configuration files.  The git tags are stored in the SCCS (git) repo.  As a result Bower doesn’t work in some situations where a “.git” folder is not found in the working directory tree.

A Misuse of SCCS

I think it’s pretty straightforward to explain why SCCS is not intended for dependency management.  If it was, then we should all stop maintaining files like package.json, bower.json, and metadata.rb and the version attributes contained within.  Not only is SCCS intended for code-management, version control, and shared/distributed development teams, it is not intended for repository management.  SCCS tags and branches are not a good way to identify dependencies because:

Metadata that uniquely identifies a codebase should reside directly in the codebase.

I’m pretty sure somebody else coined that phrase, or something just like it, and not myself.  Version information (in pakcage.json, bower.json, composer.json, pom.xml, etc.) is codebase-metadata.  Ideally we should be able to view and update version information right there in our IDE.  Otherwise we have to use another tool to perform this necessary function.  Humans write code and most people agree that requiring fewer tools to get the same job done is preferable to requiring more tools.

When we use SCCS tags and branches on files that have unique version identifiers, we are creating metadata about metadata.

Again, version information is metadata and it’s conventionally stored in a file like package.json.  Now when we tag package.json with a git-tag like “4.3.2-npm” we’ve created metadata about metadata.  On face value it seems like a bad idea to me to use metadata about metadata for anything as important as dependency management and version resolution.  Yet that’s what NPM and BowerJS do.  Again, I’m not going to get into the trenches of why metadata-about-metadata should not be used this way.  I’ll leave it for another article or for you.  Besides, I think there are better ways.

What’s subtly interesting to me about using SCCS as a dependency repository is that by convention, and in reality, most of the packages NPM and BowerJS retrieve are hosted on Github.com.  Effectively, Github.com has become a Public Dependency Repository.  I wonder if Github.com “knows” that or cares?  I don’t think NPM and BowerJS are in violation of Github.com’s TOS, but I think they’re venturing into a fuzzy area.  NPM’s architecture places a defacto burden on Github.com and public SCCS systems like it.

Moreover, NPM’s architecture leads to distribution of code that’s reminiscent of Github.com’s “Github Pages” feature.  To explain, Github Pages requires a repository to create a “gh-pages” branch that contains content specific to Github Pages.  Since it’s a static website intended to highlight the repository the “gh-pages” branch has its own development roadmap.  In practice something similar (though less pronounced) happens in the NPM ecosystem:  *-npm branches are sometimes (often?) different from the main branch.  Lodash, for example, has not-exactly-the-same codebase in the tags used to distribute through NPM, for other NPM-like tools, and for pure JS distros.

How To Recover

BowerJS should adopt a centralized dependency repository/registry.  Using SCCS is unacceptable.  If it wants to use SCCS as its dependency repository then it should use a gh-pages like approach.

For NPM it should actually view redundant nested dependencies and large numbers of small files as potentially unacceptable.  NPM already has a scalable approach to it’s registry/repository management.  Its features a more a “nice-to-have”.

MavenJS

Really I’d rather have a MavenJS tool than NPM and/or BowerJS and/or a new-fangled SCCS-extension.  MavenJS would handle dependency management like Maven for Java.  MavenJS would just concern itself with dependency management, would replace Grunt, and would not need to continue the idea of “build goals”.  If we designed, contributed, or evangelized for NPM and/or BowerJS over the last 7+ years, then our work is cut out.  Fortunately we have a model to work from.  Keep in mind that others have tried to improve upon that model, so fortunately we can iterate!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s