Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📎 Monorepo support #2228

Open
Tracked by #3727
ematipico opened this issue Mar 28, 2024 · 21 comments
Open
Tracked by #3727

📎 Monorepo support #2228

ematipico opened this issue Mar 28, 2024 · 21 comments
Labels
A-CLI Area: CLI A-Core Area: core Fund S-Enhancement Status: Improve an existing feature S-Feature Status: new feature to implement

Comments

@ematipico
Copy link
Member

ematipico commented Mar 28, 2024

Description

This task is related to #1573 , but it has slightly different requirements and use cases, although we could potentially solve both with the same solution.

Background

Monorepo (package manager workspaces) are very common in the web ecosystem, and they come in different flavours and expectations.

However, the common denominator is the following: a root configuration file, and each package in the monorepo extends the root configuration.

Flavours:

  1. one command at the root of the monorepo, then the CLI/LSP does the job of understanding which configuration file that is closer to the file is processing, and it applies the changes accordingly. This is very common for lint/format/testing tools.
  2. multiple commands, each command is defined inside a package of the monorepo. Then users usually use other tools to run these commands at once, e.g. pnpm run --filter, turborepo, etc. This is very common for building tools such as bundlers, compilers, and doc generation.

The Biome case

Biome is a particular case here, because, even though it is a linter/formatter, in the future Biome will also transform/compile users's code, so it requires awareness of the manifest file and the dependency graph. Which means, while it makes sense to run biome check at the root of the monorepo, what about a - future - biome compile command?

We will have to untangle this. I am also happy to force users to set up Biome in one way in their monorepo.

CLI vs LSP = Workspace

The solution should lie in the Workspace. The Workspace is what LSP and CLI both share, meaning that both of them hold an instance of it, and they use it to pull data when they need.

CLI

The CLI usually works from up to bottom, it scans and handles the files that are closest to the working directorey, and eventually handles the farthest files from the working directory. Although, this isn't always true, because for each directory AND file, we always span a thread, which means that eventually all jobs go their own way.

We would need to change the strategy of our CLI here, in way where we would need to read and resolve possible biome.json files in each folder.

This could be potentially solved with a new workspace configuration, that would allow Biome to resolve the packages before hand.

LSP

The LSP has a different problem to solve. Biome must apply the correct configuration for the opened file when jumping from one file to another.

Workspace

The reason why I think the solution lies in the Workspace, is because both CLI and LSP have to do a very similar job: when handling a file, we should apply the configuration that belongs to that file. The Biome Workspace would potentially store all those configurations, and then the CLI and LSP could:

  • signal the Workspace to use the correct configuration, e.g. Workspace::swap_config
  • or the Workpace could that by checking the path of the file/folder, and resolving automatically the configuration to use

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue. Use the  👍 emoji to upvote it.
  • Maintainers and Core contributors will work on this issue.
  • If you'd like to see this feature happening sooner rather than later, consider funding this issue
Fund with Polar
@ematipico ematipico added A-CLI Area: CLI S-Enhancement Status: Improve an existing feature S-Feature Status: new feature to implement A-Core Area: core labels Mar 28, 2024
@polar-sh polar-sh bot added the Fund label Mar 28, 2024
@NyanHelsing
Copy link

NyanHelsing commented Mar 30, 2024

wouldn't it be easier to run biome in each package and run it in parallel with like a pnpm (or whatever your'e using for workspaces) script?

That affords a lot of flexibility considering in a monorepo you may not have all packages linted with the same config or even the same tools, eg you have an older package that is using eslint+prettier still that package can have an npm script format that does it with those tools, and then pnpm format can do biome in another package, then the package.json at the workspace root can do https://pnpm.io/cli/run#--recursive--r

To double down on this; unix philosophy says do one thing and one thing well. Biome is amazing at fixing style issues and smells in ur code. It's not a workspaces tool; theres already a bunch of tools that are specialized in that, eg lerna, pnpm workspaces, yarn workspaces, rush... biome should work with these tools not against them.

@arendjr
Copy link
Contributor

arendjr commented Apr 2, 2024

@NyanHelsing I think you make a convincing argument, but there’s a few practical issues we keep running into with the current approach (which aligns with your suggestion):

  • Users expect to open a monorepo in their editor and have it “just work” with nested packages that may or may not have customized configs. If we don’t add monorepo support to Biome we would put the burden on our extension developers to fix this issue.
  • Competing tools such as ESLint have created an environment where users expect nested configuration files to just work. This is not limited to monorepo setups, but it’s commonly used there. (Interestingly, ESLint seems to want to migrate away from this with their new flat configs, but IMO it’s a gamble to see how users will receive this change).
  • Not a current concern, but an anticipated one: When we implement type inference or bundling, we may need to pull information from other packages. This typically includes third-party (NPM) packages, but in a monorepo may also include first-party packages stored within the same repositories. When we need to do this, we need to have an understanding of all the packages within the repository and their (inter)dependencies. In other words, we’ll need a holistic understanding of the monorepo anyway.

So while I agree with your argument in principle, I’m afraid there’s too many practical downsides for Biome to keep it limited to a per-package scope.

@arendjr
Copy link
Contributor

arendjr commented Apr 2, 2024

@ematipico When you mention the workspace::swap_config command, what would that look like? I’m afraid such a command would be a source of race conditions if multiple other threads or even processes are concurrently executing commands against the Workspace. But maybe there’s a part of it I’m overlooking.

@ematipico
Copy link
Member Author

@NyanHelsing

I also share your vision, and I wouldn't have created this issue if it wasn't for @arendjr's points in #2228 (comment). The web ecosystem has matured in the last few years, although we have yet to make a standard on managing these big monorepos, which means that users have different flavours.

For example, some projects out there would expect a top-level CLI command called test, which would run all the tests.

I would suggest the same as you did: tell the users to use a package manager to create multiple scripts, one in each monorepo. I just wish there was a better way to do this, maybe by pushing for a proper use of workspaces.


@arendjr

Yeah workspace::swap_config, in a CLI environment, might not be the best. I haven't designed a solution yet, my was more a suggestion/idea.

@NyanHelsing
Copy link

@NyanHelsing I think you make a convincing argument, but there’s a few practical issues we keep running into with the current approach (which aligns with your suggestion):

* Users expect to open a monorepo in their editor and have it “just work” with nested packages that may or may not have customized configs. If we don’t add monorepo support to Biome we would put the burden on our extension developers to fix this issue.

i'm confident theis can ameliorated with a top-level section in the documentation: (demonstrated here with pnpm)

Biome and Workspaces

Biome works great with workspaces; install it in each of the packages:

pnpm -r add @biomejs/biome

Biome can now be run in each package.

pnpm -r exec biome
* Competing tools such as ESLint have created an environment where users expect nested configuration files to just work. This is not limited to monorepo setups, but it’s commonly used there. (Interestingly, ESLint seems to want to migrate away from this with their new flat configs, but IMO it’s a gamble to see how users will receive this change).

if this is the desire it isn't obvious that is should be dependent on workspaces, this is just about putting multiple biome.json in any folder structure and expecting them to work?

* Not a current concern, but an anticipated one: When we implement type inference or bundling, we may need to pull information from other packages. This typically includes third-party (NPM) packages, but in a monorepo may also include first-party packages stored within the same repositories. When we need to do this, we need to have an understanding of all the packages within the repository and their (inter)dependencies. In other words, we’ll need a holistic understanding of the monorepo anyway.

it still sounds like this is saying it isn't needed for linting or formatting.

In the lint/format space, biome files a decidedly lint/formatting-shaped hole. Theres jslint/jshint which nobody should use because they're old and slow and dont work on modern code, and there's prettier/eslint which are more modern but still slow. Biome is needed here.

Since there are already lots of tools that do bundling we'd expect lots of users to continue to use one of the many bundlers (rollup, rspack, webpack, swcpack, turbopack, even grunt) that are all very good (except maybe grunt which is old, webpack which is slow. A biome bundler (AFAIK) doesn't exist yet; assuming it already exists and there is some space it fills that isn't already occupied by one of the previous bundlers; I could imagine users being a little upset at the prospect of having many bundlers installed in their project that might not even be used.

It's strongly encouraged to make a bundler a part of a dedicated and separate install (or better, provide tight integration with the existing bundlerts) rather than bloating the tool that creates a hygenic environment for us.

So while I agree with your argument in principle, I’m afraid there’s too many practical downsides for Biome to keep it limited to a per-package scope.

@arendjr
Copy link
Contributor

arendjr commented Apr 2, 2024

i'm confident theis can ameliorated with a top-level section in the documentation: (demonstrated here with pnpm)

I'm sorry, but the solution you offered has nothing to do with the problem I highlighted. The problem is that when users open a repository in their IDE, the extension currently only supports using the top-level biome.json for all files in that repository. If the repository is a monorepo, it won't discover the nested the biome.json files and thus apply the wrong configuration on files.

It's strongly encouraged to make a bundler a part of a dedicated and separate install (or better, provide tight integration with the existing bundlerts) rather than bloating the tool that creates a hygenic environment for us.

Please look again at the tagline for Biome: One toolchain for your web project

It is the project's explicit goal to create a single unified tool that can cover several needs.

@NyanHelsing
Copy link

i'm confident theis can ameliorated with a top-level section in the documentation: (demonstrated here with pnpm)

I'm sorry, but the solution you offered has nothing to do with the problem I highlighted. The problem is that when users open a repository in their IDE, the extension currently only supports using the top-level biome.json for all files in that repository. If the repository is a monorepo, it won't discover the nested the biome.json files and thus apply the wrong configuration on files.

This sounds like a problem with the ide plugins then? not with biome per se; the plugin just needs to have the command to be configurable; then document it at a top level so it's easily seen by folks.

It's strongly encouraged to make a bundler a part of a dedicated and separate install (or better, provide tight integration with the existing bundlerts) rather than bloating the tool that creates a hygenic environment for us.

Please look again at the tagline for Biome: One toolchain for your web project

It is the project's explicit goal to create a single unified tool that can cover several needs.

A tool chain contains multiple tools.

@arendjr
Copy link
Contributor

arendjr commented Apr 2, 2024

This sounds like a problem with the ide plugins then? not with biome per se; the plugin just needs to have the command to be configurable; then document it at a top level so it's easily seen by folks.

It's not a matter of documentation or configuring the extension. Extensions would need the ability to switch between configurations on the fly, depending on which file is currently opened. Please see this comment for why we don't want to implement such logic purely within the extension: #1573 (comment)

A tool chain contains multiple tools.

Sorry, but I don't see this discussion heading in a productive direction. Feel free to disagree, but Biome has been clear in its approach: It provides multiple tools within a single command/binary/toolchain.

@anthony-hayes
Copy link

anthony-hayes commented Apr 17, 2024

" in the future Biome will also transform/compile users's code, so it requires awareness of the manifest file and the dependency graph"

Is there any details on these compilation plans? For context, right now I'm using Biome in a monorepo, entirely for its prettier-compliant formatter.

@arendjr
Copy link
Contributor

arendjr commented Apr 18, 2024

@anthony-hayes It’s briefly mentioned in the roadmap: https://biomejs.dev/blog/roadmap-2024/#transformations

There’s also already a biome_js_transformation crate and I think the TS => JS transform is already implemented. Additionally, we’re working on implementing GritQL plugins which will eventually allow user-defined transformations as well. So bits and pieces are being worked on, but I don’t think the compiler functionality itself is a concrete focus right now.

@Faithfinder
Copy link

I wouldn't strictly call "Central file" common denominator.

There's a monorepo tool called Rush, and it basically does things in a way opposite to more popular tools. And while it has less downloads than turbo or nx, it is used for large-scale monorepos at companies like TikTok, Microsoft, HBO.

Rush's way is basically to isolate package from each other for portability. There's no root config files, each dependency is explicitly listed in each package.json, common configs are distributed as packages themselves.

@omairvaiyani
Copy link

@Faithfinder Maybe I'm missing the crux here. Whilst Rush doesn't have a central package.json file or equivalent, it does have mono-repo level configuration that can alter the behaviour of commands ran within individual projects.

@Faithfinder
Copy link

Faithfinder commented May 2, 2024

@Faithfinder Maybe I'm missing the crux here. Whilst Rush doesn't have a central package.json file or equivalent, it does have mono-repo level configuration that can alter the behaviour of commands ran within individual projects.

Well, my main point was that a lot of tools don't work on Rush because they expect a root level package.json or a lock file. Other tools are using node_modules as a build target by default (prisma, panda CSS). Either approach doesn't work well in Rush's case.

And I do believe Rush gets many things right.

Just trying to put other approaches on Biome's radar before they make a design decision that's hard to back out of

@ematipico
Copy link
Member Author

@Faithfinder thank you for providing a different example. My assumption was based on my working experience and the projects that I've seen around. I know Rush, and I wanted to use it; however, when I was evaluating the project, I understood that it still needed a package manager under the hoods. Of course, I might be wrong.

@Faithfinder
Copy link

Oh, it uses a package manager under the hood, but it wraps it and imposes additional restrictions on top. At this point it would be easier for you to try it. Their own repo is a moderately sized Rush repo, works well as a study https://github.com/microsoft/rushstack

@anthonyshew
Copy link
Contributor

anthonyshew commented Aug 26, 2024

Hoping to throw in my two cents! I couldn't find anywhere where design was being discussed and this issue appears to be doing so.

(Proposal is at the bottom if you'd like to jump straight there. This is fairly long-winded as I'd like to share how I'm arriving at this proposal.)

Some background:

  • I work on JavaScript/TypeScript monorepos a lot, whether they be my own, my work's, or ones around the community, scaling from small to gigantic. This breadth of experience has helped me form some general opinions that I feel might help inform the design for Biome's monorepo support.
  • I'm in the middle of replacing ESLint in our monorepos at Vercel as much as I reasonably can. We, of course, expect that we won't get our usages of ESLint mapped 1:1 with today's current Biome, but are betting on Biome's future. That said, our largest pain is not being able to define per-workspace configurations.
    • We're trying to get the recommended ruleset activated and some rules produce 1000+ violations. We can't reasonably fix these all in one PR, so we'd like to clean them up with a PR per package (or even more granular, if needed).
    • We're finding it difficult to break up the enabling of rules per package in a reasonable way. We've settled on using overrides for our migration work, but we're not convinced overrides is a scalable solution for setting up different, finalized configurations for individual packages in the Workspace for the long term.

There are various designs that I've seen used around the ecosystem for handling Workspaces and I think the community has learned a lot on how to handle configurations:

ESLint 👎

ESLint has learned that cascading configuration by default is problematic. It hurts performance as configuration discovery takes too much time and figuring out which configuration applies to the file you're in can be difficult. This has led to their Flat Configurations. Before Flat Configs, users could create ESLint configuration in any arbitrary place within the repo and ESLint would merge everything it can find from the file's location to the location where the ESLint CLI was executed from. This was bad for performance, and made it difficult to know which configuration was being applied in which files.

Flat Config was meant to make configuring ESlint easier - but it's still unclear what users are meant to do and ESLint doesn't know what they recommend for Workspaces either. For what it's worth, it's clear to me after working with Flat Configs in multiple repos that each package should get its own configuration file, with a separate configuration in the root specifically, for the Workspace root, if desired. ESLint is then run in each package individually and in the root separately. This brings ESLint closer to TypeScripts more favorable monorepo modeling.

TypeScript 👍

Typescript allows users to define configuration in the Workspace root and in packages. A package in a workspace uses either the root configuration, or the one defined in the package, but cannot use both - unless the user explicitly uses the extends key to reference a root configuration. Conventionally, the community has settled on only defining tsconfig.jsons at the root of packages, and this works quite well.

This strategy has some significant advantages:

  • It's clear for users where configuration is defined for a package, given it can only be from one of two places. This makes it easy to figure out which configuration applies to a package both for users and tools.
  • Configurations do not cascade by default. Inheriting configuration from somewhere else is explicit with the extends key, making it apparent when configuration is being extended.
  • The extends key supports specifying modules using Node.js conventions. This is much more robust than file paths, allowing monorepos to create a "Biome configuration" package to install in the rest of the internal packages for the monorepo.
Turborepo Package Configurations 👍

Turborepo's Package Configurations follow the same general model as TypeScript, using values from the root configuration and merging in keys found in the Package Configuration. The consistencies between the two approaches have proven to be a strength for users of these tools, in my experience, both when used together and separately.

This configuration has a key difference when compared to TypeScript's extending. Turborepo (currently) only allows for extending from the root configuration using a "//" token in the extends key. This could be useful for Biome, in that users could choose to define a base configuration at the root and reference it with a special microsyntax, which you'll see in the incoming proposal.

There are, of course, many more tools I could add to this list, but things seem to be converging around TypeScript's monorepo-ing model, which seems to work well.

Perceived shortcomings in today's Biome

  • overrides at the Workspace root being the primary method for defining per-package configuration is incompatible with the general model of JavaScript Workspaces. The Workspace root should be considered a dependency of all packages, since it can affect behavior in the rest of the repository. Using overrides as the primary source of per-package configuration doesn't jive with the ecosystem's model of how Workspaces work.
    • As an example, Turborepo would miss cache if a root biome.json changed, even if the edit to overrides is only meant for one package. This would mean a cache miss for all lints, tests, builds, etc. - despite the change only being meaningful for one package.
  • extends only supports paths, rather than being able to reference packages. Paths are brittle, and using the package manager to Biome's advantage would unlock tons of simple yet meaningful flexibility for Biome configurations.

Proposal

  • biome.json files can only be found in two places: at the root of the repo and at the root of a package. This keeps the clarity for users that TypeScript has found, and makes it so Biome has a negligible performance impact for configuration file discovery.
    • Some keys can be disallowed in a package's configuration, like vcs, since they don't make sense(?) defined in a package.
  • Users have the option to cascade, merge, or completely partition their configuration in packages:
    • "extends": "//" (or some other special microsyntax), will use the root configuration, merging/getting overwritten by any configuration found in the package's configuration.
    • "extends": "@repo/biome-config" will use the package manager to resolve to a package, merging/getting overwritten by any configuration found in the package's configuration.
    • "extends": "../../some-path" will allow for path-based extending, merging/getting overwritten by any configuration found in the package's configuration.
    • When no extends key is present in a package's configuration, it will not cascade or merge any other configuration files. The configuration for that package will be considered standalone.
  • Because Biome is so ridiculously fast, Biome can still be executed from the root of the repository as a single CLI invocation, making it simple and fast to run checks. The CLI will need to understand where package boundaries are in a repo to know when to check for a nested configuration. This is done by checking the workspaces key in package.json or pnpm-workspace.yaml. (I don't know if this is breaking new ground for Biome's understanding of a repo, but the original issue makes it sounds like this is desired long term anyway?)

Open questions/uncertainties

  • I'm not sure if this approach has any disadvantages when it comes to the LSP. I'm seeing discussion about the LSP above and can't quite tell if it's a solved problem or not, nor if this proposal clashes with what's possible in the LSP today.
    • A specific limitation of the TypeScript LSP is that it can only use one version of TypeScript in a repository at once, even if the repo legitimately uses multiple versions of TypeScript per-package. I'm not sure if this is a TypeScript quirk or a limitation of the general design that I'll be inheriting by being TypeScript-inspired.
  • I'm not certain that a singular extends key would be able to be flexible enough to know when it's referencing a specialized microsyntax, a package manager package, or a file system path. It's possible this key could be expanded to variations (extendsPackage, extendsPath, etc.) and throw if more than one is used.

Whew! That's a lot of info. Feel free to ask any questions you may have. The biggest thing I'd like to express is that the ecosystem has been closing in on standards around how Workspaces are meant to work, and the above proposal would fall into line with those expectations. The more we can do to have the best tools in the ecosystem work together, the better!

@onlywei
Copy link

onlywei commented Oct 12, 2024

I'm looking forward to this feature being implemented as I am constantly working with monorepos.

While I do value having some freedom between the different packages within my monorepos, I don't consider the ability to install different lint tools to be particularly valuable.

I've had multiple monorepos where we chose to allow each package within the monorepo to set up eslint on their own. The result was:

  • We had to manage version drift between different versions of eslint
  • Eslint always comes with a gigantic list of other plugins that we had to install in many package.json files, which we also had to manage the versions of
  • Each package.json had to have its own "lint" script, which was pure copy-paste from file to file

None of the above bullet points made for a pleasant development experience at all. I understand that Biome doesn't have plugins yet, but based on the roadmap I think it eventually will. I would much prefer to install Biome (and any future plugins) in a central location of the monorepo and only do it once. I only want to have one "lint" script that can either lint the entire repo or target specific folders using git diff in CI.

@GabenGar
Copy link
Contributor

"One lint script" quickly falls apart the moment you have codebases with different global scopes (think nodejs API backend, browser frontend and browser extension) and have to have separate configs anyway. I'd say the biggest hurdle for biome adoption is it's hard to cram into a subpackage in a project (with IDE support), and therefore "preview" its functionality in an actual production setup. Instead it requires a full commitment from get go, which isn't something people would do when they have working ESLint + Prettier setup already.
ESLint + Prettier definitely have a W in this regard, even if setting them up in each workspace is annoying.

@anthonyshew
Copy link
Contributor

anthonyshew commented Oct 14, 2024

@GabenGar, can you describe further how global scopes of your applications affect a singular lint script being used? From what you've mentioned, it sounds like you have globals that apply to specific packages in your Workspace, given their different contexts.

Today, one could use javascript.globals to do this with a singular biome.json and an overrides array that targets globs to denote packages of different types (that need different globals). That single invocation can use different JavaScript globals in different places, and that wouldn't change based on my or the other proposals written above.

Notably, this is the similar to what some folks are doing now with ESLint's Flat Config. One configuration at the root of the Workspace, one CLI invocation from the root, but the Flat Config has many global contexts it takes care of.

Hopefully both of these examples demonstrate my larger point: From the Workspace's perspective, application globals are not global at all, since they only apply within that package's context. Accordingly, a single, root CLI invocation can be designed with the cleverness to only use the right globals at the right times in the right places.

@onlywei
Copy link

onlywei commented Oct 15, 2024

"One lint script" quickly falls apart the moment you have codebases with different global scopes (think nodejs API backend, browser frontend and browser extension) and have to have separate configs anyway. I'd say the biggest hurdle for biome adoption is it's hard to cram into a subpackage in a project (with IDE support), and therefore "preview" its functionality in an actual production setup. Instead it requires a full commitment from get go, which isn't something people would do when they have working ESLint + Prettier setup already.

For clarity, "one lint script" doesn't mean "one lint config file". I can still put different config files in different folders and sub folders. The script picks them up and always applies the most local config.

@ematipico
Copy link
Member Author

We reached the pledged amount. I will start working on this feature starting next year :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-CLI Area: CLI A-Core Area: core Fund S-Enhancement Status: Improve an existing feature S-Feature Status: new feature to implement
Projects
None yet
Development

No branches or pull requests

9 participants