Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pluralize('datum') -> data but I wish 'datums' #32

Open
pbrod opened this issue Mar 1, 2019 · 5 comments
Open

pluralize('datum') -> data but I wish 'datums' #32

pbrod opened this issue Mar 1, 2019 · 5 comments

Comments

@pbrod
Copy link

pbrod commented Mar 1, 2019

>>> import inflection
>>> inflection.pluralize('datum')
'data'

>>> inflection.pluralize(inflection.singularize('datum'))
'data'

>>> inflection.singularize('datum')
'datum'
@Rocamonde
Copy link

Yes, but that is wrong.

@pbrod
Copy link
Author

pbrod commented Aug 14, 2019

Well that is debatable according to https://nxg.me.uk/note/2005/singular-data/ and https://en.wikipedia.org/wiki/Data_(word).

In precise geodesy, for example, a ‘datum’ is the term for one of several models of the shape of the earth, relative to which the heights of mountains and the positions of telescopes are measured. This usage, which has nothing to do with our atom of data, has the perfectly regular plural ‘datums’.

@pbrod
Copy link
Author

pbrod commented Aug 14, 2019

According to https://en.wikipedia.org/wiki/Data_(word) data is most often used as a singular mass noun in everyday usage. Some use it either in the singular or plural. The Associated Press style guide classifies data as a collective noun that takes the singular when treated as a unit but the plural when referring to individual items (e.g., "The data is sound" and "The data have been carefully collected").

In scientific writing data is often treated as a plural, as in "These data do not support the conclusions", but the word is also used as a singular mass entity like information, for instance in computing and related disciplines. British usage now widely accepts treating data as singular in standard English, including everyday newspaper usage.

@Rocamonde
Copy link

Rocamonde commented Aug 14, 2019

I agree all your comments, except for the one mentioning the original topic, i.e “datums”, as I never heard of it, so I can’t really speak of it. To my understanding, the right way of pluralising such latin terms is as “datum”-“data”, just like “erratum”-“errata”, or many others. If you can provide a reference where that word can be found, I’d appreciate it.

Regardless, the current implementation only supports one resultant term, so the default one should be, in my opinion, “data”. Figuring out what word is appropriate would require context processing and probably AI-like tools, which are, to my understanding, outside of the scope of this package.

@pbrod
Copy link
Author

pbrod commented Aug 14, 2019

My point is that "data" as a singular form is far more common according to https://en.wikipedia.org/wiki/Data_(word) than the latin word "datum". In English, the word datum is still used in the general sense of "an item given", but is now-rarely-used. Any measurement or result is a datum, though data point is now far more common.

If you're writing for an academic audience, particularly in the sciences, "data" takes a plural verb.
For example: "The data are correct".
But most people treat 'data' as a singular noun, especially when talking about computers etc.
For example: "The data is being transferred from my computer to yours".

And I have to be honest, I've never heard anyone ask for a datum.

I think it is more usual to talk about 'datum' in geodesy. And in this context the plural is ‘datums’.

That is why I think inflection.pluralize('datum') should return "datums",
and that inflection.pluralize('data') and inflection.singularize('data') both should return "data".

I think this convention is more practical in use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants