Glossary

We use a number of terms across our repositories and in our communication. This glossary clearly defines those terms so there is no ambiguity.

Seedcase Project

Always in proper case. Use this to refer to the project as a whole, including underpinning ideas and philosophy.

Seedcase

Limit the bare use of Seedcase as a noun to cases where the project as a whole is referred to in very general terms (e.g. the idea of Seedcase initially formed…).

Use Seedcase as a qualifier to restrict the meaning of a noun to a project-specific context (e.g. Seedcase team, Seedcase users, Seedcase software).

Seedcase software

Use these to refer to the software deliverables of the project collectively. Choose best fit based on context. Note that, in general, these phrases don’t take the definite article (i.e. the Seedcase software).

Seedcase ecosystem

Use this to refer to the software deliverables which work together to implement core project functionalities as a single conceptual unit.

Data

Data can mean any piece of information that someone would like to use to answer questions. What is considered data (and metadata) is highly dependent on people and what they intend to do with the information that is collected. For us, data is any information collected for the purposes of doing analyses on them to answer questions. An example might be data collected from people participating in a study on health and disease.

Data Package (title case)

When we use the proper name “Data Package” (or more explicitly “Data Package spec”), we are referring to the Data Package specification. This is a specification for describing a collection of connected data and metadata, documented within the datapackage.json file. When we say “Data Package” (title case), we are using it in the context of this specification, not in the context of a general “package” or organisation of data and metadata.

We also don’t refer to a folder that contains a datapackage.json file as a “Data Package” as we are not referring to the specification in that case. Instead, we might refer to it as a “data package” (lowercase) as we are referring to a set of files and folders that contains data and metadata and happens to use the Data Package specification within the datapackage.json file. See data package (lowercase) below for more on that.

data package (lowercase)

The term “package” is a general term that has been used in many different contexts to refer to any bundling of things together to make them easier to manage, distribute, and (re)use. So appending “data” to “package” is a common way of referring to any bundling of data and is not unique to “Data Package” (title case) as defined in the Data Package specification. Unfortunately, this can cause some confusion: “data package” (lowercase) is sometimes and sometimes not used to mean a “Data Package” (title case) and it might not be clear from the context what is being referred to.

For example, “data package” can refer to a set of data and metadata organised as an R package. There is even an R package called DataPackageR that sets up a project with an R package structure that you can use to organise data and make it easier to distribute and reuse. In this case, this is not a Data Package (title case) but a “data package” (lowercase).

For us, “data package” (lowercase) is a general term we use to refer to any bundle or collection of data and, importantly, their metadata. A “data package” (lowercase) may or may not use the Data Package specification.

When we use “data package”, we generally use it to directly refer to the bundle of related data and metadata that we work on, rather than to any formal specification.

Data Resource (title case)

When we use the formal noun “Data Resource”, we are referring to the Data Package specification and how it defines a “data resource” (lowercase, see our “data resource” entry). A Data Resource (title case) is a specific entity within the Data Package spec that has a defined structure and properties that are described in the resources section of the datapackage.json file. When we use “Data Resource” (title case), we are using it in the context of the specification, not in the context of a general resource of data. However, we tend to avoid using the formal noun “Data Resource” (title case) as it tends to be clearer to say “Data Package” or “Data Package spec”. See our use of “data resource” (lowercase) for an explanation of that term.

data resource (lowercase)

The term “data resource” is not a commonly used word and can mean many different things to different groups of people. It could mean a resource of data, like a library is a resource for books or like the IT department is a resource for IT support within an organization. It could also mean a resource, like wood is when making a chair, that is used to make something else with.

For us, “data resource” (lowercase) or simply “resource” is a general term we use to refer to any single set of related data (but not a bundle of data and metadata). A “data resource” may or may not have also been cleaned and tidied. A resource does not need to have metadata attached to it. It could be a single file or a set of files that all contain the same type of data.

For example, data collected from several people using continuous glucose monitors, which is what people with type 1 diabetes use, would be a data resource. Even though this data might be across several files, one for each person and likely for each day the monitor was used, it is still the same type of data, so is considered one collected resource.

We avoid the term “data resource” (lowercase) as it isn’t a clearly defined term and because other terms exist that are widely used and more precise. For example, a “data file” or “dataset” is a more precise term to refer to a single file or set of files that contain data.

Metadata

Metadata that describes the entire data package and each of the data resources within it. At the package level, the properties include the package name and description, contributors, licenses, and more. At the resource level, they describe attributes such as the resource name, description, schema, and data fields. All properties are stored in datapackage.json in the root directory of the package.

Data Resource

Always in proper case. Use this to refer to the data layer of the ecosystem and its contents. A single piece of data, such as a table or data file, and its properties, included in a data package. It contains the actual data, documented following the Data Package standard.

Check

Because of the confusion around “validate” and “verify”, we avoid using those words. Instead, we use the more general, though paradoxically more precise term, “check”. Check is general enough to encompass both “validate” and “verify”, without confounding the actual meaning of the two.

Validate

In software development, the term “validate” is a common word to describe comparing something against a specification or expectation. However, it is loosely and inconsistently used, and often incorrectly used. For a good overview, see the Wikipedia on this topic in general and on software specifically. In short, “validate” is the process of checking something against reality, human needs, and the human understanding of the world. “A valid argument” or “I was validated in my feelings” are examples of things that can only be determined by another human comparing them.

Or for more technical examples: “validating a model” is the process of comparing a model’s predictions to real-world data that we collected and that we believe accurately represents reality (or at least how we understand it). Likewise, “validating data” is checking that the data represents something that is meaningful to humans. It requires judgment and comparison, and can be different over time and in different contexts. What is valid now, may not be valid in the future.

In software development, true “validation” is rarely done, outside of user testing and feedback, and during requirement gathering. Most of the time, when the term “validate” is used, it actually means “verifying”. See verify below.

Verify

“Verify” is the act of checking something against a specification or requirement. When verifying something against a specification, regardless of the connection to reality or to human needs, if it meets the specification, it “passes”. As long as the specification and the thing being verified don’t change, it will always pass. Even if the specification is wrong, unconnected to reality, or no longer relevant to any human problem or need. “Verification” is often confused with “validation”. See our entry on validate for an introduction the confusion around these two terms.

The act of making or updating a specification requires human judgment and a connection to solving some need, so the specification itself can be validated. When something is compared against the specification, this is verification. An easy guideline to use is that verification happens within a computer (or between computers) and doesn’t necessarily require human involvement. Often, verification is automated, and if it isn’t, it likely could be.