Virtual team brainstorming meeting

virtual
meeting
brainstorming
Agenda for half-day virtual team meeting to brainstorm on the project.
Author

Luke W. Johnston

Published

December 15, 2022

Agenda

  • Brainstorm on backend common data model. This is the key to the whole project, so should focus on it. (Idea: Use whiteboard to sketch I/O?)
    • How will it be stored?
    • When data comes in, what do the files and folders look like?
      • If no files and folders, then stored as SQL? How? If data comes in as CSV, what are the steps involved?
    • How should the data be organized as it enters the CDM? Should it be organized by a timestamp?
    • Or, when new completely data (e.g. a new table) comes in, where will it be added? (e.g. study gets funding to run metabolic measurements from stored blood)
      • Should all data be treated similarly? As in, should any data coming in be structured in a similar way?
      • How can we design it to be flexible to incoming data?
    • How should the model look?
    • Easy enough with rectangular data (e.g. spreadsheet style), what about images, etc?
    • How should the API look like to move data into this Model?
    • How will changes or additions be put into a changelog file?
    • How will data schema be accessible to display in frontend as list of variables available? Or that can be updated (e.g. add description about data)?
  • Next steps and aims:
    • Create an architecture/specification document detailing how Seedcase will look like
    • Send spec document to get reviewed by external third-party
  • How to split up tasks, some milestones?

Minutes from meeting

  • Common data model thoughts:
    • Consideration: Data input should immediately enter into relational database? Or go into another storage format instead?
    • How to enforce certain standards, like variable naming?
      • Each variable has its own UUID? With a list of variable “standards” to select from based on the meta data?
    • Question: Does column-based data storage (e.g. Parquet) make it easier to input/update the data?
    • Images/non-rectangular thoughts:
      • Image be saved as is?
      • Create folder that holds the “variable” (e.g liver ultrasound)
      • each image name would have the subject/observation ID included and timestamp? Along with a UUID appended?
      • Include a _metadata.json type file in the “variable” folder?
  • Action item: Luke will make a sketch of an data input pathway diagram for everyone to review later.
  • Another brainstorming Jan 26th, same time.
    • To discuss anything that wasn’t included in this session.