Data Science Student Team (DSST)

  • Joshua Vong
  • Ethan Nguyen
  • Thomas Smale
  • Matt Solone


Regenerative Agriculture is an approach to farm and ranch management that aims to reverse climate change through practices that restore degraded soils. The Center for Regenerative Agriculture offers soil testing and regenerative practice education services for farmers for a fee. This project is to create and implement a data pipeline and processing system for soil samples coming into the soil lab.


The RAD Lab team is heading to Blythe to visit a farm and take samples. They would like to have the farmer complete an intake form to collect background information such as type of farm, location etc to build a profile of that client. Then once the samples are taken from the site, they need to be asset tagged, entered into some sort of tracking system, scientifically processed to produce data, which needs to be wrangled into a client facing report.

Project Goals

Create a data science solution to streamline lab activities, increase staff efficacy by automating data processing and report generation.

You are starting from essentially scratch here. Your approach should be to spend a fair amount of time planning and communicating directly with the client to understand the needs and what they have already. You will do a lot of writing and decision making on best way to collect & process data before you get to any type of coding stage where you can start implementing and building the pipeline.

Sample Deliverables

  • Client relationship database
    • Intake form that is easy to use by farmers.
  • A playbook for new staff and students to use.
    • A conceptual diagram of interrelated data systems.
    • A foundation data structure and file organization protocol for the lab. (instead of individual spreadsheets in random places on box) a structured data format with codebook to ensure common headers/data types across data sources
    • Instructions on how to use scripts/data pipeline
    • Training of current lab staff
  • A client facing dashboard that shows where the farmer’s sample is in the process, possibly a downloadable report with the nutrient profile and suggested conservation/amendment suggestions.
  • A ticket requesting system that allows clients to request lab services outside of direct email.
  • Reproducible reports that take raw sample processing output and put it into a client facing report that is compiled to PDF.
  • Aggregated summary of lab activity for funder such as number of samples processed, by type of farm or location etc.

Timeline / Milestones

Details regarding deliverables to be added by DSST

  • 10-15 minute project check ins during class time approximately every two weeks starting 3/10/2022. The DSST will provide updates to the client on their progress with opportunity to ask questions from both parties.
  • The client will review a scientific poster created by the DSST for presentation at the College of Natural Sciences Poster Session, date somewhere in late April or Early May.
  • The DSST will give a final stakeholder presentation to the CRARS leadership during the week of May 15th, 2022 (Finals week).

Timeline (Updated)

By 03/29/2022

Prototypes for the 3 options for the intake form.

  • Option 1 - Google form <-> Google Sheets
  • Option 2 - Shiny -> Box -> Gmail
  • Option 3 - Jango Web application

By 04/05/2022

Have Prototypes ready to present to the AG team # Domain Learning

Project Updates

Weekly updates done by DSST via PR


  • Joshua Vong - Oversight
  • Ethan Nguyen - Taskmaster
  • Thomas Smale - Liaison
  • Matt Solone - Time Keeper

02/24/2022 Meeting

Big Idea

  1. Intake form client relationship form (Website )

    1a. What fields do we want on this form ?

    1b. Data processing pipeline, produces file, get that file into a presentable format.

  2. Develop data from that intake form, How to store? Where to access?

  3. Have all types of data types that will need to be tracked on the back-end.

  4. Databases set around a specific ID, where we can track everything back to this ID.

  5. Frontend, what categories do we need to collect that allows us to analyze efficiency?

  6. Cost estimates based on a few fields, then after then we can send them the true intake form.

Next Steps

  • Work on raw data example, figure out how to clean and organize for more efficient.

  • Schedule lab visit and get idea for the intake form.

  • Implement data an tie into a modeling package, with slight tweaking.


Project update

Met with members of the RAD lab to focus in on the specifics of our job.

Main objectives -

  • Develop a user friendly online intake form for clients (Auto generation for client when we receive their samples)
  • Develop Automated tracking sheet for RAD lab
  • Begin to build prototype online intake form and data cleaning/presenting code.


Began to develop working protoypes of the intake form to present to the AG representatives. Waiting on ITSS approval for Shiney App.


Project Updates

Google Form -

  • Added file download / upload part to the Google form to allow for many samples
  • There is a problem as the sheet is uploaded as a link, could slow down automation

Django -

From 04/05/2022 * Building out django’s back-end abstraction for the intake form * Creating a frontend form for intake form

04/12/2022 * Make intake form front end cleaner * Start on internal tracking form

R Shiney / Box -

  • Looking into authentication with ITSS to configure the box API to store their files on a local system.


  • Met with AG group to dicuss updates on prototypes
  • CRARS likes the Django approach best, problem is matinence and full implementation.
  • Still updating and working on the Google form and Shiney programs.