This R markdown file performs calibrations for dissolved inorganic carbon (\(\sum\text{CO}_2\)) concentration (\(c\)) and isotopic composition (\(\delta^{13}\text{C}\)) analyzed by acidification of water samples and transfer of resultant \(\text{CO}_2\) via a Thermo GasBench to an isotope ratio mass spectrometer. Peak amplitudes are fit by exponential decay to check for proper functioning of the GasBench needle. Only peaks derived from good injections are selected. Exponential fits are then performed on all analyses to project back to the m/z 44 amplitude that would have resulted from an injection occuring at the moment of needle puncture. Expected pCO\(_2\) of standards is then calculated from the mass of CaCO\(_3\) loaded into vials as well as the volumes of the vial, liquid, and H\(_3\)PO\(_4\) added. A calibration curve is constructed by plotting expected pCO\(_2\) of standards vs. their m/z 44 amplitude at t\(_0\). \(\sum\text{CO}_2\) of the original water sample is calculated based on this calibration. \(\delta^{13}\)C is corrected for drift, linearity, and isotopic discrimination.
# load libraries
library(tidyverse) # dplyr, tidyr, ggplot
library(isoreader) # reading isotope data files
library(isoprocessor) # processing isotope data files
library(plotly) # interactive plots
library(knitr) # generating reports
library(ggpmisc) # add equations of best fit to ggplot
library(chemCal) # calculations from calibration curve
# source all relevant scripting files
source(file.path("scripts", "plotting_functions.R"))
# global knitting options for automatic saving of all plots as .png and .pdf
knitr::opts_chunk$set(
dev = c("png", "pdf"), fig.keep = "all",
dev.args = list(pdf = list(encoding = "WinAnsi", useDingbats = FALSE)),
fig.path = file.path("fig_output/", paste0(gsub("\\.[Rr]md", "/", knitr::current_input()))),
cache.path = file.path("cache/", paste0(gsub("\\.[Rr]md", "/", knitr::current_input())))
)
Data processed using the packages isoreader version 1.2.1 and isoprocessor version 0.6.7.
iso_files_raw <-
file.path(
"data_raw/190208_DBN_DIC", "190208_DBN_DIC_data.cf.rds"
) %>%
# read data files in parallel for fast read
iso_read_continuous_flow(parallel = TRUE)
# process
iso_files <- iso_files_raw %>%
# set peak table from vendor data table
iso_set_peak_table_from_auto_vendor_data_table() %>%
# rename key file info columns
iso_rename_file_info(id1 = `Identifier 1`, type = `Identifier 2`) %>%
# parse text info into numbers
iso_parse_file_info(number = Analysis) %>%
# process other file information that is specific to the naming conventions
# of this particular sequence
iso_mutate_file_info(
# what was the mass of carbonate standard loaded?
mass_loaded = parse_number(Comment) %>% iso_double_with_units("ug"),
vol_vial = parse_number(Preparation) %>% iso_double_with_units("ml")
)
## Warning: 'iso_set_peak_table_from_auto_vendor_data_table' has been renamed
## to 'iso_set_peak_table_automatically_from_vendor_data_table'. Please call the
## latter function directly to avoid this warning.
## Info: setting peak table for 32 file(s) from vendor data table with the following renames:
## - for 32 file(s): 'Nr.'->'peak_nr', 'Is Ref.?'->'is_ref', 'Start'->'rt_start', 'Rt'->'rt', 'End'->'rt_end', 'Ampl 44'->'amp44', 'Ampl 45'->'amp45', 'Ampl 46'->'amp46', 'BGD 44'->'bgrd44_start', 'BGD 45'->'bgrd45_start', 'BGD 46'->'bgrd46_start', 'BGD 44'->'bgrd44_end', 'BGD 45'->'bgrd45_end', 'BGD 46'->'bgrd46_end', 'rIntensity 44'->'area44', 'rIntensity 45'->'area45', 'rIntensity 46'->'area46', 'rR 45CO2/44CO2'->'r45/44', 'rR 46CO2/44CO2'->'r46/44', 'rd 45CO2/44CO2'->'rd45/44', 'rd 46CO2/44CO2'->'rd46/44', 'd 45CO2/44CO2'->'d45/44', 'd 46CO2/44CO2'->'d46/44', 'd 13C/12C'->'d13C', 'd 18O/16O'->'d18O', 'd 17O/16O'->'d17O', 'AT% 13C/12C'->'at13C', 'AT% 18O/16O'->'at18O'
## Info: renaming the following file info across 32 data file(s): 'Identifier 1'->'id1', 'Identifier 2'->'type'
## Info: parsing 1 file info columns for 32 data file(s):
## - to number: 'Analysis'
## Info: mutating file info for 32 data file(s)
Display chromatograms of all samples and standards. The first four peaks are reference peaks. The smaller sharp peak after that is a half-inject used to clear the sample loop.
chroms <- iso_files %>%
iso_plot_continuous_flow_data(
data = c(44),
color = NULL
) +
theme(legend.position = "bottom")
chroms
# note: we do not consider first two sample inject peaks as analyte peaks to be sure that there is no carryover between samples
peak_maps <-
tibble::tribble(
~compound, ~ref_nr, ~`rt`,
# peak map data (row-by-row)
"CO2 ref", 1, 26,
"CO2 ref", 2, 51,
"CO2 ref", 3, 75,
"CO2 ref", 4, 100,
"CO2 half inject", NA, 148,
"CO2 first sample peak", NA, 169,
"CO2 analyte", NA, 219,
"CO2 analyte", NA, 269,
"CO2 analyte", NA, 319,
"CO2 analyte", NA, 368,
"CO2 analyte", NA, 418,
"CO2 analyte", NA, 468,
"CO2 analyte", NA, 518,
"CO2 analyte", NA, 568,
"CO2 analyte", NA, 617
)
peak_maps %>% knitr::kable(digits = 0)
compound | ref_nr | rt |
---|---|---|
CO2 ref | 1 | 26 |
CO2 ref | 2 | 51 |
CO2 ref | 3 | 75 |
CO2 ref | 4 | 100 |
CO2 half inject | NA | 148 |
CO2 first sample peak | NA | 169 |
CO2 analyte | NA | 219 |
CO2 analyte | NA | 269 |
CO2 analyte | NA | 319 |
CO2 analyte | NA | 368 |
CO2 analyte | NA | 418 |
CO2 analyte | NA | 468 |
CO2 analyte | NA | 518 |
CO2 analyte | NA | 568 |
CO2 analyte | NA | 617 |
# identify peaks
peak_table_w_ids <- iso_files %>%
iso_map_peaks(peak_maps) %>%
# peak table
iso_get_peak_table(include_file_info = everything())
## Warning in max(..n_overlapping..): no non-missing arguments to max; returning -
## Inf
## Warning in max(..n_matches..): no non-missing arguments to max; returning -Inf
## Info: 443 of 443 peaks in 32 files were successfully mapped using a single peak map. 36 expected peak(s) are missing.
## Info: aggregating peak table from 32 data file(s), including file info 'everything()'
Display an example chromatogram with peaks labeled.
chrom_labeled <- iso_files %>%
iso_filter_files(id1 == "205 ug HIS mon2") %>%
iso_plot_continuous_flow_data(
# select data and aesthetics
data = c(44),
color = id1,
# provide our peak table with ids
peak_table = peak_table_w_ids,
# define peak labels, this can be any valid expression
peak_label = iso_format(id = peak_info)
) +
theme(
legend.position = "bottom"
)
## Info: applying file filter, keeping 1 of 32 files
chrom_labeled
# focus on analyte peaks
peak_table_analytes <- peak_table_w_ids %>%
# omit reference peaks and half inject for further processing (i.e. analyte peaks only)
filter(compound == "CO2 analyte")
# print
peak_table_analytes
Add columns to data frame for the period of time between the GasBench needle puncturing the vial and it being injected to the GC-IRMS.
sample_transfer_t_s <- 53 # seconds between needle stabbing vial and start of acquisition
rt_CO2_s <- 150 # actual CO2 retention time from start of inject to m/z 44 peak on mass spec
peak_table_analytes <- peak_table_analytes %>% mutate(t_stab_to_inject_s = rt - rt_CO2_s + sample_transfer_t_s) # make column for time between stabbing of gasbench needle and injection
Find and plot a model chromatogram with good injection.
iso_files %>% iso_filter_files(id1 == "205 ug HIS mon2") %>%
iso_plot_continuous_flow_data(
data = c("44"),
color = file_id
) +
theme(legend.position = "bottom")
## Info: applying file filter, keeping 1 of 32 files
Plot the peak amplitudes of the training chromatogram vs. time from start of He dilution. A non-linear least squares model of exponential decay of m/z 44 amplitude vs. time from needle puncture to inject provides a good fit to the data. This is consistent with physical reality since the sample CO\(_2\) is in a constant volume and is being diluted with a constant flow of He.
training_chrom <- peak_table_analytes %>% filter(id1 == "205 ug HIS mon2")
training_chrom %>%
ggplot() +
aes(
x = t_stab_to_inject_s,
y = amp44
) +
geom_point() +
geom_line(data = function(df) mutate(df, amp44 = nls(amp44 ~ A*exp(-k*t_stab_to_inject_s), start = c(A = 10000, k = 0.0001), data = df) %>% predict()), alpha = 0.4)+
scale_x_continuous(name = "time from GasBench needle puncture until injection on GC-IRMS [s]", limits = c(0, 600), expand = c(0,0)) +
scale_y_continuous(name = ("m/z 44 peak amplitude [mV]"))+
theme_bw()
Make non-linear least squares model based on the ‘training chromatogram’.
exp_model <- nls(data = training_chrom, formula = amp44 ~ A*exp(-k*t_stab_to_inject_s), start = c(A = 10000, k = 0.0001)) # make and save model
summary(exp_model) # print summary of the model
##
## Formula: amp44 ~ A * exp(-k * t_stab_to_inject_s)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## A 1.522e+04 1.801e+01 845.2 < 2e-16 ***
## k 7.233e-04 3.652e-06 198.1 2.21e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.95 on 7 degrees of freedom
##
## Number of iterations to convergence: 4
## Achieved convergence tolerance: 1.259e-06
Save exponential decay variable, k
k <- as.numeric(coef(exp_model)[2]) # save k
k # print k
## [1] 0.0007233262
For demonstration purposes, find and plot a model chromatogram with some bad injections.
We can see in the following chromatogram that the first few peaks don’t follow the expected trend of decreasing amplitude with time. This indicates partial needle clogging, probably due to water condensation on underside of septum, causing some weak injections in this sample.
iso_files %>% iso_filter_files(id1 == "194 ug YULE drift1") %>%
iso_plot_continuous_flow_data(
data = c(44),
color = file_id
) +
theme(legend.position = "bottom")
## Info: applying file filter, keeping 1 of 32 files
For demonstration purposes, now re-plot m/z 44 peak amplitude vs. period of time between the GasBench needle puncturing the vial and it being injected to the GC-IRMS, same chromatogram as above.
weak_inject_chrom <- peak_table_analytes %>% filter(id1 == "194 ug YULE drift1")
weak_inject_chrom %>%
ggplot() +
aes(
x = t_stab_to_inject_s,
y = amp44
) +
geom_point()+
scale_x_continuous(name = "time from GasBench needle puncture until injection on GC-IRMS [s]", limits = c(0, 600), expand = c(0,0)) +
scale_y_continuous(name = ("m/z 44 peak amplitude [mV]"))+
theme_bw()
Now, plot all peaks alongside the fit which gives the largest m/z 44 peak amplitude at t\(_0\). It is visibly clear which injects were affected by partial needle obstruction.
weak_inject_chrom <- weak_inject_chrom %>% mutate(amp44_t0_per_peak = amp44 * exp(k*t_stab_to_inject_s)) # For each peak, calculate the projected amplitude at t0 in the chromatogram
weak_inject_chrom_max_amp44_t0 <- weak_inject_chrom %>% filter(amp44_t0_per_peak == max(amp44_t0_per_peak)) # select largest projected amplitude at t0 in the chromatogram
max_amp44_t0 <- weak_inject_chrom_max_amp44_t0$amp44_t0_per_peak # save largest projected amplitude at t0
t_max_amp44_t0 <- weak_inject_chrom_max_amp44_t0$t_stab_to_inject_s # save the time of the peak with largest projected amplitude at t0
weak_inject_chrom_fit_function <- function (x) max_amp44_t0*exp(-k*x) # write function to calculate amp44 as a function of t based on the projected amplitude at t0 estimated from the largest peak of the weak inject chromatogram and the k value from the good chromatogram
# plot the peak and the fit based on the point with largest project amp44 t0
weak_inject_chrom %>%
ggplot() +
stat_function(data = data.frame(amp44=c(1, 600)), aes(x=amp44), fun = weak_inject_chrom_fit_function, geom="line") +
aes(
x = t_stab_to_inject_s,
y = amp44
) +
geom_point(color="blue") +
scale_x_continuous(name = "time from GasBench needle puncture until injection on GC-IRMS [s]", limits = c(0, 600), expand = c(0,0)) +
scale_y_continuous(name = ("m/z 44 peak amplitude [mV]"))+
theme_bw()
Estimate the expected m/z 44 peak amplitudes of all peaks if proper injections occured based on the maximum projected amplitude at t\(_0\) for each analysis.
peak_table_analytes_max_peaks <- peak_table_analytes %>% mutate(amp44_t0_per_peak = amp44 * exp(k*t_stab_to_inject_s)) # For each peak, calculate the projected amplitude at t0 in the chromatogram
peak_table_analytes_max_peaks <- peak_table_analytes_max_peaks %>% group_by(file_id) %>% mutate(max_amp44_t0 = max(amp44_t0_per_peak)) # add column for largest projected m/z 44 peak amplitude at t0 per peak for a given analysis
peak_table_analytes_max_peaks <- peak_table_analytes_max_peaks %>% group_by(file_id) %>% mutate(t_of_max_amp44_t0 = ifelse(amp44_t0_per_peak == max_amp44_t0, t_stab_to_inject_s, 0)) # add column for t_stab_to_inject for the peak with largest projected m/z 44 peak amplitude at t0
peak_table_analytes_max_peaks <- peak_table_analytes_max_peaks %>% group_by(file_id) %>% mutate(t_of_max_amp44_t0_copied = max(t_of_max_amp44_t0)) %>% select(-t_of_max_amp44_t0) # copy t_of_max_amp44_t0 to whole row and delete previously made column
peak_table_analytes_max_peaks <- peak_table_analytes_max_peaks %>% group_by(file_id) %>% mutate(amp44_expected = max_amp44_t0*exp(-k*t_stab_to_inject_s)) # estimate what the amp44 would have been based on the peak with largest projected m/z 44 peak amplitude at t0 and previously calculated value of k
peak_table_analytes_max_peaks <- peak_table_analytes_max_peaks %>% group_by(file_id) %>% mutate(amp44_expected_plus5percent = amp44_expected + amp44_expected*.05) # make column for expected amp44 + 5%
peak_table_analytes_max_peaks <- peak_table_analytes_max_peaks %>% group_by(file_id) %>% mutate(amp44_expected_minus5percent = amp44_expected - amp44_expected*.05) # make column for expected amp44 - 5%
Plot measured m/z 44 peak amplitude as well as expected peak amplitude ± 5%. By playing around with the interactive plot, it should be clear that normal injections are consistently within the expected value 5±%, and the bad injections are obviously out of this range. 5% is an arbitrarily selected tuning factor.
It’s worth noting that the ~12 ml Exetainer vials generally require a somewhat larger tolerance compared to the ~119 ml vials that I sometimes use for lower concentration samples (i.e. 5% as opposed to 1.5%). This is because the larger vials have a larger headspace and therefore slower dilution, so the exact peak fitting parameters are typically less sensitive for the bigger vials.
amp44_measured_v_expected <- peak_table_analytes_max_peaks %>%
ggplot() +
aes(
x = t_stab_to_inject_s,
y = amp44,
color = file_id
) +
geom_pointrange(aes(y = amp44_expected, ymin = amp44_expected_minus5percent, ymax = amp44_expected_plus5percent, label = "expected_amp44"), size = 2, alpha=0.5)+
geom_point(size=1, aes(label = d13C))+
scale_x_continuous(name = "time from GasBench needle puncture until injection on GC-IRMS [s]", limits = c(0, 600), expand = c(0,0)) +
scale_y_continuous(name = ("m/z 44 peak amplitude [mV]"))+
theme_bw()
## Warning: Ignoring unknown aesthetics: label
## Warning: Ignoring unknown aesthetics: label
amp44_measured_v_expected %>% ggplotly()
## Don't know how to automatically pick scale for object of type iso_double_with_units/vctrs_vctr. Defaulting to continuous.
Filter out peaks that are not within 5% of the expected value for amp44
peak_table_analytes_max_peaks_filtered <- peak_table_analytes_max_peaks %>% filter(amp44 > amp44_expected_minus5percent & amp44 < amp44_expected_plus5percent) # filter out peaks that are not within 5% of the expected value for amp44
Plot peaks filtered for good injections
filtered_for_good_injects <- peak_table_analytes_max_peaks_filtered %>%
ggplot() +
aes(
x = t_stab_to_inject_s,
y = amp44,
color = file_id
)+
scale_x_continuous(name = "time from GasBench needle puncture until injection on GC-IRMS [s]", limits = c(0, 600), expand = c(0,0)) +
scale_y_continuous(name = ("m/z 44 peak amplitude [mV]"))+
geom_point(size=2)+
theme_bw()
filtered_for_good_injects %>% ggplotly()
Write function for calculating the amplitude of a signal at time 0 given a dataframe of time and signal.
# function for calculating the amplitude of a signal at time 0 given a dataframe of time and signal
exp_decay_t0 <- function (time, signal, A_guess = 5000, k_guess = k) {
ampl_t0 <- coef(nls(formula = signal ~ A*exp(-k*time), start = c(A = A_guess, k = k_guess)))[1]
try(return(ampl_t0))
}
Count peaks left after filtering
peak_counts <- peak_table_analytes_max_peaks_filtered %>% group_by(file_id) %>% summarise(n=n()) # summarise how many peaks are left after filtering for bad injects
## `summarise()` ungrouping output (override with `.groups` argument)
peak_counts #print
Only keep samples with at least 4 peaks after quality filtering
peak_table_analytes_summarise <- peak_table_analytes_max_peaks_filtered %>% group_by(file_id) %>% filter(n()>=4) # only keep samples with at least 4 peaks after quality filtering
peak_table_analytes_summarise <- peak_table_analytes_summarise %>% group_by(file_id) %>% mutate(amp44_t0 = exp_decay_t0(time = t_stab_to_inject_s, signal = amp44)) # calculate more exactly amp44 at t0 based on all peaks in the model
data <-
peak_table_analytes_summarise %>%
group_by(file_id, id1, type, mass_loaded, amp44_t0) %>%
summarize(
num.peaks=n(),
d13C.measured=mean(d13C),
d13C.sd=sd(d13C),
amp44_mean=mean(amp44),
amp44.sd=sd(amp44),
inv.amp44=1/amp44_mean,
file_datetime=mean(file_datetime),
vol_vial = mean(vol_vial),
Analysis = mean(Analysis)
)
## `summarise()` regrouping output by 'file_id', 'id1', 'type', 'mass_loaded' (override with `.groups` argument)
data <- data %>% ungroup()
Add column for data type
data <- data %>% mutate(type_general = ifelse(type == "sample", "sample", "standard"))
# data %>% select(file_id, Analysis, amp44_t0)
method_blanks <- data %>% filter(Analysis == 16593 | Analysis == 16595 | Analysis == 16597)
method_blanks %>% select(file_id, Analysis, amp44_t0)
# note: in this run, I included blanks with only H3PO4 and not H3PO4 + water because I only had 1 blank with water and I need at least 2 to calculate LOQ, but still all these blanks are decent representations of true method blanks
method_blanks <- data %>% filter(Analysis == 16593 | Analysis == 16595 | Analysis == 16597)
method_blanks %>% select(file_id, amp44_t0) %>% kable()
file_id | amp44_t0 |
---|---|
16593__5 min He purge 190104 + 100 ul H3PO4 190104 rep1.dxf | 422.0845 |
16595__5 min He purge 190104 + 100 ul H3PO4 190104 rep3.dxf | 133.6089 |
16597__5 min He purge 190104 + 100 ul H3PO4 190104 + 1 ml boiled MilliQ 190104.dxf | 362.9678 |
# mean signal of method blanks
S_mb <- mean(method_blanks$amp44_t0)
S_mb # print
## [1] 306.2204
# standard deviation of signal of method blanks
sd_mb <- sd(method_blanks$amp44_t0)
sd_mb # print
## [1] 152.3803
# calculate the signal for limit of quantitation
# eq. 4.7.4 https://chem.libretexts.org/Bookshelves/Analytical_Chemistry/Book%3A_Analytical_Chemistry_2.0_(Harvey)/04_Evaluating_Analytical_Data/4.7%3A_Detection_Limits
# The ability to detect the analyte with confidence is not the same as the ability to report with confidence its concentration, or to distinguish between its concentration in two samples. For this reason the American Chemical Society’s Committee on Environmental Analytical Chemistry recommends the limit of quantitation, (SA)LOQ.
S_A_LOQ <- S_mb + 10 * sd_mb
S_A_LOQ # print
## [1] 1830.023
Make units explicit for subsequent calculations. add_row() in following chunk can’t handle double with units data class.
data <- data %>% iso_make_units_explicit()
# print
data
Check if data is quantitable
# insert dummy row for LOQ
# have to make units explicit here because add_row() can't handle double with units
data <- data %>% iso_make_units_explicit() %>% add_row(file_id = "LOQ", id1 = "LOQ", type_general = "sample", amp44_t0 = S_A_LOQ, `vol_vial [ml]` = 11.7)
# add general names for samples
data <- data %>% mutate(name = case_when(
Analysis == 16591 ~ "5 min He purge 190104",
Analysis == 16593 ~ "5 min He purge 190104 + 100 ul H3PO4 190104",
Analysis == 16595 ~ "5 min He purge 190104 + 100 ul H3PO4 190104",
Analysis == 16597 ~ "5 min He purge 190104 + 100 ul H3PO4 190104 + 1 ml boiled MilliQ",
Analysis == 16617 ~ "5 min He purge 190104 + 100 ul H3PO4 190104 + 1 ml boiled MilliQ 190104 + 194ug YULE",
Analysis == 16618 ~ "WAB71",
Analysis == 16619 ~ "NSHQ14",
Analysis == 16620 ~ "WAB105",
Analysis == 16621 ~ "BA1A_0_30",
Analysis == 16624 ~ "BA1A_41_65",
Analysis == 16625 ~ "BA1A_108_132",
Analysis != 16617 & str_detect(id1, "YULE") == TRUE ~ "YULE",
str_detect(id1, "HIS") == TRUE ~ "HIS",
str_detect(id1, "LSVEC") == TRUE ~ "LSVEC",
str_detect(id1, "sequoia") == TRUE ~ "CU Sequoia flush gas",
is.na(Analysis) ~ "LOQ"
))
data %>% select(file_id, name) %>% kable()
file_id | name |
---|---|
16589__CU sequoia flush gas.dxf | CU Sequoia flush gas |
16591__5 min He purge 190104.dxf | 5 min He purge 190104 |
16593__5 min He purge 190104 + 100 ul H3PO4 190104 rep1.dxf | 5 min He purge 190104 + 100 ul H3PO4 190104 |
16595__5 min He purge 190104 + 100 ul H3PO4 190104 rep3.dxf | 5 min He purge 190104 + 100 ul H3PO4 190104 |
16597__5 min He purge 190104 + 100 ul H3PO4 190104 + 1 ml boiled MilliQ 190104.dxf | 5 min He purge 190104 + 100 ul H3PO4 190104 + 1 ml boiled MilliQ |
16598__18 ug YULE lin.dxf | YULE |
16599__46 ug YULE lin.dxf | YULE |
16600__111 ug YULE lin.dxf | YULE |
16601__149 ug YULE lin.dxf | YULE |
16602__198 ug YULE lin.dxf | YULE |
16606__248 ug YULE lin.dxf | YULE |
16607__304 ug YULE lin.dxf | YULE |
16608__347 ug YULE lin.dxf | YULE |
16609__435 ug YULE lin.dxf | YULE |
16610__194 ug YULE drift1.dxf | YULE |
16611__204 ug HIS mon1.dxf | HIS |
16612__204 ug YULE dis1.dxf | YULE |
16613__209 ug HIS dis2.dxf | HIS |
16614__174 ug LSVEC dis3.dxf | LSVEC |
16615__216 ug YULE drift2.dxf | YULE |
16616__205 ug HIS mon2.dxf | HIS |
16617__5 min He purge 190104 + 100 ul H3PO4 190104 + 1 ml boiled MilliQ 190104 + 194ug YULE.dxf | 5 min He purge 190104 + 100 ul H3PO4 190104 + 1 ml boiled MilliQ 190104 + 194ug YULE |
16618__WAB71.dxf | WAB71 |
16619__NSHQ14.dxf | NSHQ14 |
16622__200 ug YULE drift3.dxf | YULE |
16623__198 ug HIS mon3.dxf | HIS |
16629__196 ug YULE drift4.dxf | YULE |
16630__210 ug HIS mon4.dxf | HIS |
LOQ | LOQ |
# add column for samples above or below limit of quantitation
data <- data %>% mutate(quantitatable = ifelse(amp44_t0 >= S_A_LOQ, TRUE, FALSE))
# select relevant data and print
data %>% select(file_id, amp44_t0, quantitatable) %>% kable()
file_id | amp44_t0 | quantitatable |
---|---|---|
16589__CU sequoia flush gas.dxf | 10183.43970 | TRUE |
16591__5 min He purge 190104.dxf | 72.97056 | FALSE |
16593__5 min He purge 190104 + 100 ul H3PO4 190104 rep1.dxf | 422.08447 | FALSE |
16595__5 min He purge 190104 + 100 ul H3PO4 190104 rep3.dxf | 133.60887 | FALSE |
16597__5 min He purge 190104 + 100 ul H3PO4 190104 + 1 ml boiled MilliQ 190104.dxf | 362.96777 | FALSE |
16598__18 ug YULE lin.dxf | 1706.27779 | FALSE |
16599__46 ug YULE lin.dxf | 3346.59980 | TRUE |
16600__111 ug YULE lin.dxf | 7989.05289 | TRUE |
16601__149 ug YULE lin.dxf | 11328.22799 | TRUE |
16602__198 ug YULE lin.dxf | 15312.63837 | TRUE |
16606__248 ug YULE lin.dxf | 19121.39281 | TRUE |
16607__304 ug YULE lin.dxf | 24482.44977 | TRUE |
16608__347 ug YULE lin.dxf | 28099.43586 | TRUE |
16609__435 ug YULE lin.dxf | 18733.55607 | TRUE |
16610__194 ug YULE drift1.dxf | 14775.66727 | TRUE |
16611__204 ug HIS mon1.dxf | 15524.44228 | TRUE |
16612__204 ug YULE dis1.dxf | 15699.95889 | TRUE |
16613__209 ug HIS dis2.dxf | 16017.63748 | TRUE |
16614__174 ug LSVEC dis3.dxf | 17608.76326 | TRUE |
16615__216 ug YULE drift2.dxf | 16587.11741 | TRUE |
16616__205 ug HIS mon2.dxf | 15224.82323 | TRUE |
16617__5 min He purge 190104 + 100 ul H3PO4 190104 + 1 ml boiled MilliQ 190104 + 194ug YULE.dxf | 15368.38613 | TRUE |
16618__WAB71.dxf | 329.74667 | FALSE |
16619__NSHQ14.dxf | 331.34241 | FALSE |
16622__200 ug YULE drift3.dxf | 14993.60208 | TRUE |
16623__198 ug HIS mon3.dxf | 15313.38197 | TRUE |
16629__196 ug YULE drift4.dxf | 14870.91328 | TRUE |
16630__210 ug HIS mon4.dxf | 16216.01256 | TRUE |
LOQ | 1830.02298 | TRUE |
Correct data types for calculations
vol_H2O_sample_ml <- 1 # volume of water sample in ml
vol_H3PO4_added_ml <- .1 # volume of concentrated H3PO4 added to samples and standards
select standards for calibration curve
linC <- data %>% filter(type == "lin.std") # filter for linearity standards
Plot calib curve based on mass loaded
calib_DIC <-
ggplot (linC, aes(x=`mass_loaded [ug]`, y=amp44_t0, label = num.peaks)) +
geom_point() +
scale_x_continuous(name = "mass CaCO3 loaded [µg]") +
scale_y_continuous(name = ("m/z 44 peak amplitude t0 [mV]"))+
theme_bw()
calib_DIC %>% ggplotly()
Something appears wrong with the 49µg standard. Perhaps an error in weighing and transferring the standard, or piece of standard got stuck on the side of Exetainer and did not dissolve, or cap was not properly closed, allowing CO\(_2\) to diffuse out. The chromatogram below looks fine, so it’s unclear what the problem was.
iso_files %>% iso_filter_files(id1 == "435 ug YULE lin") %>%
iso_plot_continuous_flow_data(
data = c(44),
color = id1
)
## Info: applying file filter, keeping 1 of 32 files
Cull 435µg YULE from linearity standards
linC <- linC %>% filter(id1 != "435 ug YULE lin") # low yield for 49 ug sample
Replot calibration curve with 435 µg standard culled
calib_mass_loaded_summ <- summary(lm(linC$amp44_t0 ~ linC$`mass_loaded [ug]`)) # summarize regression of m/z 44 amplitude at t0 vs. mass loaded
calib_DIC_2 <-
ggplot(linC, aes(x=`mass_loaded [ug]`, y=amp44_t0)) +
geom_smooth(method="lm", color = "blue") +
geom_point(shape=21, fill="black", size = 2)+
stat_poly_eq(aes(label = paste(stat(eq.label), stat(rr.label), sep = "~~~~")),
formula = linC$amp44_t0 ~ linC$`mass_loaded [ug]` , parse = TRUE, rr.digits = 6, color = "blue")+
scale_x_continuous(name = latex2exp::TeX("mass CaCO$_3$ loaded $\\[$µg$\\]$"))+
scale_y_continuous(name = latex2exp::TeX("m/z 44 peak amplitude t$_0$ $\\[$mV$\\]$"))+
theme_bw()
calib_DIC_2
## `geom_smooth()` using formula 'y ~ x'
Preparing constants and equations to calculate pCO\(_2\) from µg CaCO\(_3\)
# calculating henry's constant at lab conditions
R <- 0.083144598 # R (l * bar * K−1 * mol−1)
Pa_bar <- 1e5 # Pa/bar
l_m3 <- 1e3 # l m^-3
Hcp_CO2_25C_DI <- 3.30E-04 # Henry's constant (Hcp) @ 298.15K in deonized water (Sander, 2015)[mol m^-3 Pa^-1]
#eqn: Hcc = c(aq) / c(g)
#Hcc = Hcp * R * T
Hcp_lit_temp_K <- 298.15 # temp in K of literature henry constant
Hcp_CO2_25C_DI_bar <- Hcp_CO2_25C_DI * Pa_bar / l_m3 # Hcp mol L^-1 bar^-1
Hcc_CO2_25C_DI <- Hcp_CO2_25C_DI_bar * R * Hcp_lit_temp_K # dimensionless Hcc
Hcp_temp_correct_factor <- 2400 #dlnHcp/d(1/T) [K] temperature correction factor (Sander, 2015)
lab_temp_C <- 21
lab_temp_K <- lab_temp_C + 273.15
Hcp_CO2_lab_temp_DI <- Hcp_CO2_25C_DI * exp(Hcp_temp_correct_factor * (1/lab_temp_K - 1/Hcp_lit_temp_K)) #Henry constant at lab temp in DI water [mol m^-3 Pa^-1]
Hcp_CO2_lab_temp_DI_bar <- Hcp_CO2_lab_temp_DI * Pa_bar / l_m3 # Hcp mol L^-1 bar^-1
Hcc_CO2_lab_temp_DI <- Hcp_CO2_lab_temp_DI_bar * R * lab_temp_K # dimensionless Hcc
PO4_stock_M <- 14.8 # moles / liter of phosphate in concentrated stock sol'n (85 wt %) https://www.sigmaaldrich.com/chemistry/stockroom-reagents/learning-center/technical-library/reagent-concentrations.html
vol_l_ml <- vol_H2O_sample_ml + vol_H3PO4_added_ml # volume of water + acid in ml
water_H3PO4_ratio <- vol_H2O_sample_ml / vol_H3PO4_added_ml # ratio of concentrated H3PO4 (85 wt%) to water in DIC prep method
dilution_factor_H3PO4 <- (1/(1+1*water_H3PO4_ratio)) # dilution factor of concentrated H3PO4 during acidification of water sample
ci <- 14.8 * dilution_factor_H3PO4 # concentration of total phosphate and its protonated forms in acidified water sample [kmol m^-3 aka mol/l]
hi_H2PO4 <- 0.1025 # m^3kmol^-1 ion-specific parameter (schumpe 1993)
hg_CO2 <- -0.0183 # m^3kmol^-1 gas-specific parameter (schumpe 1993)
Hcc_CO2_lab_temp_and_ionic_strength <- Hcc_CO2_lab_temp_DI * 10^-((hi_H2PO4 + hg_CO2) * ci)
Calculate expected pCO\(_2\) from µg CaCO\(_3\) loaded
MM_CaCO3 <- 100.0869 #g/mol
linC <- linC %>% mutate(mol_CO2_total_expected = `mass_loaded [ug]` * 1e-6 / MM_CaCO3) # add column for total moles CO2 expected
linC <- linC %>% mutate(mol_ratio_CO2_g_aq = (`vol_vial [ml]` - vol_l_ml) * 1e-3 / (vol_l_ml*1e-3 * Hcc_CO2_lab_temp_and_ionic_strength)) # add column for mole ratio of CO2 gas / aqueous
linC <- linC %>% mutate(mol_CO2_g = mol_CO2_total_expected / (1+ (1/mol_ratio_CO2_g_aq))) # add column for total moles CO2 in gas phase
linC <- linC %>% mutate(p_CO2_expected_bar = mol_CO2_g * R * lab_temp_K / ((`vol_vial [ml]` - vol_l_ml)*1e-3)) # add column for expected pCO2
calib_DIC_3 <-
ggplot(linC, aes(x=p_CO2_expected_bar, y=amp44_t0)) +
geom_smooth(method="lm", color = "blue") +
geom_point(shape=21, fill="black", size = 2)+
stat_poly_eq(aes(label = paste(stat(eq.label), stat(rr.label), sep = "~~~~")),
formula = linC$amp44_t0 ~ linC$p_CO2_expected_bar , parse = TRUE, rr.digits = 6, color = "blue")+
scale_x_continuous(name = latex2exp::TeX("pCO$_2$ expected $\\[$bar$\\]$"))+
scale_y_continuous(name = latex2exp::TeX("m/z 44 peak amplitude t$_0$ $\\[$mV$\\]$"))+
theme_bw()
calib_DIC_3 # show plot
## `geom_smooth()` using formula 'y ~ x'
Generate linear regression of calibration against pCO\(_2\)
calib_fit_pCO2 <- lm(linC$amp44_t0 ~ linC$p_CO2_expected_bar) # make linear regression of m/z 44 amplitude at t0 vs. mass loaded
calib_fit_summ_pCO2 <- summary(calib_fit_pCO2) # summarize regression statistics
calib_fit_summ_pCO2 # print regression statistics
##
## Call:
## lm(formula = linC$amp44_t0 ~ linC$p_CO2_expected_bar)
##
## Residuals:
## Min 1Q Median 3Q Max
## -540.10 -333.01 -90.12 361.49 706.61
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -457.7 335.7 -1.363 0.222
## linC$p_CO2_expected_bar 3764888.9 74573.5 50.486 4.05e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 502.3 on 6 degrees of freedom
## Multiple R-squared: 0.9977, Adjusted R-squared: 0.9973
## F-statistic: 2549 on 1 and 6 DF, p-value: 4.052e-09
Filter for samples
samples <- filter(data, type_general=="sample") # filter for samples
# use inverse.predict function from chemCal to predict X based on Y
samples_calibrated <- samples %>% group_by(name) %>% mutate(pCO2_bar = as.numeric(inverse.predict(object = calib_fit_pCO2, newdata = amp44_t0, alpha = 0.05)[1]))
# use inverse.predict function from chemCal to calculate the 95% confidence interval for the prediction
samples_calibrated <- samples_calibrated %>% mutate(pCO2_bar_95_confidence = as.numeric(inverse.predict(object = calib_fit_pCO2, newdata = amp44_t0, alpha = 0.05)[2]))
# summarize calibrated samples
samples_calibrated_summ <- samples_calibrated %>% group_by(name) %>% summarise(n = n(), `amp44_mean [mV] mean` = mean(`amp44_mean [mV]`), amp44_t0 = mean(amp44_t0), pCO2_bar = first(pCO2_bar), pCO2_bar_95_confidence = first(pCO2_bar_95_confidence), quantitatable = ifelse(any(quantitatable == FALSE) == TRUE, FALSE, TRUE), `vol_vial [ml]` = mean(`vol_vial [ml]`))
## `summarise()` ungrouping output (override with `.groups` argument)
samples_calibrated_summ %>% kable(digits = 6) # print
name | n | amp44_mean [mV] mean | amp44_t0 | pCO2_bar | pCO2_bar_95_confidence | quantitatable | vol_vial [ml] |
---|---|---|---|---|---|---|---|
5 min He purge 190104 | 1 | 59.69181 | 72.97056 | 0.000141 | 0.000159 | FALSE | 11.7 |
5 min He purge 190104 + 100 ul H3PO4 190104 | 2 | 225.92858 | 277.84667 | 0.000195 | 0.000128 | FALSE | 11.7 |
5 min He purge 190104 + 100 ul H3PO4 190104 + 1 ml boiled MilliQ | 1 | 293.77891 | 362.96776 | 0.000218 | 0.000158 | FALSE | 11.7 |
5 min He purge 190104 + 100 ul H3PO4 190104 + 1 ml boiled MilliQ 190104 + 194ug YULE | 1 | 12324.35375 | 15368.38613 | 0.004204 | 0.000142 | TRUE | 11.7 |
LOQ | 1 | NA | 1830.02298 | 0.000608 | 0.000155 | TRUE | 11.7 |
NSHQ14 | 1 | 265.25060 | 331.34241 | 0.000210 | 0.000159 | FALSE | 11.7 |
WAB71 | 1 | 256.26572 | 329.74667 | 0.000209 | 0.000159 | FALSE | 11.7 |
Now, just for demonstration purposes, convert mass CaCO\(_3\) loaded to \(c_{\sum\text{CO}_2}\) and re-plot. This is a simpler, more common, and slightly less exact/representative way to make such a calibration curve.
MM_CaCO3 <- 100.0869 #g/mol
linC <- linC %>% mutate(mol_CO2_total_expected = `mass_loaded [ug]` * 1e-6 / MM_CaCO3) # add column for total moles CO2 expected
linC <- linC %>% mutate(DIC_uM = mol_CO2_total_expected / (vol_H2O_sample_ml *1e-3) * 1e6) # dissolved inorganic carbon concentration of initial water sample by dividing total moles CO2 by volume of water
calib_DIC_4 <-
ggplot(linC, aes(x=DIC_uM, y=amp44_t0)) +
geom_smooth(method="lm", color = "blue") +
geom_point(shape=21, fill="black", size = 2)+
stat_poly_eq(aes(label = paste(stat(eq.label), stat(rr.label), sep = "~~~~")),
formula = linC$amp44_t0 ~ linC$DIC_uM , parse = TRUE, rr.digits = 6, color = "blue")+
scale_x_continuous(name = latex2exp::TeX("estimated $\\textit{c}_{\\sum CO_2}$ $\\[$µmol$\\cdot$L$^{-1}\\]$"))+
scale_y_continuous(name = latex2exp::TeX("m/z 44 peak amplitude t$_0$ $\\[$mV$\\]$"))+
theme_bw()
calib_DIC_4
## `geom_smooth()` using formula 'y ~ x'
# make interactive plot
calib_DIC_5 <-
ggplot(linC, aes(x=DIC_uM, y=amp44_t0, label=id1))+
geom_point()+
theme_bw()
calib_DIC_5 %>% ggplotly()
### for concentration
samples_calibrated_summ <- samples_calibrated_summ %>% mutate(mol_CO2_g = pCO2_bar * (`vol_vial [ml]` - vol_l_ml) * 1e-3 / (R * lab_temp_K)) # calculate moles CO2 in gas phase
samples_calibrated_summ <- samples_calibrated_summ %>% mutate(mol_CO2_aq = mol_CO2_g * vol_l_ml*1e-3 * Hcc_CO2_lab_temp_and_ionic_strength / ((`vol_vial [ml]` - vol_l_ml)*1e-3)) #calculated moles CO2 in aqueous phase
samples_calibrated_summ <- samples_calibrated_summ %>% mutate(mol_CO2_tot = mol_CO2_g + mol_CO2_aq) # sum aqueous and gaseous CO2
samples_calibrated_summ <- samples_calibrated_summ %>% mutate(DIC_uM = mol_CO2_tot / (vol_H2O_sample_ml * 1e-3) * 1e6) # convert total moles CO2 to dissolved inorganic C concentration of initial water sample
### for confidence interval
samples_calibrated_summ <- samples_calibrated_summ %>% mutate(mol_CO2_g_95_confidence = pCO2_bar_95_confidence * (`vol_vial [ml]` - vol_l_ml) * 1e-3 / (R * lab_temp_K)) # calculate moles CO2 in gas phase
samples_calibrated_summ <- samples_calibrated_summ %>% mutate(mol_CO2_aq_95_confidence = mol_CO2_g_95_confidence * vol_l_ml*1e-3 * Hcc_CO2_lab_temp_and_ionic_strength / ((`vol_vial [ml]` - vol_l_ml)*1e-3)) #calculated moles CO2 in aqueous phase
samples_calibrated_summ <- samples_calibrated_summ %>% mutate(mol_CO2_tot_95_confidence = mol_CO2_g_95_confidence + mol_CO2_aq_95_confidence) # sum aqueous and gaseous CO2
samples_calibrated_summ <- samples_calibrated_summ %>% mutate(DIC_uM_95_confidence = mol_CO2_tot_95_confidence / (vol_H2O_sample_ml * 1e-3) * 1e6) # convert total moles CO2 to dissolved inorganic C concentration of initial water sample
# add column in which d13C is rounded to tenth of permil place
samples_calibrated_summ <- samples_calibrated_summ %>% mutate(`DIC_uM rounded tens` = round(DIC_uM, -1))
### clean and print
samples_select <- samples_calibrated_summ %>% select(name, n, `amp44_mean [mV] mean`, amp44_t0, pCO2_bar, pCO2_bar_95_confidence, DIC_uM, DIC_uM_95_confidence, `DIC_uM rounded tens`, quantitatable)
samples_select %>% kable(caption = "DIC concentration from calibration of amp44 t0 vs. pCO2 expected")
name | n | amp44_mean [mV] mean | amp44_t0 | pCO2_bar | pCO2_bar_95_confidence | DIC_uM | DIC_uM_95_confidence | DIC_uM rounded tens | quantitatable |
---|---|---|---|---|---|---|---|---|---|
5 min He purge 190104 | 1 | 59.69181 | 72.97056 | 0.0001409 | 0.0001592 | 65.48267 | 73.94745 | 70 | FALSE |
5 min He purge 190104 + 100 ul H3PO4 190104 | 2 | 225.92858 | 277.84667 | 0.0001954 | 0.0001276 | 90.76586 | 59.27621 | 90 | FALSE |
5 min He purge 190104 + 100 ul H3PO4 190104 + 1 ml boiled MilliQ | 1 | 293.77891 | 362.96777 | 0.0002180 | 0.0001585 | 101.27041 | 73.62560 | 100 | FALSE |
5 min He purge 190104 + 100 ul H3PO4 190104 + 1 ml boiled MilliQ 190104 + 194ug YULE | 1 | 12324.35375 | 15368.38613 | 0.0042036 | 0.0001417 | 1953.04679 | 65.83690 | 1950 | TRUE |
LOQ | 1 | NA | 1830.02298 | 0.0006076 | 0.0001551 | 282.31556 | 72.08228 | 280 | TRUE |
NSHQ14 | 1 | 265.25060 | 331.34241 | 0.0002096 | 0.0001585 | 97.36761 | 73.66044 | 100 | FALSE |
WAB71 | 1 | 256.26572 | 329.74667 | 0.0002091 | 0.0001585 | 97.17069 | 73.66220 | 100 | FALSE |
Plot samples to check that their amplitude roughly makes sense given their calculated \(c_{\sum\text{CO}_2}\). Note that on this plot, the standards’ \(c_{\sum\text{CO}_2}\) was calculated simply with the mass loaded, whereas the samples were calculated based on the pCO\(_2\) calibration. The samples do plot as expected. It is re-assuring to see that one check sample (5 min He purge 190104 + 100 ul H3PO4 190104 + 1 ml boiled MilliQ 190104 + 194ug YULE) plots quite close to a standard of 198ug YULE that was prepared just prior to analysis.
Dashed line = LOQ.
LOQ_DIC_um <- as.numeric(samples_select %>% filter(name == "LOQ") %>% select(DIC_uM))
amp44.DIC.sample.stnd.check <- ggplot(samples_calibrated_summ, aes(x=amp44_t0, y=DIC_uM, label=name, color = "samples")) +
geom_hline(yintercept = LOQ_DIC_um, linetype = "dashed", alpha = 0.3)+
geom_point()+
geom_point(data = linC, aes(x=amp44_t0, y=DIC_uM, label=file_id, color="standards")) +
scale_y_continuous(name = "DIC (µM)") +
scale_x_continuous(name = ("m/z 44 amplitude t0 (mV)"))+
theme_bw()
## Warning: Ignoring unknown aesthetics: label
amp44.DIC.sample.stnd.check %>% ggplotly()
First round of culling looks at standard deviations of the stable isotope values of the individual peaks (typically there are 10 peaks, prior to previous culling for bad injects), and uses that information to identify analytical outliers or samples with problems or too few peaks. Often the “cutoff” of outliers tends to be sd of 0.075 - 0.1 permil.
Make summary plots of reproducibility of isotopic values.
data <- data %>% filter(amp44_t0 >= S_A_LOQ) # filter for samples with signal greater than or equal LOQ for concentration
sd.hist <- data %>% ggplot(aes(x=d13C.sd, fill = amp44_t0)) +
geom_histogram(binwidth=.01) +
theme_bw()+
theme(axis.text.x = element_text(angle = 90, hjust = 1))
sd.values <- data %>% filter(!is.na(d13C.sd)) %>% ggplot(aes(x=as.character(Analysis), y=d13C.sd, label=id1, color = `amp44_mean [mV]`)) +
geom_point() +
theme_bw()+
theme(axis.text.x = element_text(angle = 90, vjust = .5))+
scale_color_gradientn(colours = c("red", "blue", "blue"), values = c(0, 0.2, 1))
sd.v.amp44 <- data %>% filter(!is.na(d13C.sd)) %>% ggplot(aes(x=`amp44_mean [mV]`, y=d13C.sd, fill=factor(num.peaks), label=file_id)) +
geom_point(size=3, shape=21)+
theme_bw()+
scale_fill_discrete(name="# peaks")
sd.hist
## Warning: Removed 1 rows containing non-finite values (stat_bin).
sd.values %>% ggplotly()
sd.v.amp44 %>% ggplotly()
Remove any data points that did not replicate within uncertainty for the individual peaks, redo plots - creates a “culled data” file that shows samples and standards that shouldn’t be used.
d13C.sd.cutoff <- 0.075 # set a standard deviation on d13C measurements between peaks that you deem acceptable
culled.data <- subset(data, d13C.sd>d13C.sd.cutoff) # subset data that don't meet the acceptability threshold for d13C standard deviation
wo.culled <- subset(data, d13C.sd<d13C.sd.cutoff) # subset data that do meet the threshold
#print
culled.data
Plot yields of the standards, using interactive plots. Use this to cull more standards if need be, by looking for statistical outliers that coincide with yield problems. Note: LSVEC is LiCO\(_3\), whereas the other standards are CaCO\(_3\), so it is expected that LSVEC has a different ratio of amplitude to mass loaded. Thus, all plotted standards look fine in this analytical session.
# make a data frame of standards
stds1 <- subset(wo.culled, type_general == "standard")
# name standards. omit flush gas (CU sequoia)
stds1 <- stds1 %>% mutate(standard = case_when(
str_detect(id1, "YULE") == TRUE ~ "YULE",
str_detect(id1, "HIS") == TRUE ~ "HIS",
str_detect(id1, "LSVEC") == TRUE ~ "LSVEC"
)) %>% filter(!is.na(standard))
yield.stds <- ggplot(stds1, aes(x = `mass_loaded [ug]`, y = `amp44_mean [mV]`, label=file_id)) +
geom_point(aes(color=standard)) +
theme_bw()
# note: LSVEC is LiCO3, not CaCO3, so it is expected to be above the yield of the other standards
d13C.stds <-
ggplot(stds1, aes(label=file_id)) +
geom_point(shape=21, mapping = aes(x =`amp44_mean [mV]`, y = `d13C.measured [permil]`, fill = standard)) +
facet_grid(standard ~ ., scales = "free") +
theme_bw()
ggplotly(yield.stds)
ggplotly(d13C.stds )
# adjust next line only: change #'s in stds.to.cull to reflect the analysis #'s that need to be culled, add analysis #'s as needed and rerun after looking at the new plots
stds.to.cull <- c(16609) # analyses with yield problem and/or significant outlier in d13C
# bad yield for analysis 16609
stds.culled <- filter(stds1, Analysis %in% stds.to.cull)
culled.data<-bind_rows(culled.data, stds.culled)
stds2 <- filter(stds1, !Analysis %in% stds.to.cull)
wo.culled <- filter(wo.culled, !Analysis %in% stds.to.cull)
# omit LSVEC from plot to focus on CaCO3 standards
yield.stds2<- stds2 %>% filter(standard != "LSVEC") %>% ggplot(aes(x = `mass_loaded [ug]`, y = `amp44_mean [mV]`, label=file_id)) +
stat_smooth(method="lm") +
geom_point(aes(color=standard)) +
theme_bw()
d13C.stds2 <-
ggplot(stds2, aes(label=Analysis)) +
geom_point(shape=21, mapping = aes(x = `amp44_mean [mV]`, y = `d13C.measured [permil]`, fill=standard)) +
facet_grid(standard ~ ., scales = "free") +
theme_bw()
ggplotly(yield.stds2)
## `geom_smooth()` using formula 'y ~ x'
ggplotly(d13C.stds2)
standards <-
tibble::tribble(
~name, ~true_d13C,
"HIS", -4.80,
"LSVEC", -46.6,
"YULE", -3.12
) %>%
mutate(
true_d13C = iso_double_with_units(true_d13C, "permil")
)
standards %>% knitr::kable(digits = 2)
name | true_d13C |
---|---|
HIS | -4.80 |
LSVEC | -46.60 |
YULE | -3.12 |
wo.culled_w_stds <-
wo.culled %>%
iso_add_standards(stds = standards, match_by = c(name))
## Info: matching standards by 'name' - added 3 standard entries to 17 out of 19 rows, added new column 'is_std_peak' to identify standard peaks
calibs <- wo.culled_w_stds %>%
# prepare for calibration
iso_prepare_for_calibration() %>%
# run calibrations
iso_generate_calibration(
model = c(
# reference scale correction
delta_only = lm(`d13C.measured [permil]` ~ true_d13C),
# multivariate with delta and amplitude
delta_and_ampl = lm(`d13C.measured [permil]` ~ true_d13C + `amp44_mean [mV]`),
# + the delta and amplitude cross term
delta_cross_ampl = lm(`d13C.measured [permil]` ~ true_d13C * `amp44_mean [mV]`),
# multivariate with delta and the datetime (i.e. checking for temporal drift)
delta_and_time = lm(`d13C.measured [permil]` ~ true_d13C + file_datetime),
delta_cross_time = lm(`d13C.measured [permil]` ~ true_d13C * file_datetime),
# multivariate with delta, amplitude and datetime
delta_and_ampl_and_time = lm(`d13C.measured [permil]` ~ true_d13C + `amp44_mean [mV]` + file_datetime),
# multivariate with delta cross amplitude and datetime
delta_cross_ampl_and_time = lm(`d13C.measured [permil]` ~ true_d13C * `amp44_mean [mV]` + file_datetime)
),
# specify which peaks to include in the calibration, here:
# - all std_peaks (this filter should always be included!)
use_in_calib = is_std_peak
)
## Info: preparing data for calibration by nesting the entire dataset
## Info: generating calibration based on 7 models (delta_only = 'lm(`d13C.measured [permil]` ~ true_d13C)', delta_and_ampl = 'lm(`d13C.measured [permil]` ~ true_d13C + `amp44_mean [mV]`)', delta_cross_ampl = 'lm(`d13C.measured [permil]` ~ true_d13C * `amp44_mean [mV]`)', delta_and_time = 'lm(`d13C.measured [permil]` ~ true_d13C + file_datetime)', delta_cross_time = 'lm(`d13C.measured [permil]` ~ true_d13C * file_datetime)', delta_and_ampl_and_time = 'lm(...)', delta_cross_ampl_and_time = 'lm(...)') for 1 data group(s) with standards filter 'is_std_peak'. Storing residuals in new column 'resid'. Storing calibration info in new column 'in_calib'.
# look at coefficients and summary
calibs %>%
# unnest calibration parameters
iso_get_calibration_parameters(
select_from_coefs =
c(term, estimate, SE = std.error, signif),
select_from_summary =
c(fit_R2 = adj.r.squared, fit_RMSD = deviance, residual_df = df.residual)) %>%
arrange(term) %>%
knitr::kable(digits = 4)
## Info: retrieving coefficient column(s) 'c(term, estimate, SE = std.error, signif)' for calibration
## Info: retrieving summary column(s) 'c(fit_R2 = adj.r.squared, fit_RMSD = deviance, residual_df = df.residual)' for calibration
calib | calib_ok | calib_points | term | estimate | SE | signif | fit_R2 | fit_RMSD | residual_df |
---|---|---|---|---|---|---|---|---|---|
delta_only | TRUE | 17 | (Intercept) | 11.8937 | 0.0219 | *** (p < 0.001) | 0.9999 | 0.0889 | 15 |
delta_and_ampl | TRUE | 17 | (Intercept) | 11.6781 | 0.0471 | *** (p < 0.001) | 1.0000 | 0.0337 | 14 |
delta_cross_ampl | TRUE | 17 | (Intercept) | 11.7373 | 0.4887 | *** (p < 0.001) | 1.0000 | 0.0336 | 13 |
delta_and_time | TRUE | 17 | (Intercept) | -1006.9306 | 3697.9322 | - | 0.9999 | 0.0884 | 14 |
delta_cross_time | TRUE | 17 | (Intercept) | -4784.4688 | 21620.5198 | - | 0.9999 | 0.0882 | 13 |
delta_and_ampl_and_time | TRUE | 17 | (Intercept) | -326.1208 | 2371.0323 | - | 1.0000 | 0.0336 | 13 |
delta_cross_ampl_and_time | TRUE | 17 | (Intercept) | -551.7311 | 2724.2058 | - | 1.0000 | 0.0335 | 12 |
delta_and_ampl | TRUE | 17 | amp44_mean [mV] |
0.0000 | 0.0000 | *** (p < 0.001) | 1.0000 | 0.0337 | 14 |
delta_cross_ampl | TRUE | 17 | amp44_mean [mV] |
0.0000 | 0.0000 | - | 1.0000 | 0.0336 | 13 |
delta_and_ampl_and_time | TRUE | 17 | amp44_mean [mV] |
0.0000 | 0.0000 | *** (p < 0.001) | 1.0000 | 0.0336 | 13 |
delta_cross_ampl_and_time | TRUE | 17 | amp44_mean [mV] |
0.0000 | 0.0000 | - | 1.0000 | 0.0335 | 12 |
delta_and_time | TRUE | 17 | file_datetime | 0.0000 | 0.0000 | - | 0.9999 | 0.0884 | 14 |
delta_cross_time | TRUE | 17 | file_datetime | 0.0000 | 0.0000 | - | 0.9999 | 0.0882 | 13 |
delta_and_ampl_and_time | TRUE | 17 | file_datetime | 0.0000 | 0.0000 | - | 1.0000 | 0.0336 | 13 |
delta_cross_ampl_and_time | TRUE | 17 | file_datetime | 0.0000 | 0.0000 | - | 1.0000 | 0.0335 | 12 |
delta_only | TRUE | 17 | true_d13C | 1.0259 | 0.0018 | *** (p < 0.001) | 0.9999 | 0.0889 | 15 |
delta_and_ampl | TRUE | 17 | true_d13C | 1.0262 | 0.0012 | *** (p < 0.001) | 1.0000 | 0.0337 | 14 |
delta_cross_ampl | TRUE | 17 | true_d13C | 1.0446 | 0.1515 | *** (p < 0.001) | 1.0000 | 0.0336 | 13 |
delta_and_time | TRUE | 17 | true_d13C | 1.0259 | 0.0019 | *** (p < 0.001) | 0.9999 | 0.0884 | 14 |
delta_cross_time | TRUE | 17 | true_d13C | -1061.3315 | 5984.0311 | - | 0.9999 | 0.0882 | 13 |
delta_and_ampl_and_time | TRUE | 17 | true_d13C | 1.0262 | 0.0012 | *** (p < 0.001) | 1.0000 | 0.0336 | 13 |
delta_cross_ampl_and_time | TRUE | 17 | true_d13C | 1.0600 | 0.1741 | *** (p < 0.001) | 1.0000 | 0.0335 | 12 |
delta_cross_ampl | TRUE | 17 | true_d13C:amp44_mean [mV] |
0.0000 | 0.0000 | - | 1.0000 | 0.0336 | 13 |
delta_cross_ampl_and_time | TRUE | 17 | true_d13C:amp44_mean [mV] |
0.0000 | 0.0000 | - | 1.0000 | 0.0335 | 12 |
delta_cross_time | TRUE | 17 | true_d13C:file_datetime | 0.0000 | 0.0000 | - | 0.9999 | 0.0882 | 13 |
The visualization of the calibration parameters reveals that as expected the scale contraction and amplitude calibrations are highly statistically relevant (***
= p.value < 0.001). Dt (drift) is also statistically relevant (*
= p.value < 0.05).
calibs %>% iso_plot_calibration_parameters()
calibs_applied <-
calibs %>%
# which calibration to use? can include multiple if desired to see the result
# in this case, the amplitude- and time-conscious calibrations are applied
filter(calib == "delta_and_ampl_and_time") %>%
# apply calibration indication what should be calculated
iso_apply_calibration(true_d13C, calculate_error = TRUE)
## Info: applying calibration to infer 'true_d13C' for 1 data group(s); storing resulting value in new column 'true_d13C_pred' and estimated error in new column 'true_d13C_pred_se'. This may take a moment... finished.
# calibration ranges
calibs_with_ranges <-
calibs_applied %>%
# evaluate calibration range for the measured amplitude and predicted d13C
iso_evaluate_calibration_range(`amp44_mean [mV]`, true_d13C_pred)
## Info: evaluating range for terms 'amp44_mean [mV]' and 'true_d13C_pred' in calibration for 1 data group(s); storing resulting summary for each data entry in new column 'in_range'.
# show calibration ranges
calibs_with_ranges %>%
iso_get_calibration_range() %>%
iso_remove_list_columns() %>%
knitr::kable(d = 2)
## Info: retrieving all calibration range information for calibration
calib | calib_ok | calib_points | term | units | min | max |
---|---|---|---|---|---|---|
delta_and_ampl_and_time | TRUE | 17 | amp44_mean [mV] | NA | 6484.92 | 22804.68 |
delta_and_ampl_and_time | TRUE | 17 | true_d13C_pred | permil | -46.60 | -3.04 |
# create calibrated peak table
peak_table_calibrated <- calibs_with_ranges %>%
iso_get_calibration_data()
## Info: retrieving all data
All reported samples are within calibrated range.
# replicate earlier overview plot but now with the calibrated delta values
# and with a highlight of the calibration ranges and which points are in range
peak_table_calibrated %>%
# visualize with convenience function iso_plot_data
iso_plot_data(
# choose x and y (multiple y possible)
x = `amp44_mean [mV]`, y = true_d13C_pred,
# choose aesthetics
color = in_range, shape = is_std_peak, size = 3,
# decide what geoms to include
points = TRUE
) %>%
# highlight calibration range
iso_mark_calibration_range() +
# legend
theme(legend.position = "bottom", legend.direction = "vertical")
# generate data summary
peak_data <-
peak_table_calibrated
# summarize replicates
peak_data_summary <-
peak_data %>%
# summarize for each sample and compound
group_by(name) %>%
iso_summarize_data_table(`amp44_mean [mV]`, true_d13C_pred, true_d13C_pred_se) %>% select(-`true_d13C_pred_se sd`)
# add column in which d13C is rounded to hundredth of permil place
peak_data_summary <- peak_data_summary %>% mutate(`d13C rounded hundredth` = round(`true_d13C_pred mean`, 2))
# print
peak_data_summary %>% iso_make_units_explicit() %>% knitr::kable(d = 2)
name | n | amp44_mean [mV] mean | amp44_mean [mV] sd | true_d13C_pred mean [permil] | true_d13C_pred sd | true_d13C_pred_se mean [permil] | d13C rounded hundredth [permil] |
---|---|---|---|---|---|---|---|
5 min He purge 190104 + 100 ul H3PO4 190104 + 1 ml boiled MilliQ 190104 + 194ug YULE | 1 | 12324.35 | NA | -3.34 | NA | 0.05 | -3.34 |
CU Sequoia flush gas | 1 | 8312.95 | NA | -39.59 | NA | 0.08 | -39.59 |
HIS | 5 | 12630.76 | 415.10 | -4.80 | 0.03 | 0.05 | -4.80 |
LSVEC | 1 | 14079.41 | NA | -46.60 | NA | 0.07 | -46.60 |
YULE | 11 | 13419.68 | 4485.28 | -3.12 | 0.05 | 0.05 | -3.12 |
# add data about DIC concentration to d13C-calibrated samples
samples_summ_w_concs <- peak_data_summary %>% left_join(samples_select %>% select(amp44_t0, DIC_uM, DIC_uM_95_confidence, `DIC_uM rounded tens`, quantitatable, name), by = "name")
# add data about LOQ and samples below it
samples_summ_w_concs_w_LOQ <- union(samples_summ_w_concs, peak_data_summary %>% full_join(samples_select %>% select(n, amp44_t0, `amp44_mean [mV] mean`, DIC_uM, DIC_uM_95_confidence, `DIC_uM rounded tens`, quantitatable, name)))
## Joining, by = c("name", "n", "amp44_mean [mV] mean")
# remove flush gas from data output
samples_summ_w_concs_w_LOQ <- samples_summ_w_concs_w_LOQ %>% filter(name!= "CU Sequoia flush gas")
# arrange summary data by column describing whether it was above limit of quanitation
samples_summ_w_concs_w_LOQ <- samples_summ_w_concs_w_LOQ %>% arrange(-quantitatable)
# print
samples_summ_w_concs_w_LOQ %>% iso_make_units_explicit() %>% knitr::kable(d = 2)
name | n | amp44_mean [mV] mean | amp44_mean [mV] sd | true_d13C_pred mean [permil] | true_d13C_pred sd | true_d13C_pred_se mean [permil] | d13C rounded hundredth [permil] | amp44_t0 | DIC_uM | DIC_uM_95_confidence | DIC_uM rounded tens | quantitatable |
---|---|---|---|---|---|---|---|---|---|---|---|---|
5 min He purge 190104 + 100 ul H3PO4 190104 + 1 ml boiled MilliQ 190104 + 194ug YULE | 1 | 12324.35 | NA | -3.34 | NA | 0.05 | -3.34 | 15368.39 | 1953.05 | 65.84 | 1950 | TRUE |
LOQ | 1 | NA | NA | NA | NA | NA | NA | 1830.02 | 282.32 | 72.08 | 280 | TRUE |
5 min He purge 190104 | 1 | 59.69 | NA | NA | NA | NA | NA | 72.97 | 65.48 | 73.95 | 70 | FALSE |
5 min He purge 190104 + 100 ul H3PO4 190104 | 2 | 225.93 | NA | NA | NA | NA | NA | 277.85 | 90.77 | 59.28 | 90 | FALSE |
5 min He purge 190104 + 100 ul H3PO4 190104 + 1 ml boiled MilliQ | 1 | 293.78 | NA | NA | NA | NA | NA | 362.97 | 101.27 | 73.63 | 100 | FALSE |
NSHQ14 | 1 | 265.25 | NA | NA | NA | NA | NA | 331.34 | 97.37 | 73.66 | 100 | FALSE |
WAB71 | 1 | 256.27 | NA | NA | NA | NA | NA | 329.75 | 97.17 | 73.66 | 100 | FALSE |
HIS | 5 | 12630.76 | 415.10 | -4.80 | 0.03 | 0.05 | -4.80 | NA | NA | NA | NA | NA |
LSVEC | 1 | 14079.41 | NA | -46.60 | NA | 0.07 | -46.60 | NA | NA | NA | NA | NA |
YULE | 11 | 13419.68 | 4485.28 | -3.12 | 0.05 | 0.05 | -3.12 | NA | NA | NA | NA | NA |
Save data to xlsx spreadsheet.
# export the global calibration with all its information and data to Excel
peak_table_calibrated %>%
iso_export_calibration_to_excel(
filepath = format(Sys.Date(), "data_output/%Y%m%d_190208_DBN_DIC_calibrated.xlsx"),
# include data summary as an additional useful tab
`data summary` = samples_summ_w_concs_w_LOQ
)
## Info: exporting calibrations into Excel '20200812_190208_DBN_DIC_calibrated.xlsx'...
## Info: retrieving all data
## Info: retrieving all coefficient information for calibration
## Info: retrieving all summary information for calibration
## Info: retrieving all calibration range information for calibration
## Info: export complete, created tabs 'data summary', 'all data', 'calib coefs', 'calib summary' and 'calib range'.