Alright, monthly goals are to do the following!

  • Clean survey data to prepare for modeling (ASAP)
  • Carry out modeling project (by end of month)

We also have some secondary goals - these are more dependent on when I get the edits back, but

  • Incorporate Steven’s edits in the proposal and submit
  • Incorporate Steven’s edits in the paper and finish Draft 2

Monday, August 1st

After a lot (a lot) of edits and revisions to the code, I managed to clean the (very messy) temperature files from all survey legs. This typically required re-doing the read-in process - a common issue was that my code would successfully read in the file, but would leave me with NAs in a column, as the file delimiters and formats were pretty wildly different.

Work has been done both in 2_4_fixing_temp_files.Rmd and in 2_5_merging_temperature_data.Rmd, with a few in my R script containing my functions.

I also accomplished the following steps on this:

  • Edited some Tidbit filenames to be consistent with the Tidbit names in the pot data file (though since there’s generally more variance within names in the pot data file, there’s still a lot of work to do for consistency here)

  • Downloaded the HOBOware software (required for viewing .hobo files and converting them to .csv files), and used it to obtain the temperature data for the 2019 RKC Leg 2 survey data (available only as .hobo files)

  • Moved all processes in these scripts to an output/ folder so we aren’t directly editing the data

  • Checked for NAs, blanks, incorrect values/formats in the data

Tasks still remaining:

  • Check to see we aren’t missing any Tidbits

  • Get the pot data file (../data/ADFG_SE_AK_pot_surveys/Pot_Set_Data_for_Tanner_and_RKC_surveys.csv) cleaned up. Notably, this is going to involve moving a lot of comments into the Tidbits data column - there are some years in which all data on Tidbit matching took place in this column. It’ll be a huge, huge pain, but hey!

Tuesday, August 2nd

Spent today working on the pot data file (mentioned above), and successfully moved all comments describing Tidbits into the data frame! Per usual, tons of different formatting methods were used originally (new bumper sticker: standardize your data, make a future researcher happy), but at this point, it should all now be fixed!

Four years (2005-2008) had either all or nearly all Tidbit IDs entered into the comments column, and 2009 had a number that were there as well. In other years, there were only a few odd cases to deal with.

I also spent a good portion of the day accomplishing what I thought I did yesterday - completing the read-in of all Tidbit temperature files. Turns out, the formatting on a few files wasn’t quite right, and my function was ignoring the files.

Finally, I worked on double-checking that all files read in correctly. A number of different time formats were used, and I was worried that either a) the AM/PM component was being ignored within some files, leading to incorrect times, or b) times were rounding oddly. I was correct with b) in a few cases, but fixed the issues and all is now well!

Next goal: merge the pot set dataframe with the huge Tidbit temperature dataframe. Since both files are quite large (around 20,000 lines and 400,000 lines, respectively), I think it likely makes the most sense to do this on a year-by-year basis. But hey, might as well give merging the whole thing at once a shot and see if it takes an unreal amount of time!

Wednesday, August 3rd

Took almost the full day off today - physically wiped out from the monkeypox vaccine and didn’t wake up until 2pm! Still, managed to successfully merge all the temperature and Tidbit data! Set up a neat little for loop to accomplish this in R.

One concern: it looks like for a few surveys (2016 and 2019 RKC surveys in particular, potentially some others), the datetimes may not have read in correctly. I’m getting anomalously high average temps (10C and above, with one up to 14C). This could be natural, but it definitely sets off some red flags. I think non-submerged datetimes may be present within these values. Next goal: take a closer look at them and see what’s up!

Thursday, August 4th

Three solid accomplishments today! First, I went back and fixed up something I forgot yesterday! I’ve got these two dataframes - one with all the pot data, and one with all the tidbit data. I had merged them by Tidbit ID (and then filtered by haul times to ensure I only had temperatures for when the pot was in the water). However, I forgot that the pot data sometimes had odd names in Tidbit ID! For example, the same Tidbit might be called 22, 2207, or 22-07. I spent a good portion of the day making that tidbit_id column in the pot data consistent with the names in the tidbit data, thus improving the join and getting more data.

Next, I took a peek at the anomalously high average temperatures for some surveys, and fixed them! The issue was a failure to read in AM/PM times for certain legs of the survey (specifically, legs that were saved in a .csv format). Spent half an hour looking for the error, and then fixed it in like half a line. Ain’t that how it always goes?

I also examined some additional anomalously high average temperatures, and filtered out any pot sets that looked iffy. “Iffy” to me meant the following:

  • Pots with a set time or haul time of 0:00:00 (likely to be missing true time info)
  • Pots in which the maximum temperature was 3+ degrees above the minimum temp over the course of the deployment

I also took a long look at any days that were overrepresented in the highest temperatures. I specifically examined the temperature pattern - did it get colder after deployment? did it warm up after being hauled? - along with the NOAA weather data from that time period and area. One I decided to eliminate (2011 Tanner crab, Leg 1, pots set on 10-07).

Finally, I took a peek at any remaining elevated temperatures and eliminated if they were suspicious!

The next accomplishment: actually joining the crab data with the merged pot/temperature data! We’ve got about 150,000 crab that were examined for bitter crab syndrome AND have temperature data!

Next steps:

  • Clean up the data a bit
  • Start exploring the data! Look at temp by location, by year, etc.

Friday, August 5th

Didn’t do any lab work today - it was dedicated entirely to merch!

Monday, August 8th Spent the day exploring the pot data! Didn’t find anything particularly interesting honestly, though I did locate some errors in the latitude/longitude values! It’s just a few rows, but don’t think it’s worth manually fixing those - we’ll just remove them entirely.

Thursday, August 11th Today I finished up exploring the pot data! Everything looks good so far, and spotted some potentially interesting relationships (keep my eye on temp vs. date, but temp vs. latitude should be fine, for instance). Goal for tomorrow: clean up the crab data!