Bulldogtech.org

Bulldogtech.org

Causes of Data Lapse and Strategies for Reducing It

Causes of Data Lapse and Strategies for Reducing It

Data sprawl refers to the accumulation of massive amounts of data within an organization that can no longer be managed effectively. It can lead to increased management overhead (tying up your most talented technical people with less impactful administrative tasks), security risks, hidden costs, and missed opportunities for analytics. It can also impede business agility, since teams must wait for IT to manage the data they need. It’s becoming a huge problem as software-as-a-service (SaaS) applications continue to proliferate.

Ideally, you want to prevent data sprawl from occurring in the first place by ensuring that your employees have access to all of the tools they need to do their jobs well. But if your team is already struggling with data sprawl, it’s important to understand what’s causing it and how to address the issue quickly. This article will provide an overview of the main causes of data sprawl and some strategies for reducing it.

The Southern Great Plains (SGP) atmospheric observatory is the world’s most extensive field measurement site and one of the key sites for improving Earth system models. Located on 160 acres of cattle pasture and wheat fields southeast of Lamont, Oklahoma, the heart of the SGP observatory is the heavily instrumented Central Facility. The site offers high-quality measurements for scientists to study cloud, aerosol, and atmospheric processes. In addition to continuous observations from the SGP instruments, researchers supplement the continuous operations with guest instruments during field research campaigns or by requesting additional measurements, such as sonde launches.

SGP analyses are designed to be simple and straightforward following proper data preparation. Any errors that are encountered during SGP analyses typically revert back to issues with the data that was prepared for the analysis.

The sgpdata package contains 4 example data sets that can be used with SGP analyses. The sgpdata_WIDE data set specifies the WIDE format that’s used by lower level SGP functions like studentGrowthPercentiles and studentGrowthProjections. The sgpdata_LONG and sgptData_LONG data sets specify the LONG format that’s used by higher level SGP wrapper functions like abcSGP, prepareSGP, analyzeSGP.

The first column in the sgpData data set, ID, provides the unique student identifier. The next 5 columns, SS_2013, SS_2014, SS_2015, SS_2016, and SS_2017, provide the scale scores associated with each student assessment record for each of the previous 5 years. The final column, sgpdata_INSTRUCTOR_NUMBER, is an anonymized teacher-student lookup table utilized to produce teacher level aggregates by the summarizeSGP function.

All of the SGP observations are transmitted to ARM’s Data Discovery. Data Discovery is a free web-based tool for viewing and exploring the SGP data and other ARM environmental science data sets. SGP data is also available to the public through the ARM Public Data Portal.