> For the complete documentation index, see [llms.txt](https://thesis.marcs.phd/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://thesis.marcs.phd/3.-methodology/3.7-data-analysis.md).

# 3.7 Data Analysis

This section summarises the methods used to prepare and analyse the data. It describes data cleaning, pre-processing, and the procedures of exploratory data analysis, data visualisation, time series, and the use of tools to assist in qualitative data analysis (CAQDA software).

***

### Data Cleaning & Pre-processing

Web scraping is a method for automatically extracting vast quantities of data from web pages. Web scraping tools generate CSV or XLS files, but these data typically require extensive cleaning, triangulation, and preprocessing before and during data analysis. As already mentioned, web scraping was used to collect 13,212 smartphone specifications. The technique could not be used in the collection of aircraft data. The collection of 54,793 aircraft registries and 283 aircraft models was accomplished through manual data collection.

Pre-processing required standard clean-up operations, such as (1) removing duplicates, (2) converting boolean values, (3) converting date formats, and (4) changing text cases. These procedures can be performed by most spreadsheet softwares. Google Sheets was useful for (5) separating texts into columns and transforming unstructured text into structured data categories.

Due to faults in the original website databases, web scraping ended up grabbing the specifications of bar scanners and portable music players. These records had to be identified and deleted. Airframes built for military use, freight, or private executive transport were also excluded from aircraft production lists. These types can be identified by the aircraft’s model or airline operator. Data triangulation was used when information was incomplete, inconsistent, or missing.

***

### Exploratory Data Analysis (EDA)

In 1977, statistician John Tukey released his book titled *Exploratory Data Analysis*, which emphasised the importance of analysing data using visualisation techniques and encouraged data experts to adopt this approach. Tukey’s exploratory stance on data analysis is consistent with Classic Grounded Theory. Because of its iterative and exploratory nature, Glaser considered Exploratory Data Analysis (EDA) a “precursor to the methodology of quantitative grounded theory” (Glaser, 2008, p. 13).

The exploratory approach to data analysis requires adhering to scepticism and openness. Scepticism is crucial when assessing summary measures, which may conceal or misrepresent data, while openness allows for the discovery of unexpected patterns (Hartwig & Dearing, 1979). Unfortunately, data analysis in the social sciences frequently proceeds without openness, opting for a confirmatory mode that tests theory and remains oblivious to alternative data patterns (Glaser, 2008).

To support this exploratory stance, the data collection needs to encompass a large number of variables, allowing the researcher the freedom to explore new patterns. For this purpose, the analysis of mobile phone specifications encompassed 91 product attributes and performance indicators (as seen in Table 3.6.1), while the analysis of passenger aircraft covered 55 variables (Table 3.6.2). This contrasts with how previous innovation studies had surveyed their respective segments (Table 3.3.2). Given the technological limitations of the 1990s, Christensen (1992a) surveyed the evolution of disc drives along 6 variables (price, year, drive capacity, density, access time, and transfer rate). From these variables, Christensen used only 2 (capacity and recording density) to propose a new definition of disruptive innovation.

In common, these previous research postulate that technologies evolve along a performance trajectory. In other words, it is assumed that technological evolution tends to be unidimensional, leading to a more focused analysis of only a few variables. In Christensen’s cases, often only one or two performance dimensions dominate the customer’s choice; However, in many cases, the number of performance dimensions is much higher, and customers trade them off against each other, making for a complex and recursive set of variables (Danneels, 2004).

If these assumptions were adopted in this study, the data collection would be narrowly focused on a few variables, thus compromising the process of grounded theory from the outset. For example, given how much users value CPU performance and display size, smartphones could be analysed along these performance trajectories. However, this assumption would inevitably hide other evolutionary patterns. Rather than enabling the emergence of new concepts, the analysis would be geared toward confirming an extant theory.

***

### Qualitative Data Analysis with CAQDAS

Atlas.ti was used to facilitate the management, coding, and retrieval of qualitative data, such as historical documents (for example, the 1938 edition of Jane’s All the World’s Aircraft), technical literature, and academic literature (evolutionary theory, disruptive innovation, and design evolution).

CAQDAS is an acronym for Computer Assisted Data Analysis Software. Like the majority of CAQDAS, Atlas.ti also features auto-coding, memos, concept mapping, code quantification, word clouds, etc. However, these features were not used in this research. Many of these CAQDAS capabilities, particularly auto-coding, can be detrimental to the Classic Grounded Theory approach.

Atlas.ti was particularly useful for organising PDF libraries, manually coding documents, and retrieving coded texts. However, to explore the theoretical relationships between concepts, I preferred to memo with sticky notes on a whiteboard. This space was used to sort and compare data with data, data with concepts, concepts with concepts, and the relationships between these concepts and the main category. In addition, considering that CAQDAS are not built to accommodate quantitative datasets, a common ground was needed to connect insights from quantitative and qualitative data. The exploratory analysis produced several data visualisations (instances), some of which were printed and linked with their respective concepts on the whiteboard.

Computer-assisted data analysis is a common practice in the social sciences. The researcher should, however, employ CAQDAS tools in accordance with the stated methodological approach. With more automation, qualitative analysis can easily devolve into the codification of massive amounts of data and descriptive codes. Computer coding software is unlikely to solve the problem of too much data; In many cases using transcriptions and coding software leads to a proliferation of codes, which can obstruct the emergence and discovery of latent patterns and can also lead to the analysis being superficial and shallow (Nilsson, 2011). Such mindless coding may, at best, drift the researcher into conceptual description without the capacity to generate a conceptually dense and integrated grounded theory (Holton & Walsh, 2017).&#x20;

Therefore, CAQDAS software should be used in a way that is consistent with Classic Grounded Theory. Glaser has argued that the ability to find connections, latent concepts and to do the integration is a process which takes place in the researcher’s own mind, not in the computer (Glaser, 2003, p.17). To enable this, the software must assist (rather than hinder) the researcher in moving beyond description and reach conceptual analysis.