Analysing SiteCatalyst data with Tableau-Visualizing 25 million+ natural search results
F or those of you familiar with Omniture SiteCatalyst (Adobe Analytics), the limitation of the tool should be frustratingly obvious when trying to conduct Marketing Campaign analysis on large volumes of data and along multiple breakdowns (SiteCatalyst terminology for dimensions for those coming from a traditional Data Warehousing world). It is precisely for this purpose that Adobe provides their Data Warehouse product that allows Marketers to pull out custom data using arbitrary breakdowns and metrics. In this post, we review a powerful implementation option (used for an actual client) that can enable Marketers to quickly analyze hidden patterns and discover insights using large volumes of historic data. Given the volumes, and the need for multiple dimensions, such analysis cannot be performed within SiteCatalyst and most certainly not in Excel etc.
The client in question here is one of United States’ leading website networks providing range of Christian products and services, including Bibles, Bible studies, church resources and other digital services aimed exclusively towards the Christian faith. The company invests heavily in natural search and gets over 1 million monthly unique visitors from natural search alone. The aim of this analysis was to investigate the performance of an arbitrary grouping of organic search keywords in driving site revenue
Limitations of SiteCatalyst native reporting utilities
Over 6 months ago, the company modified its SEO strategy to focus on analysis of groups of keywords rather than individual entries. While earlier, they would look at Analytics reports and figure out top 10-15 keywords driving traffic and revenue, they soon realized the limited actionability of this reporting format from a content production perspective. Given the volume of data however, there was no easy solution that Analysts (with limited technical skills, tools knowledge limited to Excel) could tap into. Three specific constraints were recorded that showed the limitations of SiteCatalyst native reporting interface, and Excel based analysis of Data Warehouse extracts for 6 months data (estimated at over 25 million records based on extrapolation of one month data)
- Limited breakdowns within SiteCatalyst reports – The company needed at least 4 dimensions in their analysis including Search Keywords – All, Entry Pages, Product SKU ID and Product Category
- If using DWH extracts, they would have to schedule multiple requests for smaller periods and then somehow merge the multiple data files. Even after merging, the combined file size would be far too big to work within existing skillsets
- The team did consider using Tableau but the sheer data volume implied that they could not just upload data into a relational storage and run on-the-fly analysis without running into excessive query times and frequent out-of-memory issues with Tableau
Advanced data analysis with SiteCatalyst-A Business Intelligence approach
This is fairly common scenario in most of our client engagements and beautifully highlights the need for an advanced Business Intelligence solution built using traditional Data Warehousing methods. The solution in this particular case was three fold
1. Developing the logical data model for storing analysis data
SiteCatalyst DWH extracts produce ‘flat’ datasets consisting of simple rows and columns. Transforming these into optimized reporting datasets requires upfront due diligence and discovery that effectively transforms this raw data into metrics and dimensions to be used in final analysis.
Effective data modelling lies at the heart of a successful data analysis project implementation. In this sense, capturing the various metrics and dimensions along with required hierarchies is imperative for actual database design.
In this scenario, the requirement was to analyze the performance of an arbitrary group of keywords in driving category specific metrics for revenue, new orders, upsells, and coupon redemption. For example, it makes a lot more sense to assess the performance of keywords related to bible products in driving revenue for bible products rather than that for other categories such as church supplies or christian music. This involved attaching category specific meta data to each keyword (over 15000 keywords across all categories) outside of SiteCatalyst and then implementing advanced visualizations to track metrics against these category groupings. From a data modelling perspective this involved defining the category taxonomy and then identifying the various metrics and dimensions to be implemented the data warehouse. The various eVars/props already implemented were identified along with a comprehensive set of business rules around data transformations for ‘deriving’ the category of each keyword. Other dimensions included visit time, conversion time, and some other geo-location and local marketing campaign data. This metrics and dimensions ‘inventory’ was then used to define the physical database schema layout
2. Design and build of data extraction and transformation setup
In the ETL step we ensure that the data is preprocessed to the extent possible in order to avoid having to implement complex calculations in Tableau. For analysis of large datasets, this can mean order of magnitude improvements in query performance
While the data modelling step identifies the reporting schema layout, the Extract, Transform and Load (ETL) step involves actually moving data from SiteCatalyst to the target schema.
In this instance, the eVar/Prop inventory identified at the data modelling stage was used to define SiteCatalyst datawarehouse extracts. Multiple extracts were scheduled with each one using the same metrics/breakdowns but pulling data for different reporting periods. A trial and error approach was used to identify the maximum amount of data (based on date range) that could be extracted within one export and files were then arranged to be sent into a staging area via FTP (Emails for such large file would fail regularly). Once all files were received in staging area, it was now time to implement the Transformation and Loading into the Data Mart created in Step 1. At LVMetrics we use the industry leading Talend platform (www.talend.com) for all our data processing needs. Talend was used to implement all the business transformations before loading data into the reporting schema.
Without pre-processing the data as above, the only option would have been to implement complex calculated fields and table calculations within Tableau. Notwithstanding the versatility of Tableau, implementing data transformations in the reporting layer is not technically recommended. This is even more so for large datasets where on-the-fly calculations can result in exceptionally long query processing times resulting in unnecessary hardware augmentation. For example, using appropriate aggregations and data preprocessing, we were not only able to generate reporting ready datasets but also bring down the number of records from over 25 million to under 50,000!
3. Tableau setup and report delivery
With data now in Tableau, the focus shifted to actual analysis. Meticulous upfront data modelling meant that were able to implement multiple analysis patterns including
- Differential reporting-Category level change in metrics compared to arbitrary previous periods)
- Co-relational analysis-Co-relation between local marketing organic keywords and key metrics
- Outlier detection-Keywords that gave unusual performance during specific cyclical periods