Graphical Query Builder
An introduction to a graphical query builder
To ask a question using a graphical query builder, go to Data Studio and click the + New button in the upper right of the main navigation bar and select Question.
You can start a question from:
- A model. A model is a special kind of saved question meant to be used as a good starting point for questions. These would typically contain data pulled together data from one or multiple raw tables.
- Raw data. You'll need to specify the raw data table as the starting point for your question.
- A saved question. You can use the results of any question as the starting point for a new question.
The query builder
Once you select your data, Xtremepush will take you to the query builder. Say you selected Raw data > Bet Settlement, then you'll see something like this:
This is the query builder's notebook editor. It has three default steps.
To the right of each completed step is a Preview button that shows you the first 10 rows of the results of your question up to that step.
Picking data
The data section is where you select the data you want to work with. It allows to pick a model, a table from a database, or a saved question. You can click on a table to select which columns you want to include in your results.
Filtering
Filtering means narrowing things down based on certain criteria. You're probably already familiar with filtering when looking for something online, like when shopping. Maybe you only want to see olive-colored pants, or books where the author's last name is "Borges," or pictures of people wearing olive-colored pants reading Jorge Luis Borges.
When you add a filter step, you can select one or more columns to filter on. Depending on the data type of the column you pick, you'll get different filter types, like a calendar for date columns.
You can add subsequent filter steps after each summarize step. This lets you do things like summarize by the count of rows per month, and then add a filter on the count
column to only include rows where the count is greater than 100, which is like a SQL HAVING
clause.
Once you're happy with your filter, click Add filter, and visualize your results. Your data will be updated with the filter applied.
If you want to edit your filter, click the purple filter at the top of the screen. Clicking on the X button removes your filter. It's possible to add as many filters as you need.
Filter types
Broadly speaking, there are three types of columns, each with their own set of filtering options:
- Numeric columns let you add filters to only include rows in your table where this number is between two specific values, or is greater or less than a specific value, or is exactly equal to something.
- Text or category columns let you specify that you only want to include data where this column is or isn't a specific option, or you can exclude empty cells in that column.
- Date columns give you a lot of options to filter by specific date ranges, relative date ranges, and more.
Filtering by date
One important thing to understand when filtering on a date column is the difference between specific and relative dates:
- Specific dates are things like November 1, 2010, or June 3 – July 12, 2017; they always refer to the same date(s).
- Relative dates are things like "the past 30 days," or "the current week;" as time passes, the dates these options refer to change. Relative dates are a useful way to set up a filter on a question so that it stays up-to-date by showing you, for example, how many people visited your website in the last 7 days. You can also click on the ... to specify a Starting from option, which lets you offset the relative date range. For example, you could set the range as the "Previous 7 days, starting from 2 days ago".
Filters with OR
OR
If you have a more complex filter you're trying to express, you can pick Custom Expression from the add filter menu to create a filter expression. You can use comparison operators like greater than, >
, or less than ,<
, as well as spreadsheet-like functions. For example, [Stake Amount] > 100 OR [Odds] > 3.0
. Learn more about writing expressions or skip right to the list of expressions.
Summarizing and grouping by
When we have a question like "how many bets each user has placed in the last month?", we're asking for a summary of the data. A summary is usually made up of two parts: one or more numbers we care about (called a "metric" in data-speak), and how we want to see that number grouped or broken out. To answer that example question of "How many bets each user has placed in the last month?"
- The metric would be the count of bets (the count of rows).
- We want that metric to be grouped by "Profile UID".
- And we want to filter the rows for "last month."
There are two common ways you'll tend to summarize your data:
- Counting the number of rows in your table
- Getting the sum or average of a numeric column
For most of the questions, you'll then group that metric by Profile UID, which identifies each unique user profile in your single customer view.
Adding a summarize step lets you choose how to aggregate the data from the previous step. You can pick one or more metrics, and optionally group those metrics by one or more dimensions (columns). When picking your metrics you can choose from basic functions like sum
, average
, and count
; or you can pick a common metric defined by an admin; or you can create a custom expression by writing a formula.
Common metrics include:
- Count of rows: the total number of rows in the table, after any filters have been applied. If you're looking at your
Orders
table and want to know how many orders were placed with a price greater than $40, you’d filter by “Price greater than 40,” and then selectCount of rows
, because you want Xtremepush to count how many orders matched your filter. - Sum of …: the sum of all the values in a specific column.
- Average of …: the average of all the values in a single column.
- Number of distinct values of…: the number of unique values in all the cells of a single column. This is useful when trying to find out things like how many different types of products were sold last month (not how many were sold in total).
- Cumulative sum of…: This gives you a running total for a specific column. In order for this metric to be useful you'll need to group it by a date column to see it across time.
- Cumulative count of rows: This gives you a running total of the number of rows in the table over time. Just like
Cumulative sum of…
, you'll need to group this by a date column in order for it to be useful. - Standard deviation of …: A number which expresses how much the values of a column vary, plus or minus, from the average value of that column.
- Minimum of …: The minimum value present in the selected field.
- Maximum of …: The maximum value present in the selected field.
If you summarize and add a grouping you can then summarize again. You can also add steps to filter and/or join in between. For example, your first summarization step could be to get the count of orders per month, and you could then add a second summarization step to get the average monthly order total by selecting the Average of…
your count
column.
You can also add metrics and groupings on the results page in a sidebar: the top of the sidebar where you pick the number ("metric") you want to see, and the part below is where you pick how to group that number (or how to "break it out").
If your admins have created any named metrics that are specific to your company or organization, they will be in this dropdown under the Common Metrics section. These might be things like your company’s official way of calculating revenue.
Custom expressions
Custom expressions allow you to use spreadsheet-like functions and simple arithmetic within or between aggregation functions.
For example, you could do Average(sqrt[FieldX]) + Sum([FieldY])
or Max(floor([FieldX] - [FieldY]))
, where FieldX
and FieldY
are fields in the currently selected table. Learn more about writing expressions.
Creating custom columns
Custom columns are helpful when you need to create a new column based on a calculation, such as subtracting the value of one column from another, or extracting a portion of an existing text column. Custom columns that you add aren't permanently added to your table; they'll only be present in the given question.
You can use the following math operators in your formulas: +
, –
, *
(multiplication), and /
(division), along with a whole host of spreadsheet-like functions. You can also use parentheses to clarify the order of operations.
Viewing the SQL that powers your question
Under the hood, all Xtremepush questions are SQL. If you're curious to see the SQL that will get run when you ask your question, you can click the console icon in the top-right of the notebook editor. In the modal that opens up, you'll also be given the option to start a new query in the SQL editor using this generated SQL as a starting point (assuming you have SQL Permissions). It's a nice little shortcut to have Xtremepush write some boilerplate SQL for you, but then allows you to tweak and customize the query.
Computed attributes requirements
If you wish to use your question to populate user-level computed attributes you have to follow specific rules:
- The result dataset must include either Profile UID or User ID as one of the columns:
- Profile UID is an ID of a unified user profile in Xremepush, which is recommended to use for performance reasons and is required if you wish your question to be computed in realtime.
- Other columns in dataset must represent individual attributes for those users:
- The name of the column would appear as an attribute name in the single customer view.
- At least one attribute column must be present.
See our computations guide to learn how to activate your computed attributes to make them available for usage in audiences, decisioning and personalisation.
Updated 6 months ago