01 Getting Started

Supported Ruby Versions

by Tomas Korcak

Problem

You want to know which Ruby to use with the SDK.

Solution

Here is a list of supported ruby versions.

  • Ruby 1.9.3

  • Ruby 2.0

  • Ruby 2.1

  • Ruby 2.2

  • JRuby with Oracle Java 1.8.0.51

  • JRuby with OpenJDK 1.8.0.51

Unfortunately JRuby with latest version of Java (1.8.0.60) is not supported because of issues with SSL. Using latest JRuby with Java higher than 1.8.0.51 is going to cause network communication issues (WebDav authentication). We will support latest Java as soon as soon as these issues are resolved.

Working with Project Interactively

by Tomas Svarovsky

Problem

You want to use SDK on a project but writing a script seems like too much of a hassle.

Solution

You first need to install gooddata gem using

gem install gooddata

There are several methods by which you can work with the GoodData Ruby SDK. Let’s look at the major ones.

IRB

irb is an interactive console that is provided with your Ruby installation. You may use gooddata sdk inside your irb. Below are some of the basic steps.

First, launch irb. In your terminal, execute the following:

irb

You should receive a message similar to the following, which indicates that you are inside the interactive Ruby environment:

2.1-head :001 >

Let’s start playing with gooddata. Enter the following:

> GoodData

You should receive the following:

NameError: uninitialized constant GoodData

This error message indicates that irb does not know about the gooddata SDK. So, let’s tell irb to require the SDK:

> require 'gooddata'
  => true

Ok. Now, repeat the previous command:

> GoodData
  => GoodData

The response indicates that irb knows about SDK. Let’s try to log in to the GoodData platform with your credentials through the SDK. * For clarity, the > sign is omitted in the irb session from now on.

client = GoodData.connect("john@example.com", "password")

You should be logged in. Now, you can perform tasks that do not require you to be inside of a specific project. For example, use the following to list all of your projects:

client.projects

To work with a project, you must define the project for the SDK. For example, suppose you wish to list the reports in a project. You must tell the SDK the project to review:

project = client.projects('PROJECT_ID')

To list the reports in this project:

project.reports

Ok. To exit irb, enter:

exit
jack_in

Working with GoodData SDK using irb can be cumbersome. To make things a bit easier, Gooddata SDK includes a gooddata command line interface. One of the commands is jack_in which brings you into a live project where you can poke and explore.

gooddata -U john@example.com -P password -p PROJECT_ID project jack_in

In a single command, the above launches the command line interface, logs you into the platform, and identifies the project to which to connect. At this point, you may begin entering commands:

reports = project.reports

NOTE: Use the ~/.gooddata file to save your username and password locally, so you do not have to type it every single time.

Scripting

by Tomas Svarovsky

Problem

You created a program by building it up in an interactive manner but now you would like to put it into a file to be able to run it

Solution

Let’s assume that your fantastic code that you have is fairly simple. You would like to print the title of the project. So the session looks something like this

project_live_session: puts project.title
"My project Test (2)"
=> nil

Not too fancy but it is enough to help us illustrate couple of points. First you need to save your code to a file. Go ahead take your favorite text editor and create a new file and put your code into it so it looks like this

puts client.projects

Go ahead and save it and call it 'my_first_sdk_program.rb'. Do not execute because it would not work now. Let’s add couple more things.

Jack in does couple of things for you so you can be productive quickly but we have to handle them ourselves in a program.

  • Loads Automation SDK libraries

  • It logs you in and provides you the client as a client variable

  • It jacks into a project and provides you the project as a project variable. If you were wondering why the project.title worked in the interactive session in the first place since we never defined project this is why.

We can easily do the same things with these 3 lines of code respectively. So add them to the file so the end result looks like this.

require 'gooddata'
client = GoodData.connect('username', 'pass')
project = client.projects('project_id')
puts project.title

Done. Save it.

Now you can run it by using this command

ruby my_first_sdk_program.rb

Connecting to Gooddata Platform

by Tomas Svarovsky

Problem

You know how to jack in and how to write a simple program. Now it is time to combine these two to write a program that connects and does something with a gooddata project.

Solution

# encoding: utf-8

require 'gooddata'

GoodData.with_connection('user', 'password') do |client|
    # just so you believe us we are printing names of all the project under this account
    client.projects.each do |project|
        puts project.title
    end
end

Discussion

Maybe you are wondering if there are other ways how to log in. First of all it is not nice to have secret credentials in plain text like this. SDK has one feature to help you. If GoodData.with_connection or GoodData.connect is called without any params it tries to find the param file that contains these credentials. Currently it looks for ~/.gooddata and expects it to have following content

{
  "username": "john@example.com",
  "password": "pass",
  "auth_token": "token"
}

You do not have to create it yourself. Run gooddata auth store and a wizard is going to help you.

Also it is possible to pass parameters as hash. The form that we have shown is just a convenience form.

  GoodData.with_connection(username: 'user', password: 'password')

Connecting to GoodData using Single Sign On (SSO)

by Tomas Korcak

Problem

You do have SSO enabled and want to use it for logging to GoodData without using username and password.

Solution

Using the SSO capability you don’t need to maintain just another password for accessing GoodData application. You can use your existing infrastructure for user management and connect with GoodData APIs to allow your users login to GoodData seamlessly.

For more info check this article - https://developer.gooddata.com/article/single-sign-on

# encoding: utf-8

require 'gooddata'
require 'pp'

client = GoodData.connect_sso('tomas.korcak@gooddata.com', 'gooddata.tomas.korcak')

pp client.user.json

Discussion

Never share the your private key with other people. It is the same thing as you name and password. You can do almost everything with it.

Connecting to Different Servers

by Tomas Svarovsky

Problem

Sometimes the server you would like to connect is not the secure.goodata.com machine. This might occur for 2 reasons. Either you are trying something on a test machine (if you are working for gooddata) or you are working with a project that is on a white labeled instance.

Solution

Solution to both is passing the server name as a pareameter to connect or with_connection functions.

# encoding: utf-8
require 'gooddata'

GoodData.with_connection(login: 'user',
                         password: 'password',
                         server: 'https://some-other-server.com') do |client|
  # just so you believe us we are printing names of all the project under this account
  client.projects.each do |project|
    puts project.title
  end
end

Discussion

Connecting to GoodData Platform with Super Secure Token (SST)

by Tomas Svarovsky

Problem

You do not have credentials. You only have SST and would like to still make some requests.

Solution

Super Secure Token is a token that allows to access you our APIs in a nonrestricted matter without necessarily knowing username and password. Take note that currently you do not have access to whole API. Things like interacting with DSS and Staging Storage still need username and pass. We are trying t resolve the issue. First you need to have SST token. There are several ways how to obtain it. Here is one using SDK

# encoding: utf-8

require 'gooddata'

client = GoodData.connect(login: 'user', password: 'password')
client.connection.sst_token

# Once you have the token, you can try to login and do something.

GoodData.with_connection(sst_token: 'sst_token') do |client|
  client.projects.each do |p|
    puts p.title
  end
end

Discussion

Never share the token with other people. It is the same thing as your login and password. You can do almost everything with it.

Handling Credentials Securely

by Tomas Svarovsky

Problem

You would like to write scripts that are sharable. Currently you either have to rely on ~/.gooddata file or put there the credentials by hand. This is error prone and you have to be careful so you do not share your passwords.

Solution

There is fairly easy solution that you can use. If you recall correctly there is a way how you can log in into platform.

GoodData.connect(login: 'user', password: 'password')

This takes several types of parameters one of which is a hash of values. You can leverage that and read the hash from a file. There a many variants we are going to show 2 most popular ones, YAML and JSON.

JSON

Create a file that looks like this.

{
  "login" : "john@gooddata.com",
  "password" : "my_secret"
}
params = JSON.parse(File.read('/path/to/file'))
client = GoodData.connect(params)
puts client.projects.count
YAML

Create a file that looks like this.

login: john@gooddata.com
password: my_secret
require 'yaml'

params = YAML.load(File.read('/path/to/file'))
client = GoodData.connect(params)
puts client.projects.count

Discussion

You see that you can easily share these scripts with other people since you have externalized all sensitive information. If the user will put the config file on a expected place everything will just work. There are other formats and ways you can leverage but the idea is always the same.

This is actually something we do behind the scenes if you do not specify any params into connect or with_connection. We look at ~/.gooddata for a JSON file and use what we find there. This is extremely useful when you are showing things interactively so you do not have to disclose your credentials. You can either go and create the file by hand or you can use a convenience method we have created for you. Just call gooddata auth store and you will be asked couple of questions and the file will be created for you.

Using Project

by Tomas Svarovsky

Problem

You want to use a specific project

Solution

You can use couple of ways to do this. Our favorite is this

# encoding: utf-8

require 'gooddata'

GoodData.with_connection('user', 'password') do |client|
  GoodData.with_project('project_pid') do |project|
    puts project.title
  end
end

This has a benefit that you have access to project only inside the block. Once the block is left you are 'disconnected to the project. If you are using several projects in one script this is a way to go to be sure you are not reaching somewhere you do not want to.

There are other more conventional ways to do the same thing.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection('user', 'password') do |client|
  project = GoodData.use('project_pid')
  puts project.title
end

Using APIs

by Tomas Svarovsky

Problem

You would like to interact with GoodData API directly

Solution

SDK provides you slew of well known methods that make this possible while shielding you from intricacies of keeping connection alive etc.

require 'gooddata'

client.get("/gdc/md/")

project_id = 'YOUR_PROJECT_ID'

client.delete("/gdc/projects/#{project_id}")

Disabling SSL Verification

by Tomas Svarovsky

Problem

You would like to disable SSL verification when using SDK against a server that does not have proper certificates

Solution

You can switch of SSL validating like this. This is especially useful when you are using SDK against testing or development servers.

# encoding: utf-8

require 'gooddata'

# Traditionally with login and pass
client = GoodData.connect(login: 'user', password: 'password', verify_ssl: false)

# You can also use it with SST token
client = GoodData.connect(sst_token: 'sst_token', verify_ssl: false)

Discussion

Using Asynchronous Tasks with Timeouts

by Tomas Svarovsky

Problem

You would like to build on top of SDK but you would like to have more controll over asynchronous tasks

Solution

There are numerous tasks on GoodData API which potentially take more than just couple of seconds to execute. These include report executions, data loads, exports, clones and others.

The way these tasks are implemented in SDK that they block. The execution continues only when the task finishes (either success or error) or the server time limit is reached and the task is killed.

Sometimes it is useful to be able to specify the time limit on the client side. This might be useful for cases where you need to make sure that something is either finished under a certain tim threshold or you have to make some other action (notifying a customer). The limit you would like to use is different then the server side limit of GoodData APIs.

You can implement it like this

# encoding: utf-8

require 'gooddata'

client = GoodData.connect
project = client.projects('project_id')
report = project.reports(1234)
begin
  puts report.execute(time_limit: 10)
rescue GoodData::ExecutionLimitExceeded => e
  puts "Unfortunately #{report.title} execution did not finish in 10 seconds"
  raise e
end

Logging

by Zdenek Svoboda

Problem

Your script doesn’t work or throws a weird error. You want to see detailed logging information to understand what is going on.

Solution

GoodData Ruby SDK uses the standard Ruby logger that you can use in standard way

# encoding: utf-8

require 'gooddata'

logger = Logger.new(STDOUT)
logger.level = Logger::DEBUG
GoodData.logger = logger

You can also use following abbreviated syntax for logging to standard output using DEBUG level

# encoding: utf-8

require 'gooddata'

GoodData.logging_http_on

or using INFO level

# encoding: utf-8

require 'gooddata'

GoodData.logging_on

another option is to specify the debug level explicitly

# encoding: utf-8

require 'gooddata'

GoodData.logging_on(Logger::DEBUG)

There are quite a few options to choose from. Feel free to use whatever you like the best.

02 Working With Users

Listing Project’s Users

by Tomas Korcak

Problem

You would like to list users in project

Prerequisites

You have to have a user who is a project admin.

Solution

# encoding: utf-8

require 'gooddata'

GoodData.with_connection('user', 'password') do |client|
  GoodData.with_project('project_pid') do |project|
    pp project.users
    # You might want to see just name and login
    pp project.users.map {|u| [u.login, u.name]}
  end
end

Checking User Memberhip in Project

by Tomas Svarovsky

Problem

You would like to see if a user is part of a project.

Prerequisites

You have to have a user who is an admin of an organization. If you don’t have the organization admin account, please contact your primary GoodData contact person or GoodData support (e-mail support@gooddata.com).

Solution

# encoding: utf-8

require 'gooddata'

GoodData.with_connection('user', 'password') do |client|
  GoodData.with_project('project_pid') do |project|
    project.member?('jane.doe@example.com')
  end
end

Discussion

You can ask on membership not just by login but also check an object. This might be useful especially if you check a user from several projects.

Enabling and Disabling Users

by Tomas Svarovsky

Problem

You need to enable or disable users in a project.

Prerequisites

You have to have a user who is an admin in a project you would like to disable users in.

Solution

Disable and enable particular user in GoodData project

# encoding: utf-8

require 'gooddata'

# Connect to platform using credentials
GoodData.with_connection do |c|
  GoodData.with_project('project_pid') do |project|
    user = project.member('john@example.com')
    user.disable
    # You can reenable the user again
    user.enable
  end
end

Disable all users in GoodData project

# encoding: utf-8

require 'gooddata'

# Connect to platform using credentials

GoodData.with_connection do |c|
  GoodData.with_project('project_pid') do |project|
    # We collect all users minus current user
    # so we do not disable ourselves from the project
    users_to_disable = project.users.reject { |u| u.login == c.user.login }
    # disable all users
    users_to_disable.pmap {|u| u.disable}
  end
end

If you want to keep more than one user you can do something like this

# encoding: utf-8

require 'gooddata'

# Connect to platform using credentials
GoodData.with_connection do |c|
  GoodData.with_project('project_pid') do |project|
    keepers = ['john@example.com', c.user.login]
    # We collect all users and remove those from the keepers array
    users_to_disable = project.users.reject { |u| keepers.include?(u.login) }
    # disable all users
    users_to_disable.pmap { |u| u.disable }
  end
end

Discussion

As you can see from the above examples possibilities are endless and you can easily enable or disable users just by correctly prepare the array of users to work with by using regular methods on arrays.

Creating Multiple Users from CSV

by Patrick McConlogue, Tomas Korcak

Problem

You have CSV containing users, their roles and passwords and you want to create them in bulk.

Prerequisites

You have to have existing project and be admin of it.

Solution

# encoding: UTF-8

require 'gooddata'

# Project ID
PROJECT_ID = 'we1vvh4il93r0927r809i3agif50d7iz'

GoodData.with_connection do |c|
  GoodData.with_project(PROJECT_ID) do |project|
    path = File.join(File.dirname(__FILE__), '..', 'data', 'users.csv')
    puts "Loading #{path}"
    CSV.foreach(path, :headers => true, :return_headers => false) do |user|
      email = user[0]
      role = user[1]
      project.invite(email, role)
    end
  end
end

Inviting Users to Project

by Tomas Korcak

Problem

You need to invite a user into project.

Prerequisites

You have to have a user who is an admin of an organization. If you don’t have the organization admin account, please contact your primary GoodData contact person or GoodData support (e-mail support@gooddata.com).

Solution

# encoding: utf-8

require 'gooddata'

client = GoodData.connect
project = client.projects('project_pid')

project.invite('joe@example.com', 'admin', 'Hey Joe, look at this.')

Discussion

Invitation can be sent by any administrator. Invited user receives invitation e-mail with invitation confirmation. If you want to add user into a project without the e-mail and confirmation, please consult the recipe Adding users to Project.

Listing Organization’s Users

by Tomas Korcak

Problem

You would like to list users in organization

Prerequisites

You have to have a user who is an admin of an organization. If you don’t have the organization admin account, please contact your primary GoodData contact person or GoodData support (e-mail support@gooddata.com).

Solution

# encoding: utf-8

require 'gooddata'

# Connect to platform using credentials
client = GoodData.connect

domain = client.domain('domain_name')
users = domain.users
pp users

Adding User to Organization

by Tomas Korcak

Problem

You would like to add a user to organization programmatically

Prerequisites

You have to have a user who is an admin of an organization. If you don’t have the organization admin account, please contact your primary GoodData contact person or GoodData support (e-mail support@gooddata.com).

Solution

# encoding: utf-8

require 'gooddata'

client = GoodData.connect

# Get your domain ..
domain = client.domain('domain_name')

# Generate random password
pass = (0...10).map { ('a'..'z').to_a[rand(26)] }.join

new_user = domain.add_user(
  :login => 'new.user@gooddata.com',
  :password => pass,
  :first_name => 'First',
  :last_name => 'Last',
  :email => 'test@gooddata.com',
  :sso_provider => 'some_sso'
)

pp new_user

Adding Multiple Users to Organization from CSV

by Tomas Svarovsky

Problem

You would like to add a users to organization programmatically. What you have is a CSV file holding the information about the users.

Prerequisites

You have to have a user who is an admin of an organization. If you don’t have the organization admin account, please contact your primary GoodData contact person or GoodData support (e-mail support@gooddata.com).

We assume that you have a file with details handy. The file can look for example like this

login,first_name,last_name,password
john@example.com,John,Doe,12345678

The headers we used are defaults. If you have different ones you will have to do some mangling. Minimal information that you have to provide for the user creation to be successful is login.

Solution

First let’s have a look how to implement an addition with having the file as in the example above. This has the advantage that you do not have to mangle with the headers.

# encoding: utf-8

require 'gooddata'
require 'active_support/all'

client = GoodData.connect

domain = client.domain('domain-name')
users = []
CSV.foreach('data.csv', :headers => true, :return_headers => false) do |row|
  users << row.to_hash.symbolize_keys
end

domain.create_users(users)

Sometimes though what you have is that the file that comes from the system has different headers than would be ideal. One possible thing you can do is to transform it in a separate ETL process or you can change the code slightly to compensate. In the next code snippet we will show you how to do just that.

Imagine you have a file like this

UserLogin,FirstName,LastName,UserPassword
john@example.com,John,Doe,12345678

Notice that it is exactly the same information as before. The only difference is that you have different headers.

This is the code add those users to a domain.

# encoding: utf-8

require 'gooddata'

client = GoodData.connect

headers = ["UserLogin", "FirstName", "LastName", "UserPassword"]
new_headers = [:login, :first_name, :last_name, :password]

domain = client.domain('domain-name')
users = []
CSV.foreach('data.csv', :headers => true, :return_headers => false) do |row|
  new_data = new_headers.zip(row.to_hash.values_at(*headers))
  users << Hash[new_data]
end

domain.create_users(users)

Notice that the bulk of the code is the same. The only differences are that we defined headers which contain the values of the headers in the CSV file provided. Variable new_headers provide the corresponding values for those headers that the SDK expects. Take not that the position is important and the headers for corresponding columns has to be in the same order in both variables.

The most important line in the code is this

new_headers.zip(row.to_hash.values_at(*headers))

What it does is it exchanges the keys from headers to those defined in new_headers. This code does not return a Hash but array of key value pairs. This can be turned into a hash with this

Hash[*new_data]

The rest of the code is the same as in previous example.

Adding User from Organization to Project

by Tomas Svarovsky

Problem

You have users added in organization (domain). You would like to add them into the project.

Prerequisites

You have to have a user who is an admin of an organization. If you don’t have the organization admin account, please contact your primary GoodData contact person or GoodData support (e-mail support@gooddata.com).

Solution

require 'gooddata'

GoodData.with_connection do |client|
  # Get your domain ..
  domain = client.domain('domain_name')
  GoodData.with_project('project_id') do |project|
    # Let's get all users except of ourselves
    users_to_add = domain.users.reject { |u| u.login == client.user.login }
    # Let's add all as viewer
    users_to_add.each { |u| project.add_user(u, 'Viewer', domain: domain) }
  end
end

Adding Users from Organization to Multiple Projects

by Tomas Svarovsky

Problem

You would like to add a users to project programmatically. You already have all the user profiles created in your organization (domain). You potentially have many projects that you would like to populate.

Prerequisites

You have to have a user who is an admin of an organization. If you don’t have the organization admin account, please contact your primary GoodData contact person or GoodData support (e-mail support@gooddata.com). You have an organization populated with user. If you don’t take a look at the recipe above (Adding user to organization)

We assume that you have a file with details which users should be added to a project. The file has to contain three pieces of information. Who you would like to add to a project, role that the user should take on in that project and which project he should be added to. The headers we used are defaults. If you have different ones you will have to do some mangling. Example of how to do that can be found in recipe adding_users_to_domain.

Solution

Sometimes the case might be that you would like to add users to a slew of projects in one go. Let’s illustrate this on one example. Let’s imagine that you have a file like this

pid,login,role
asl50ejow6bzp97i9pxlbcm3vkuvzQ72,john@example.com,admin
# encoding: utf-8

require 'gooddata'
require 'active_support/all'

GoodData.with_connection('user', 'password') do |client|
  domain = client.domain('domain-name')
  users = []
  CSV.foreach('data.csv', :headers => true, :return_headers => false) do |row|
    users << row.to_hash.symbolize_keys.slice(:pid, :login, :role)
  end
  users.group_by {|u| u[:pid]}.map do |project_id, users|
    GoodData.with_project(project_id) do |project|
      project.import_users(users, domain: domain, whitelists: ["svarovsky@gooddata.com"])
    end
  end
end

What this does is to reads the users groups them according to a project and then adds them to the project. It reads all the users into memory but this should be ok for typical situations of thousands users. Note that we are not getting a project dynamically for each group of users. All the notes for whitelist apply as well.

Adding User to Both Organization and Project from CSV

by Tomas Svarovsky

Problem

You would like to add users from CSV file to a project and domain in one go. Best practice with automated projects is to add users to domain (organization) and then to add users from that domain to specific project(s). This allows you to bypass the invitation process and you can manipulate users without their consent which is usually what you want in those cases. Sometimes it could be useful to do this in one go. This is especially true if you have only one project and one organization.

Prerequisites

You have to have a user who is an admin of an organization. If you don’t have the organization admin account, please contact your primary GoodData contact person or GoodData support (e-mail support@gooddata.com).

We assume that you have a file with details handy. The file can look for example like this

login,first_name,last_name,password,role
john@example.com,John,Smith,12345678,admin
john.doe@example.com,John,Doe,87616393,admin

The headers we used are defaults. If you have different ones you will have to do some mangling (see the recipe Adding user to organization from CSV file above). Minimal information that you have to provide for the user creation to be successful is login.

Solution

SDK has a method import_users that takes care of all the details and does exactly what is described above. Adds users to domain and then project.

Notes
  • Process of adding users is not additive but declarative. What you provide on the input is what SDK will strive to have in the project at the end. This has some constraints on data preparation but it is much more resilient approach.

  • Connected to previous point sometimes you have users that you do not want to be touched by this process. ETL developers, admins etc that usually do not come from data. You can provide list of whitelist expressions as either string or regular expressions. What this means is the user that would be added or removed based on the data will be ignored by the process. In our example we want to omit user under which you are logged in.

  • Be careful so you do not lock yourself (etl administrator) out of the project. Be sure to add him/her to whitelist if he is not intrinsically in the data.

  • Note that above data will not work. Currently there is a constraint that each user has to have unique login across all users in GoodData. It is likely that there already is a john@example.com somewhere so change the test data accordingly.

# encoding: utf-8

require 'gooddata'
require 'active_support/all'

GoodData.with_connection('user', 'password') do |client|
  GoodData.with_project('project_pid') do |project|
    users = []
    CSV.foreach('data.csv', :headers => true, :return_headers => false) do |row|
      users << row.to_hash.symbolize_keys
    end
    domain = client.domain('domain-name')
    domain.create_users(users)
    project.import_users(users, domain: domain)
  end
end

Updating a User in Organization

by Tomas Svarovsky

Problem

You have users in your organization but some of those have incorrect or outdated information and need update

Prerequisites

You have to have a user who is an admin of an organization. If you don’t have the organization admin account, please contact your primary GoodData contact person or GoodData support (e-mail support@gooddata.com). You have an organization populated with user. If you don’t take a look at the recipe above (Adding user to organization)

Solution

Here we are updating an sso provider but you can update almost any property of a user.

# encoding: utf-8

require 'gooddata'

client = GoodData.connect
domain = client.domain 'organization_name'
u = domain.find_user_by_login('john@gooddata.com')
u.sso_provider = 'a_provider'
domain.update_user(u)

Creating a group

by Tomas Svarovsky

Problem

You would like to organize users in groups

Solution

There is a concept of group in GoodData. Member of a group can be either another group or directly a user. Here we are going to create a group and assign it some users.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  client.with_project('project') do |p|
    group_1 = p.add_user_group(name: 'east')
    group_1.add_members(p.member('john.doe@gooddata.com'))
    pp group_1.members.map(&:uri)
    # ['/gdc/account/profile/4e1e8cacc4989228e0ae531b30853248']

    group_1.member?('/gdc/account/profile/4e1e8cacc4989228e0ae531b30853248')
    # => true
  end
end

Deleting users from all projects

by Eliot Towb, Tomas Svarovsky

Problem

You would like to remove one user from all projects.

Solution

This script should remove user from all projects. There are couple of things you need to keep in mind though.

  • This deletes user only from projects where you have access to

  • If you are not admin in particular project the user will not be deleted from that particular project

# encoding: utf-8

require 'gooddata'

login = 'john.doe@gooddata.com'

# Connect to platform using credentials
GoodData.with_connection do |client|
  # first select only projects where you are admin
  # which means you can read users and disable them
  projects = client.projects.pselect(&:am_i_admin?)
  projects.peach do |project|
    project.member(login).disable
  end
end

03 Automation Recipes

Performing Operations on Multiple Projects

by Tomas Svarovsky

Problem

You already know how to do many things but now you would like to do the same thing on many projects at once.

Solution

The basis of the solution is pretty simple. We need to iterate over projects and perform an operation on each of them.

Lets illustrate this on a simple example. We will compile a list of all reports from user’s projects that have been changed since last week by somebody from GoodData. If there is more than one revision in a report we’ll assume that the report is changing frequently and tag it with 'to_be_checked' tag for QA to validate the report.

This is a lot to do. Let’s try to break the tasks down to smaller pieces.

Work with many projects
projects = [ 'project_pid_a', 'project_pid_b']
GoodData.with_connection('user', 'password') do |c|
  projects.each do |project|
    GoodData.with_project(project) do |project|
      # do your thing
    end
  end
end
Select specific reports
reports_to_tag = project.reports
                   .select { |report| report.updated > 2.weeks.ago }
                   .select { |report| report.definitions.count > 1 }
Remove tag

We’ll first remove the 'to_be_checked' tag from all reports

GoodData::Report.find_by_tag('to_be_checked', :project => project).each do |report|
  report.remove_tag('to_be_checked')
  report.save
end
Add tag

Then we’ll add the 'to_be_checked' tag to all qualifying reports

reports_to_tag.each do |report|
  report.add_tag('to_be_checked')
  report.save
end
Full example - all pieces together
# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|

  projects = [
      client.projects('project_pid_a'),
      client.projects('project_pid_b')
  ]

  results = projects.map do |project|
    reports_to_validate = project.reports
                                .select { |report| report.updated > 2.weeks.ago }
                                .select { |report| report.definitions.count > 1 }

    GoodData::Report.find_by_tag('to_be_checked', :project => project).each do |report|
      report.remove_tag('to_be_checked')
      report.save
    end

    reports_to_validate.each do |report|
      report.add_tag('to_be_checked')
      report.save
    end

    {project: project, reports_to_validate: reports_to_validate.count}
  end

  results.each do |result|
    puts "#{result[:project].pid}: there are #{result[:reports_to_validate]} reports to check"
  end

end

Discussion

Please note that the reports in projects are processed sequentially (one after another).

Parallel Processing

by Tomas Svarovsky

Problem

You created the script. It works but it is slow. Let’s have at one trick how to possibly make it faster.

Solution

Often if you look at the API the structure is orthogonal to your usecase. For example you might need to invite user to many projects but the API has only an endpoint to invite a user to single project. If you do something like.

projects.map do |project|
  project.invite_user('john@example.com')
end

it will work but it will create as many requests as there are projects. The requests will be processed sequentially and this takes time. Just by some back of the napkin computation you can estimate. Let’s say we have 1000 projects and invitation request takes 500ms (half a second). This means it will take 500 seconds that is almost 10 mins.

The nice thing is that the requests for inviting user to each project are totally independent. You don’t need a project invitation to finish before you invite the user to another project. So you can invoke multiple invitations at the same time. The following code invokes multiple invitations in parallel

projects.pmap do |project|
  project.invite_user('john@example.com')
end

We’ve just changed single letter. We are using pmap which stands for parallel map. It behaves exactly the same as map with one difference. Block for each item of the array runs in a separate thread. You do not need to think about it it just happens faster. We have currently 20 thread reserved so it will run 20 times faster. that is under 0.5 mins.

04 Model

Changing Object’s Identifier

by Tomas Svarovsky

Problem

You moved an attribute in your model and you would like to change the identifier for it to follow the naming convention.

Solution

You can change identifier on any object (e.g. dashboards, reports, metrics, attributes, facts etc.). Many tools rely on specific LDM object identifier’s naming convention. So changing the identifier may be very handy for for enforcing the naming convention. For example lwhen you move an attribute from one dataset to another, you may want to change it’s identifier.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_pid') do |project|

    # We find the attribute we want to change
    # Lets list all attributes in the project
    # and print its title and identifier
    puts project.attributes.map {|a| [a.identifier, a.title]}

    # Let's pick one
    # Here I just pick the first one
    attribute = project.attributes.first

    # but you can also pick one by identifier. For example
    # attribute = project.attributes('attr.salesmem.country')
    # We have a look at the current value.
    # Let's say it is 'attr.users.region'
    puts attribute.identifier

    # We change the value
    attribute.identifier = 'attr.salesmen.region'
    attribute.save

    # If we refresh the value from server
    # we get a new value 'attr.salesmen.region'
    attribute.reload!
    puts attribute.identifier

  end
end

Computing Dataset’s Number of Records

by Tomas Svarovsky

Problem

Very often there is a need to know how many records there are in a dataset.

Solution

This is not so easy to do on UI. You basically have to find the dataset’s connection point and then create a simple report with COUNT metric. SDK makes this very simple task.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_pid') do |project|
    blueprint = project.blueprint
    blueprint.datasets.each do |dataset|
      count = dataset.count(project)
      puts "Number of record in a dataset #{dataset.title} is #{count}"
    end
  end
end

Enumerating Date Dimensions

by Tomas Svarovsky

Problem

You would like to know how many date dimensions you have in a project.

Solution

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_pid') do |project|
    blueprint = project.blueprint
    dds = blueprint.date_dimensions
    puts "You have #{dds.count} date dimensions in your project"
  end
end

Removing Data

by Tomas Svarovsky

Problem

Sometimes you need to delete all data from a dataset.

Solution

SDK calls the MAQL SYNCHRONUIZE command on the dataset

SYNCHRONIZE {dataset.users};

for each dataset you would like to clear. You can achieve the same like this.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_pid') do |project|

    # You can select the dataset in several ways
    dataset = project.datasets.find {|d| d.title == 'Users'}
    dataset = project.datasets('dataset.users')
    dataset = project.datasets(345)
    # dataset.synchronize works as well
    dataset.delete_data

  end
end

Exploring Unknown Project

by Tomas Svarovsky

Problem

You just got invited to this project and just can’t reach the project’s author. Can’t find any project’s documentation. The following code snippet may be helpful in such situation

Solution

You want to start with a quick introspection. How many datasets, how much data is there how many processes etc. Doing this manually is fairly time consuming. You have to find the primary attributes of each dataset create a count metric and a lot of other stuff. This can be automated and it is so useful we actually created a command that does just this.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_pid') do |project|
    project.info
  end
end

This might be the hypothetical output. For each dataset you have there the size. Below is a breakdown of how many different types of objects there are inside the dataset.

GOODSALES
===============

Datasets - 28

stage_history
=============
Size - 29103.0 rows
1 attributes, 4 facts, 8 references

opp_snapshot
============
Size - 2130074.0 rows
1 attributes, 9 facts, 21 references

.
.
.
.

Opp_changes
===========
Size - 472.0 rows
2 attributes, 6 facts, 10 references

Opportunity Benchmark
=====================
Size - 487117.0 rows
3 attributes, 2 facts, 2 references

Substitute Date Dimension for Another One

by Tomas Korcak

Problem

You want to substitute an existing date dimension in your project for another date dimension. This is particularly handy when you are consolidating date dimensions (perhaps you want to have only one date dimension in your project) or replacing your standard date dimension with a fiscal date dimension.

Solution

The code snippet below substitutes all occurences of a date dimension objects (attributes and labels) for another date dimension’s objects (that must obviously exist in the project). The substitution is performed in following objects:

  • Metrics

  • Report Definitions

  • Reports

  • Report Specific Metrics

  • Dashboards

  • Mandatory User Filters aka Data Permissions

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_pid') do |project|

    opts = {
      # You can specify name of new and old date dimension...
      # :old => 'Close',
      # :new => 'Completed',

      # Or explicitly specify mapping using identifiers...
      :mapping => {
        'closedate.date' => 'abortdate.date',
        'closedate.day.in.euweek' => 'abortdate.day.in.euweek',
        'closedate.month' => 'abortdate.month',
        'closedate.month.in.year' => 'abortdate.month.in.year',
        'closedate.euweek.in.year' => 'abortdate.week.in.year',
        'closedate.euweek' => 'abortdate.week',
        'closedate.quarter' => 'abortdate.quarter',
        'closedate.day.in.month' => 'abortdate.day.in.month',
        'closedate.week.in.quarter' => 'abortdate.week.in.quarter',
        'closedate.quarter.in.year' => 'abortdate.quarter.in.year',
        'closedate.week' => 'abortdate.week',
        'closedate.day.in.year' => 'abortdate.day.in.year',
        'closedate.day.in.week' => 'abortdate.day.in.week',
        'closedate.week.in.year' => 'abortdate.week.in.year',
        'closedate.euweek.in.quarter' => 'abortdate.week.in.quarter',
        'closedate.day.in.quarter' => 'abortdate.day.in.quarter',
        'closedate.year' => 'closedate.year',
        'closedate.month.in.quarter' => 'abortdate.month.in.quarter'
      },

      :dry_run = false # Optional. Default 'false'
    }

    project.replace_date_dimension(opts)
  end
end

Discussion

You need to specify complete mapping between the current and new date dimensions attributes. This is straightforward in case when both date dimensions have the same structure (see the commented out :old / :new syntax). Full mapping is necessary when the date dimensions have different structures. For example the abortdate date dimension in the code above doesn’t have any EU week attributes. The existing closedate's EU week attributes are mapped to standard week attributes of the abortdate dimension.

05 Testing

Testing Report Result

by Tomas Svarovsky

Problem

You want to be sure that reports return expected results. This is a basis for delving later into test driven BI projects.

Solution

# encoding: utf-8

require 'gooddata'

GoodData.with_connection('user', 'password') do |client|
  GoodData.with_project('project_pid') do |project|
    report = project.reports(32)
    result = report.execute
    fail "Report has unexpected result" unless result == [[1, 2, 3]]
  end
end

06 Working With Projects

Creating Empty Project

by Tomas Svarovsky

Problem

You need to create a project programmatically.

Solution

# encoding: utf-8

require 'gooddata'

client = GoodData.connect
project = client.create_project(title: 'My project title', auth_token: 'PROJECT_CREATION_TOKEN')

# after some time the project is created and you can start working with it
puts project.uri
puts project.title # => 'My project title'

Renaming Project

by Tomas Svarovsky

Problem

You need to rename the project.

Solution

require 'gooddata'

client = GoodData.connect
project = client.projects('project_id')
project.title = "New and much better title"
project.save

Cloning Project

by Tomas Svarovsky

Problem

You would like to create an exact copy of a project

Solution

You can use cloning capabilities.

# encoding: utf-8

require 'gooddata'

client = GoodData.connect
project = client.projects('project_pid')
cloned_project = project.clone(title: 'New title',
                               auth_token: 'token',
                               users: false,
                               data: true)

Discussion

There are two options that you can specify. You can pick if you want to clone just metadata (e.g. attributes, facts, metrics reports, dashboards etc.) or also data (this is default). Also you can choose if you want to transfer all old project’s users to the new project. No users are transferred by default. In the example above we explicitly specified the users and data optional parameters so you can see how it works.

Cloning Project accross Organizations

by Tomas Svarovsky

Problem

You would like to clone project to a different organization

Solution

Usually you would be happy using project.clone but sometimes you need more granularity in controlling who is doing what. In this recipe we are going to explain what is happening when you clone a project and how you can leverage the lower level API.

When the project clone is done SDK actually orchestrates couple of calls together

  1. Exports a project, obtaining an export package token. The token is a pointer to a package that is stored at GoodData premises.

  2. Creates an empty project

  3. Imports the package (from step 1) into a freshly created project from step 2

All these 3 calls are asynchronous by default. SDK makes sure that things happen in the correct order and exposes it as synchronous operation.

Sometimes more granular things are need for example when you need to clone a project to different organization. This means that clone would not work since you cannot have one user in 2 organizations. We decided to expose the methods to allow you doing the three steps above so anyone can mix and match.

# encoding: utf-8

require 'gooddata'

user_from_login = 'john@example.com'
user_to_login = 'jane@example.com'

client_from = GoodData.connect(user_from_login, 'password', server: 'https://customer_1_domain.gooddata.com')
client_to = GoodData.connect(user_to_login, 'password', server: 'https://customer_2_domain.gooddata.com')

from_project = client_from.projects('project_pid_1')
to_project = client_to.create_project(:title => "project_title", :auth_token => "TOKEN")

export_token = from_project.export_clone(authorized_users: [user_to_login], data: true, users: false)
to_project.import_clone(export_token)

Discussion

You can also leverage this more granular interface when you need to clone one project to multiple target projects.

Creating Empty Project using specific environment.

by Tomas Korcak

Problem

You want to create a project programmatically using specific environment. The TESTING projects are not backed up and may be discarded during a platform maintenance.

Environments
  • PRODUCTION (DEFAULT)

  • DEVELOPMENT

  • TESTING

Note that the environment names are case sensitive.

Solution

# encoding: utf-8

require 'gooddata'

client = GoodData.connect
project = client.create_project(title: 'My project title', auth_token: 'PROJECT_CREATION_TOKEN', :environment => 'TESTING')

# after some time the project is created and you can start working with it
puts project.uri
puts project.environment # => 'TESTING'

Creating Project from Template

by Tomas Svarovsky

Problem

You have a template created for you and you would like to spin a project from this template.

Solution

# encoding: utf-8

require 'gooddata'

client = GoodData.connect
project = client.create_project(title: 'New project',
                                template: '/projectTemplates/SuperSoda/1/',
                                auth_token: 'token')

Discussion

Note that people behind SDK do not endorse usage of templates so consider it being here for legacy purposes.

Migrating objects between projects

by Tomas Svarovsky

Problem

You have metadata objects (reports/metrics/dashboards) you would like to transfer between projects

Solution

# encoding: UTF-8

require 'gooddata'

GoodData.with_connection do |c|
  target_project = c.projects('target_project_id')

  # Lets log into project you would like to transfer from
  GoodData.with_project('master_project_id') do |master_project|
    # find objects you would like to transfer
    # here we transfer all reports containing word "sales" in the title
    reports = master_project.reports.select { |r| r.title =~ /title/ }
    begin
      master_project.transfer_objects(reports, project: target_project)
    rescue ObjectsMigrationError
      puts 'Object transfer failed'
    end
  end
end

Discussion

Occassionally you need to transfer objects to multiple projects. To make it easier SDK provides a convenience method for this.

# encoding: UTF-8

require 'gooddata'

GoodData.with_connection do |c|
  target_project_1 = c.projects('target_project_id_1')
  target_project_2 = c.projects('target_project_id_2')

  # Lets log into project you would like to transfer from
  GoodData.with_project('master_project_id') do |master_project|
    # find objects you would like to transfer
    # here we transfer all reports containing word "sales" in the title
    reports = master_project.reports.select { |r| r.title =~ /title/ }
    result = master_project.transfer_objects(reports, project: [target_project_1, target_project_2])

    # If you provided an array of projects the method will not throw an exception on failed
    # imports. It returns an array of results an you have to investigate to know what is up.
    # The shape of results is in shape
    # {
    #   project: target_project,
    #   result: true
    # }
    puts "#{result.select {|r| r[:result] }.count} projects succeeded"
    puts "#{result.reject {|r| r[:result] }.count} projects failed"
  end
end

Objects import happen in two stages. First you export objects with an API call. This creates a package on our platform and provides you with a token. You can then use the token to initiate the import. In most cases you do not care about these details and 2 methods above are all what you need. In some cases though you want the low level control. Here is a quick example how to use those lower level methods.

# encoding: UTF-8

require 'gooddata'

GoodData.with_connection do |c|
  target_project = c.projects('target_project_id')

  # Lets log into project you would like to transfer from
  GoodData.with_project('master_project_id') do |master_project|
    # find objects you would like to transfer
    # here we transfer all reports containing word "sales" in the title
    reports = master_project.reports.select { |r| r.title =~ /title/ }
    begin
      token = master_project.objects_export(reports)
    rescue ObjectsExportError
      puts "Export failed"
    end
    begin
      target_project.objects_import(token)
    rescue ObjectsImportError
      puts "Import failed"
    end
  end
end

Delete data in project

by Tomas Svarovsky

Problem

You need to delete all the data in all datasets in a particular project.

Solution

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_pid') do |project|
    project.delete_all_data(force: true)
  end
end

Delete a project

by Tomas Svarovsky

Problem

You need to delete a project programatically.

Solution

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_pid') do |project|
    project.delete
  end
end

Delete a project

by Tomas Svarovsky

Problem

You would like to list the roles you have available in your project.

Solution

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_pid') do |project|
    # Usually what is useful is to see the titles
    pp project.roles.map(&:title)
    # But occassionally you need identifiers and urls as well
    pp project.roles.map(&:identifier)
    pp project.roles.map(&:uri)
  end
end

Setting and updating project metadata storage

by Tomas Svarovsky

Problem

You would like to use project metadata storage to store some additional information.

Solution

Each project has a small API that allows to set some values in a simple key value manner. This is usually used for storing some information that do not fit into the data.

require 'gooddata'

client = GoodData.connect
project = client.projects('project_id')

# You can set some metadata
p.set_metadata('key', 'value')

# You can access the project metadata in two ways
#
# First is to get specific key directly
p.metadata('key')
# => 'value
# In case you try to access a nonexisting key you will get 404

# Second is to access all metadata. This will return them as a Hash which you can access as usual
m = p.metadata
# => {"key"=>"value"}
m['key']
# => "value"

07 Deployment Recipes

Deploying Process

by Tomas Svarovsky

Problem

You would like to deploy a CloudConnect or Ruby SDK process to GoodData platform.

Solution

SDK allows you to deploy a process. Just point it to a directory or a zipped archive that you want to deploy. Below is an example of CloudConnect process deployment. When deploying CloudConnect processes you typically want to take the whole folder structure of a CloudConnect project and deploy it. So you will want to pass either path to the root folder of the structure or you can zip it first and pass just a path to the zip archive. The below example points to the root folder of a CloudConnect project.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  GoodData.with_project('project_id') do |project|
    project.deploy_process('./path/to_cloud_connect_directory',
      name: 'Test ETL Process')
  end
end

Redeploying Existing Process

by Tomas Svarovsky

Problem

You would like to redeploy a CloudConnect or Ruby process to GoodData platform.

Solution

SDK provides means for redeploying a process (with a new updated content). All you have to do is to get a handle on the process. Here we are using a process id to identify the process that we want to redeploy. You can use any other way to identify the process to redeploy. Take note that the same deployment rules as in project.deploy_process apply.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  GoodData.with_project('project_id') do |project|
    process = project.processes('process_id')
    process.deploy('./path/to_cloud_connect_directory')
  end
end

Scheduling Process

by Tomas Svarovsky

Problem

You have a process deployed and you would like to add a schedule to it so the process is executed regularly

Solution

You can easily add a time based schedule to any process. Scheduled process execution has couple advantages over the ad-hoc process executions. Scheduled executions are logged and logs are kept around for some time (~10 days). Also schedule keeps list of parameters so you create it once and you do not need to care about them anymore.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  GoodData.with_project('project_id') do |project|
    process = project.deploy_process('./path/to_cloud_connect_directory', name: 'Test ETL Process')
    process.create_schedule('0 15 * * *', 'graph/my_graph.grf',
      params: {
        param_1: 'a',
        param_2: 'b'
      },
      hidden_params: {
        hidden_param_1: 'a',
        hidden_param_2: 'b'
      }
    )
  end
end

Working with JSON

In many examples in the appstore the parameters are specified in JSON. JSON is language agnostic that is very similar to Ruby format of representing data but it is not exaclty the same. Since here we are working in Ruby we present a short example how to convert JSON to Ruby automatically.

Let’s assume we have some JSON parameters that look like this

{
  "param_1" : {
    "deeper_key" : "value_1"
  },
  "param_2" : "value_2"
}

We first need to store this in a string a and then use one of the ruby libraries to convert this to a equivalent on Ruby language. Typically there are problems with the quotes since in many language a string literal is defined with " and thus the JSON need to be escaped. Another probelm might be caused with the fact that JSON is typically on multiple lines (as in our example). We use one of the lessser know features of ruby called HEREDOC which will help us. It is basically a way how to define a string that is potentially on multiple lines without worrying about escaping.

data = <<JSON_DATA
{
  "param_1" : {
    "deeper_key" : "value_1"
  },
  "param_2" : "value_2"
}
JSON_DATA


# Note that <<JSON_DATA and JSON_DATA marks the beginning and the end of the string. Once we have the JSON string defined we can use JSON libraries to convert it. Here we are using MultiJson which is part fo the Ruby SDK.

params = MultiJson.load(data)
=> {"param_1"=>{"deeper_key"=>"value_1"}, "param_2"=>"value_2"}

Then we can use it as in the example above

Disabling Schedules in All Projects

by Tomas Svarovsky

Problem

You would like to disable all schedules on all processes on all the projects you have access to and have sufficient privileges to do so.

Solution

# encoding: utf-8

require 'gooddata'

client = GoodData.connect
GoodData.with_connection do |client|
  schedules = client.projects.pselect(&:am_i_admin?).pmapcat(&:schedules)
  schedules.pmap do |schedule|
    schedule.disable
    schedule.save
  end
end

Executing Schedule

by Tomas Svarovsky

Problem

You have a process with a schedule. You would like to execute it out of schedule.

Solution

Since schedule already have information about executable and parameters stored it is very easy. You just need to find the schedule and execute it.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  project = GoodData.use('project_id')
  project.processes.first.schedules.first.execute
end

Executing Process

by Tomas Svarovsky

Problem

You would like to execute a process without a schedule.

Solution

SDK allows you to execute a process. This is not something that you should do regularly since you have to specify the parameters during execution and logs are not kept for individual executions but it might be occasionally useful.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  GoodData.with_project('project_id') do |project|
    process = project.deploy_process('./path/to_cloud_connect_directory', name: 'Test ETL Process')
    process.execute('graph/my_graph.grf', params: { param1: 'a', param2: 'b' })
  end
end

Discussion

There are also couple of other useful tricks. You might execute arbitrary process that is already deployed. You just need to get the process by its id.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  GoodData.with_project('project_id') do |project|
    process = project.processes('6a75759f-2a76-49c8-af18-ad3bc58fc65e')
    process.execute('graph/my_graph.grf', params: { param1: 'a', param2: 'b' })
  end
end

Or you can get a process by it’s name too.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  GoodData.with_project('project_id') do |project|
    project = GoodData.use('project_id')
    process = project.processes.find { |p| p.name == 'Test ETL Process' }
    process.execute('graph/my_graph.grf', params: { param1: 'a', param2: 'b' })
  end
end

If you do not know what the executable is you can look it up by using process.executables. We recommend using the same for all processes like main.grf (for CloudConnect) or main.rb.

Run-After Scheduling

by Zdenek Svoboda

Problem

You want to schedule a process to run upon successful completion of another process (schedule). This is also known as run-after triggered schedule.

Solution

If you use an existing schedule object instead of a cron expression in the create_schedule method, the scheduled process will be scheduled to execute upon successful completion of the passed schedule.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  GoodData.with_project('project_id') do |project|
    process = project.deploy_process('./path/to_parent_cloud_connect_directory', name: 'Parent Process')
    parent_schedule = process.create_schedule('0 15 * * *', 'graph/parent_graph.grf', params: { param1: 'a', param2: 'b' })
        # The after_process will run after the parent_schedule successfully finishes
    process = project.deploy_process('./path/to_after_cloud_connect_directory', name: 'After Process')
        # Note passing the parent_schedule instead of a cron expression
    process.create_schedule(parent_schedule, 'graph/after_graph.grf', params: { param1: 'a', param2: 'b' })
  end
end

Changing an existing schedule

by Tomas Svarovsky

Problem

You have a schedule but would like to change it

Solution

You can retrieve a schedule in the same way as any other object, use it’s methods to change it and save it.

# encoding: utf-8

require 'gooddata'

schedule_id = 'fill_in'

GoodData.with_connection do |client|
  project = client.projects('project_id')
  schedule = project.schedules(schedule_id)
  # you can change pretty much anything

  # executable
  schedule.executable = 'graph/new_graph.grf'

  # params
  schedule.params
  # {
  #   "PROCESS_ID"=>"c42c1b82-7d6f-433a-b008-9cdb1d454e01",
  #   "EXECUTABLE"=>"new_main.rb",
  #   :a=>:b
  # }
  schedule.set_param(:a, :c)
  schedule.params
  # {
  #   "PROCESS_ID"=>"c42c1b82-7d6f-433a-b008-9cdb1d454e01",
  #   "EXECUTABLE"=>"new_main.rb",
  #   :a=>:c
  # }
  schedule.update_params({
    :a => 42,
    :b => [1,2,3]
  })
  schedule.params
  # {
  #   "PROCESS_ID"=>"c42c1b82-7d6f-433a-b008-9cdb1d454e01",
  #   "EXECUTABLE"=>"new_main.rb",
  #   :a => 42,
  #   :b=>[1, 2, 3]
  # }

  # reschedule
  schedule.reschedule # => 0
  schedule.reschedule = 15

  # name
  schedule.name # => "Some Name"
  schedule.name = "Better Name"

  # enable/disable
  schedule.state # => "ENABLED"
  schedule.disable
  schedule.state # => "DISABLED"
  schedule.enable
  schedule.state # => "ENABLED"

  # cron expression
  schedule.cron # => "1 1 1 * *"
  schedule.cron = "1 1 1 1 *"

  # "run after" schedule
  after_schedule = project.schedules('some_id')
  schedule.after = after_schedule

  # Do not foreget to save it
  schedule.save
end

Investigating Executions

by Tomas Svarovsky

Problem

You have large amount of projects and would like to investigate their execution. The administration UI is not necessarilly helpful since its overview UI can hide some failures, there is too much too go through by hand etc.

Solution

Let’s have a look at couple of scenarios. First let’s investigate executions inside one project regardless of which schedule it was triggered by. Let’s assume first that we are just looking for any executions that failed for that particular project. We will print the date when it happend sorted in a ascending fashion.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  GoodData.with_project('project_id') do |project|
    results = project.schedules
                     .pmapcat { |s| s.executions.to_a }  # take all their executions (execute in parallel since this goes to API)
                     .select(&:error?) # select only those that failed
    pp results.map(&:started).sort.uniq
  end
end

Now imagine that you are looking for executions executed by a particular schedule.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  GoodData.with_project('project_id') do |project|
    results = project.schedules
                     .select { |s| s.name == 'user_filters_schedule' } # filter on those that have particular name
                     .pmapcat { |s| s.executions.to_a } # take all their executions (execute in parallel since this goes to API)
                     .select(&:error?) # select only those that failed
    pp results.map(&:started).sort.uniq
  end
end

Lets' make it even more specific and let’s look for a specific term in the error message. Let’s say that "unsynchronized" is the word we are looking for.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  GoodData.with_project('project_id') do |project|
    results = project.schedules
                     .select { |s| s.name == 'user_filters_schedule' } # filter on those that have particular name
                     .pmapcat { |s| s.executions.to_a } # take all their executions (execute in parallel since this goes to API)
                     .select(&:error?) # select only those that failed
                     .select { |e| e.json['execution']['error']['message'] =~ /unsynchronized/ } # select those that contain particular message in error message
    pp results.map(&:started).sort.uniq
  end
end

Sometimes the error does not manfiest itself in the error message directly and you need to look into logs. Take note that in both last cases we are getting the log and the error message as a string so you have full power of ruby to process it. Here we are using regular expressions which by itself gives you siginificant power but you can go even deeper if you need.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  GoodData.with_project('project_id') do |project|
    results = project.schedules
                     .select { |s| s.name == 'user_filters_schedule' } # filter on those that have particular name
                     .pmapcat { |s| s.executions.to_a } # take all their executions (execute in parallel since this goes to API)
                     .select(&:error?) # select only those that failed
                     .select { |e| e.log =~ /unsynchronized/ } # select those that contain particular message in log
    pp results.map(&:started).sort.uniq
  end
end

Last example we will show is just a small extension. Imagine you would like to perform the same analysis on all projects in your account. This is usually the case since this type of analysing executions get exponentially more useful with growing number of executions or projects you need to investigate.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
    results = client.projects # take all projects
                    .pmapcat(&:schedules) # take all their schedules (execute in parallel since this goes to API)
                    .select { |s| s.name == 'user_filters_schedule' } # filter on those that have particular name
                    .pmapcat { |s| s.executions.to_a } # take all their executions (execute in parallel since this goes to API)
                    .select(&:error?) # select only those that failed
                    .select { |e| e.log =~ /unsynchronized/ } # select those that contain particular message in log
    pp results.map(&:started).sort.uniq
end

Visualizing executions

by Tomas Svarovsky

Problem

You successfully mudularized your ETL into several modules you orchestrated. The problem is that it is hard to visualize the order of execution from the Data Administration console.

Solution

There are plethora of very useful libraries that you can use in conjunction with GoodData SDK. One of those is Graphviz that is a C library but it has bindings to almost every language including Ruby. Graphviz is a visualization library and one of the features is visualizing Direct Acycclic Graphs which is exactly what an execution of several schedules bascially is.

As a prerequisite you have to install both Graphviz and Graphviz ruby bindings. I leave this as a exercise for the reader because this might be a little bit difficult and unfortuntely is different for every platform. If you encounter any errors try googling them or shoot us a message on github or support

# encoding: utf-8

require 'gooddata'
require 'graphviz'

PROJECT_ID = 'PROJECT_ID' # fill_in

GoodData.with_connection do |client|
  GoodData.with_project(PROJECT_ID) do |project|
    schedules = project.schedules

    nodes = project.processes.pmapcat { |p| p.schedules.map { |s| [s, "#{p.name}-#{s.name}"] } }
    edges = schedules.reject(&:time_based?).pmap {|s| ["#{s.after.process.name}-#{s.after.name}", "#{s.process.name}-#{s.name}"]}


    g = GraphViz.new(:G, :type => :digraph , :rankdir => 'TB')
    nodes.each { |s, n|
      node = g.add_nodes(n)
      node[:shape] = 'box'
      node[:label] = n + "\n#{s.cron}"
    }

    edges.each { |a, b| g.add_edges(a, b) }
    g.output(:png => "run_dag.png")

    # Now you can open it for example on mac by running this on terminal
    # open -a Preview run_dag.png
  end
end

08 Working With Reports

Computing Report

by Tomas Svarovsky

Problem

You have an existing report and you would like to execute it programatically.

Solution

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|
    puts project.reports(1234).execute
  end
end

Comparing Reports across Projects

by Tomas Svarovsky

Problem

You created a new version of a project. You made some changes to the reports and you would like to verify that the report is still computing the same numbers.

Solution

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  original_project = c.projects('project_id_1')
  new_project = c.projects('project_id_2')

  orig_report = GoodData::Report.find_first_by_title('Sum of Revenue', client: c, project: original_project)
  new_report = GoodData::Report.find_first_by_title('Sum of Revenue', client: c, project: new_project)

  orig_result = orig_report.execute
  new_result = new_report.execute

  puts orig_result == new_result
end

Discussion

If there is more reports this can obviously take a lot of time so it would be nice if you could do computation or reports in parallel and not sequentially. Imagine we have a list of reports that should be checked tagged by tag to_check. Let’s rewrite previous code snippet to be parallel friendly.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  original_project = c.projects('project_id_1')
  new_project = c.projects('project_id_2')

  # We assume that reports have unique name inside a project
  orig_reports = GoodData::Report.find_by_tag('to_check', client: c, project: original_project).sort_by(&:title)
  new_reports = GoodData::Report.find_by_tag('to_check', client: c, project: new_project).sort_by(&:title)

  results = orig_reports.zip(new_reports).pmap do |reports|
    # compute both reports and add the report at the end for being able to print a report later
    reports.map(&:execute) + [reports.last]
  end

  results.map do |res|
    orig_result, new_result, new_report = res
    puts "#{new_report.title}, #{orig_result == new_result}"
  end
end

Updating report definition

by Tomas Korcak

Problem

You have report and you want to update report definition in easy way. Perhaps you need to modify multiple reports in one or more projects.

Solution

Use the Report#update_definition with block argument in following way.

require 'gooddata'
require 'pp'

PID = 'rq3enqarynvkt7q11u0stev65qdwpow8'
REPORT = '/gdc/md/rq3enqarynvkt7q11u0stev65qdwpow8/obj/1323'

GoodData.with_connection do |c|
  GoodData.with_project(PID) do |project|
    report = project.reports(REPORT)

    new_def = report.update_definition do |rdef|
      rdef.title = "Test TITLE: #{DateTime.now.strftime}"
    end

    pp new_def
  end
end

Discussion

Specify :new_definition ⇒ false if you do want to update the latest report definition instead of creating new one. New definition flag is true by default.

require 'gooddata'
require 'pp'

PID = 'rq3enqarynvkt7q11u0stev65qdwpow8'
REPORT = '/gdc/md/rq3enqarynvkt7q11u0stev65qdwpow8/obj/1323'

GoodData.with_connection do |c|
  GoodData.with_project(PID) do |project|
    report = project.reports(REPORT)

    new_def = report.update_definition(:new_definition => false) do |rdef|
      rdef.title = "Test TITLE: #{DateTime.now.strftime}"
    end

    pp new_def
  end
end

Creating Report that Counts Records in All Datasets

by Tomas Svarovsky

Problem

Occasionally you need to know how many rows there are in each dataset.

Solution

This is surprisingly difficult to do in GoodData UI but it is simple with SDK. Here we are going to create the necessary metrics on the fly through inspection of a blueprint. Then we will create a report that will contain those metrics and compute it.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|
    blueprint = project.blueprint

    # let's grab anchor on each dataset. Anchor is a special attribute on each dataset
    # this attribute defines the grain of the dataset if we "count" it we will get the number of lines
    # in the dataset
    anchors = blueprint.datasets.map(&:anchor)

    # As is explained in Blueprint section. Objects in blueprint are project agnostic.
    # Let's find corresponding attribute object in specific project
    attributes = anchors.pmap { |anchor| anchor.in_project(project) }

    # Lets create a report on the fly that is going to have the metrics in the rows
    # Take note that this is a real report on the platform that could be saved and alter reused
    puts project.compute_report(left: attributes.map(&:create_metric))

    # This might result into something like this
    #
    # [                                                            |   Values   ]
    # [count of Records of timeline                                | 0.7306E4   ]
    # [count of Activity                                           | 0.61496E6  ]
    # [count of Opportunity                                        | 0.85171E6  ]
    # [count of Product                                            | 0.5E1      ]
  end
end

Discussion

Reseting Report Color Map

by Tomas Svarovsky

Problem

You have a report somebody set up a colormapping on. You would like to remove it.

Solution

# encoding: utf-8

GoodData.with_connection do |client|
  GoodData.with_project('project_id') do |project|
    ids = [report_id_1, report_id_2, report_id_3]
    ids.each do |id|
      r = project.reports(id)
      d = r.latest_report_definition
      d.reset_color_mapping!
      d.save
    end
  end
end

Computing Ad Hoc Report

by Tomas Svarovsky

Problem

You would like to make some ad hoc computation

Solution

We are using the recipe Working with HR Demo project here so spin it up if you want to follow the code.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|

    # first let's create a metric and give it a reasonable identifier so we can read the examples
    m = project.facts('fact.salary.amount').create_metric
    m.identifier = "metrics.my_metric"
    m.save

    # metric can be referenced directly
    project.compute_report(left: ['metrics.my_metric'],
                           top: ['label.department.region'])

    # or you can pass by reference if you already hold it
    m1 = project.metrics('metrics.my_metric')
    project.compute_report(left: [m1],
                           top: ['label.department.region'])

    # report can take attributes and in that case it will use the default label
    project.compute_report(left: [m1],
                           top: ['attr.department.region', 'dataset.payment.quarter.in.year'])

    # for readability you might shuffle those labels to different section of report
    project.compute_report(left: [m1, 'dataset.payment.quarter.in.year'],
                            top: ['attr.department.region'])


    # there can be more than 1 metric in the group and the metric even does not have to be saved (if it is not it will be saved for you and removed after the computation)
    m2 = project.attributes('attr.salary.id').create_metric
    result = project.compute_report(left: [m1, m2],
                           top: ['attr.department.region'])
    # you can print out the results into console
    puts result
  end
end

Working with report results

by Tomas Svarovsky

Problem

You have computed a report now you would like to get dirty with the data

Solution

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|

    # first let's create a metric and give it a reasonable identifier so we can read the examples
    m1 = project.facts('fact.salary.amount').create_metric
    result = project.compute_report(left: [m1, 'dataset.payment.quarter.in.year'],
                            top: ['attr.department.region'])

    # You can print the result
    puts result
    # [   |               | Europe | North America]
    # [Q1 | sum of Amount | 29490  | 62010        ]
    # [Q2 | sum of Amount | 29490  | 62010        ]
    # [Q3 | sum of Amount | 29490  | 62010        ]
    # [Q4 | sum of Amount | 29490  | 62010        ]

    # You can get size of report
    result.size # => [5, 4]

    # this gives you the overall size but you probably want to also know the
    # size of the data portion
    result.data_size # => [4, 2]

    # you can learn if it is empty which comes handy for reports without data
    result.empty? # => false

    # You can access data as you would with matrix
    result[0][3] # => "North America"
    result[2] # ["Q2", "sum of Amount", "29490", "62010"]

    # You can ask questions about contents
    result.include_row? ["Q4", "sum of Amount", "29490", "62010"] # => true
    result.include_column? ["Europe", "29490", "29490", "29490", "29490"] # => false

    # this is fine but there is a lot fo clutter caused byt the headers. The library provides you with methods to get only slice of the result and creates a new result
    # Let's say I would like to get just data
    puts result.slice(1, 2)
    # [29490 | 62010]
    # [29490 | 62010]
    # [29490 | 62010]
    # [29490 | 62010]

    # This is a worker method that is used to implement several helpers
    # Previous example is equivalent with this
    puts result.data

    puts result.without_top_headers
    # [Q1 | sum of Amount | 29490 | 62010]
    # [Q2 | sum of Amount | 29490 | 62010]
    # [Q3 | sum of Amount | 29490 | 62010]
    # [Q4 | sum of Amount | 29490 | 62010]

    puts result.without_left_headers
    # [Europe | North America]
    # [29490  | 62010        ]
    # [29490  | 62010        ]
    # [29490  | 62010        ]
    # [29490  | 62010        ]

    # All of those are again results so everything above works as expected
    result.data.include_row? ["29490", "62010"] # => true

    # There are several other methods that might make your life easier. Consider the following
    result.diff result.without_top_headers
    # {
    #   :added => [],
    #    :removed => [[nil, nil, "Europe", "North America"]],
    #    :same => [["Q1", "sum of Amount", "29490", "62010"],
    #              ["Q2", "sum of Amount", "29490", "62010"],
    #              ["Q3", "sum of Amount", "29490", "62010"],
    #              ["Q4", "sum of Amount", "29490", "62010"]]
    # }

  end
end

Exporting report results

by Tomas Svarovsky

Problem

You would like to export the report from GoodData into one of the typical formats for further processing or presetnation

Solution

There are a simple way how to export a report from gooddata

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|
    result = project.reports.export(:csv)
    puts result
  end
end
Discussion

There are more formats supported and either of :csv, :xls, :xlsx or :pdf should work.

The export returns the actual data so if you would like to have them stored into a file the responsibility is on you.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|
    report = project.reports('report_identifier')
    File.open('export.csv', 'wb') { |f| f.write(report.export(:csv)) }
  end
end

Removing old versions from report

by Tomas Svarovsky

Problem

Occasionally you would like to clean up your project. The simplest way to do it is to get rid of the old version of reports.

Solution

There are a simple way how to export a report from gooddata

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|
    project.reports.peach do |report|
      r.report.purge_report_of_unused_definitions!
    end
  end
end

Computing/creating reports with filters

by Tomas Svarovsky

Problem

Computing report is great but without filter it is not that great.

Solution

While SDK does not provide full support for all types of filters. There are couple of useful wrappers that make it easier. If you need something special you can always go to the raw APIs. Currently there are two types of filters supported directly.

The general shape of the solution looks like this

project.compute_report(:left => project.metrics.first, :filters => [])

for project computation on the fly (no report is saved) or like this

project.create_report(title: 'best report ever with filter',
                      left: project.metrics.first,
                      filters: [])

For creating and persisting a report.

Variable filter

Variable filter is very simple. You just provide the variable into the filter.

var = project.variables('my_variable_identifier')
puts project.compute_report(left: project.metrics.first,
                            filters: [var])
Attribute value filter

This is probably the most commonly used filter type and it filters certain attribute on certain values. Imagine "WHERE City IN [San Francisco, Prague]". You can set it up easily like this.

label = project.labels('label.regions.city.name')
puts project.compute_report(left: project.metrics.first,
                            filters: [[label, 'San Francisco', 'Prague']])

# You can also use a variation of NOT equal

label = project.labels('label.regions.city.name')
puts project.compute_report(left: project.metrics.first,
                            filters: [[label, :not, 'San Francisco', 'Prague']])

09 Working With Facts Metrics And Attributes

NOTE: All 'metric' related methods have their 'measure' counterpart alias. Here are few examples.

  • Project#add_metric ⇐⇒ Project#add_measure

  • Project#create_metric ⇐⇒ Project#create_measure

  • Project#compute_metric ⇐⇒ Project#compute_measure

  • Project#metrics ⇐⇒ Project#measures

  • Project#metric_by_title ⇐⇒ Project#measure_by_title

  • Project#metrics_by_title ⇐⇒ Project#measures_by_title

See the gooddata gem’s Ruby reference documentation for the complete list of aliases.

Creating Metrics from Attributes

by Patrick McConlogue, Tomas Svarovsky

Problem

You have several attributes in a project. You would like to create some basic metric out of them.

Prerequisites

You have to have existing project with model and data loaded.

Solution

# encoding: UTF-8

require 'gooddata'

# Connect to GoodData platform
GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|
    attribute = project.attributes('attr.devs.dev_id')
    metric = attribute.create_metric(:title => "Count of [#{attribute.identifier}]")
    metric.save
    metric.execute
  end
end

Creating Metrics from Facts

by Patrick McConlogue, Tomas Svarovsky

Problem

You have several facts in a project. You would like to create some basic metric out of them.

Prerequisites

You have to have existing project with model and data loaded.

Solution

# encoding: UTF-8

require 'gooddata'

# Connect to GoodData platform
GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|
    fact = project.facts('fact.commits.lines_changed')
    metric = fact.create_metric(:title => "Sum of [#{fact.identifier}]")
    res = metric.execute
    puts res

    # if you like the metric you can save it of course for later usage
    metric.save

    # Default aggregation is SUM but you can also specify a different one
    metric = fact.create_metric(:title => "Min of [#{fact.identifier}]", type: :min)
  end
end

Write Information about project’s metrics and attributes to CSV

by Tomas Korcak, Tomas Svarovsky

Problem

You would like to store information about all project metrics and attributes in CSV.

Prerequisites

You have to have existing project with metric(s) and attribute(s).

Solution

Metrics
# encoding: UTF-8

require 'gooddata'

# Connect to GoodData platform
GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|
    CSV.open(project.pid + "_metrics.csv", 'wb') do |csv|
      data = project.metrics.pmap do |metric|
        [metric.title, metric.pretty_expression]
      end
      data.each do |m|
        csv << m
      end
      puts 'The CSV is ready!'
    end
  end
end

It is a simple script that iterates over metrics (remember report specific metrics are not included in the list) and collects some fields. In our case it is title and pretty printed metric’s MAQL expression. If you would like to get more information, just add them to the list. In the end this list is formatted as a valid CSV so any reasonable CSV parser should be able to load it.

Attributes

You also might like to export attributes. The script itself is very similar.

# encoding: UTF-8

require 'gooddata'

# Connect to GoodData platform
GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|
    CSV.open(project.pid + "_attributes.csv", 'wb') do |csv|
      data = project.attributes.pmap do |attribute|
        [attribute.title, attribute.identifier]
      end
      data.each do |m|
        csv << m
      end
      puts 'The CSV is ready!'
    end
  end
end

Discussion

Folders

Many times people want to also include the information about the folder the metric/attribute is in. While SDK does not provide direct support for it here is a little workaround to make it possible.

Attributes
# encoding: UTF-8

require 'gooddata'

# Connect to GoodData platform
GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|

    folder_cache = c.get(project.md['query'] + '/dimensions')['query']['entries'].reduce({}) do |a, e|
      a[e['link']] = project.objects(e['link'])
      a
    end

    CSV.open(project.pid + "_attributes.csv", 'wb') do |csv|
      data = project.attributes.pmap do |attribute|
        [attribute.title, attribute.identifier, attribute.content['dimension'] && folder_cache[attribute.content['dimension']].title]
      end
      data.each do |m|
        csv << m
      end
      puts 'The CSV is ready!'
    end
  end
end
Metrics
# encoding: UTF-8

require 'gooddata'

# Connect to GoodData platform
GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|

    folder_cache = c.get(project.md['query'] + '/folders?type=metric')['query']['entries'].reduce({}) do |a, e|
      a[e['link']] = project.objects(e['link'])
      a
    end

    CSV.open(project.pid + "_metrics.csv", 'wb') do |csv|
      data = project.metrics.map do |metric|
        folder = metric.content.key?('folders') && metric.content['folders'].is_a?(Enumerable) && metric.content['folders'].first
        [metric.title, metric.identifier, folder_cache[folder] && folder_cache[folder].title]
      end
      data.each do |m|
        csv << m
      end
      puts 'The CSV is ready!'
    end
  end
end

Changing Metric’s Number Formatting

by Tomas Svarovsky

Problem

You have a project and you would like to update formatting of all metrics programatically. They are currently formatted for dollar values but you would like to change the all formats to Euro.

Solution

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|
    project.metrics.pmap do |metric|
      metric.content['format'] = metric.content['format'].gsub('$', '')
      metric.save
    end
  end
end

Creating Metrics

by Zdenek Svoboda

Problem

You want to create advanced MAQL metric.

Prerequisites

You have to have existing project with model and data loaded.

Solution

# encoding: UTF-8

require 'gooddata'

# Connect to GoodData platform
GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|
    metric = project.add_measure 'SELECT PERCENTILE(#"Amount",0.9)',
     title: 'Salary Amount [90th Pct]'
    metric.save
    metric.execute

    metric = project.add_measure 'SELECT PERCENTILE(![fact.salary.amount],0.9)',
     title: 'Salary Amount [90th Pct] V2'
    metric.save
    metric.execute

    metric = project.add_measure 'SELECT PERCENTILE([/gdc/md/ptbedvc1841r4obgptywd2mzhbwjsfyr/obj/223],0.9)',
     title: 'Salary Amount [90th Pct] V3'
    metric.save
    metric.execute

  end
end

Discussion

Please note that the MAQL statement uses three ways how to reference the underlying objects (e.g. facts or metrics that are part of the MAQL statement)

  • #"Amount" for referencing the fact (or metric) via its name (title)

  • ![fact.salary.amount] for referencing the fact (or metric) via its identifier

  • [/gdc/md/ptbedvc1841r4obgptywd2mzhbwjsfyr/obj/223] for referencing the fact (or metric) via its uri

Creating Metric with filter

by Tomas Svarovsky

Problem

You want to create a more complicated metric that has filter on values included

Solution

In this case we will actually create the raw MAQL metric along with the filter values. The main problem is that you have to find out the URLs of all the objects and values. This is generally tricky but SDK can simplify this a bit.

We will try to create a metric that looks like this in "human readable" MAQL.

SELECT COUNT(City) WHERE Continent IN ('Europe', 'Africa')

This is actually not a MAQL that would be possible to post to the API. You have to translate all the objects into theire valid URLs. The MAQL then might look like this (obviously the URLs will look different in your particular case)

SELECT COUNT([/gdc/md/e8pid3efwftbc3pc13nnnau4xymb0198/obj/23]) WHERE [/gdc/md/e8pid3efwftbc3pc13nnnau4xymb0198/obj/72] IN ([/gdc/md/e8pid3efwftbc3pc13nnnau4xymb0198/obj/72/elements?id=0], [/gdc/md/e8pid3efwftbc3pc13nnnau4xymb0198/obj/72/elements?id=1])

Let’s have a look how we might write a code that does this

# encoding: UTF-8

require 'gooddata'

# Connect to GoodData platform
GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|

    # Let's find the city attribute - here we assume identifer atttribute.cities.city
    city_attribute = project.attributes('atttribute.cities.city')

    # Let's find the continent label - here we assume identifer label.cities.continent.name
    continent_label = project.labels('label.cities.continent.name')
    filter_values = ['Europe', 'Africa'].map { |val| "[#{continent_label.find_value_uri(val)}]" }.join(', ')

    m = project.create_metric("SELECT COUNT([#{city_attribute.uri}]) WHERE #{continent_label.uri} IN #{filter_values}", extended_notation: false)
    puts m.execute
  end
end

10 Working With Dashboards

Listing Dashboards

by Tomas Korcak

Problem

You would like to list dashboards programmatically.

Prerequisites

You have to have existing project with dashboard(s).

Solution

# encoding: UTF-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|
    # List all dashboards and their names
    pp project.dashboards.map(&:title)
  end
end

Listing Dashboard Tabs

by Tomas Svarovsky

Problem

You would like to list dashboards and their tabs programmatically.

Prerequisites

You have to have existing project with dashboard(s).

Solution

# encoding: UTF-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|
    # You can list tabs of a specific dashoard and print their titles
    pp project.dashboards(123).tabs.map(&:title)

    # Sometimes it is very useful to get a sense on what tabs are where
    # We will print dashboard title, tab title tuples
    pp project.dashboards.flat_map { |d| d.tabs.map { |t| [d.title, t.title] } }
    # ....
    #  ["Sales Reports", "Damage"],
    #  ["Sales Reports", "Storage"],
    #  ["Sales Reports", "Assignment"]
    # ....

    # Another thing that might be useful is to compute how many tabs
    # each of the dashboard has
    pp project.dashboards.map { |d| [d.title, d.tabs.count] }

    # [["Support Reports", 4],
    #  ["Sales Reports", 10],
    #  ["Insurance Dashboard", 1],
    #  ["Inventory", 10],
    #  ["Email Scheduling ", 1]]
  end
end

Listing Dashboard Tabs

by Tomas Svarovsky

Problem

You would like to work with dashboard tab a little. List how many reports are on the tab, if they are filtered etc.

Prerequisites

You have to have existing project with dashboard(s).

Solution

# encoding: UTF-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|

    t = project.dashboards('dashboard_id').tabs.find { |t| t.identifier == 'tab_identifier' }

    # How many items are there on the tab?
    t.items.count

    # The items count also several utility item types. Usually what you are interested in is
    # Reports and filters
    # How many reports are there on the tab?
    t.items.select { |i| i.is_a? GoodData::ReportItem }.count
    # => 6

    # Are there any filters on this tab?
    t.items.any? { |i| i.is_a? GoodData::FilterItem }
    # => false

    # It might be useful to see how many report are on each tab of each dashboard
    project.dashboards.pmapcat { |d| d.tabs.map { |t| [d.title, t.title, t.items.select { |i| i.is_a? GoodData::ReportItem }.count] }}

    # In a similar vein. Which tabs do have any filters on tabs?
    project.dashboards
      .pmapcat { |d| d.tabs.map { |t| [d.title, t.title, t.items.select { |i| i.is_a? GoodData::FilterItem }.count] }}
      .select { |_, _, i| i > 0 }

    # On each item there are properties that you can access.
    # On each type you can access the position and size
    item = tab.items.find { |i| i.is_a? GoodData::ReportItem }
    item.position_y
    # => 130
    item.size_y
    # => 50

    # With this you can for example find the bottom most element on each page. From this you can
    # find out if there are not any tabs that are too "long". Depends on the usage of the dashboard
    # but if it is an operational dashboard if users need to scroll down it might decrease the
    # usefulness of the particular dashboard.
    #
    # Let's say we would like to find tabs that are longer than 500 pixels
    tuple_with_lowest_item = project.dashboards.pmapcat { |d| d.tabs.map do
      # pick an item whose y position + vertical size is the largest (ie it is lowest on the page)
      |t| [d.title, t.title, t.items.max { |a, b| (a.position_y + a.size_y) <=> (b.position_y + b.size_y) }]} #
    end
    tuple_with_lowest_item
      .map {|d, t, m| [d, t, m.position_y + m.size_y]} # Convert it to actual size
      .select { |_, _, d| d > 800 } # Filter those that are larger than a particular threshold

    # With GoodData::ReportItem you can access the underlying report and do whatever is doable with a report
    # For example executing it. Remeber though that the report on the dashboard is executed with additional
    # context like filters etc so the results are not going to be the same.
    puts tab.items.find { |i| i.is_a? GoodData::ReportItem }.execute
  end
end

11 Working With Data Permissions

Setting Up Simple Data Permission

by Tomas Svarovsky

Problem

You would like to set a data permission (user filter or MUF filter) for one or two users.

Solution

SDK offers couple of convenience features for doing this. Let’s first recap what we need to set up a data permission filter. We’ll be setting a simple filter along these lines

WHERE city IN (San Francisco, Prague, Amsterdam)
  • We need to know the label to filter. In our case this is the city label

  • We need to know the label’s values. In our case these are (San Francisco, Prague, Amsterdam)

  • We also need to know the user assigned with the filter. We’ll use your account in the example (you may use any valid user).

Although we present this as an executable script this is something that is often done interactively. So do not fear to jack in to a project and follow along.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|
    # First let's grab an attribute
    attribute = project.attributes('attr.region.city')
    # to set up a value we need a specific label
    # if the attribute has only one label, you can easily grab it by calling #primary_label
    label = attribute.primary_label
    # if a label has multiple labels, you can select the correct one like this
    # label = attribute.label_by_name('City Name')
    # Let's construct filters we are going to set up
    # We will do it for two hypothetical users
    filters = [
      ['john.doe@example.com', label.uri, 'San Francisco', 'Amsterdam'],
      ['jane.doe@example.com', label.uri, 'San Francisco', 'Prague']
    ]
    # Obvious question might be how do you know that the values are correct
    # you can find out like this label.value?('Amsterdam')
    # Let's now update the project
    project.add_data_permissions(filters)
  end
end

List Data Permissions

by Tomas Svarovsky

Problem

You have a project that has the data permissions set up. You would like to review them.

Solution

There is no UI that would provide a good overview and API is a little crude but with help of SDK you can get around that.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|
    project.data_permissions.pmap {|f| [f.related.login, f.pretty_expression]}
  end
end

Setting Data Permissions from CSV (column format)

by Tomas Svarovsky

Problem

You would like to set adata permissions for multiple users. You have a CSV with two columns that associate user with a single data permssion’s value. We’ll use the same city example from the examples above.

Solution

SDK offers couple of convenience features for doing this. Let’s first recap what we need to set up a filter.

In this case we will be setting a simple data permission for the city label in form of following filter

WHERE city IN ('San Francisco', 'Prague', 'Amsterdam')
  • We need to know the label to filter. In our case this is the city label

  • We need to know the label’s values. In our case these are (San Francisco, Prague, Amsterdam)

  • We also need to know the user assigned with the filter. We’ll use your account in the example (you may use any valid user).

Let’s say we want to set up these specific values

['john.doe@example.com', 'San Francisco', 'Amsterdam']
['jane.doe@example.com', 'San Francisco', 'Prague']

We’ll capture these data permissions in the following CSV (data.csv)

login,city
john.doe@example.com,San Francisco
john.doe@example.com,Amsterdam
jane.doe@example.com,San Francisco
jane.doe@example.com,Prague
# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|
    # First let's grab an attribute
    attribute = project.attributes('attr.region.city')
    # to set up a value we need a specific label
    # if the attribute has only one label, you can easily grab it by calling #primary_label
    label = attribute.primary_label
    # if a label has multiple labels, you can select the correct one like this
    # label = attribute.label_by_name('City name')
    filters = GoodData::UserFilterBuilder::get_filters('data.csv', {
      :type => :filter,
      :labels => [{:label => label, :column => 'city'}]
      })
    project.add_data_permissions(filters)
  end
end
Preconditions

Several things has to be true for this code to work correctly

  • All the users are members of the target project

  • All the label’s (city) values are present in the data loaded in the project

Setting Data Permissions from CSV (row format)

by Tomas Svarovsky

Problem

You would like to set adata permissions for multiple users. You have a CSV with two columns that associate user with a single data permssion’s value. We’ll use the same city example from the examples above.

Solution

SDK offers couple of convenience features for doing this. Let’s first recap what we need to set up a filter.

In this case we will be setting a simple data permission for the city label in form of following filter

WHERE city IN ('San Francisco', 'Prague', 'Amsterdam')
  • We need to know the label to filter. In our case this is the city label

  • We need to know the label’s values. In our case these are (San Francisco, Prague, Amsterdam)

  • We also need to know the user assigned with the filter. We’ll use your account in the example (you may use any valid user).

Let’s say we want to set up these specific values

['john.doe@example.com', 'San Francisco', 'Amsterdam']
['jane.doe@example.com', 'San Francisco', 'Prague']

We’ll capture these data permissions in the following CSV (data.csv)

john.doe@example.com,San Francisco,Amsterdam
jane.doe@example.com,San Francisco,Prague,Berlin

Please note that the CSV format is different frpom the previous example. There are no headers because the file can have different number of columns on each line.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|
    # First let's grab an attribute
    attribute = project.attributes('attr.region.city')
    # to set up a value we need a specific label
    # if the attribute has only one label, you can easily grab it by calling #primary_label
    label = attribute.primary_label
    # if a label has multiple labels, you can select the correct one like this
    # label = attribute.label_by_name('City name')
    filters = GoodData::UserFilterBuilder::get_filters('data.csv', {
      :type => :filter,
      :labels => [{:label => label}]
      })
    project.add_data_permissions(filters)
  end
end
Preconditions

Several things has to be true for this code to work correctly

  • All the users are members of the target project

  • All the label’s (city) values are present in the data loaded in the project

List Variable Values

by Tomas Svarovsky

Problem

You have variable in your project and you would like to see the values in a readable fashion.

Solution

SDK tries to hide the differences from Data Permissions so this recipe is analogy to List Data Permissions recipe.

# encoding: utf-8

require 'gooddata'

# fill this in
VARIABLE_IDENTIFIER = 'variable_identifier'
PROJECT_ID = 'project_id'

GoodData.with_connection do |c|
  GoodData.with_project(PROJECT_ID) do |project|
    var = project.variables(VARIABLE_IDENTIFIER)
    var.values.pmap do |var|
      owner = var.related.class == GoodData::Project ? "[PROJECT DEFAULT]" : var.related.login rescue nil
      [owner, var.pretty_expression]
    end
  end
end

Unlike the Data Permissions Variables has a concept of default value for the project. If a specific values is not provided for a user the project default is used. You can noticed these users to have "TRUE" as a value. The project value is marked as [PROJECT DEFAULT] in the output.

12 Working With Blueprints

What is blueprint

Before we delve into examples involving blueprints let’s talk a moment what a blueprint is and what is a difference between blueprint and model when we talk about them at Gooddata.

They mean the same thing but they are representing it in a different way and they live at different places. Both of them define a model of analytic application.

The main difference is the fact that when we talk about model we mean a specific instance of a project which has specific instance of model with all its attributes and facts that live at GoodData platform. Each field has a URI and object_id and you can query it using the platform API. You can also ask questions like "What object in the model has object_id 1532?" or "Can you give me a link to dataset 'users' in the model?".

When we talk about blueprint on the other hand we mean the abstract prescription that tells us how to create the model. Blueprint does not contain many of the details most prominently object_ids and URIs and strictly speaking each time you use blueprint to create a model those things might be different. When a model is created from blueprint as a sideffect all the URIs and id are created.

So what are the benefits of the blueprint you might ask? Blueprint was specifically designed so you can have a description of the model that can live outside of GoodData platform. It has textual representation which you can commit into git and work with it in a programmatic way. Second it is intended that the many projects are created from one blueprint.

Creating Project from Blueprint

by Tomas Svarovsky

Problem

You would like to create a new project with a data model programatically.

Solution

# encoding: utf-8

require 'gooddata'

client = GoodData.connect

blueprint = GoodData::Model::ProjectBlueprint.build("My project from blueprint") do |p|
  p.add_date_dimension('created_on')

  p.add_dataset('dataset.users') do |d|
    d.add_anchor('attr.users.id')
    d.add_date('created_on')
    d.add_fact('fact.users.some_number')
  end
end

blueprint.valid? # => true

project = client.create_project_from_blueprint(blueprint, auth_token: 'token')

# After the project is created (might take a while) you can start using it
project.title # => "My project from blueprint"
project.datasets.count # => 2
project.facts('fact.users.some_number').identifier # => 'fact.users.some_number'

This created a project with very simple model with just 2 datasets. One is date dimension. The other is a typical fact table. For another more complex example chec out the recipe "Working with HR Demo project" which uses many of the features we will explain here.

Discussion

Let’s have a look at couple of other variations and more complex examples.

Defining identifiers

Majority of the objects defined in a blueprint will end up as object in metadata server in the project. Each of these objects has its URI, object id (this number is part of the URI) and identifier which is a textual id. URI and object id are created automatically during creation of a model and you cannot influence them in any way but you have to define the identifiers. This is also the first parameter in majority of the add_…​ commands. Namely

add_anchor
add_label
add_dataset
add_fact
add_attribute

When you see this in the blueprint

p.add_dataset('dataset.users')

It means that later you would be able to do

project.datasets('dataset.users') # this will search all the datasets and returns you the one with identifier 'dataset.users'.

Similarly

d.add_fact('fact.users.some_number')

will result into you be able to do

project.facts('fact.users.some_number') # this will search all the facts and returns you the one with identifier 'fact.users.some_number'.

Identifier can be anything. The only condition is that it has to be unique in the context of a project. No 2 objects may have the same identifer. That being said it is useful to have some kind of convention how you assign the identifiers.

Exception to this rule are references and date_references which we will discuss separately.

Defining attributes

When you define attributes through add_attribute you have to remember to add at least one label to that particular attribute

blueprint = GoodData::Model::ProjectBlueprint.build("My project from blueprint") do |p|
  p.add_dataset('dataset.users') do |d|
    d.add_anchor('attr.users.id')
    d.add_attribute('attr.users.name')
  end
end

blueprint.valid? # => false
blueprint.validate # => [{:type=>:attribute_without_label, :attribute=>"attr.users.name"}]

You can do it like this

blueprint = GoodData::Model::ProjectBlueprint.build("My project from blueprint") do |p|
  p.add_dataset('dataset.users') do |d|
    d.add_anchor('attr.users.id')
    d.add_attribute('attr.users.name')
    d.add_label('label.users.name.full_name', reference: 'attr.users.name')
    d.add_label('label.users.name.abbreviated_name', reference: 'attr.users.name')
  end
end

blueprint.valid? # => true
blueprint.validate # => []
Defining anchors/connection_points

Since you might argue that anchor (you might also hear term connection point which means the same thing) is a special case of the attribute lets' talk about it a little. Yes it is true but there are additional things that make it that special one. There can be only one anchor in each dataset

blueprint = GoodData::Model::ProjectBlueprint.build("My project from blueprint") do |p|
  p.add_dataset('dataset.users') do |d|
    d.add_anchor('attr.users.id')
    d.add_anchor('attr.users.id2')
  end
end

blueprint.valid? # => false
blueprint.validate # => [{:type=>:more_than_on_anchor, :dataset=>"dataset.users"}]

Anchor is the thing you can reference from other datasets. If you want to do that you have to define a label. Anchor can have multiple labels same as attribute. We strongly recommend not to define anchor with labels on fact tabels (they are usually not referenced). The only exception to this rule is if you need to upsert data.

blueprint = GoodData::Model::ProjectBlueprint.build("My project from blueprint") do |p|
  p.add_dataset('dataset.users') do |d|
    d.add_anchor('attr.users.id')
    d.add_label('label.users.id', reference: 'attr.users.id')
    d.add_attribute('attr.users.name')
    d.add_label('label.users.name.full_name', reference: 'attr.users.name')
  end

  p.add_dataset('dataset.sales') do |d|
    d.add_anchor('attr.sales.id')
    d.add_fact('fact.sales.amount')
    d.add_reference('dataset.users')
  end
end

blueprint.valid? # => true

Good question is "why you have to define the anchor if it has no labels?". The reason is that you still need the underlying attribute if you want to construct the count metric for fact table to answere question "How many lines there is in the 'dataset.sales' dataset?". You would do it as follows with SDK (with previous model).

project.attributes("attr.sales.id").create_metric.execute
Defining date dimensions

In all tools and even in MAQL date dimensions are reprseneted as single unit (as in blueprint builder add_date_dimension). This is great for readability but might be misleading. The fact is that date dimension is several datasets that contain typically ~18 attributes. If you understand this it is probably not surprising that the parameter to 'add_date_dimension' is not an identifier but a name that will be used in titles and identifiers of all attributes. It is also a name that you can use in add_date function. Here is an example.

blueprint = GoodData::Model::ProjectBlueprint.build("My project from blueprint") do |p|
  p.add_date_dimension('created_on')

  p.add_dataset('dataset.users') do |d|
    d.add_anchor('attr.users.id')
    d.add_fact('fact.users.some_number')
    d.add_date('created_on')
  end
end
Defining references

Typically in your model you need to reference other datasets. This is expressed in the blueprint builder with add_reference function. It takes only one parameter which is the identifier of referenced dataset. References do not have identifier since they are not represented as objects on the platform.

blueprint = GoodData::Model::ProjectBlueprint.build("My project from blueprint") do |p|
  p.add_dataset('dataset.users') do |d|
    d.add_anchor('attr.users.id')
    d.add_attribute('attr.users.name')
    d.add_label('attr.users.name.full_name', reference: 'attr.users.name')
  end

  p.add_dataset('dataset.sales') do |d|
    d.add_anchor('attr.sales.id')
    d.add_fact('fact.sales.amount')
    d.add_reference('dataset.users')
  end
end

blueprint.valid? # => true
Defining date references

This is very similar to references but there is additional hint that you are referencing date dimension.

blueprint = GoodData::Model::ProjectBlueprint.build("My project from blueprint") do |p|
  p.add_date_dimension('created_on')

  p.add_dataset('dataset.users') do |d|
    d.add_anchor('attr.users.id')
    d.add_date('created_on')
    d.add_fact('fact.users.some_number')
  end
end
Defining Titles

If you would build and open in the browser any of the models we built up to this point you probably noticed that the titles look off. Since we did not define anything SDK tries to do the right thing and tries to use the identifiers (with some tweaking for readability) as titles. While this might work it is usually not what you want. You can easily fix that by defining the titles explicitly.

blueprint = GoodData::Model::ProjectBlueprint.build("My project from blueprint") do |p|
  p.add_date_dimension('created_on')

  p.add_dataset('dataset.users') do |d|
    d.add_anchor('attr.users.id')
    d.add_date('created_on')
    d.add_fact('fact.users.amount', title: 'Amount Sold')
  end
end

project.facts('fact.users.amount').title # => 'Amount Sold'
Specifying data types

Ocasionally the default datatypes of the fields will not be what you want. You can redefine them for both labels and facts as exepected with parameter :gd_data_type. There is more information about this in a following recipe.

Getting Blueprint from Existing Project

by Tomas Svarovsky

Problem

You have an existing project and would like to reverse engineer it’s blueprint.

Solution

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  GoodData.with_project('project_id') do |project|
    blueprint = project.blueprint

    # now you can start working with it
    blueprint.datasets.count # => 3
    blueprint.datasets(:all, include_date_dimensions: true).count # => 4
    blueprint.attributes.map(&:title)

    # You can also store it into file as json
    blueprint.store_to_file('model.json')
  end
end

Loading Data to Project

by Tomas Svarovsky

Problem

You would like to load data into project.

Solution

To load data you have to have 3 things. Blueprint, project and data.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  blueprint = GoodData::Model::ProjectBlueprint.build('Acme project') do |p|
    p.add_date_dimension('committed_on')

    p.add_dataset('dataset.commits') do |d|
      d.add_anchor('attr.commits.id')
      d.add_fact('fact.commits.lines_changed')
            d.add_attribute('attr.commits.name')
      d.add_label('label.commits.name', reference: 'attr.commits.name')
      d.add_date('committed_on', :format => 'dd/MM/yyyy')
    end
  end

  project = client.create_project_from_blueprint(blueprint, auth_token: 'TOKEN')

  # By default names of the columns are the identifiers of the labels, facts, or names of references
  data = [
    ['fact.commits.lines_changed', 'label.commits.name', 'committed_on'],
    [1, 'tomas', '01/01/2001'],
    [1, 'petr', '01/12/2001'],
    [1, 'jirka', '24/12/2014']]

  project.upload(data, blueprint, 'dataset.commits')

  # Now the data are loaded in. You can easily compute some number
  project.facts.first.create_metric(type: :sum).execute # => 3

  # By default data are loaded in full mode. This means that data override all previous data in the dataset
  data = [
    ['fact.commits.lines_changed', 'label.commits.name', 'committed_on'],
    [10, 'tomas', '01/01/2001'],
    [10, 'petr', '01/12/2001'],
    [10, 'jirka', '24/12/2014']]
  project.upload(data, blueprint, 'dataset.commits')
  project.facts.first.create_metric(type: :sum).execute # => 30

  # You can also load more data through INCREMENTAL mode
  project.upload(data, blueprint, 'dataset.commits', :mode => 'INCREMENTAL')
  project.facts.first.create_metric(type: :sum).execute # => 60

  # If you want to you can also specify what the names of the columns in the CSV is going to be

  blueprint = GoodData::Model::ProjectBlueprint.build('Acme project') do |p|
    p.add_date_dimension('committed_on')

    p.add_dataset('dataset.commits') do |d|
      d.add_anchor('attr.commits.id')
      d.add_fact('fact.commits.lines_changed', column_name: 'fact')
            d.add_attribute('attr.commits.name')
      d.add_label('label.commits.name', reference: 'attr.commits.name', column_name: 'label' )
      d.add_date('committed_on', :format => 'dd/MM/yyyy', column_name: 'ref')
    end
  end

  data = [
    ['fact', 'label', 'ref'],
    [10, 'tomas', '01/01/2001'],
    [10, 'petr', '01/12/2001'],
    [10, 'jirka', '24/12/2014']]
  project.upload(data, blueprint, 'dataset.commits')
end

Loading Multiple Data Sets to Project

by Tomas Korcak

Problem

You would like to load multiple data sets into project at once.

Solution

The GoodData platform supports loading multiple datasets from a set of CSV files in a single task. In addition to loading a single CSV at a time, you can now upload your CSV files, provide a JSON manifest file, and then execute the data load through a single API call. This method is particularly useful if your project contains many datasets, or if you are loading multiple datasets with larger data volumes. The multiple datasets processing is significantly faster in these situations.

For more info see GoodData Article.

# encoding: utf-8

require 'gooddata'
require 'csv'

USERNAME = 'YOUR_USERNAME'
PASSWORD = 'YOUR_PASSWORD'
TOKEN = 'YOUR_TOKEN'

GoodData.with_connection(USERNAME, PASSWORD) do |client|
  # Create LDM blueprint
  blueprint = GoodData::Model::ProjectBlueprint.from_json('data/hr_manifest.json')

  # Create new project (datamart)
  project = GoodData::Project.create_from_blueprint(blueprint, auth_token: TOKEN)
  puts "Created project #{project.pid}"

  data = [
    {
      data: 'data/hr_departments.csv',
      dataset: 'dataset.department',
    },
    {
      data: 'data/hr_employees.csv',
      dataset: 'dataset.employee'
    },
    {
      data: 'data/hr_salaries.csv',
      dataset: 'dataset.salary',
      options: {:mode => 'INCREMENTAL'}
    }
  ]
  res = project.upload_multiple(data, blueprint)

  puts JSON.pretty_generate(res)

  puts 'Done!'
end

Loading Data with Specific Date Format

by Tomas Svarovsky

Problem

You need to upload dates but you have a nonstandard formatting

Solution

You can specify a date loading format in your blueprint. If you do not specify any format then the default MM/dd/yyyy format is used

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  blueprint = GoodData::Model::ProjectBlueprint.build('Acme project') do |p|
    p.add_date_dimension('committed_on')

    p.add_dataset('dataset.commits') do |d|
      d.add_anchor('attr.commits.id')
      d.add_fact('fact.commits.lines_changed')
            d.add_attribute('attr.commits.name')
      d.add_label('label.commits.name', reference: 'attr.commits.name')
      d.add_date('committed_on', :dataset => 'committed_on')
    end
  end

  project = client.create_project_from_blueprint(blueprint, auth_token: 'token')

  # By default the dates are expected in format MM/dd/yyyy
  data = [
    ['fact.commits.lines_changed', 'label.commits.name', 'committed_on'],
    [1, 'tomas', '01/01/2001'],
    [1, 'petr', '12/01/2001'],
    [1, 'jirka', '12/24/2014']]
  project.upload(data, blueprint, 'dataset.commits')
  puts project.compute_report(top: [project.facts.first.create_metric], left: ['committed_on.date'])
  # prints
  #
  # [01/01/2001 | 3.0]
  # [12/01/2001 | 3.0]
  # [12/24/2014 | 3.0]

  # if you try to load a differen format it will fail
  data = [['fact.commits.lines_changed', 'label.commits.name', 'committed_on'],
          [1, 'tomas', '2001-01-01']]
  project.upload(data, blueprint, 'dataset.commits')

  # You can specify a different date format
  blueprint = GoodData::Model::ProjectBlueprint.build('Acme project') do |p|
    p.add_date_dimension('committed_on')

    p.add_dataset('dataset.commits') do |d|
      d.add_anchor('attr.commits.id')
      d.add_fact('fact.commits.lines_changed')
            d.add_attribute('attr.commits.name')
      d.add_label('label.commits.name', reference: 'attr.commits.name')
      d.add_date('committed_on', :dataset => 'committed_on', format: 'yyyy-dd-MM')
    end
  end

  data = [
    ['fact.commits.lines_changed', 'label.commits.name', 'committed_on'],
    [3, 'tomas', '2001-01-01'],
    [3, 'petr', '2001-01-12'],
    [3, 'jirka', '2014-24-12']]
  project.upload(data, blueprint, 'dataset.commits')
  puts project.compute_report(top: [project.facts.first.create_metric], left: ['committed_on.date'])
  # prints
  #
  # [01/01/2001 | 3.0]
  # [12/01/2001 | 3.0]
  # [12/24/2014 | 3.0]
end

Note couple of things We did not have to update the project to be able to load dates in a different format. Date format information is used only during the data upload and the model is unaffected. This is something to think about when you are inferring the blueprint from the model using project.blueprint. This information is not persisted in the project.

Adding New LDM Fields to Project

by Tomas Svarovsky

Problem

You would like to update a project with additional fields

Solution

Blueprints are easy to create programmatically and even to merge them together. In this example we will create a simple blueprint with one dataset and then add an additional field into blueprint and update the model.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  blueprint = GoodData::Model::ProjectBlueprint.build('Acme project') do |p|
    p.add_date_dimension('committed_on')

    p.add_dataset('dataset.commits') do |d|
      d.add_anchor('attr.commits.id')
      d.add_fact('fact.commits.lines_changed')
            d.add_attribute('attr.commits.name')
      d.add_label('label.commits.name', reference: 'attr.commits.name')
      d.add_date('committed_on', :dataset => 'committed_on')
    end
  end

  project = client.create_project_from_blueprint(blueprint, auth_token: 'token')

  update = GoodData::Model::ProjectBlueprint.build("update") do |p|
    p.add_dataset("dataset.commits") do |d|
      d.add_attribute("attr.commits.repo")
      d.add_label('label.commits.repo', reference: 'attr.commits.repo')
    end
  end

  # update the model in the project
  project.update_from_blueprint(blueprint.merge(update))

  # now you can look at the model and verify there is new attribute present
  project.attributes('attr.commits.repo')
end

Discussion

In the example above we have created a new model and updated it right away. This is not a typical situation however and there are couple of things that you need to be aware of.

It is more common that you would like to gradually update the model in an existing project as the new requirements arrive. For that to be possible you have to take the "old" blueprint from some place where it is persisted. We will show a basic way how to save a blueprint to a file.

# encoding: utf-8

require 'gooddata'

BLUEPRINT_FILE = 'blueprint_file.json'


GoodData.with_connection do |c|
  GoodData.with_project('project_id') do |project|
    blueprint = GoodData::Model::ProjectBlueprint.from_json(BLUEPRINT_FILE)
    update = GoodData::Model::ProjectBlueprint.build('update') do |p|
      p.add_dataset('repos') do |d|
        d.add_attribute('region')
      end
    end
    new_blueprint = blueprint.merge(update)
    unless new_blueprint.valid?
            pp new_blueprint.validate
            fail "New blueprint is not valid"
    end
    project.update_from_blueprint(new_blueprint)
    # now you can look at the model and verify there is new attribute present
    project.attributes.find {|a| a.title == 'Region'}
    new_blueprint.store_to_file(BLUEPRINT_FILE)

  end
end

Specifying Fields Data Type

by Tomas Svarovsky

Problem

You would like to specify a different data type for attribute or fact in a blueprint

Solution

Each column in blueprint is eventually translated into a physical column in a database. While the defaults are typically what you want sometimes it might be useful to override them. You can specify the data type with gd_data_type clause.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  blueprint = GoodData::Model::ProjectBlueprint.build('Acme project') do |p|
    p.add_date_dimension('committed_on')

    p.add_dataset('dataset.commits') do |d|
      d.add_anchor('attr.commits.id')
      d.add_fact('fact.commtis.lines_changed', gd_data_type: 'integer')
            d.add_attribute('attr.commits.name')
      d.add_label('label.commits.name', reference: 'attr.commits.name')
      d.add_date('committed_on', :format => 'dd/MM/yyyy')
    end

    project = client.create_project_from_blueprint(blueprint, auth_token: 'token')

    # This is going to fail since we are trying to upload 1.2 into INT numeric type
    data = [['fact.commtis.lines_changed', 'label.commits.name', 'committed_on'],
          [1.2, 'tomas', '01/01/2001']]
    project.upload(data, blueprint, 'dataset.commits')

    # This is going to pass since we are trying to upload 1 into INT numeric type
    data = [['fact.commtis.lines_changed', 'label.commits.name', 'committed_on'],
          [1, 'tomas', '01/01/2001']]
    project.upload(data, blueprint, 'dataset.commits')
  end
end

Discussion

These data types are currently supported on the platform

  • DECIMAL(m, d)

  • INTEGER

  • LONG

  • VARCHAR(n)

The case where this is very useful are

  • if you use values from smaller domain (for example integers) you can leverage appropriate data type to save space and speed things up

  • if you are using facts with atypical precision (the default is DECIMAL(12,2)) you can leverage decimal type with larger precision

Working with Folders

by Tomas Svarovsky

Problem

You would like to use folders for organizing your project’s model

Solution

By default all the attributes and facts are automatically assigned a folder based on it’s dataset name. This name is either generated from the dataset’s name or title (if defined). You can override this default and specify your own folder for any field.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  blueprint = GoodData::Model::ProjectBlueprint.build("my_blueprint") do |p|
    p.add_dataset('dataset.reps', title: 'Awesome Sales Reps') do |d|
      d.add_anchor('attr.reps.id')
      d.add_label('label.reps.id', reference: 'attr.reps.id')
      d.add_attribute('attr.reps.name')
      d.add_label('label.reps.name', reference: 'attr.reps.name')
    end

    p.add_dataset('dataset.regions') do |d|
      d.add_anchor('attr.regions.id')
      d.add_label('label.regions.id', reference: 'attr.regions.id')
      d.add_attribute('attr.regions.name')
      d.add_label('label.regions.name', reference: 'attr.regions.name')
    end

    p.add_dataset('dataset.opportunities') do |d|
      d.add_anchor('attr.opportunities.id')
      d.add_fact('fact.amount', folder: 'My Special Folder')
      d.add_reference('dataset.reps')
      d.add_reference('dataset.regions')
    end
  end

  project = client.create_project_from_blueprint(blueprint, auth_token: 'token_id')

  # Currently there is not support in SDK to directly explore folders but we can reach to API directly
  # You can also go to the project in your browser and look for folders there
  client.get("#{project.md['query']}/dimensions")['query']['entries'].map {|i| i['title']} # => ["Dataset.Opportunities", "Awesome Sales Reps", "Dataset.Regions"]

  client.get("#{project.md['query']}/folders")['query']['entries'].map {|i| i['title']} # => ["My Special Folder"]
end

Discussion

Folders are not removed. If you would publish a model that doesn’t contain an existing folder, the folder isn’t automatically removed (it is empty).

Creating Project in One Page of Code

by Tomas Svarovsky

Problem

You would like to create the whole project from code for whatever reason.

Prerequisites

You have a provisioning token for project creation

Solution

What we will do is to create a simple project with 4 datasets. Load couple of line of data create a simple report and invite 2 other people to it. All this will fit on one page of code. Let’s get to it.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection('user', 'password') do |client|
  blueprint = GoodData::Model::ProjectBlueprint.build('Acme project') do |p|
    p.add_date_dimension('committed_on')
    p.add_dataset('devs') do |d|
      d.add_anchor('attr.dev')
      d.add_label('label.dev_id', :reference => 'attr.dev')
      d.add_label('label.dev_email', :reference => 'attr.dev')
    end
    p.add_dataset('commits') do |d|
      d.add_anchor('attr.commits_id')
      d.add_fact('fact.lines_changed')
      d.add_date('committed_on')
      d.add_reference('devs')
    end
  end
  project = GoodData::Project.create_from_blueprint(blueprint, auth_token: '')
  puts "Created project #{project.pid}"

  # Load data
  commits_data = [
    ['fact.lines_changed', 'committed_on', 'devs'],
    [1, '01/01/2014', 1],
    [3, '01/02/2014', 2],
    [5, '05/02/2014', 3]]
  project.upload(commits_data, blueprint, 'commits')

  devs_data = [
    ['label.dev_id', 'label.dev_email'],
    [1, 'tomas@gooddata.com'],
    [2, 'petr@gooddata.com'],
    [3, 'jirka@gooddata.com']]
  project.upload(devs_data, blueprint, 'devs')

  # create a metric
  metric = project.facts('fact.lines_changed').create_metric
  metric.save
  report = project.create_report(title: 'Awesome_report', top: [metric], left: ['label.dev_email'])
  report.save
  ['john@example.com'].each do |email|
    p.invite(email, 'admin', "Guys checkout this report #{report.browser_uri}")
  end
end

Using Attribute Types when Creating Project

by Tomas Korcak

Problem

You want to specify certain label’s data type see this document for more details.

Prerequisites

You have a provisioning token for project creation.

Solution

We’ll create a simple project’s blueprint that contains a label with the 'GDC.link' (hyperlink) datatype in the code snippet below.

Common Types
  • GDC.link

  • GDC.text

  • GDC.time

Types for Geo
  • GDC.geo.pin (Geo pushpin)

  • GDC.geo.ausstates.name (Australia States (Name))

  • GDC.geo.ausstates.code (Australia States (ISO code))

  • GDC.geo.usstates.name (US States (Name))

  • GDC.geo.usstates.geo_id (US States (US Census ID))

  • GDC.geo.usstates.code (US States (2-letter code))

  • GDC.geo.uscounties.geo_id (US Counties (US Census ID))

  • GDC.geo.worldcountries.name (World countries (Name))

  • GDC.geo.worldcountries.iso2 (World countries (ISO a2))

  • GDC.geo.worldcountries.iso3 (World countries (ISO a3))

  • GDC.geo.czdistricts.name (Czech Districts (Name))

  • GDC.geo.czdistricts.name_no_diacritics (Czech Districts)

  • GDC.geo.czdistricts.nuts4 (Czech Districts (NUTS 4))

  • GDC.geo.czdistricts.knok (Czech Districts (KNOK))

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  blueprint = GoodData::Model::ProjectBlueprint.build('Beers Project') do |p|
    p.add_date_dimension('created_at')

    # Add Breweries Dataset
    p.add_dataset('dataset.breweries', title: 'Breweries') do |d|
      d.add_anchor('attr.breweries.brewery_id', title: 'Brewery ID')
      d.add_label('label.breweries.brewery_id.brewery_id', title: 'Brewery ID', :reference => 'attr.breweries.brewery_id')
      d.add_label('label.breweries.brewery_id.name', title: 'Brewery Name', :reference => 'attr.breweries.brewery_id')
      d.add_label('label.breweries.brewery_id.link', title: 'Brewery URL', :reference => 'attr.breweries.brewery_id', :gd_type => 'GDC.link') # <--- Notice this!
      d.add_date('created_at', :dataset => 'created_at')
    end
  end

  project = GoodData::Project.create_from_blueprint(blueprint, auth_token: 'YOUR_TOKEN_HERE')
  puts "Created project #{project.pid}"

  GoodData::with_project(project) do |p|
    # Load Brewery Data
    data = [
      %w(label.breweries.brewery_id.brewery_id label.breweries.brewery_id.name label.breweries.brewery_id.link created_at),
      [1, '21st Amendment Brewery', 'http://21st-amendment.com/', '06/23/2015'],
      [2, 'Almanac Beer Company', 'http://www.almanacbeer.com/', '06/23/2015'],
      [3, 'Anchor Brewing Company', 'http://www.anchorbrewing.com/', '06/23/2015'],
      [4, 'Ballast Point Brewing Company', 'http://www.ballastpoint.com/', '06/23/2015'],
      [5, 'San Francisco Brewing Company', 'http://www.ballastpoint.com/', '06/23/2015'],
      [6, 'Speakeasy Ales and Lagers', 'http://www.goodbeer.com/', '06/23/2015']
    ]
    GoodData::Model.upload_data(data, blueprint, 'dataset.breweries')
  end
end

Moving fields in blueprint

by Tomas Svarovsky

Problem

I created a model and now found out that I would like to move a field to different dataset.

Solution

We will use variation of our hr demo model. If you look at that model you can see that we have an attribute region defined on departments dataset. This is how you (hypothetically) originally created the project and everything was fine but you realized that you would like to assign the region to people dataset not departments dataset.

Before we start changing things spin up the project go inside the project and create a report which show SUM(Amount) sliced by Region.

Now let’s move the attribute between datasets.

client = GoodData.connect
project = client.projects('PROJECT_ID')

blueprint = project.blueprint
blueprint.move!('attr.department.region', 'dataset.department', 'dataset.employee')
project.update_from_blueprint(blueprint)

Since we moved the attribute we have to load new data for it in the context of new dataset. The old dataset (departments) is fine since we just removed a column.

employee_data_with_dep = [
    ['label.employee.id','label.employee.fname','label.employee.lname','dataset.department', 'label.department.region'],
    ['e1','Sheri','Nowmer','d1', 'North America'],
    ['e2','Derrick','Whelply','d2', 'Europe']
]
project.upload(employee_data_with_dep, blueprint, 'dataset.employee')

Now go ahead and check the original report. Yes, it is still working fine. It gives different numbers since we changed the meaning of it but we did not break anything.

Working with HR Demo project

by Tomas Svarovsky

Problem

I would like to play with blueprints but I cannot come up with some data and model of a ptoy project. Can you give me any?

Solution

Yes. Use our HR example which models a small company and people inside it. We tried to use majority of the features that the blueprint builder supports so you can take it as a starting point when you are creating yours.

# encoding: utf-8

require 'gooddata'

# Connect to platform (using credentials in ~/.gooddata)
GoodData.with_connection do |client|

  # Create LDM blueprint
  blueprint = GoodData::Model::ProjectBlueprint.build('HR Demo Project') do |p|
    p.add_date_dimension('dataset.payment', title: 'Payment')

    p.add_dataset('dataset.department', title: 'Department', folder: 'Department & Employee') do |d|
      d.add_anchor('attr.department.id', title: 'Department ID')
      d.add_label('label.department.id', reference:'attr.department.id', title: 'Department ID')
      d.add_label('label.department.name', reference: 'attr.department.id', title: 'Department Name')
      d.add_attribute('attr.department.region', title: 'Department Region')
      d.add_label('label.department.region', reference: 'attr.department.region', title: 'Department Region')
    end

    p.add_dataset('dataset.employee', title: 'Employee', folder: 'Department & Employee') do |d|
      d.add_anchor('attr.employee.id', title: 'Employee ID')
      d.add_label('label.employee.id', title: 'Employee ID', reference:'attr.employee.id')
      d.add_label('label.employee.fname', title: 'Employee Firstname', reference:'attr.employee.id')
      d.add_label('label.employee.lname', title: 'Employee Lastname', reference:'attr.employee.id')
      d.add_reference('dataset.department')
    end

    p.add_dataset('dataset.salary', title: 'Salary') do |d|
      d.add_anchor('attr.salary.id', title: 'Salary ID', folder: 'Salary')
      d.add_label('label.salary.id', reference:'attr.salary.id', title: 'Salary ID', folder: 'Salary')
      d.add_fact('fact.salary.amount', title: 'Amount', folder: 'Salary')
      d.add_date('dataset.payment', format: 'yyyy-MM-dd')
      d.add_reference('dataset.employee')
    end
  end

  # Create new project (datamart)
  project = GoodData::Project.create_from_blueprint(blueprint, auth_token: token)
  puts "Created project #{project.pid}"

  # Load data
  department_data = [
      ['label.department.id','label.department.name', 'label.department.region'],
      ['d1','HQ General Management', 'North America'],
      ['d2','HQ Information Systems', 'Europe']
  ]
  project.upload(department_data, blueprint, 'dataset.department')

  employee_data = [
      ['label.employee.id','label.employee.fname','label.employee.lname','dataset.department', 'label.department.region'],
      ['e1','Sheri','Nowmer','d1', 'North America'],
      ['e2','Derrick','Whelply','d2', 'Europe']
  ]
  project.upload(employee_data, blueprint, 'dataset.employee')

  employee_data_with_dep = [
      ['label.employee.id','label.employee.fname','label.employee.lname','dataset.department', 'label.department.region'],
      ['e1','Sheri','Nowmer','d1', 'North America'],
      ['e2','Derrick','Whelply','d2', 'Europe']
  ]
  project.upload(employee_data_with_dep, blueprint, 'dataset.employee')



  salary_data = [
      ['label.salary.id','dataset.employee','fact.salary.amount','dataset.payment'],
      ['s1','e1','10230','2006-01-01'], ['s2','e2','4810','2006-01-01'], ['s617','e1','10230','2006-02-01'],
      ['s618','e2','4810','2006-02-01'], ['s1233','e1','10230','2006-03-01'], ['s1234','e2','4810','2006-03-01'],
      ['s1849','e1','10230','2006-04-01'], ['s1850','e2','4810','2006-04-01'], ['s2465','e1','10230','2006-05-01'],
      ['s2466','e2','4810','2006-05-01'], ['s3081','e1','10230','2006-06-01'], ['s3082','e2','4810','2006-06-01'],
      ['s3697','e1','10230','2006-07-01'], ['s3698','e2','4810','2006-07-01'], ['s4313','e1','10230','2006-08-01'],
      ['s4314','e2','4810','2006-08-01'], ['s4929','e1','10230','2006-09-01'], ['s4930','e2','4810','2006-09-01'],
      ['s5545','e1','10230','2006-10-01'], ['s5546','e2','4810','2006-10-01'], ['s6161','e1','10230','2006-11-01'],
      ['s6162','e2','4810','2006-11-01'], ['s6777','e1','10230','2006-12-01'], ['s6778','e2','4810','2006-12-01'],
      ['s7393','e1','10440','2007-01-01'], ['s7394','e2','5020','2007-01-01'], ['s8548','e1','10440','2007-02-01'],
      ['s8549','e2','5020','2007-02-01'], ['s9703','e1','10440','2007-03-01'], ['s9704','e2','5020','2007-03-01'],
      ['s10858','e1','10440','2007-04-01'], ['s10859','e2','5020','2007-04-01'], ['s12013','e1','10440','2007-05-01'],
      ['s12014','e2','5020','2007-05-01'], ['s13168','e1','10440','2007-06-01'], ['s13169','e2','5020','2007-06-01'],
      ['s14323','e1','10440','2007-07-01'], ['s14324','e2','5020','2007-07-01'], ['s15478','e1','10440','2007-08-01'],
      ['s15479','e2','5020','2007-08-01'], ['s16633','e1','10440','2007-09-01'], ['s16634','e2','5020','2007-09-01'],
      ['s17788','e1','10440','2007-10-01'], ['s17789','e2','5020','2007-10-01'], ['s18943','e1','10440','2007-11-01'],
      ['s18944','e2','5020','2007-11-01'], ['s20098','e1','10440','2007-12-01'], ['s20099','e2','5020','2007-12-01']
  ]
  project.upload(salary_data, blueprint, 'dataset.salary')
end

Refactoring datasets

by Tomas Svarovsky

Problem

You created a quick protoype but you found out that it needs some touch ups.

Notes

This is work in progress.

Solution

Use SDK refactoring features.

Let’s have a look at two hypothetical but very common scenarios that you probably encountered in during career.

"One Dataset problem"

Lets' say you have model like this

blueprint = GoodData::Model::ProjectBlueprint.build('Not so great project') do |p|
  p.add_dataset('dataset.reps', title: 'Sale Reps') do |d|
    d.add_anchor('attr.reps.id', title: 'Sales Rep')
    d.add_label('label.reps.id', reference: 'attr.reps.id', title: 'Sales Rep Name')
  end

  p.add_dataset('dataset.regions', title: 'Sale Reps') do |d|
    d.add_anchor('attr.regions.id', title: 'Sales Region')
    d.add_label('label.regions.id', reference: 'attr.regions.id', title: 'Sales Rep Name')
  end

  p.add_dataset('dataset.sales', title: 'Department') do |d|
    d.add_anchor('attr.sales.id', title: 'Sale Id')
    d.add_label('label.sales.id', reference: 'attr.sales.id', title: 'Sales tracking number')
    d.add_fact('fact.sales.amount', title: 'Amount')
    d.add_attribute('attr.sales.stage', title: 'Stage')
    d.add_label('label.sales.stage', title: 'Stage Name', reference:'attr.sales.stage')
    d.add_reference('dataset.regions')
    d.add_reference('dataset.reps')
  end
end

There is one problem. We should definitely extract the attribute from 'dataset.sales' dataset somewhere else. Also the anchor for this dataset has a label. Unless we do not have specific reason for it we should extract it somewhere else.

We can try to ask SDK to help us

refactored_blueprint = blueprint.refactor_split_df('dataset.sales')

# Let's have a look around
refactored_blueprint.datasets.map(&:title)

refactored_blueprint.datasets.map {|d| [d.title, d.id]}
=> [["Sale Reps", "dataset.reps"],
    ["Sale Reps", "dataset.regions"],
    ["Department", "dataset.sales"],
    ["Dataset.Sales Dim", "dataset.sales_dim"]]

# So there is a new dataset
# If we print it out in repl
refactored_blueprint.datasets('dataset.sales_dim')
# prints
#
# {
#   :type=>:dataset,
#   :id=>"dataset.sales_dim",
#   :columns=>
#    [{:type=>:anchor, :id=>"vymysli_id"},
#     {:type=>:label, :id=>"label.vymysli_id", :reference=>"vymysli_id"},
#     {:type=>:attribute, :id=>"attr.sales.stage", :title=>"Stage"},
#     {:type=>:label,
#      :id=>"label.sales.stage",
#      :title=>"Stage Name",
#      :reference=>"attr.sales.stage"}]}

You can see that there is stage attribute right there. And it prepared an anchor for us. The naming definitely needs touch ups (User should be able to specify the ids somehow) but the structure is there. Now let’s have a look what happened to sales dataset

refactored_blueprint.datasets('dataset.sales')
# prints
#
# {:id=>"dataset.sales",
#    :type=>:dataset,
#    :columns=>
#     [{:type=>:anchor, :id=>"attr.sales.id", :title=>"Sale Id"},
#      {:type=>:label,
#       :id=>"label.sales.id",
#       :reference=>"attr.sales.id",
#       :title=>"Sales tracking number"},
#      {:type=>:fact, :id=>"fact.sales.amount", :title=>"Amount"},
#      {:type=>:reference, :dataset=>"dataset.regions"},
#      {:type=>:reference, :dataset=>"dataset.reps"},
#      {:type=>:reference, :dataset=>"dataset.sales_dim"}]

You can see that the attribute is gone with labels. Only facts remained. New reference was added so the reports should still be working. This might seems just like a minor thing but once you start creating more complex models with multiple stars you find this techique a necessity so why not automate it.

Multiple facts in one dataset

Another problem we will look at is splitting fact tables because of facts. Conside this model

blueprint = GoodData::Model::ProjectBlueprint.build('Not so great project') do |p|

  p.add_dataset('dataset.orders_dim', title: 'Orders Dimension') do |d|
    d.add_anchor('attr.orders_dim.id', title: 'Order')
    d.add_label('label.dataset.orders_dim.id', reference: 'attr.orders_dim.id', title: 'Order Id')
  end

  p.add_dataset('dataset.orders_fact', title: 'Orders Fact Table') do |d|
    d.add_anchor('attr.orders_fact.id', title: 'Sales Rep')
    d.add_fact('fact.dataset.orders_fact.amount_ordered', title: 'Amount Ordered')
    d.add_fact('fact.dataset.orders_fact.amount_shipped', title: 'Amount Shipped')
    d.add_reference('dataset.orders_dim')
  end
end

What you want to do is have a look at how many shipments were ordered and shipped on particular day. But if you keep the facts in one dataset you will have all kinds of trouble with nil values. Much better is to split the fact tables in two. Again we can try doing that with SDK

Note: it seems it currently does not work with date references. We need to update that so I ommitted it in the example so it works

# you define which dataset you would like to split. Secnd parameter is list of facts you would like to move and the last one is the id of the new dataset
refactored_blueprint = blueprint.refactor_split_facts('dataset.orders_fact', ['fact.dataset.orders_fact.amount_shipped'], 'dataset.orders_shipped_fact')

# Again Let's explore
refactored_blueprint.datasets.count # => 3

refactored_blueprint.datasets.map {|d| [d.title, d.id]}
# => [["Orders Dimension", "dataset.orders_dim"],
#     ["Orders Fact Table", "dataset.orders_fact"],
#     ["Dataset.Orders Shipped Fact", "dataset.orders_shipped_fact"]]

# There is a new dataset "dataset.orders_shipped_fact"
refactored_blueprint.datasets('dataset.orders_shipped_fact')
# prints
#
# {
#   :id=>"dataset.orders_shipped_fact",
#   :type=>:dataset,
#   :columns=> [
#     {:type=>:anchor, :id=>"dataset.orders_shipped_fact.id"},
#     {:type=>:fact,
#      :id=>"fact.dataset.orders_fact.amount_shipped",
#      :title=>"Amount Shipped"},
#     {:type=>:reference, :dataset=>"dataset.orders_dim"}]}

These are 2 basic ways how to refactor a blueprint in an assisted and automated fashion.

Reconnect date dimension

by Tomas Svarovsky

Problem

Occasionally you need to reconnect date dimensions. You did all the work on reports and last thing you are missing is to change all references in the model from one date dimension to another.

Solution

With SDK you can use swap_date_dimension! method on blueprint. I will give you two examples one will be with a sample blueprint created on the fly the second will show you how to do the same on an existing project.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  blueprint = GoodData::Model::ProjectBlueprint.build('Acme project') do |p|
    p.add_date_dimension('committed_on')
    p.add_date_dimension('signed_on')

    p.add_dataset('dataset.commits') do |d|
      d.add_anchor('attr.commits.id')
      d.add_fact('fact.commits.lines_changed')
            d.add_attribute('attr.commits.name')
      d.add_label('label.commits.name', reference: 'attr.commits.name')
      d.add_date('committed_on', :format => 'dd/MM/yyyy')
    end
  end

  # Let's check that there are some references really pointing to committed_on dimension
  # and none to signed_on dimension

  blueprint.datasets.flat_map(&:references).map(&:reference).include?('committed_on')
  # => true
  blueprint.datasets.flat_map(&:references).map(&:reference).include?('signed_on')
  # => false

  # let's swap all the references
  blueprint.swap_date_dimension!('committed_on', 'signed_on')

  # Now if we check we see that there is no reference to committed_on dimension
  blueprint.datasets.flat_map(&:references).map(&:reference).include?('committed_on')
  # => false
  blueprint.datasets.flat_map(&:references).map(&:reference).include?('signed_on')
  # => true
end

The change in operating on an existing project is the same. The only difference is how you acquire the blueprint.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|
  GoodData.with_project('project_id') do |project|
    blueprint = project.blueprint

    blueprint.swap_date_dimension!('committed_on', 'signed_on')

    # Update the project with the new blueprint
    project.update_from_blueprint(blueprint)
  end
end

Taking portion of a project model

by Tomas Svarovsky

Problem

You have a project and you would like to create a new on with the same model but only portion of it.

Solution

Blueprints can be easily manipulated because undernaeth they are just hashes of data. Worst case you can always manipulate the hash and create the blueprint out of that. In some cases there are helper methods to make the code a little bit cleaner. In this example we will show an example where we will have a blueprint with 2 datasets and we will remove one of them so the new project will have just portion of the model.

# encoding: utf-8

require 'gooddata'

GoodData.with_connection do |client|

  # This is a blueprint
  # We could grab it from live project via project.blueprint but we will just
  # create it inline so we do not have to spin up additional project

  blueprint = GoodData::Model::ProjectBlueprint.build('Acme project') do |p|
    p.add_date_dimension('committed_on')

    p.add_dataset('dataset.commits') do |d|
      d.add_anchor('attr.commits.id')
      d.add_fact('fact.commits.lines_changed')
            d.add_attribute('attr.commits.name')
      d.add_label('label.commits.name', reference: 'attr.commits.name')
      d.add_date('committed_on', :dataset => 'committed_on')
    end

    p.add_dataset('dataset.commits2') do |d|
      d.add_anchor('attr.commits.id2')
      d.add_fact('fact.commits.lines_changed2')
            d.add_attribute('attr.commits.name2')
      d.add_label('label.commits.name2', reference: 'attr.commits.name2')
      d.add_date('committed_on', :dataset => 'committed_on')
    end
  end

  # Now we need to manipulate it so it contains only portion of the model we want.
  # In this case we want just dataset 'dataset.commits'.
  # Take note that the project can be succesfully created only from a valid blueprint
  # You can always check if the blueprint is valid by calling blueprint.valid?
  #

  # Let's remove the dataset
  # We are going to user method remove_dataset! which means the blueprint will be changed in place
  # You can also use remove_dataset in which case a new blueprint will be created and the old one will
  # not be touched
  blueprint.remove_dataset!('dataset.commits2')

  # Let's create a project based on the
  project = client.create_project_from_blueprint(blueprint, auth_token: 'token')

  # You can verify that the created project has only two datasets
  project.datasets.map(&:identifier)
  # => ["dataset.commits", "committed_on.dataset.dt"]

end

13 Working With Lifecycle

Working with segments

by Tomas Svarovsky

Problem

You would like to setup amd execute a lifecycle management configuration to distribute new versions of dashboards, reports, and metrics from a master template to multiple target projects.

Solution

You can use lifecycle management API to create lifecycle segment, associate it with its master template project, and populate the segment with target projects that are going to receive the new versions of objects from the master template. First let’s align on the terminology.

Organization - Organization is an object that contains all users and projects that you or your company have provisoned to the GoodData platform. Organization can also contain whitelabeling configuration (your custom colors, logos, URLs etc.). Lifecycle management segments and projects are always set up within an organization.

Lifecycle segment represents a group of projects that contain the same analytical objects. New versions of these objects can be distributed from so called master template project to the segment’s projects. The segment’s projects can also contain ad-hoc objects (e.g. reports or metrics) that are not touched (not updated nor deleted) by the lifecycle processes. Lifecycle segments are usually used to represent a different tiers (e.g. bronze/silver/gold) tiers of an analytical solution.

Master template is a project that contains the latest and greatest versions of objects that are distributed to the segment’s projects. We recommend to keep the master’s project immutable once it gets associated with a segment. So a new master template project should be cloned from the previous master and associated with the segment for each new version of your solution. Lifecycle segment can be associated with just one master at any given time.

Client - is user-assigned ID of a segment’s project. The client ID is associated with a GoodData generated project ID during segment provisioning operation. The client ID is stable identification of certain tenant’s project. It can be associated with different project IDs during it’s lifecycle.

Having defined some terminology we present code that will do several things.

  • sets up a segment with master project (along with some data and dashboard)

  • creates a client within a segment. The client is not yet associated with a project at this time.

  • releases new master project

  • provisions new project for the client (this will ensure client’s project has the same model, dashboard but no data )

  • changes the dashboard in master

  • makes another release

  • synchronizes the clients with updated version

  • adds a client

  • provisions new project for the second client too

require 'gooddata'

TOKEN = 'token'
PASSWORD = 'pass'

# Connect to GoodData a specific organization's (aka domain's) URL
client = GoodData.connect('mustang@gooddata.com', PASSWORD, server: 'https://mustangs.intgdc.com', verify_ssl: false )
# Organization (aka domain)
domain = client.domain('mustangs')

seq_number = 8

# Prepare master project
# ======================

# Create LDM blueprint
blueprint = GoodData::Model::ProjectBlueprint.build('HR Demo Project') do |p|
  p.add_date_dimension('dataset.payment', title: 'Payment')

  p.add_dataset('dataset.department', title: 'Department', folder: 'Department & Employee') do |d|
    d.add_anchor('attr.department.id', title: 'Department ID')
    d.add_label('label.department.id', reference:'attr.department.id', title: 'Department ID')
    d.add_label('label.department.name', reference: 'attr.department.id', title: 'Department Name')
    d.add_attribute('attr.department.region', title: 'Department Region')
    d.add_label('label.department.region', reference: 'attr.department.region', title: 'Department Region')
  end

  p.add_dataset('dataset.employee', title: 'Employee', folder: 'Department & Employee') do |d|
    d.add_anchor('attr.employee.id', title: 'Employee ID')
    d.add_label('label.employee.id', title: 'Employee ID', reference:'attr.employee.id')
    d.add_label('label.employee.fname', title: 'Employee Firstname', reference:'attr.employee.id')
    d.add_label('label.employee.lname', title: 'Employee Lastname', reference:'attr.employee.id')
    d.add_reference('dataset.department')
  end

  p.add_dataset('dataset.salary', title: 'Salary') do |d|
    d.add_anchor('attr.salary.id', title: 'Salary ID', folder: 'Salary')
    d.add_label('label.salary.id', reference:'attr.salary.id', title: 'Salary ID', folder: 'Salary')
    d.add_fact('fact.salary.amount', title: 'Amount', folder: 'Salary')
    d.add_date('dataset.payment', format: 'yyyy-MM-dd')
    d.add_reference('dataset.employee')
  end
end

# Create the master template project
project = client.create_project_from_blueprint(blueprint, auth_token: TOKEN)
puts "Created master template project #{project.pid}"

# Load data
department_data = [
    ['label.department.id','label.department.name', 'label.department.region'],
    ['d1','HQ General Management', 'North America'],
    ['d2','HQ Information Systems', 'Europe']
]
project.upload(department_data, blueprint, 'dataset.department')

employee_data_with_dep = [
    ['label.employee.id','label.employee.fname','label.employee.lname','dataset.department', 'label.department.region'],
    ['e1','Sheri','Nowmer','d1', 'North America'],
    ['e2','Derrick','Whelply','d2', 'Europe']
]
project.upload(employee_data_with_dep, blueprint, 'dataset.employee')

salary_data = [
    ['label.salary.id','dataset.employee','fact.salary.amount','dataset.payment'],
    ['s1','e1','10230','2006-01-01'], ['s2','e2','4810','2006-01-01'], ['s617','e1','10230','2006-02-01'],
    ['s618','e2','4810','2006-02-01'], ['s1233','e1','10230','2006-03-01'], ['s1234','e2','4810','2006-03-01'],
    ['s1849','e1','10230','2006-04-01'], ['s1850','e2','4810','2006-04-01'], ['s2465','e1','10230','2006-05-01'],
    ['s2466','e2','4810','2006-05-01'], ['s3081','e1','10230','2006-06-01'], ['s3082','e2','4810','2006-06-01'],
    ['s3697','e1','10230','2006-07-01'], ['s3698','e2','4810','2006-07-01'], ['s4313','e1','10230','2006-08-01'],
    ['s4314','e2','4810','2006-08-01'], ['s4929','e1','10230','2006-09-01'], ['s4930','e2','4810','2006-09-01'],
    ['s5545','e1','10230','2006-10-01'], ['s5546','e2','4810','2006-10-01'], ['s6161','e1','10230','2006-11-01'],
    ['s6162','e2','4810','2006-11-01'], ['s6777','e1','10230','2006-12-01'], ['s6778','e2','4810','2006-12-01'],
    ['s7393','e1','10440','2007-01-01'], ['s7394','e2','5020','2007-01-01'], ['s8548','e1','10440','2007-02-01'],
    ['s8549','e2','5020','2007-02-01'], ['s9703','e1','10440','2007-03-01'], ['s9704','e2','5020','2007-03-01'],
    ['s10858','e1','10440','2007-04-01'], ['s10859','e2','5020','2007-04-01'], ['s12013','e1','10440','2007-05-01'],
    ['s12014','e2','5020','2007-05-01'], ['s13168','e1','10440','2007-06-01'], ['s13169','e2','5020','2007-06-01'],
    ['s14323','e1','10440','2007-07-01'], ['s14324','e2','5020','2007-07-01'], ['s15478','e1','10440','2007-08-01'],
    ['s15479','e2','5020','2007-08-01'], ['s16633','e1','10440','2007-09-01'], ['s16634','e2','5020','2007-09-01'],
    ['s17788','e1','10440','2007-10-01'], ['s17789','e2','5020','2007-10-01'], ['s18943','e1','10440','2007-11-01'],
    ['s18944','e2','5020','2007-11-01'], ['s20098','e1','10440','2007-12-01'], ['s20099','e2','5020','2007-12-01']
]
project.upload(salary_data, blueprint, 'dataset.salary')

# Create a report within the master template project
metric = project.facts('fact.salary.amount').create_metric
metric.save
report = project.create_report(title: 'My report', left: ['label.department.name'], top: [metric])

# Create a dashboard within the master template project
dashboard = project.create_dashboard(:title => 'Test Dashboard')
tab = dashboard.create_tab(:title => 'Tab Title #1')
tab.title = 'Test #42'
item = tab.add_report_item(:report => report, :position_x => 10, :position_y => 20)
item.position_x = 400
item.position_y = 300
dashboard.lock
dashboard.save

# Create new lifecycle segment
# ============================

segment = domain.create_segment(segment_id: "segment_#{seq_number}", master_project: project)

# Release the segment
# ===================
segment.synchronize_clients


# Create new client
# =================
segment_client = segment.create_client(id: "client_#{seq_number}")


# Provision new project for the client
# ====================================
domain.provision_client_projects

# The new project contains all the objects from the master template
segment.clients.first.project.pid
# => aerkc6562oiauaof9mxtowcc4fl5vwb4
segment.clients.first.project.metrics.count
# => 1
segment.clients.first.project.dashboards.first.title
# => 'Test Dashboard'

# The client project should not have any data from master
segment.master_project.metrics.first.execute
# => 0.366E6
segment.clients.first.project.metrics.first.execute
# => nil

# Update master and propagate changes
# ===================================

# Now let's change something in our master.
# Let's change a title in master and transfer to the clients
dashboard.title = 'Better Test Dashboard'
dashboard.save

# Release the segment and synchronize clients
segment.synchronize_clients

# Check the results in the client's project
segment.clients.first.project.dashboards.first.title
# => "Better Test Dashboard"

# Add additional clients
# ======================
#
# This is it. Just for illustration let's create another client. This basically just means repeating the flow.
# We already have our master prepared so let's just create a new client.
another_segment_client = segment.create_client(id: "client_#{seq_number + 1}")

# currently there should be only one project for the first client
segment.clients.map { |c| [c.id, c.project_uri]}
# => [["client_8", "/gdc/projects/aerkc6562oiauaof9mxtowcc4fl5vwb4"], ["client_9", nil]]

# Let's provision project. This will provision it with project from last call of 'domain.synchronize_clients'
domain.provision_client_projects

# Let's check we have a project
segment.clients.map { |c| [c.id, c.project_uri]}
# => [["client_8", "/gdc/projects/aerkc6562oiauaof9mxtowcc4fl5vwb4"], ["client_9", "/gdc/projects/yxpp45hf39bigezp3ug8pm6kc9h6tihv"]]

# Let's also verify that we have a latest version. The new project should contain the updated version of the dashboard beacuse we've already released it via 'synchonize_clients'
segment.clients("client_#{seq_number + 1}").project.dashboards.first.title
# => "Better Test Dashboard"