Loading JSON (and XML) data is possible in WhereScape RED, but does involve a number of steps, and may differ depending on the database platform (some platforms have native JSON &/or XML support while others do not).


Utilising the flexibilty of WhereScape RED along with the power of Windows Powershell 3.0, you can create a load script that connects to a Webservice, retrieves the required data, processes that data and then inserts the data into a standard load table.  To demonstrate this capability we have developed a sample tutorial.


The tutorial uses the https://api.github.com/users/mralexgray/repos webservice as the source for data to load into the table load_json_sample.


If you follow the below, you will end up with a load table named load_json_sample that is populated from the JSON Webservice at  https://api.github.com/users/mralexgray/repos.


Notes

  1. Inserts are done 1000 rows at a time
  2. The load table is created with all columns defined as NVARCHAR(4000) initially
  3. This was written and tested on SQL Server but should only require minor modifications to work on other databases
  4. These scripts will work with very little (if any) modification for many different JSON files

Instructions:

1 - Create a host script object inside RED named create_from_webservice.ps1 and copy the contents of create_from_webservice.ps1 into it

2 - Create a host script object inside RED named load_from_webservice.ps1 and copy the contents of load_from_webservice.ps1 into it

3 - Create a host script object inside RED named script_run_powershell_load and copy the contents of script_run_powershell_load.bat into it

4 - on Windows connection Set Database Server/Home Directory (The Hostname\Instance of your SQL Server database)

5 - on Windows connection Set Database ID (Data Warehouse repository)



6 - Create a new metadata only load table object in WhereScape RED called 'load_json_sample' containing no columns

7 - Set the pre load action to 'no action'

8 - Set the load tables connection to 'Windows'

9 - Set the load type to script based load

10 - Set the load tables script to 'script_run_powershell_load'



11 - Edit the load table source properties and set the source file path to 'https://api.github.com/users/mralexgray/repos'

12 - Edit the load table source properties and set the source file to 'create_from_webservice.ps1'

13 - If your JSON file has a single node which contains all other data, enter the name of this node into the source file field delimiter (not required for this tutorial)



14 - 'Load' the table (this will create the table in the database with the correct columns)

15 - Set the pre load action on the load table back to 'Truncate'



15 - Edit the load table source properties and set the source file to 'load_from_webservice.ps1'



16 - Load the table


Common issues:
- Your windows connection is correctly configured with the database server name and that the work directory is correctly set
If this is run via the scheduler, aborting the job WILL NOT KILL THE SCRIPT.
- The script checks the returned pages headers for the next page of data, if it is found, it will continue to loop around until no more data is returned.
- If your API does not return headers, you can disable them in the script by setting $UseHeaders to $False and it will guess at the next page of data
- The Size limit for an object in powershell is 2GB, if your JSON object (each page) is larger than 2GB after conversion then a memory error will be thrown.
- Some REST APIs can be flaky and return different fields on some pages instead of being uniform. If this occurs, the script can handle most of these scenarios. However, if a new field appears, it will need to be manually added to the table.