Loading JSON (and XML) data is possible in WhereScape RED, but does involve a number of steps, and may differ depending on the database platform (some platforms have native JSON &/or XML support while others do not).
Utilising the flexibilty of WhereScape RED along with the power of Windows Powershell 3.0, you can create a load script that connects to a Webservice, retrieves the required data, processes that data and then inserts the data into a standard load table. To demonstrate this capability we have developed a sample tutorial.
The tutorial uses the https://api.github.com/users/mralexgray/repos webservice as the source for data to load into the table load_json_sample.
If you follow the below, you will end up with a load table named load_json_sample that is populated from the JSON Webservice at https://api.github.com/users/mralexgray/repos.
Notes
- Inserts are done 1000 rows at a time
- The load table is created with all columns defined as NVARCHAR(4000) initially
- This was written and tested on SQL Server but should only require minor modifications to work on other databases
- These scripts will work with very little (if any) modification for many different JSON files
Instructions:
1 - Create a host script object inside RED named create_from_webservice.ps1 and copy the contents of create_from_webservice.ps1 into it
2 - Create a host script object inside RED named load_from_webservice.ps1 and copy the contents of load_from_webservice.ps1 into it
3 - Create a host script object inside RED named script_run_powershell_load and copy the contents of script_run_powershell_load.bat into it
4 - on Windows connection Set Database Server/Home Directory (The Hostname\Instance of your SQL Server database)
5 - on Windows connection Set Database ID (Data Warehouse repository)
6 - Create a new metadata only load table object in WhereScape RED called 'load_json_sample' containing no columns
7 - Set the pre load action to 'no action'
8 - Set the load tables connection to 'Windows'
9 - Set the load type to script based load
10 - Set the load tables script to 'script_run_powershell_load'
12 - Edit the load table source properties and set the source file to 'create_from_webservice.ps1'
13 - If your JSON file has a single node which contains all other data, enter the name of this node into the source file field delimiter (not required for this tutorial)
14 - 'Load' the table (this will create the table in the database with the correct columns)
15 - Set the pre load action on the load table back to 'Truncate'
15 - Edit the load table source properties and set the source file to 'load_from_webservice.ps1'
16 - Load the table
Common issues:
- Your windows connection is correctly configured with the database server name and that the work directory is correctly set
- If this is run via the scheduler, aborting the job WILL NOT KILL THE SCRIPT.
- The script checks the returned pages headers for the next page of data, if it is found, it will continue to loop around until no more data is returned.
- If your API does not return headers, you can disable them in the script by setting $UseHeaders to $False and it will guess at the next page of data
- The Size limit for an object in powershell is 2GB, if your JSON object (each page) is larger than 2GB after conversion then a memory error will be thrown.
- Some REST APIs can be flaky and return different fields on some pages instead of being uniform. If this occurs, the script can handle most of these scenarios. However, if a new field appears, it will need to be manually added to the table.