XML Input

Overview

The XML Input tool reads XML-formatted data, and outputs records from a file or field to the downstream tools. You can configure the number of fields, as well as the size and type of each field. To automate field definition, you can also import a layout file specifying field names, field descriptions, and (optionally) field sizes.

The XML Input tool supports three different types of XML data file: Simple Tags, Simple Attributes, and SOAP. All of these formats are simple formats, in that they can only represent a flat array of record data, with no nesting structure. For more complex XML structures, see XML2 Input.

Because XML Input and XML Output tools are complementary, it is useful to experiment with XML Output and observe the formats it produces. You can then use that information to configure the XML Input tool to produce a matching input file format.

Simple XML formats

Simple tags

Simple tag XML defines a simple array data structure using element nesting. The first element to follow the XML declaration (typically of the form <?xml version='1.0' encoding='utf-8'?>) is the root element. This encloses a series of row elements, each of which defines a record. Within each row element, fieldname elements enclose field values. For example the record data in the table below corresponds to the following XML file.

FIRMNAME	CONTACT
Unicorp	John Smith
...	...

XML

<ROOT>
   <ROW>
      <FIRMNAME>Redpoint Global Inc.</FIRMNAME>
      <CONTACT>John Smith</CONTACT>
   </ROW>
  ...
</ROOT>

Simple attributes

Simple attribute XML defines a simple array data structure using elements for each row, but attributes for the field values. For example this record data.

FIRMNAME	CONTACT
Unicorp	John Smith
...	...

XML

<ROWS>
   <ROW FIRMNAME="Redpoint Global Inc." CONTACT="John Smith"/>
   ...
</ROWS>

SOAP format

SOAP is an XML protocol designed for exchanging structured information in a decentralized, distributed environment. Normally you would specify SOAP in an XML Input or XML Output tool when you are creating a project that will be published as a web service. If you configure an XML Output tool as in the image below, with the input data from the following table, the output XML is shown after the table.

COMPANY	CONTACT
Unicorp	John Smith
Megabank	Joan Riley

XML

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
                    xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/"
                    xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"
                    xmlns:tns="http://www.Data Management.com/dles_soap"
                    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <SOAP-ENV:Body>
        <OutputArray xmlns="http://www.Data Management.com/dles_soap">
            <OutputRecord>
                <COMPANY>Unicorp</COMPANY>
                <CONTACT>John Smith</CONTACT>
            </OutputRecord>
            <OutputRecord>
                <COMPANY>Megabank</COMPANY>
                <CONTACT>Joan Riley</CONTACT>
            </OutputRecord>
        </OutputArray>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

When SOAP format is specified in the XML Input and XML Output tools, these tools read or write simple SOAP documents in which every record is contained in an element whose tag consists of the SOAP element base followed by a record element (the <OutputRecord> element in the above example). Within the record elements, each column is surrounded by a tag derived from the column name (<COMPANY> and <CONTACT> in the example).

Suppose that we construct a simple project using XML Input and XML Output tools to read and write records with COMPANY and CONTACT columns. If you queried that WSDL, you would see this.

XML

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<wsdl:definitions xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"
                  targetNamespace="http://www.Data Management.com/dles_soap"
                  xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/"
                  xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
                  xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"
                  xmlns:tns="http://www.Data Management.com/dles_soap"
                  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <wsdl:types>
        <xsd:schema attributeFormDefault="unqualified" elementFormDefault="qualified" targetNamespace="http://www.Data Management.com/dles_soap">
            <xsd:complexType name="InputRecord">
                <xsd:sequence>
                    <xsd:element maxOccurs="1" minOccurs="0" name="COMPANY" nillable="true" type="xsd:string"/>
                    <xsd:element maxOccurs="1" minOccurs="0" name="CONTACT" nillable="true" type="xsd:string"/>
                </xsd:sequence>
            </xsd:complexType>
            <xsd:element name="InputRecord" nillable="true" type="tns:InputRecord"/>
            <xsd:complexType name="InputArray">
                <xsd:sequence>
                    <xsd:element maxOccurs="unbounded" minOccurs="0" name="InputRecord" nillable="true" type="tns:InputRecord"/>
                </xsd:sequence>
            </xsd:complexType>
            <xsd:element name="InputArray" nillable="true" type="tns:InputArray"/>
            <xsd:complexType name="OutputRecord">
                <xsd:sequence>
                    <xsd:element maxOccurs="1" minOccurs="0" name="COMPANY" nillable="true" type="xsd:string"/>
                    <xsd:element maxOccurs="1" minOccurs="0" name="CONTACT" nillable="true" type="xsd:string"/>
                </xsd:sequence>
            </xsd:complexType>
            <xsd:element name="OutputRecord" nillable="true" type="tns:OutputRecord"/>
            <xsd:complexType name="OutputArray">
                <xsd:sequence>
                    <xsd:element maxOccurs="unbounded" minOccurs="0" name="OutputRecord" nillable="true" type="tns:OutputRecord"/>
                </xsd:sequence>
            </xsd:complexType>
            <xsd:element name="OutputArray" nillable="true" type="tns:OutputArray"/>
        </xsd:schema>
    </wsdl:types>
    <wsdl:message name="Request">
        <wsdl:part element="tns:InputArray" name="parameters"/>
    </wsdl:message>
    <wsdl:message name="Response">
        <wsdl:part element="tns:OutputArray" name="parameters"/>
    </wsdl:message>
    <wsdl:portType name="dlesPortType">
        <wsdl:operation name="webservice">
            <wsdl:input message="tns:Request"/>
            <wsdl:output message="tns:Response"/>
        </wsdl:operation>
    </wsdl:portType>
    <wsdl:binding name="dlesSoapHttpBinding" type="tns:dlesPortType">
        <soap:binding style="document" transport="http://schemas.xmlsoap.org/soap/http"/>
        <wsdl:operation name="webservice">
            <soap:operation soapAction="http://www.Data Management.com/dles_soap/webservice" style="document"/>
            <wsdl:input>
                <soap:body use="literal"/>
            </wsdl:input>
            <wsdl:output>
                <soap:body use="literal"/>
            </wsdl:output>
        </wsdl:operation>
    </wsdl:binding>
    <wsdl:service name="dles">
        <wsdl:port binding="tns:dlesSoapHttpBinding" name="dles">
            <soap:address location="http://rpb-desk-3:12345/"/>
        </wsdl:port>
    </wsdl:service>
</wsdl:definitions>

The XML Input and XML Output tools cannot read and write an arbitrary SOAP document. They perform formatting; if you use them in a published web service, Data Management generates the corresponding WSDL. If you have existing SOAP documents and want to read or write that format, use the XML Input2 and XML Output2 tools and analyze a sample of the document.

XML Input tool configuration parameters

The XML Input tool has two sets of configuration parameters in addition to the standard execution options.

Configuration

Parameter	Description
Input from	The source of the data: File: read from the specified data file. This is the default. Field: read from the field on the input connection, treating the stream of binary data as wildcarded files. The Split list data source is for Redpoint Global Inc. internal use only. Do not set this manually!
Input file	If Input from is File, the file containing the records, or a wildcard pattern matching multiple files.
Input field	If Input from is Field, the field from which the data will be read. If you are connecting to an upstream Web Service Input tool, this is `requestBody`.
Include input fields	If Input from is Field, optionally passes through the "extra" input fields to the "root" record. This can be useful in certain kinds of processing where it is desirable to carry through identifying information that is not represented in the XML documents.
Use repository schema	If selected, configure the field layout using the specified Schema instead of configuring fields directly.
Schema	If Use repository schema is selected, a schema must be specified.
XML format	Select the type of XML input data: Simple tags, Simple attributes, or SOAP. This is optional and defaults to Simple attributes.
XML row tag	If XML format is Simple tags or Simple attributes, specifies the XML tag under which rows can be found.
SOAP element base	If XML format is SOAP, specifies the XML tag containing the input record.
Field	Name that uniquely names the field amongst all fields in the record.
Type	Sub-configuration describing the type of the new field. See Configuring Data Types.

Options

Parameter	Description
Produce file name field	If selected, the file name will be output as a record field.
Output full path	If Produce file name field is selected, optionally outputs the entire path to the record file name field.
Output URI path	If Output full path is selected, express path as a Uniform Resource Identifier (URI).
Field name	If Produce file name field is selected, name of the column to be used for the file name. This is optional and defaults to `FILENAME`.
Field size	If Produce file name field is selected, size of the field to be used for the file name. This is optional and defaults to `255`.
Error handling	Configures how errors in the input data will be handled: Ignore error: accept the record and ignore the error. Issue warning: accept the record but send a warning to the Message Viewer and/or Error Report connector. Reject record: reject the record and send a warning to the Message Viewer and/or Error Report connector. Abort project: abort the project.
Parsing errors	Specify how to handle input containing invalid XML that cannot be processed. The Issue warning option is not available for this type of error.
Conversion errors	Specify how to handle input data that cannot be correctly converted to its specified data type (such as converting "Jane Smith" to a Date).
Truncation errors	Specify how to handle input with truncated field values (such as storing "John Smithson" in a Unicode(1) field).
Missing fields	Specify how to handle input with fields that are missing or contain empty values.
Extra fields	Specify how to handle input with extra elements that are not defined in the tool's configuration.
Send errors to Message Viewer	If selected, all error types configured to either Issue warning or Reject record will send warnings to the Message Viewer. The Report warnings option must be selected on the Execution tab.
Send errors to Error Report connector	If selected, creates an E "Error Report" output connector. All error types configured to either Issue warning or Reject record will send detailed error messages to that connector. Attach a downstream Data Viewer or other output tool to review these messages.

Configure the XML Input tool

To configure an XML Input tool, you need the following information about the input file:

The input file or field name
The XML type (either Simple Tags, Simple Attributes, or SOAP)
The XML row tag
The list of fields and their types

Data Management provides several ways to specify this information.

If you have...	Do this
An XML schema for this file already defined in the repository.	Configure the XML Input tool using a predefined schema.
A layout file.	Configure the XML Input tool using a layout file.
None of the above.	Use Analyze command, or configure the XML Input manually.

...using a predefined schema

To use this method of configuring an XML Input tool, you must have an XML schema for the input file already defined in the repository.

To configure the XML Input tool using a predefined schema:

Select the XML Input tool.
Go to the Configuration tab on the Properties pane.
Select Input from and choose a data source.

Input from

Do this

File

Specify the input file or files. You can use wildcards to configure a single XML Input tool to read a sequence of files with the same layout and format.

Field

Field data must be binary. If necessary, use an upstream Calculate tool configured with the BinaryRecastFromText function to convert the input data to type binary.

Select Commit to display the input connector, and then connect the desired input tool. Output connectors are unavailable until input specifications have been defined.

Select the Input field from which the data will be read. If you are connecting to an upstream Web Service Input tool, this is requestBody.

The Split list data source is for Redpoint Global Inc. internal use only.

Select Use repository schema, and then select the desired schema from the list.
Examine the Format options and Schema grid to verify that the format is correct.
If the fields and options appear incorrect, choose or create a different schema appropriate for the input file, or use another method to configure the XML Input tool.
Optionally, select the Options tab and configure advanced options:
- If you don't want to process the entire file, select Limit records and type the desired number of records to process.
- To include the name of the input file as a new field, select Produce file name field and specify a Field name and Field size. Select Output full path to include the complete file specification. This can be useful when reading a wildcarded set of files. Select Output URI path to express the complete file specification as a Uniform Resource Identifier.
- Configure error handling.
Optionally, go to the Execution tab and Enable trigger input, configure reporting options, or set Web service options.

...using a layout file

To use this method of configuring an XML Input tool, you must have a multi-column layout file for the input file.

To configure the XML Input tool using a layout file:

Select the XML Input tool.
Go to the Configuration tab on the Properties pane.
Select Input from and choose a data source.

Input from

Do this

File

Specify the input file or files. You can use wildcards to configure a single XML Input tool to read a sequence of files with the same layout and format.

Field

Field data must be binary. If necessary, use an upstream Calculate tool configured with the BinaryRecastFromText function to convert the input data to type binary.

Select Commit to display the input connector, and then connect the desired input tool. Output connectors are unavailable until input specifications have been defined.

Select the Input field from which the data will be read. If you are connecting to an upstream Web Service Input tool, this is requestBody.

The Split list data source is for Redpoint Global Inc. internal use only.

Selectand choose Layout, and then Load (text), Load (text with types), or Load (XML). Browse to the layout file that describes the input file, and then select Open.
Examine the Format options and field grid to verify that the format is correct.
If the fields and options appear incorrect, choose or create a different schema appropriate for the input file, or use another method to configure the XML Input tool.
Optionally, select the Options tab and configure advanced options:
1. If you don't want to process the entire file, select Limit records and type the desired number of records to process.
2. To include the name of the input file as a new field, select Produce file name field and specify a Field name and Field size. Select Output full path to include the complete file specification. This can be useful when reading a wildcarded set of files. Select Output URI path to express the complete file specification as a Uniform Resource Identifier.
Configure error handling.
Optionally, go to the Execution tab and Enable trigger input, configure reporting options, or set Web service options.

...using Analyze

If the input schema is changed after you configure the XML Input tool using Analyze, you must repeat the configuration process.

To configure the XML Input tool using Analyze:

Select the XML Input tool.
Go to the Configuration tab on the Properties pane.
Select Input from and choose a data source.

Input from

Do this

File

Specify the input file or files. You can use wildcards to configure a single XML Input tool to read a sequence of files with the same layout and format.

Select Analyze and analyze the data.

Field

Field data must be binary. If necessary, use an upstream Calculate tool configured with the BinaryRecastFromText function to convert the input data to type binary.

Select Commit to display the input connector, and then connect the desired input tool. Output connectors are unavailable until input specifications have been defined.

Select the Input field from which the data will be read. If you are connecting to an upstream Web Service Input tool, this is requestBody.

Optionally, select the Analyze tab and enter a formatted sample of the expected XML input.

Select Analyze and analyze the data.

The Split list data source is for Redpoint Global Inc. internal use only.

Examine the Format options and field grid to verify that the format is correct.
If the fields and options appear incorrect, choose or create a different schema appropriate for the input file, or use another method to configure the XML Input tool.
Optionally, select the Options tab and configure advanced options:
- If you don't want to process the entire file, select Limit records and type the desired number of records to process.
- To include the name of the input file as a new field, select Produce file name field and specify a Field name and Field size. Select Output full path to include the complete file specification. This can be useful when reading a wildcarded set of files. Select Output URI path to express the complete file specification as a Uniform Resource Identifier.
- Configure error handling.
Optionally, go to the Execution tab and Enable trigger input, configure reporting options, or set Web service options.

...manually

Use this method to configure an XML Input tool if the other available configuration options do not correctly describe your input data.

To configure the XML Input tool manually:

Select the XML Input tool.
Go to the Configuration tab on the Properties pane.
Select Input from and choose a data source.

Input from

Do this

File

Specify the input file or files. You can use wildcards to configure a single XML Input tool to read a sequence of files with the same layout and format.

Field

Field data must be binary. If necessary, use an upstream Calculate tool configured with the BinaryRecastFromText function to convert the input data to type binary.

Select Commit to display the input connector, and then connect the desired input tool. Output connectors are unavailable until input specifications have been defined.

Select the Input field from which the data will be read. If you are connecting to an upstream Web Service Input tool, this is requestBody.

Select the XML format from the list.
- If you specify Simple tags or Simple attributes, enter the XML row tag identifier (without angle braces < >).
- If you specify SOAP array, enter the SOAP argument name, SOAP array name, and SOAP record name.
Configure field Name, Type, and optional Description of input fields. For each field:
- Select the cell in the Field column, and type the desired name.
- Select the cell in the Type column, and specify the field's data type and size, if different than the name assigned by default. Available options depend on the type selected. See Configuring data types. You don't need to set the data type unless you will perform calculations or comparisons using the field, or intend to export the information to a strongly-typed database.
Optionally, select the Options tab and configure advanced options:
- If you don't want to process the entire file, select Limit records and type the desired number of records to process.
- To include the name of the input file as a new field, select Produce file name field and specify a Field name and Field size. Select Output full path to include the complete file specification. This can be useful when reading a wildcarded set of files. Select Output URI path to express the complete file specification as a Uniform Resource Identifier.
- Configure error handling.
Optionally, go to the Execution tab and Enable trigger input, configure reporting options, or set Web service options.