microsoft/openscraping-lib-csharp

Enable CastToIntegerTransformation to transform from container too

PoLaKoSz opened this issue · 1 comments

My use case:

Given this URL: https://dev.test/index.php?PHPSESSID=a&action=profile;u=99 i wanted to extract the 99 user ID from the end of the string. My solution was to use a simple Regex and convert it to integer:

"_transformations": [
{
    "_type": "RegexTransformation",
    "_regex": "u=(\\d+)",
},
"CastToIntegerTransformation",
],

But after i got

Transformation chain broken at transformation type CastToIntegerTransformation

started to debug the library and recognized that the CastToIntegerTransformation not inherits from ITransformationFromObject so i cannot use at the end of the parsing pipeline.

Yes, this problem can easily fixed with inheritance but i thought mention here.

Click to view my extended CastToIntegerTransformation class implementation

/// <summary>
/// Class to cast selected XPath value to <see cref="int"/>.
/// </summary>
public class CastToIntegerTransformation : ITransformationFromHtml, ITransformationFromObject
{
    public object Transform(Dictionary<string, object> settings, HtmlNodeNavigator nodeNavigator, List<HtmlAgilityPack.HtmlNode> logicalParents)
    {
        var text = nodeNavigator?.Value ?? nodeNavigator?.CurrentNode?.InnerText;

        if (text != null)
        {
            int intVal;

            if (int.TryParse(text, out intVal))
            {
                return intVal;
            }
        }

        return null;
    }

    /// <summary>
    /// Transforms the input to a valid <see cref="int"/>.
    /// </summary>
    /// <param name="settings"><seealso cref="Config.TransformationConfig.ConfigAttributes"/>.</param>
    /// <param name="input">Parsed XPath value.</param>
    /// <returns><see cref="int"/>.</returns>
    /// <exception cref="FormatException">Occurs when the <paramref name="input" /> parameter
    /// is not a valid integer.</exception>
    public object Transform(Dictionary<string, object> settings, object input)
    {
        if (int.TryParse(input.ToString(), out int number))
        {
            return number;
        }

        throw new FormatException($"Input parameter {input} is not a valid integer!");
    }
}

Thank You for this great library!

I just ran into the exact same problem. Thank you so much, nice to not be 'first' for a change ,)