Cómo obtener xpath de una instancia de XmlNode

52

Bien, no pude resistirme a intentarlo. Solo funcionará para los atributos y elementos, pero bueno ... ¿qué se puede esperar en 15 minutos? De la misma manera, puede haber una forma más limpia de hacerlo.

Es superfluo incluir el índice en cada elemento (¡especialmente en el de raíz!), Pero es más fácil que tratar de determinar si existe alguna ambigüedad en caso contrario.

using System; 
using System.Text; 
using System.Xml; 

class Test 
{ 
    static void Main() 
    { 
     string xml = @" 
<root> 
    <foo /> 
    <foo> 
    <bar attr='value'/> 
    <bar other='va' /> 
    </foo> 
    <foo><bar /></foo> 
</root>"; 
     XmlDocument doc = new XmlDocument(); 
     doc.LoadXml(xml); 
     XmlNode node = doc.SelectSingleNode("//@attr"); 
     Console.WriteLine(FindXPath(node)); 
     Console.WriteLine(doc.SelectSingleNode(FindXPath(node)) == node); 
    } 

    static string FindXPath(XmlNode node) 
    { 
     StringBuilder builder = new StringBuilder(); 
     while (node != null) 
     { 
      switch (node.NodeType) 
      { 
       case XmlNodeType.Attribute: 
        builder.Insert(0, "/@" + node.Name); 
        node = ((XmlAttribute) node).OwnerElement; 
        break; 
       case XmlNodeType.Element: 
        int index = FindElementIndex((XmlElement) node); 
        builder.Insert(0, "/" + node.Name + "[" + index + "]"); 
        node = node.ParentNode; 
        break; 
       case XmlNodeType.Document: 
        return builder.ToString(); 
       default: 
        throw new ArgumentException("Only elements and attributes are supported"); 
      } 
     } 
     throw new ArgumentException("Node was not in a document"); 
    } 

    static int FindElementIndex(XmlElement element) 
    { 
     XmlNode parentNode = element.ParentNode; 
     if (parentNode is XmlDocument) 
     { 
      return 1; 
     } 
     XmlElement parent = (XmlElement) parentNode; 
     int index = 1; 
     foreach (XmlNode candidate in parent.ChildNodes) 
     { 
      if (candidate is XmlElement && candidate.Name == element.Name) 
      { 
       if (candidate == element) 
       { 
        return index; 
       } 
       index++; 
      } 
     } 
     throw new ArgumentException("Couldn't find element within parent"); 
    } 
}

Fuente

2008-10-27 20:35:17

+3

Jon, gracias, lo usé recientemente. Hay un error en FindElementIndex cuando un elemento tiene un "sobrino" del mismo tipo que lo precede. Haré una pequeña revisión que resuelva esto. – harpo

+0

Muchas gracias Jon! ¡Esto me salvó la vida hoy! Tengo un árbol fuente xml/xsd (árbol de casillas de verificación para que los usuarios puedan eliminar nodos) y guardo la selección de los usuarios en cadenas xpath separadas por comas para luego filtrar el feed xml de los usuarios para que solo obtengan el subconjunto de nodos que necesitan. Esto funcionó para mí. Thx otra vez. – Laguna

2

No existe el "xpath" de un nodo. Para cualquier nodo dado, bien puede haber muchas expresiones xpath que lo emparejarán.

Probablemente pueda trabajar en el árbol para construir una expresión que coincidirá, teniendo en cuenta el índice de elementos particulares, etc., pero no va a ser un código terriblemente agradable.

¿Por qué necesita esto? Puede haber una mejor solución.

Fuente

2008-10-27 20:19:00

+0

Llamo una API a una aplicación de edición XML. Necesito decirle a la aplicación que oculte ciertos nodos, lo hago llamando al ToggleVisibleElement que toma un xpath. Tenía la esperanza de que hubiera una manera fácil de hacerlo. – joe

+0

@Jon Skeet: vea mi respuesta a una pregunta similar: http://stackoverflow.com/questions/451950/get-the-xpath-to-an-xelement#453814 Mi solución produce una expresión XPath que selecciona un nodo que podría ser de cualquier tipo: raíz, elemento, atributo, texto, comentario, PI o espacio de nombres. –

20

Jon ha corregido que hay varias expresiones XPath que producirán el mismo nodo en un documento de instancia. La forma más sencilla de construir una expresión que produce de forma inequívoca un nodo específico es una cadena de pruebas de nodos que utilizan la posición del nodo en el predicado, por ejemplo:

/node()[0]/node()[2]/node()[6]/node()[1]/node()[2]

Obviamente, esta expresión no está utilizando nombres de los elementos, pero luego si todo lo que intenta hacer es ubicar un nodo dentro de un documento, no necesita su nombre. Tampoco se puede usar para buscar atributos (porque los atributos no son nodos y no tienen posición, solo se pueden encontrar por nombre), pero encontrará todos los demás tipos de nodos.

para construir esta expresión, es necesario escribir un método que devuelve la posición de un nodo en nodo hijo de su padre, porque XmlNode no expone que como una propiedad:

static int GetNodePosition(XmlNode child) 
{ 
    for (int i=0; i<child.ParentNode.ChildNodes.Count; i++) 
    { 
     if (child.ParentNode.ChildNodes[i] == child) 
     { 
      // tricksy XPath, not starting its positions at 0 like a normal language 
      return i + 1; 
     } 
    } 
    throw new InvalidOperationException("Child node somehow not found in its parent's ChildNodes property."); 
}

(Probablemente hay una más elegante manera de hacer que el uso de LINQ, ya que implementa XmlNodeListIEnumerable, pero voy con lo que sé aquí)

entonces se puede escribir un método recursivo como esto:.

static string GetXPathToNode(XmlNode node) 
{ 
    if (node.NodeType == XmlNodeType.Attribute) 
    { 
     // attributes have an OwnerElement, not a ParentNode; also they have 
     // to be matched by name, not found by position 
     return String.Format(
      "{0}/@{1}", 
      GetXPathToNode(((XmlAttribute)node).OwnerElement), 
      node.Name 
      );    
    } 
    if (node.ParentNode == null) 
    { 
     // the only node with no parent is the root node, which has no path 
     return ""; 
    } 
    // the path to a node is the path to its parent, plus "/node()[n]", where 
    // n is its position among its siblings. 
    return String.Format(
     "{0}/node()[{1}]", 
     GetXPathToNode(node.ParentNode), 
     GetNodePosition(node) 
     ); 
}

Como puedes ver, he pirateado para que también encuentre atributos.

Jon se resbaló en su versión mientras escribía la mía. Hay algo en su código que me va a hacer despotricar un poco ahora, y me disculpo de antemano si suena como si estuviera hablando mal de Jon. (No estoy. Estoy bastante seguro de que la lista de cosas que Jon tiene que aprender de mí es extremadamente corta.) Pero creo que el punto que voy a presentar es bastante importante para cualquiera que trabaje con XML para pensar en.

Sospecho que la solución de Jon surgió de algo que veo muchos desarrolladores: pensar en documentos XML como árboles de elementos y atributos. Creo que esto proviene en gran medida de los desarrolladores cuyo uso principal de XML es como un formato de serialización, porque todo el XML que están acostumbrados a usar está estructurado de esta manera. Puede detectar estos desarrolladores porque están usando los términos "nodo" y "elemento" indistintamente.Esto les lleva a encontrar soluciones que tratan a todos los demás tipos de nodos como casos especiales. (Yo fui uno de ellos por mucho tiempo).

Parece una suposición simplificadora mientras lo haces. Pero no lo es. Hace que los problemas sean más difíciles y el código más complejo. Le lleva a omitir las piezas de tecnología XML (como la función node() en XPath) que están diseñadas específicamente para tratar de forma genérica todos los tipos de nodos.

Hay una bandera roja en el código de Jon que me haría consultarlo en una revisión del código, incluso si no sabía cuáles son los requisitos, y eso es GetElementsByTagName. Cada vez que veo ese método en uso, la pregunta que me viene a la mente es siempre "¿por qué tiene que ser un elemento?" Y la respuesta es muy a menudo "oh, ¿este código también necesita manejar nodos de texto?"

Fuente

2008-10-27 21:42:43

+4

Respuesta general mucho mejor. –

0

Esto es aún más fácil

''' <summary> 
    ''' Gets the full XPath of a single node. 
    ''' </summary> 
    ''' <param name="node"></param> 
    ''' <returns></returns> 
    ''' <remarks></remarks> 
    Private Function GetXPath(ByVal node As Xml.XmlNode) As String 
     Dim temp As String 
     Dim sibling As Xml.XmlNode 
     Dim previousSiblings As Integer = 1 

     'I dont want to know that it was a generic document 
     If node.Name = "#document" Then Return "" 

     'Prime it 
     sibling = node.PreviousSibling 
     'Perculate up getting the count of all of this node's sibling before it. 
     While sibling IsNot Nothing 
      'Only count if the sibling has the same name as this node 
      If sibling.Name = node.Name Then 
       previousSiblings += 1 
      End If 
      sibling = sibling.PreviousSibling 
     End While 

     'Mark this node's index, if it has one 
     ' Also mark the index to 1 or the default if it does have a sibling just no previous. 
     temp = node.Name + IIf(previousSiblings > 0 OrElse node.NextSibling IsNot Nothing, "[" + previousSiblings.ToString() + "]", "").ToString() 

     If node.ParentNode IsNot Nothing Then 
      Return GetXPath(node.ParentNode) + "/" + temp 
     End If 

     Return temp 
    End Function

Fuente

2009-06-23 15:44:27

3

Mi valor 10p es un híbrido de Robert y Corey de las respuestas. Solo puedo reclamar el crédito por el tipeo real de las líneas adicionales de código.

private static string GetXPathToNode(XmlNode node) 
    { 
     if (node.NodeType == XmlNodeType.Attribute) 
     { 
      // attributes have an OwnerElement, not a ParentNode; also they have 
      // to be matched by name, not found by position 
      return String.Format(
       "{0}/@{1}", 
       GetXPathToNode(((XmlAttribute)node).OwnerElement), 
       node.Name 
       ); 
     } 
     if (node.ParentNode == null) 
     { 
      // the only node with no parent is the root node, which has no path 
      return ""; 
     } 
     //get the index 
     int iIndex = 1; 
     XmlNode xnIndex = node; 
     while (xnIndex.PreviousSibling != null) { iIndex++; xnIndex = xnIndex.PreviousSibling; } 
     // the path to a node is the path to its parent, plus "/node()[n]", where 
     // n is its position among its siblings. 
     return String.Format(
      "{0}/node()[{1}]", 
      GetXPathToNode(node.ParentNode), 
      iIndex 
      ); 
    }

Fuente

2009-12-18 01:37:45

1

Si lo hace, obtendrá un camino con nombres de los nodos der y la posición, si tiene nodos con el mismo nombre de la siguiente manera: "/ servicio [1]/Sistema [1]/grupo [1]/carpeta [2]/archivo [2]"

public string GetXPathToNode(XmlNode node) 
{   
    if (node.NodeType == XmlNodeType.Attribute) 
    {    
     // attributes have an OwnerElement, not a ParentNode; also they have    
     // to be matched by name, not found by position    
     return String.Format("{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name); 
    } 
    if (node.ParentNode == null) 
    {    
     // the only node with no parent is the root node, which has no path 
     return ""; 
    } 

    //get the index 
    int iIndex = 1; 
    XmlNode xnIndex = node; 
    while (xnIndex.PreviousSibling != null && xnIndex.PreviousSibling.Name == xnIndex.Name) 
    { 
     iIndex++; 
     xnIndex = xnIndex.PreviousSibling; 
    } 

    // the path to a node is the path to its parent, plus "/node()[n]", where 
    // n is its position among its siblings.   
    return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, iIndex); 
}

Fuente

2011-08-31 09:34:31

1

me encontré con que ninguna de las anteriores trabajó con XDocument, así que escribí mi propio código para apoyar XDocument y recursividad utilizado. Creo que este código maneja múltiples nodos idénticos mejor que algunos de los otros códigos aquí porque primero trata de ir tan profundo en la ruta XML como pueda y luego hace una copia de seguridad para construir solo lo que se necesita. Entonces, si tiene /home/white/bob y /home/white/mike y desea crear /home/white/bob/garage, el código sabrá cómo crear eso. Sin embargo, no quería meterme con predicados o comodines, así que explícitamente los desactivé; pero sería fácil agregar soporte para ellos.

Private Sub NodeItterate(XDoc As XElement, XPath As String) 
    'get the deepest path 
    Dim nodes As IEnumerable(Of XElement) 

    nodes = XDoc.XPathSelectElements(XPath) 

    'if it doesn't exist, try the next shallow path 
    If nodes.Count = 0 Then 
     NodeItterate(XDoc, XPath.Substring(0, XPath.LastIndexOf("/"))) 
     'by this time all the required parent elements will have been constructed 
     Dim ParentPath As String = XPath.Substring(0, XPath.LastIndexOf("/")) 
     Dim ParentNode As XElement = XDoc.XPathSelectElement(ParentPath) 
     Dim NewElementName As String = XPath.Substring(XPath.LastIndexOf("/") + 1, XPath.Length - XPath.LastIndexOf("/") - 1) 
     ParentNode.Add(New XElement(NewElementName)) 
    End If 

    'if we find there are more than 1 elements at the deepest path we have access to, we can't proceed 
    If nodes.Count > 1 Then 
     Throw New ArgumentOutOfRangeException("There are too many paths that match your expression.") 
    End If 

    'if there is just one element, we can proceed 
    If nodes.Count = 1 Then 
     'just proceed 
    End If 

End Sub 

Public Sub CreateXPath(ByVal XDoc As XElement, ByVal XPath As String) 

    If XPath.Contains("//") Or XPath.Contains("*") Or XPath.Contains(".") Then 
     Throw New ArgumentException("Can't create a path based on searches, wildcards, or relative paths.") 
    End If 

    If Regex.IsMatch(XPath, "\[\]()@='<>\|") Then 
     Throw New ArgumentException("Can't create a path based on predicates.") 
    End If 

    'we will process this recursively. 
    NodeItterate(XDoc, XPath) 

End Sub

Fuente

2011-09-27 02:18:44 cjbarth

3

Aquí hay un método simple que he usado, funcionó para mí.

static string GetXpath(XmlNode node) 
    { 
     if (node.Name == "#document") 
      return String.Empty; 
     return GetXpath(node.SelectSingleNode("..")) + "/" + (node.NodeType == XmlNodeType.Attribute ? "@":String.Empty) + node.Name; 
    }

Fuente

2012-08-09 16:37:45 rugg

5

lo sé, era defectuoso antiguo puesto, pero la versión que más me gustó (el que tiene los nombres): Cuando un nodo padre tiene nodos con diferentes nombres, que dejó de contar el índice después de que se encontró el primer no -correspondiente nombre-nodo.

Aquí está mi versión fija de la misma:

/// <summary> 
/// Gets the X-Path to a given Node 
/// </summary> 
/// <param name="node">The Node to get the X-Path from</param> 
/// <returns>The X-Path of the Node</returns> 
public string GetXPathToNode(XmlNode node) 
{ 
    if (node.NodeType == XmlNodeType.Attribute) 
    { 
     // attributes have an OwnerElement, not a ParentNode; also they have    
     // to be matched by name, not found by position    
     return String.Format("{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name); 
    } 
    if (node.ParentNode == null) 
    { 
     // the only node with no parent is the root node, which has no path 
     return ""; 
    } 

    // Get the Index 
    int indexInParent = 1; 
    XmlNode siblingNode = node.PreviousSibling; 
    // Loop thru all Siblings 
    while (siblingNode != null) 
    { 
     // Increase the Index if the Sibling has the same Name 
     if (siblingNode.Name == node.Name) 
     { 
      indexInParent++; 
     } 
     siblingNode = siblingNode.PreviousSibling; 
    } 

    // the path to a node is the path to its parent, plus "/node()[n]", where n is its position among its siblings.   
    return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, indexInParent); 
}

Fuente

2013-08-12 10:25:59 Roemer

1

Qué acerca del uso extensión de clase? ;) Mi versión (basada en el trabajo de otros) usa el nombre de la sintaxis [índice] ... con el índice omitido, el elemento no tiene "hermanos". El ciclo para obtener el índice del elemento está afuera en una rutina independiente (también una extensión de clase).

Justo después de la siguiente en cualquier clase de utilidad (o en la clase principal del programa)

static public int GetRank(this XmlNode node) 
{ 
    // return 0 if unique, else return position 1...n in siblings with same name 
    try 
    { 
     if(node is XmlElement) 
     { 
      int rank = 1; 
      bool alone = true, found = false; 

      foreach(XmlNode n in node.ParentNode.ChildNodes) 
       if(n.Name == node.Name) // sibling with same name 
       { 
        if(n.Equals(node)) 
        { 
         if(! alone) return rank; // no need to continue 
         found = true; 
        } 
        else 
        { 
         if(found) return rank; // no need to continue 
         alone = false; 
         rank++; 
        } 
       } 

     } 
    } 
    catch{} 
    return 0; 
} 

static public string GetXPath(this XmlNode node) 
{ 
    try 
    { 
     if(node is XmlAttribute) 
      return String.Format("{0}/@{1}", (node as XmlAttribute).OwnerElement.GetXPath(), node.Name); 

     if(node is XmlText || node is XmlCDataSection) 
      return node.ParentNode.GetXPath(); 

     if(node.ParentNode == null) // the only node with no parent is the root node, which has no path 
      return ""; 

     int rank = node.GetRank(); 
     if(rank == 0) return String.Format("{0}/{1}",  node.ParentNode.GetXPath(), node.Name); 
     else   return String.Format("{0}/{1}[{2}]", node.ParentNode.GetXPath(), node.Name, rank); 
    } 
    catch{} 
    return ""; 
}

Fuente

2014-06-27 12:45:57 Plasmabubble

1

produje VBA para Excel para hacer esto para un proyecto de trabajo. Emite tuplas de un Xpath y el texto asociado de un elemento o atributo. El objetivo era permitir a los analistas de negocios identificar y mapear algunos xml. Apreciar que este es un foro de C#, pero pensé que esto podría ser de interés.

Sub Parse2(oSh As Long, inode As IXMLDOMNode, Optional iXstring As String = "", Optional indexes) 


Dim chnode As IXMLDOMNode 
Dim attr As IXMLDOMAttribute 
Dim oXString As String 
Dim chld As Long 
Dim idx As Variant 
Dim addindex As Boolean 
chld = 0 
idx = 0 
addindex = False 


'determine the node type: 
Select Case inode.NodeType 

    Case NODE_ELEMENT 
     If inode.ParentNode.NodeType = NODE_DOCUMENT Then 'This gets the root node name but ignores all the namespace attributes 
      oXString = iXstring & "//" & fp(inode.nodename) 
     Else 

      'Need to deal with indexing. Where an element has siblings with the same nodeName,it needs to be indexed using [index], e.g swapstreams or schedules 

      For Each chnode In inode.ParentNode.ChildNodes 
       If chnode.NodeType = NODE_ELEMENT And chnode.nodename = inode.nodename Then chld = chld + 1 
      Next chnode 

      If chld > 1 Then '//inode has siblings of the same nodeName, so needs to be indexed 
       'Lookup the index from the indexes array 
       idx = getIndex(inode.nodename, indexes) 
       addindex = True 
      Else 
      End If 

      'build the XString 
      oXString = iXstring & "/" & fp(inode.nodename) 
      If addindex Then oXString = oXString & "[" & idx & "]" 

      'If type is element then check for attributes 
      For Each attr In inode.Attributes 
       'If the element has attributes then extract the data pair XString + Element.Name, @Attribute.Name=Attribute.Value 
       Call oSheet(oSh, oXString & "/@" & attr.Name, attr.Value) 
      Next attr 

     End If 

    Case NODE_TEXT 
     'build the XString 
     oXString = iXstring 
     Call oSheet(oSh, oXString, inode.NodeValue) 

    Case NODE_ATTRIBUTE 
    'Do nothing 
    Case NODE_CDATA_SECTION 
    'Do nothing 
    Case NODE_COMMENT 
    'Do nothing 
    Case NODE_DOCUMENT 
    'Do nothing 
    Case NODE_DOCUMENT_FRAGMENT 
    'Do nothing 
    Case NODE_DOCUMENT_TYPE 
    'Do nothing 
    Case NODE_ENTITY 
    'Do nothing 
    Case NODE_ENTITY_REFERENCE 
    'Do nothing 
    Case NODE_INVALID 
    'do nothing 
    Case NODE_NOTATION 
    'do nothing 
    Case NODE_PROCESSING_INSTRUCTION 
    'do nothing 
End Select 

'Now call Parser2 on each of inode's children. 
If inode.HasChildNodes Then 
    For Each chnode In inode.ChildNodes 
     Call Parse2(oSh, chnode, oXString, indexes) 
    Next chnode 
Set chnode = Nothing 
Else 
End If 

End Sub

gestiona el recuento de elementos usando:

Function getIndex(tag As Variant, indexes) As Variant 
'Function to get the latest index for an xml tag from the indexes array 
'indexes array is passed from one parser function to the next up and down the tree 

Dim i As Integer 
Dim n As Integer 

If IsArrayEmpty(indexes) Then 
    ReDim indexes(1, 0) 
    indexes(0, 0) = "Tag" 
    indexes(1, 0) = "Index" 
Else 
End If 
For i = 0 To UBound(indexes, 2) 
    If indexes(0, i) = tag Then 
     'tag found, increment and return the index then exit 
     'also destroy all recorded tag names BELOW that level 
     indexes(1, i) = indexes(1, i) + 1 
     getIndex = indexes(1, i) 
     ReDim Preserve indexes(1, i) 'should keep all tags up to i but remove all below it 
     Exit Function 
    Else 
    End If 
Next i 

'tag not found so add the tag with index 1 at the end of the array 
n = UBound(indexes, 2) 
ReDim Preserve indexes(1, n + 1) 
indexes(0, n + 1) = tag 
indexes(1, n + 1) = 1 
getIndex = 1 

End Function

Fuente

2014-11-14 21:50:09 Sandy

0

Otra solución a su problema podría ser la 'marca' los XMLNodes que se desea identificar más tarde con un atributo personalizado:

var id = _currentNode.OwnerDocument.CreateAttribute("some_id"); 
id.Value = Guid.NewGuid().ToString(); 
_currentNode.Attributes.Append(id);

que puede almacenar en un diccionario, por ejemplo. Y se puede identificar más tarde, el nodo con una consulta XPath:

newOrOldDocument.SelectSingleNode(string.Format("//*[contains(@some_id,'{0}')]", id));

Sé que esto no es una respuesta directa a su pregunta, pero puede ayudar si la razón por la que desea conocer el XPath de un nodo es tener una forma de 'llegar' al nodo más tarde después de haber perdido la referencia a él en el código.

Esto también supera los problemas cuando el documento obtiene elementos agregados/movidos, lo que puede arruinar el xpath (o índices, como se sugiere en otras respuestas).

Fuente

2016-05-18 14:23:32 Andrei

0

public static string GetFullPath(this XmlNode node) 
     { 
      if (node.ParentNode == null) 
      { 
       return ""; 
      } 
      else 
      { 
       return $"{GetFullPath(node.ParentNode)}\\{node.ParentNode.Name}"; 
      } 
     }

Fuente

2017-06-29 08:26:54

Cómo obtener xpath de una instancia de XmlNode

Respuesta

Cuestiones relacionadas