¿Cuál es la mejor manera de clasificar una columna varchar sql por el número (conteo)/coincidencia de palabras en un parámetro, con cuatro criterios únicos distintos. Probablemente esta no sea una pregunta trivial, pero tengo el desafío de ordenar las filas según la "mejor coincidencia" utilizando mis criterios.El mejor método de SQL Server para hacer coincidir frases de palabras y orden relevence
columna: Descripción varchar (100) Parámetro: varchar @MyParameter (100)
de salida con este orden de preferencia:
- La concordancia exacta (toda cadena coincide) - siempre en primer lugar
- Comienza con (descendente basado en la longitud del parámetro de la coincidencia)
- Recuento de palabras rango con palabras contiguas superior para el mismo recuento de palabras que coinciden
- Palabra (s) partido en cualquier lugar (no contiguas)
palabras no coincidir exactamente, coincidencias parciales de una palabra se permite y es probable, el valor arrendador deben aplicarse a las palabras parciales para la clasificación pero no crítica (olla haría haga coincidir cada uno de ellos en: olla, alfarero, potholder, depósito, depósito, por ejemplo). Comienza con otras coincidencias de palabras que deben ser más altas que las que no tienen coincidencias posteriores, pero eso no es un asesino de reparto/súper importante.
Me gustaría tener un método para clasificar donde la columna "comienza con" el valor en el parámetro. Decir que tengo la siguiente cadena:
'This is my value string as a test template to rank on.'
me gustaría tener, en el primer caso una fila de la columna/fila donde existe el mayor número de palabras.
Y el segundo para clasificar basan en ocurrencia (ideal) en el comienzo como:
'This is my string as a test template to rank on.' - first
'This is my string as a test template to rank on even though not exact.'-second
'This is my string as a test template to rank' - third
'This is my string as a test template to' - next
'This is my string as a test template' - next etc.
En segundo lugar: (posiblemente segundo set/grupo de datos después de la primera (comienza con) - Esto es deseable
Quiero clasificar (más o menos) las filas por el conteo de palabras en el @MyParameter que se producen en @MyParameter con un rango donde las palabras contiguas rango más alto que en el mismo recuento por separado.
Así, por ejemplo, la cadena anterior , 'is my string as shown'
woul d rango más alto que 'is not my other string as'
debido a la "mejor coincidencia" de la cadena contigua (palabras juntas) con el mismo recuento de palabras. Las filas con una coincidencia más alta (recuento de palabras que se producen) clasificarían primero como la mejor combinación descendente.
Si es posible, me gustaría hacer esto en una sola consulta.
No debería aparecer ninguna fila dos veces en el resultado.
Para consideraciones de rendimiento, no se producirán más de 10.000 filas en la tabla.
Los valores en la tabla son bastante estáticos con pocos cambios pero no totalmente.
No puedo cambiar la estructura en este momento, pero lo consideraría más tarde (como una tabla de palabras/frases)
Para que esto sea un poco más complicada, la lista de palabras es en dos mesas - pero podría crear una vista para eso, pero los resultados de una tabla (lista más pequeña) deben ocurrir antes de un segundo resultado de conjunto de datos más grande con la misma coincidencia: habrá duplicados de estas tablas y dentro de una tabla, y solo quiero valores distintos. Seleccionar DISTINCT no es fácil ya que quiero devolver una columna (sourceTable) que podría hacer que las filas sean distintas y en ese caso solo seleccionar de la primera (más pequeña) tabla, pero todas las otras columnas DISTINCT son deseadas (no considere que columna en la evaluación "distinta"
columnas Psuedo en la tabla:.
procedureCode VARCHAR(50),
description VARCHAR(100), -- this is the sort/evaluation column
category VARCHAR(50),
relvu VARCHAR(50),
charge VARCHAR(15),
active bit
sourceTable VARCHAR(50) - just shows which table it comes from of the two
índice NO única existe como una columna de ID
Partidos NO en una tercera tabla que ser excluidos SELECT * FROM (select * from tableone where procedureCode not in (select procedureCode from tablethree)) UNION ALL (select * from tabletwo where procedureCode not in (select procedureCode from tablethree))
EDIT: en un intento de hacer frente a esto he creado un parametro valor de la tabla de este modo:
0 Gastric Intubation & Aspiration/Lavage, Treatmen
1 Gastric%Intubation%Aspiration%Lavage%Treatmen
2 Gastric%Intubation%Aspiration%Lavage
3 Gastric%Intubation%Aspiration
4 Gastric%Intubation
5 Gastric
6 Intubation%Aspiration%Lavage%Treatmen
7 Intubation%Aspiration%Lavage
8 Intubation%Aspiration
9 Intubation
10 Aspiration%Lavage%Treatmen
11 Aspiration%Lavage
12 Aspiration
13 Lavage%Treatmen
14 Lavage
15 Treatmen
donde la frase real se encuentra en la fila 0
Aquí está mi intento actual de esto;
CREATE PROCEDURE [GetProcedureByDescription]
(
@IncludeMaster BIT,
@ProcedureSearchPhrases CPTFavorite READONLY
)
AS
DECLARE @myIncludeMaster BIT;
SET @myIncludeMaster = @IncludeMaster;
CREATE TABLE #DistinctMatchingCpts
(
procedureCode VARCHAR(50),
description VARCHAR(100),
category VARCHAR(50),
rvu VARCHAR(50),
charge VARCHAR(15),
active VARCHAR(15),
sourceTable VARCHAR(50),
sequenceSet VARCHAR(2)
)
IF @myIncludeMaster = 0
BEGIN -- Excluding master from search
INSERT INTO #DistinctMatchingCpts (sourceTable, procedureCode, description , category ,charge, active, rvu, sequenceSet
)
SELECT DISTINCT sourceTable, procedureCode, description, category ,charge, active, rvu, sequenceSet
FROM (
SELECT TOP 1
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''01'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] = PP.[LEVEL]
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
ORDER BY PP.CODE
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM([CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''02'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%''
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''03'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] LIKE ''%'' + PP.[LEVEL] + ''%''
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
) AS CPTS
ORDER BY
procedureCode, sourceTable, [description]
END -- Excluded master from search
ELSE
BEGIN -- Including master in search, but present favorites before master for each code
-- Get matching procedures, ordered by code, source (favorites first), and description.
-- There probably will be procedures with duplicated code+description, so we will filter
-- duplicates shortly.
INSERT INTO #DistinctMatchingCpts (sourceTable, procedureCode, description , category ,charge, active, rvu, sequenceSet)
SELECT DISTINCT sourceTable, procedureCode, description, category ,charge, active, rvu, sequenceSet
FROM (
SELECT TOP 1
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''00'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] = PP.[LEVEL]
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
ORDER BY PP.CODE
UNION ALL
SELECT TOP 1
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[CATEGORY])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''2MasterCPT'' AS sourceTable,
''00'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [MASTERCPT] AS CPT
ON CPT.[LEVEL] = PP.[LEVEL]
WHERE
CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
ORDER BY PP.CODE
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''01'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] = PP.[LEVEL]
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[CATEGORY])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''2MasterCPT'' AS sourceTable,
''01'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [MASTERCPT] AS CPT
ON CPT.[LEVEL] = PP.[LEVEL]
WHERE
CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
UNION ALL
SELECT TOP 1
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''02'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%''
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
ORDER BY PP.CODE
UNION ALL
SELECT TOP 1
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[CATEGORY])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''2MasterCPT'' AS sourceTable,
''02'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [MASTERCPT] AS CPT
ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%''
WHERE
CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
ORDER BY PP.CODE
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''03'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%''
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[CATEGORY])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''2MasterCPT'' AS sourceTable,
''03'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [MASTERCPT] AS CPT
ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%''
WHERE
CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''04'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] LIKE ''%'' + PP.[LEVEL] + ''%''
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[CATEGORY])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''2MasterCPT'' AS sourceTable,
''04'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [MASTERCPT] AS CPT
ON CPT.[LEVEL] LIKE ''%'' + PP.[LEVEL] + ''%''
WHERE
CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
) AS CPTS
ORDER BY
sequenceSet, sourceTable, [description]
END
/* Final select - uses artificial ordering from the insertion ORDER BY */
SELECT procedureCode, description, category, rvu, charge, active FROM
(
SELECT TOP 500 *-- procedureCode, description, category, rvu, charge, active
FROM #DistinctMatchingCpts
ORDER BY sequenceSet, sourceTable, description
) AS CPTROWS
DROP TABLE #DistinctMatchingCpts
Sin embargo, esto NO cumple los criterios de mejor coincidencia en el recuento de palabras (como en el valor de la fila 1 en la muestra) que debe coincidir con el mejor recuento de palabras (encontrado) de esa fila.
Tengo control total sobre la forma/formato del parámetro de valor de tabla si eso hace la diferencia.
Estoy devolviendo este resultado a un programa C# si eso es útil.
¿Alguna de estas respuestas a su pregunta? –
Varias respuestas, algunas ideas pero ninguna completamente suficiente para obtener un conjunto de resultados completo que cumpla la lista de criterios. En la actualidad, estoy creando un prototipo de un algoritmo que parece estar haciendo lo que quiero; una vez que lo haya examinado por completo, determinaré si es una solución viable que cumpla esos objetivos. –