commit 0717e50c97ae74bc9fda0f07ce4cc6934d69c8e4 Author: Han, Weidong Date: Thu Feb 26 17:31:12 2009 +0800 intel-iommu: fix PCI device detach from virtual machine When assign a device behind conventional PCI bridge or PCIe to PCI/PCI-x bridge to a domain, it must assign its bridge and may also need to assign secondary interface to the same domain. Dependent assignment is already there, but dependent deassignment is missed when detach device from virtual machine. This results in conventional PCI device assignment failure after it has been assigned once. This patch addes dependent deassignment, and fixes the issue. Signed-off-by: Weidong Han Signed-off-by: David Woodhouse (cherry picked from commit 3199aa6bc8766e17b8f60820c4f78d59c25fce0e) commit bbad20263a1aa62f406847a98706f4509d66062e Author: David Woodhouse Date: Sun May 10 23:57:41 2009 +0100 intel-iommu: PAE memory corruption fix PAGE_MASK is 0xFFFFF000 on i386 -- even with PAE. So it's not sufficient to ensure that you use phys_addr_t or uint64_t everywhere you handle physical addresses -- you also have to avoid using the construct 'addr & PAGE_MASK', because that will strip the high 32 bits of the address. This patch avoids that problem by using PHYSICAL_PAGE_MASK instead of PAGE_MASK where appropriate. It leaves '& PAGE_MASK' in a few instances that don't matter -- where it's being used on the virtual bus addresses we're dishing out, which are 32-bit anyway. Since PHYSICAL_PAGE_MASK is not present on other architectures, we have to define it (to PAGE_MASK) if it's not already defined. Maybe it would be better just to fix PAGE_MASK for i386/PAE? Signed-off-by: David Woodhouse Signed-off-by: Linus Torvalds (cherry picked from commit fd18de50b9e7965f93d231e7390436fb8900c0e6) commit c405c91435e937ed05e7858913262295ac68047c Author: David Woodhouse Date: Mon Apr 6 13:30:01 2009 -0700 intel-iommu: Fix oops in device_to_iommu() when devices not found. It's possible for a device in the drhd->devices[] array to be NULL if it wasn't found at boot time, which means we have to check for that case. Signed-off-by: David Woodhouse (cherry picked from commit 4958c5dc7bcb2e42d985cd26aeafd8a7eca9ab1e) commit 0210ca0870810580f8d26b0b4b56497814fc4898 Author: David Woodhouse Date: Sat Apr 4 00:39:25 2009 +0100 intel-iommu: Fix device-to-iommu mapping for PCI-PCI bridges. When the DMAR table identifies that a PCI-PCI bridge belongs to a given IOMMU, that means that the bridge and all devices behind it should be associated with the IOMMU. Not just the bridge itself. This fixes the device_to_iommu() function accordingly. (It's broken if you have the same PCI bus numbers in multiple domains, but this function was always broken in that way; I'll be dealing with that later). Signed-off-by: David Woodhouse (cherry picked from commit 924b6231edfaf1e764ffb4f97ea382bf4facff58) commit f4e871ae555664b64f97ccf301bf189487530528 Author: Linus Torvalds Date: Tue Apr 7 07:59:41 2009 -0700 Fix build errors due to CONFIG_BRANCH_TRACER=y The code that enables branch tracing for all (non-constant) branches plays games with the preprocessor and #define's the C 'if ()' construct to do tracing. That's all fine, but it fails for some unusual but valid C code that is sometimes used in macros, notably by the intel-iommu code: if (i=drhd->iommu, drhd->ignored) .. because now the preprocessor complains about multiple arguments to the 'if' macro. So make the macro expansion of this particularly horrid trick use varargs, and handle the case of comma-expressions in if-statements. Use another macro to do it cleanly in just one place. This replaces a patch by David (and acked by Steven) that did this all inside that one already-too-horrid macro. Tested-by: Ingo Molnar Cc: David Woodhouse Cc: Steven Rostedt Signed-off-by: Linus Torvalds (cherry picked from commit aeeae86859f4319de0a4946b44771d9926eeed54) commit 698149762a0af0c54259f543b660168b903d4ae4 Author: Fenghua Yu Date: Fri Mar 27 14:22:42 2009 -0700 Intel IOMMU Suspend/Resume Support - DMAR This patch implements the suspend and resume feature for Intel IOMMU DMAR. It hooks to kernel suspend and resume interface. When suspend happens, it saves necessary hardware registers. When resume happens, it restores the registers and restarts IOMMU by enabling translation, setting up root entry, and re-enabling queued invalidation. Signed-off-by: Fenghua Yu Acked-by: Ingo Molnar Signed-off-by: David Woodhouse (cherry picked from commit f59c7b69bcba31cd355ababe067202b9895d6102) commit adfd653bdf2cb37b90ab44fb571b746e08755c1e Author: Fenghua Yu Date: Fri Mar 27 14:22:43 2009 -0700 Intel IOMMU Suspend/Resume Support - Queued Invalidation This patch supports queued invalidation suspend/resume. Signed-off-by: Fenghua Yu Acked-by: Ingo Molnar Signed-off-by: David Woodhouse (cherry picked from commit eb4a52bc660ea835482c582eaaf4893742cbd160) commit 19eadfec5de05cdaec5f5dbc2002c19aa5217e91 Author: David Woodhouse Date: Fri Apr 3 15:19:32 2009 +0100 intel-iommu: Add for_each_iommu() and for_each_active_iommu() macros Signed-off-by: David Woodhouse Acked-by: Ingo Molnar (cherry picked from commit 8f912ba4d7cdaf7d31cf39fe5a9b7732308a256d) commit 7fe06af0d58ea3e0494a5a77758593ea8cb8d996 Author: Suresh Siddha Date: Mon Mar 16 17:04:57 2009 -0700 x86, dmar: start with sane state while enabling dma and interrupt-remapping Impact: cleanup/sanitization Start from a sane state while enabling dma and interrupt-remapping, by clearing the previous recorded faults and disabling previously enabled queued invalidation and interrupt-remapping. Signed-off-by: Suresh Siddha Signed-off-by: H. Peter Anvin (cherry picked from commit 1531a6a6b81a4e6f9eec9a5608758a6ea14b96e0) commit 2724d3b3375d93f5d87f98ac5f308649f9919945 Author: Suresh Siddha Date: Mon Mar 16 17:04:56 2009 -0700 x86, dmar: routines for disabling queued invalidation and intr remapping Impact: new interfaces (not yet used) Routines for disabling queued invalidation and interrupt remapping. Signed-off-by: Suresh Siddha Signed-off-by: H. Peter Anvin (cherry picked from commit eba67e5da6e971993b2899d2cdf459ce77d3dbc5) commit f31131852631f599d670a842071c48299b691b29 Author: Suresh Siddha Date: Mon Mar 16 17:04:54 2009 -0700 x86, dmar: move page fault handling code to dmar.c Impact: code movement Move page fault handling code to dmar.c This will be shared both by DMA-remapping and Intr-remapping code. Signed-off-by: Suresh Siddha Signed-off-by: H. Peter Anvin (cherry picked from commit 0ac2491f57af5644f88383d28809760902d6f4d7) drivers/pci/dmar.c | 311 +++++++++++++++++++++++++++++++-- drivers/pci/intel-iommu.c | 405 ++++++++++++++++++++++--------------------- drivers/pci/intr_remapping.c | 44 ++++ include/linux/compiler.h | 4 include/linux/dmar.h | 10 + include/linux/intel-iommu.h | 12 + 6 files changed, 581 insertions(+), 205 deletions(-) diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c index 8a01120..8734b14 100644 --- a/drivers/pci/dmar.c +++ b/drivers/pci/dmar.c @@ -31,6 +31,8 @@ #include #include #include +#include +#include #undef PREFIX #define PREFIX "DMAR:" @@ -757,14 +759,77 @@ int qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr, } /* + * Disable Queued Invalidation interface. + */ +void dmar_disable_qi(struct intel_iommu *iommu) +{ + unsigned long flags; + u32 sts; + cycles_t start_time = get_cycles(); + + if (!ecap_qis(iommu->ecap)) + return; + + spin_lock_irqsave(&iommu->register_lock, flags); + + sts = dmar_readq(iommu->reg + DMAR_GSTS_REG); + if (!(sts & DMA_GSTS_QIES)) + goto end; + + /* + * Give a chance to HW to complete the pending invalidation requests. + */ + while ((readl(iommu->reg + DMAR_IQT_REG) != + readl(iommu->reg + DMAR_IQH_REG)) && + (DMAR_OPERATION_TIMEOUT > (get_cycles() - start_time))) + cpu_relax(); + + iommu->gcmd &= ~DMA_GCMD_QIE; + + writel(iommu->gcmd, iommu->reg + DMAR_GCMD_REG); + + IOMMU_WAIT_OP(iommu, DMAR_GSTS_REG, readl, + !(sts & DMA_GSTS_QIES), sts); +end: + spin_unlock_irqrestore(&iommu->register_lock, flags); +} + +/* + * Enable queued invalidation. + */ +static void __dmar_enable_qi(struct intel_iommu *iommu) +{ + u32 cmd, sts; + unsigned long flags; + struct q_inval *qi = iommu->qi; + + qi->free_head = qi->free_tail = 0; + qi->free_cnt = QI_LENGTH; + + spin_lock_irqsave(&iommu->register_lock, flags); + + /* write zero to the tail reg */ + writel(0, iommu->reg + DMAR_IQT_REG); + + dmar_writeq(iommu->reg + DMAR_IQA_REG, virt_to_phys(qi->desc)); + + cmd = iommu->gcmd | DMA_GCMD_QIE; + iommu->gcmd |= DMA_GCMD_QIE; + writel(cmd, iommu->reg + DMAR_GCMD_REG); + + /* Make sure hardware complete it */ + IOMMU_WAIT_OP(iommu, DMAR_GSTS_REG, readl, (sts & DMA_GSTS_QIES), sts); + + spin_unlock_irqrestore(&iommu->register_lock, flags); +} + +/* * Enable Queued Invalidation interface. This is a must to support * interrupt-remapping. Also used by DMA-remapping, which replaces * register based IOTLB invalidation. */ int dmar_enable_qi(struct intel_iommu *iommu) { - u32 cmd, sts; - unsigned long flags; struct q_inval *qi; if (!ecap_qis(iommu->ecap)) @@ -802,19 +867,241 @@ int dmar_enable_qi(struct intel_iommu *iommu) spin_lock_init(&qi->q_lock); - spin_lock_irqsave(&iommu->register_lock, flags); - /* write zero to the tail reg */ - writel(0, iommu->reg + DMAR_IQT_REG); + __dmar_enable_qi(iommu); - dmar_writeq(iommu->reg + DMAR_IQA_REG, virt_to_phys(qi->desc)); + return 0; +} - cmd = iommu->gcmd | DMA_GCMD_QIE; - iommu->gcmd |= DMA_GCMD_QIE; - writel(cmd, iommu->reg + DMAR_GCMD_REG); +/* iommu interrupt handling. Most stuff are MSI-like. */ - /* Make sure hardware complete it */ - IOMMU_WAIT_OP(iommu, DMAR_GSTS_REG, readl, (sts & DMA_GSTS_QIES), sts); - spin_unlock_irqrestore(&iommu->register_lock, flags); +static const char *fault_reason_strings[] = +{ + "Software", + "Present bit in root entry is clear", + "Present bit in context entry is clear", + "Invalid context entry", + "Access beyond MGAW", + "PTE Write access is not set", + "PTE Read access is not set", + "Next page table ptr is invalid", + "Root table address invalid", + "Context table ptr is invalid", + "non-zero reserved fields in RTP", + "non-zero reserved fields in CTP", + "non-zero reserved fields in PTE", +}; +#define MAX_FAULT_REASON_IDX (ARRAY_SIZE(fault_reason_strings) - 1) + +const char *dmar_get_fault_reason(u8 fault_reason) +{ + if (fault_reason > MAX_FAULT_REASON_IDX) + return "Unknown"; + else + return fault_reason_strings[fault_reason]; +} + +void dmar_msi_unmask(unsigned int irq) +{ + struct intel_iommu *iommu = get_irq_data(irq); + unsigned long flag; + + /* unmask it */ + spin_lock_irqsave(&iommu->register_lock, flag); + writel(0, iommu->reg + DMAR_FECTL_REG); + /* Read a reg to force flush the post write */ + readl(iommu->reg + DMAR_FECTL_REG); + spin_unlock_irqrestore(&iommu->register_lock, flag); +} + +void dmar_msi_mask(unsigned int irq) +{ + unsigned long flag; + struct intel_iommu *iommu = get_irq_data(irq); + + /* mask it */ + spin_lock_irqsave(&iommu->register_lock, flag); + writel(DMA_FECTL_IM, iommu->reg + DMAR_FECTL_REG); + /* Read a reg to force flush the post write */ + readl(iommu->reg + DMAR_FECTL_REG); + spin_unlock_irqrestore(&iommu->register_lock, flag); +} + +void dmar_msi_write(int irq, struct msi_msg *msg) +{ + struct intel_iommu *iommu = get_irq_data(irq); + unsigned long flag; + + spin_lock_irqsave(&iommu->register_lock, flag); + writel(msg->data, iommu->reg + DMAR_FEDATA_REG); + writel(msg->address_lo, iommu->reg + DMAR_FEADDR_REG); + writel(msg->address_hi, iommu->reg + DMAR_FEUADDR_REG); + spin_unlock_irqrestore(&iommu->register_lock, flag); +} + +void dmar_msi_read(int irq, struct msi_msg *msg) +{ + struct intel_iommu *iommu = get_irq_data(irq); + unsigned long flag; + + spin_lock_irqsave(&iommu->register_lock, flag); + msg->data = readl(iommu->reg + DMAR_FEDATA_REG); + msg->address_lo = readl(iommu->reg + DMAR_FEADDR_REG); + msg->address_hi = readl(iommu->reg + DMAR_FEUADDR_REG); + spin_unlock_irqrestore(&iommu->register_lock, flag); +} + +static int dmar_fault_do_one(struct intel_iommu *iommu, int type, + u8 fault_reason, u16 source_id, unsigned long long addr) +{ + const char *reason; + + reason = dmar_get_fault_reason(fault_reason); + + printk(KERN_ERR + "DMAR:[%s] Request device [%02x:%02x.%d] " + "fault addr %llx \n" + "DMAR:[fault reason %02d] %s\n", + (type ? "DMA Read" : "DMA Write"), + (source_id >> 8), PCI_SLOT(source_id & 0xFF), + PCI_FUNC(source_id & 0xFF), addr, fault_reason, reason); + return 0; +} + +#define PRIMARY_FAULT_REG_LEN (16) +irqreturn_t dmar_fault(int irq, void *dev_id) +{ + struct intel_iommu *iommu = dev_id; + int reg, fault_index; + u32 fault_status; + unsigned long flag; + + spin_lock_irqsave(&iommu->register_lock, flag); + fault_status = readl(iommu->reg + DMAR_FSTS_REG); + + /* TBD: ignore advanced fault log currently */ + if (!(fault_status & DMA_FSTS_PPF)) + goto clear_overflow; + + fault_index = dma_fsts_fault_record_index(fault_status); + reg = cap_fault_reg_offset(iommu->cap); + while (1) { + u8 fault_reason; + u16 source_id; + u64 guest_addr; + int type; + u32 data; + + /* highest 32 bits */ + data = readl(iommu->reg + reg + + fault_index * PRIMARY_FAULT_REG_LEN + 12); + if (!(data & DMA_FRCD_F)) + break; + + fault_reason = dma_frcd_fault_reason(data); + type = dma_frcd_type(data); + + data = readl(iommu->reg + reg + + fault_index * PRIMARY_FAULT_REG_LEN + 8); + source_id = dma_frcd_source_id(data); + + guest_addr = dmar_readq(iommu->reg + reg + + fault_index * PRIMARY_FAULT_REG_LEN); + guest_addr = dma_frcd_page_addr(guest_addr); + /* clear the fault */ + writel(DMA_FRCD_F, iommu->reg + reg + + fault_index * PRIMARY_FAULT_REG_LEN + 12); + + spin_unlock_irqrestore(&iommu->register_lock, flag); + + dmar_fault_do_one(iommu, type, fault_reason, + source_id, guest_addr); + + fault_index++; + if (fault_index > cap_num_fault_regs(iommu->cap)) + fault_index = 0; + spin_lock_irqsave(&iommu->register_lock, flag); + } +clear_overflow: + /* clear primary fault overflow */ + fault_status = readl(iommu->reg + DMAR_FSTS_REG); + if (fault_status & DMA_FSTS_PFO) + writel(DMA_FSTS_PFO, iommu->reg + DMAR_FSTS_REG); + + spin_unlock_irqrestore(&iommu->register_lock, flag); + return IRQ_HANDLED; +} + +int dmar_set_interrupt(struct intel_iommu *iommu) +{ + int irq, ret; + + irq = create_irq(); + if (!irq) { + printk(KERN_ERR "IOMMU: no free vectors\n"); + return -EINVAL; + } + + set_irq_data(irq, iommu); + iommu->irq = irq; + + ret = arch_setup_dmar_msi(irq); + if (ret) { + set_irq_data(irq, NULL); + iommu->irq = 0; + destroy_irq(irq); + return 0; + } + + ret = request_irq(irq, dmar_fault, 0, iommu->name, iommu); + if (ret) + printk(KERN_ERR "IOMMU: can't request irq\n"); + return ret; +} + +int __init enable_drhd_fault_handling(void) +{ + struct dmar_drhd_unit *drhd; + + /* + * Enable fault control interrupt. + */ + for_each_drhd_unit(drhd) { + int ret; + struct intel_iommu *iommu = drhd->iommu; + ret = dmar_set_interrupt(iommu); + + if (ret) { + printk(KERN_ERR "DRHD %Lx: failed to enable fault, " + " interrupt, ret %d\n", + (unsigned long long)drhd->reg_base_addr, ret); + return -1; + } + } + + return 0; +} + +/* + * Re-enable Queued Invalidation interface. + */ +int dmar_reenable_qi(struct intel_iommu *iommu) +{ + if (!ecap_qis(iommu->ecap)) + return -ENOENT; + + if (!iommu->qi) + return -ENOENT; + + /* + * First disable queued invalidation. + */ + dmar_disable_qi(iommu); + /* + * Then enable queued invalidation again. Since there is no pending + * invalidation requests now, it's safe to re-enable queued + * invalidation. + */ + __dmar_enable_qi(iommu); return 0; } diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index 7e4f9e6..52af2a3 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #include "pci.h" @@ -58,6 +59,10 @@ #define DMA_32BIT_PFN IOVA_PFN(DMA_32BIT_MASK) #define DMA_64BIT_PFN IOVA_PFN(DMA_64BIT_MASK) +#ifndef PHYSICAL_PAGE_MASK +#define PHYSICAL_PAGE_MASK PAGE_MASK +#endif + /* global iommu list, set NULL for ignored DMAR units */ static struct intel_iommu **g_iommus; @@ -1010,194 +1015,6 @@ static int iommu_disable_translation(struct intel_iommu *iommu) return 0; } -/* iommu interrupt handling. Most stuff are MSI-like. */ - -static const char *fault_reason_strings[] = -{ - "Software", - "Present bit in root entry is clear", - "Present bit in context entry is clear", - "Invalid context entry", - "Access beyond MGAW", - "PTE Write access is not set", - "PTE Read access is not set", - "Next page table ptr is invalid", - "Root table address invalid", - "Context table ptr is invalid", - "non-zero reserved fields in RTP", - "non-zero reserved fields in CTP", - "non-zero reserved fields in PTE", -}; -#define MAX_FAULT_REASON_IDX (ARRAY_SIZE(fault_reason_strings) - 1) - -const char *dmar_get_fault_reason(u8 fault_reason) -{ - if (fault_reason > MAX_FAULT_REASON_IDX) - return "Unknown"; - else - return fault_reason_strings[fault_reason]; -} - -void dmar_msi_unmask(unsigned int irq) -{ - struct intel_iommu *iommu = get_irq_data(irq); - unsigned long flag; - - /* unmask it */ - spin_lock_irqsave(&iommu->register_lock, flag); - writel(0, iommu->reg + DMAR_FECTL_REG); - /* Read a reg to force flush the post write */ - readl(iommu->reg + DMAR_FECTL_REG); - spin_unlock_irqrestore(&iommu->register_lock, flag); -} - -void dmar_msi_mask(unsigned int irq) -{ - unsigned long flag; - struct intel_iommu *iommu = get_irq_data(irq); - - /* mask it */ - spin_lock_irqsave(&iommu->register_lock, flag); - writel(DMA_FECTL_IM, iommu->reg + DMAR_FECTL_REG); - /* Read a reg to force flush the post write */ - readl(iommu->reg + DMAR_FECTL_REG); - spin_unlock_irqrestore(&iommu->register_lock, flag); -} - -void dmar_msi_write(int irq, struct msi_msg *msg) -{ - struct intel_iommu *iommu = get_irq_data(irq); - unsigned long flag; - - spin_lock_irqsave(&iommu->register_lock, flag); - writel(msg->data, iommu->reg + DMAR_FEDATA_REG); - writel(msg->address_lo, iommu->reg + DMAR_FEADDR_REG); - writel(msg->address_hi, iommu->reg + DMAR_FEUADDR_REG); - spin_unlock_irqrestore(&iommu->register_lock, flag); -} - -void dmar_msi_read(int irq, struct msi_msg *msg) -{ - struct intel_iommu *iommu = get_irq_data(irq); - unsigned long flag; - - spin_lock_irqsave(&iommu->register_lock, flag); - msg->data = readl(iommu->reg + DMAR_FEDATA_REG); - msg->address_lo = readl(iommu->reg + DMAR_FEADDR_REG); - msg->address_hi = readl(iommu->reg + DMAR_FEUADDR_REG); - spin_unlock_irqrestore(&iommu->register_lock, flag); -} - -static int iommu_page_fault_do_one(struct intel_iommu *iommu, int type, - u8 fault_reason, u16 source_id, unsigned long long addr) -{ - const char *reason; - - reason = dmar_get_fault_reason(fault_reason); - - printk(KERN_ERR - "DMAR:[%s] Request device [%02x:%02x.%d] " - "fault addr %llx \n" - "DMAR:[fault reason %02d] %s\n", - (type ? "DMA Read" : "DMA Write"), - (source_id >> 8), PCI_SLOT(source_id & 0xFF), - PCI_FUNC(source_id & 0xFF), addr, fault_reason, reason); - return 0; -} - -#define PRIMARY_FAULT_REG_LEN (16) -static irqreturn_t iommu_page_fault(int irq, void *dev_id) -{ - struct intel_iommu *iommu = dev_id; - int reg, fault_index; - u32 fault_status; - unsigned long flag; - - spin_lock_irqsave(&iommu->register_lock, flag); - fault_status = readl(iommu->reg + DMAR_FSTS_REG); - - /* TBD: ignore advanced fault log currently */ - if (!(fault_status & DMA_FSTS_PPF)) - goto clear_overflow; - - fault_index = dma_fsts_fault_record_index(fault_status); - reg = cap_fault_reg_offset(iommu->cap); - while (1) { - u8 fault_reason; - u16 source_id; - u64 guest_addr; - int type; - u32 data; - - /* highest 32 bits */ - data = readl(iommu->reg + reg + - fault_index * PRIMARY_FAULT_REG_LEN + 12); - if (!(data & DMA_FRCD_F)) - break; - - fault_reason = dma_frcd_fault_reason(data); - type = dma_frcd_type(data); - - data = readl(iommu->reg + reg + - fault_index * PRIMARY_FAULT_REG_LEN + 8); - source_id = dma_frcd_source_id(data); - - guest_addr = dmar_readq(iommu->reg + reg + - fault_index * PRIMARY_FAULT_REG_LEN); - guest_addr = dma_frcd_page_addr(guest_addr); - /* clear the fault */ - writel(DMA_FRCD_F, iommu->reg + reg + - fault_index * PRIMARY_FAULT_REG_LEN + 12); - - spin_unlock_irqrestore(&iommu->register_lock, flag); - - iommu_page_fault_do_one(iommu, type, fault_reason, - source_id, guest_addr); - - fault_index++; - if (fault_index > cap_num_fault_regs(iommu->cap)) - fault_index = 0; - spin_lock_irqsave(&iommu->register_lock, flag); - } -clear_overflow: - /* clear primary fault overflow */ - fault_status = readl(iommu->reg + DMAR_FSTS_REG); - if (fault_status & DMA_FSTS_PFO) - writel(DMA_FSTS_PFO, iommu->reg + DMAR_FSTS_REG); - - spin_unlock_irqrestore(&iommu->register_lock, flag); - return IRQ_HANDLED; -} - -int dmar_set_interrupt(struct intel_iommu *iommu) -{ - int irq, ret; - - irq = create_irq(); - if (!irq) { - printk(KERN_ERR "IOMMU: no free vectors\n"); - return -EINVAL; - } - - set_irq_data(irq, iommu); - iommu->irq = irq; - - ret = arch_setup_dmar_msi(irq); - if (ret) { - set_irq_data(irq, NULL); - iommu->irq = 0; - destroy_irq(irq); - return 0; - } - - /* Force fault register is cleared */ - iommu_page_fault(irq, iommu); - - ret = request_irq(irq, iommu_page_fault, 0, iommu->name, iommu); - if (ret) - printk(KERN_ERR "IOMMU: can't request irq\n"); - return ret; -} static int iommu_init_domains(struct intel_iommu *iommu) { @@ -1369,7 +1186,7 @@ static void dmar_init_reserved_ranges(void) if (!r->flags || !(r->flags & IORESOURCE_MEM)) continue; addr = r->start; - addr &= PAGE_MASK; + addr &= PHYSICAL_PAGE_MASK; size = r->end - addr; size = PAGE_ALIGN(size); iova = reserve_iova(&reserved_iova_list, IOVA_PFN(addr), @@ -2049,11 +1866,40 @@ static int __init init_dmars(void) } } + /* + * Start from the sane iommu hardware state. + */ for_each_drhd_unit(drhd) { if (drhd->ignored) continue; iommu = drhd->iommu; + + /* + * If the queued invalidation is already initialized by us + * (for example, while enabling interrupt-remapping) then + * we got the things already rolling from a sane state. + */ + if (iommu->qi) + continue; + + /* + * Clear any previous faults. + */ + dmar_fault(-1, iommu); + /* + * Disable queued invalidation if supported and already enabled + * before OS handover. + */ + dmar_disable_qi(iommu); + } + + for_each_drhd_unit(drhd) { + if (drhd->ignored) + continue; + + iommu = drhd->iommu; + if (dmar_enable_qi(iommu)) { /* * Queued Invalidate not enabled, use Register Based @@ -2270,7 +2116,8 @@ static dma_addr_t __intel_map_single(struct device *hwdev, phys_addr_t paddr, * is not a big problem */ ret = domain_page_mapping(domain, start_paddr, - ((u64)paddr) & PAGE_MASK, size, prot); + ((u64)paddr) & PHYSICAL_PAGE_MASK, + size, prot); if (ret) goto error; @@ -2554,8 +2401,8 @@ int intel_map_sg(struct device *hwdev, struct scatterlist *sglist, int nelems, addr = (void *)virt_to_phys(addr); size = aligned_size((u64)addr, sg->length); ret = domain_page_mapping(domain, start_addr + offset, - ((u64)addr) & PAGE_MASK, - size, prot); + ((u64)addr) & PHYSICAL_PAGE_MASK, + size, prot); if (ret) { /* clear the page */ dma_pte_clear_range(domain, start_addr, @@ -2713,6 +2560,150 @@ static void __init init_no_remapping_devices(void) } } +#ifdef CONFIG_SUSPEND +static int init_iommu_hw(void) +{ + struct dmar_drhd_unit *drhd; + struct intel_iommu *iommu = NULL; + + for_each_active_iommu(iommu, drhd) + if (iommu->qi) + dmar_reenable_qi(iommu); + + for_each_active_iommu(iommu, drhd) { + iommu_flush_write_buffer(iommu); + + iommu_set_root_entry(iommu); + + iommu->flush.flush_context(iommu, 0, 0, 0, + DMA_CCMD_GLOBAL_INVL, 0); + iommu->flush.flush_iotlb(iommu, 0, 0, 0, + DMA_TLB_GLOBAL_FLUSH, 0); + iommu_disable_protect_mem_regions(iommu); + iommu_enable_translation(iommu); + } + + return 0; +} + +static void iommu_flush_all(void) +{ + struct dmar_drhd_unit *drhd; + struct intel_iommu *iommu; + + for_each_active_iommu(iommu, drhd) { + iommu->flush.flush_context(iommu, 0, 0, 0, + DMA_CCMD_GLOBAL_INVL, 0); + iommu->flush.flush_iotlb(iommu, 0, 0, 0, + DMA_TLB_GLOBAL_FLUSH, 0); + } +} + +static int iommu_suspend(struct sys_device *dev, pm_message_t state) +{ + struct dmar_drhd_unit *drhd; + struct intel_iommu *iommu = NULL; + unsigned long flag; + + for_each_active_iommu(iommu, drhd) { + iommu->iommu_state = kzalloc(sizeof(u32) * MAX_SR_DMAR_REGS, + GFP_ATOMIC); + if (!iommu->iommu_state) + goto nomem; + } + + iommu_flush_all(); + + for_each_active_iommu(iommu, drhd) { + iommu_disable_translation(iommu); + + spin_lock_irqsave(&iommu->register_lock, flag); + + iommu->iommu_state[SR_DMAR_FECTL_REG] = + readl(iommu->reg + DMAR_FECTL_REG); + iommu->iommu_state[SR_DMAR_FEDATA_REG] = + readl(iommu->reg + DMAR_FEDATA_REG); + iommu->iommu_state[SR_DMAR_FEADDR_REG] = + readl(iommu->reg + DMAR_FEADDR_REG); + iommu->iommu_state[SR_DMAR_FEUADDR_REG] = + readl(iommu->reg + DMAR_FEUADDR_REG); + + spin_unlock_irqrestore(&iommu->register_lock, flag); + } + return 0; + +nomem: + for_each_active_iommu(iommu, drhd) + kfree(iommu->iommu_state); + + return -ENOMEM; +} + +static int iommu_resume(struct sys_device *dev) +{ + struct dmar_drhd_unit *drhd; + struct intel_iommu *iommu = NULL; + unsigned long flag; + + if (init_iommu_hw()) { + WARN(1, "IOMMU setup failed, DMAR can not resume!\n"); + return -EIO; + } + + for_each_active_iommu(iommu, drhd) { + + spin_lock_irqsave(&iommu->register_lock, flag); + + writel(iommu->iommu_state[SR_DMAR_FECTL_REG], + iommu->reg + DMAR_FECTL_REG); + writel(iommu->iommu_state[SR_DMAR_FEDATA_REG], + iommu->reg + DMAR_FEDATA_REG); + writel(iommu->iommu_state[SR_DMAR_FEADDR_REG], + iommu->reg + DMAR_FEADDR_REG); + writel(iommu->iommu_state[SR_DMAR_FEUADDR_REG], + iommu->reg + DMAR_FEUADDR_REG); + + spin_unlock_irqrestore(&iommu->register_lock, flag); + } + + for_each_active_iommu(iommu, drhd) + kfree(iommu->iommu_state); + + return 0; +} + +static struct sysdev_class iommu_sysclass = { + .name = "iommu", + .resume = iommu_resume, + .suspend = iommu_suspend, +}; + +static struct sys_device device_iommu = { + .cls = &iommu_sysclass, +}; + +static int __init init_iommu_sysfs(void) +{ + int error; + + error = sysdev_class_register(&iommu_sysclass); + if (error) + return error; + + error = sysdev_register(&device_iommu); + if (error) + sysdev_class_unregister(&iommu_sysclass); + + return error; +} + +#else +static int __init init_iommu_sysfs(void) +{ + return 0; +} +#endif /* CONFIG_PM */ + int __init intel_iommu_init(void) { int ret = 0; @@ -2748,6 +2739,7 @@ int __init intel_iommu_init(void) init_timer(&unmap_timer); force_iommu = 1; dma_ops = &intel_dma_ops; + init_iommu_sysfs(); register_iommu(&intel_iommu_ops); @@ -2778,6 +2770,33 @@ static int vm_domain_add_dev_info(struct dmar_domain *domain, return 0; } +static void iommu_detach_dependent_devices(struct intel_iommu *iommu, + struct pci_dev *pdev) +{ + struct pci_dev *tmp, *parent; + + if (!iommu || !pdev) + return; + + /* dependent device detach */ + tmp = pci_find_upstream_pcie_bridge(pdev); + /* Secondary interface's bus number and devfn 0 */ + if (tmp) { + parent = pdev->bus->self; + while (parent != tmp) { + iommu_detach_dev(iommu, parent->bus->number, + parent->devfn); + parent = parent->bus->self; + } + if (tmp->is_pcie) /* this is a PCIE-to-PCI bridge */ + iommu_detach_dev(iommu, + tmp->subordinate->number, 0); + else /* this is a legacy PCI bridge */ + iommu_detach_dev(iommu, + tmp->bus->number, tmp->devfn); + } +} + static void vm_domain_remove_one_dev_info(struct dmar_domain *domain, struct pci_dev *pdev) { @@ -2803,6 +2822,7 @@ static void vm_domain_remove_one_dev_info(struct dmar_domain *domain, spin_unlock_irqrestore(&device_domain_lock, flags); iommu_detach_dev(iommu, info->bus, info->devfn); + iommu_detach_dependent_devices(iommu, pdev); free_devinfo_mem(info); spin_lock_irqsave(&device_domain_lock, flags); @@ -2852,6 +2872,7 @@ static void vm_domain_remove_all_dev_info(struct dmar_domain *domain) iommu = device_to_iommu(info->bus, info->devfn); iommu_detach_dev(iommu, info->bus, info->devfn); + iommu_detach_dependent_devices(iommu, info->dev); /* clear this iommu in iommu_bmp, update iommu count * and coherency diff --git a/drivers/pci/intr_remapping.c b/drivers/pci/intr_remapping.c index 45effc5..0a490fe 100644 --- a/drivers/pci/intr_remapping.c +++ b/drivers/pci/intr_remapping.c @@ -458,11 +458,55 @@ static int setup_intr_remapping(struct intel_iommu *iommu, int mode) return 0; } +/* + * Disable Interrupt Remapping. + */ +static void disable_intr_remapping(struct intel_iommu *iommu) +{ + unsigned long flags; + u32 sts; + + if (!ecap_ir_support(iommu->ecap)) + return; + + spin_lock_irqsave(&iommu->register_lock, flags); + + sts = dmar_readq(iommu->reg + DMAR_GSTS_REG); + if (!(sts & DMA_GSTS_IRES)) + goto end; + + iommu->gcmd &= ~DMA_GCMD_IRE; + writel(iommu->gcmd, iommu->reg + DMAR_GCMD_REG); + + IOMMU_WAIT_OP(iommu, DMAR_GSTS_REG, + readl, !(sts & DMA_GSTS_IRES), sts); + +end: + spin_unlock_irqrestore(&iommu->register_lock, flags); +} + int __init enable_intr_remapping(int eim) { struct dmar_drhd_unit *drhd; int setup = 0; + for_each_drhd_unit(drhd) { + struct intel_iommu *iommu = drhd->iommu; + + /* + * Clear previous faults. + */ + dmar_fault(-1, iommu); + + /* + * Disable intr remapping and queued invalidation, if already + * enabled prior to OS handover. + */ + disable_intr_remapping(iommu); + + dmar_disable_qi(iommu); + } + /* * check for the Interrupt-remapping support */ diff --git a/include/linux/compiler.h b/include/linux/compiler.h index d95da10..fd4b619 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -113,7 +113,9 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect); * "Define 'is'", Bill Clinton * "Define 'if'", Steven Rostedt */ -#define if(cond) if (__builtin_constant_p((cond)) ? !!(cond) : \ +#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) ) +#define __trace_if(cond) \ + if (__builtin_constant_p((cond)) ? !!(cond) : \ ({ \ int ______r; \ static struct ftrace_branch_data \ diff --git a/include/linux/dmar.h b/include/linux/dmar.h index f284407..b967e93 100644 --- a/include/linux/dmar.h +++ b/include/linux/dmar.h @@ -24,6 +24,7 @@ #include #include #include +#include #if defined(CONFIG_DMAR) || defined(CONFIG_INTR_REMAP) struct intel_iommu; @@ -44,6 +45,14 @@ extern struct list_head dmar_drhd_units; #define for_each_drhd_unit(drhd) \ list_for_each_entry(drhd, &dmar_drhd_units, list) +#define for_each_active_iommu(i, drhd) \ + list_for_each_entry(drhd, &dmar_drhd_units, list) \ + if (i=drhd->iommu, drhd->ignored) {} else + +#define for_each_iommu(i, drhd) \ + list_for_each_entry(drhd, &dmar_drhd_units, list) \ + if (i=drhd->iommu, 0) {} else + extern int dmar_table_init(void); extern int dmar_dev_scope_init(void); @@ -127,6 +136,7 @@ extern void dmar_msi_mask(unsigned int irq); extern void dmar_msi_read(int irq, struct msi_msg *msg); extern void dmar_msi_write(int irq, struct msi_msg *msg); extern int dmar_set_interrupt(struct intel_iommu *iommu); +extern irqreturn_t dmar_fault(int irq, void *dev_id); extern int arch_setup_dmar_msi(unsigned int irq); extern int iommu_detected, no_iommu; diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index d2e3cbf..6520e07 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -284,6 +284,14 @@ struct iommu_flush { unsigned int size_order, u64 type, int non_present_entry_flush); }; +enum { + SR_DMAR_FECTL_REG, + SR_DMAR_FEDATA_REG, + SR_DMAR_FEADDR_REG, + SR_DMAR_FEUADDR_REG, + MAX_SR_DMAR_REGS +}; + struct intel_iommu { void __iomem *reg; /* Pointer to hardware regs, virtual addr */ u64 cap; @@ -304,6 +312,8 @@ struct intel_iommu { struct iommu_flush flush; #endif struct q_inval *qi; /* Queued invalidation info */ + u32 *iommu_state; /* Store iommu states between suspend and resume.*/ + #ifdef CONFIG_INTR_REMAP struct ir_table *ir_table; /* Interrupt remapping info */ #endif @@ -321,6 +331,8 @@ extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev *dev); extern int alloc_iommu(struct dmar_drhd_unit *drhd); extern void free_iommu(struct intel_iommu *iommu); extern int dmar_enable_qi(struct intel_iommu *iommu); +extern void dmar_disable_qi(struct intel_iommu *iommu); +extern int dmar_reenable_qi(struct intel_iommu *iommu); extern void qi_global_iec(struct intel_iommu *iommu); extern int qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid,