Difference between revisions of "Archived/Xen Development Projects"

From Xen
(Upstream bugs!)
Line 188: Line 188:
   
 
But the real fix is what Greg outlines in the URL above.
 
But the real fix is what Greg outlines in the URL above.
  +
}}
  +
{{project
  +
|Project=RCU timer sent to offline VCPU
  +
|Date=Sep 1 2012
  +
|Contact=Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
  +
|Desc=
  +
  +
<pre>
  +
[ 0.073006] WARNING: at /home/konrad/ssd/linux/kernel/rcutree.c:1547 __rcu_process_callbacks+0x42e/0x440()
  +
[ 0.073008] Modules linked in:
  +
[ 0.073010] Pid: 12, comm: migration/2 Not tainted 3.5.0-rc2 #2
  +
[ 0.073011] Call Trace:
  +
[ 0.073017] <IRQ> [<ffffffff810718ea>] warn_slowpath_common+0x7a/0xb0
  +
</pre>
  +
which I get with this guest config:
  +
<pre>
  +
vcpus=2
  +
maxvcpus=3
  +
</pre>
  +
  +
  +
Here is what Paul says: https://lkml.org/lkml/2012/6/19/360
  +
}}
  +
{{project
  +
|Project=CONFIG_X86_MPPARSE does not work with dom0
  +
|Date=Sep 1 2012
  +
|Contact=Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
  +
|Desc=
  +
http://lists.xen.org/archives/html/xen-devel/2011-10/msg01728.html
  +
  +
<pre>
  +
found SMP MP-table at [ffff8800000f4f80] f4f80
  +
(XEN) mm.c:945:d0 Error getting mfn 100 (pfn 5555555555555555) from L1 entry 8000000000100461 for l1e_owner=0, pg_owner=0
  +
(XEN) mm.c:5046:d0 ptwr_emulate: could not get_page_from_l1e()
  +
[ 0.000000] BUG: unable to handle kernel NULL pointer dereference at (null)
  +
[ 0.000000] IP: [<ffffffff81008a5a>] xen_set_pte+0x3a/0x1f0
  +
[ 0.000000] PGD 0
  +
[ 0.000000] Oops: 0003 [#1] SMP
  +
[ 0.000000] CPU 0
  +
[ 0.000000] Modules linked in:
  +
[ 0.000000]
  +
[ 0.000000] Pid: 0, comm: swapper Not tainted 3.1.0 #4 HP ProLiant DL380 G6 [ 0.000000] RIP: e030:[<ffffffff81008a5a>]
  +
+[<ffffffff81008a5a>] xen_set_pte+0x3a/0x1f0
  +
</pre>
  +
... and also http://lists.xen.org/archives/html/xen-devel/2011-11/msg00006.html
  +
  +
where nobody found a resolution but the problem persists.
  +
}}
  +
  +
  +
{{project
  +
|Project=CONFIG_NUMA on 32-bit.
  +
|Date=Sep 1 2012
  +
|Contact=Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
  +
|Desc=
  +
  +
http://www.spinics.net/lists/kernel/msg1350338.html
  +
  +
I came up with a patch for the problem that William found:
  +
http://lists.xen.org/archives/html/xen-devel/2012-05/msg01963.html
  +
  +
and narrowed it down the Linux calling xen_set_pte with a PMD flag
  +
(so trying to setup a 2MB page). Currently the implemenation of xen_set_pte
  +
can't do 2MB but it will gladly accept the argument and the multicall will
  +
fail.
  +
  +
Peter did not like the x86 implemenation so I was thinking to implement
  +
some code in xen_set_pte that will detect that its a PMD flag and do
  +
"something". That something could be either probe the PTE's and see if there
  +
is enough space and if so just call the multicall 512 times, or perform
  +
a hypercall to setup a super-page. .. But then I wasn't sure how we would
  +
tear down such super-page.
  +
}}
  +
  +
  +
{{project
  +
|Project=Time accounting for stolen ticks.
  +
|Date=Sep 1 2012
  +
|Contact=Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
  +
|Desc=
  +
  +
This is http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=shortlog;h=refs/heads/devel/pvtime.v1.1
  +
and whether those patches are the right way or the bad way.
  +
  +
The discussion of this is at http://lists.xen.org/archives/html/xen-devel/2011-10/msg01477.html
 
}}
 
}}
   

Revision as of 15:16, 7 September 2012

This page lists various Xen related development projects that can be picked up by anyone! If you're interesting in hacking Xen this is the place to start! Ready for the challenge?

To work on a project:

  • Find a project that looks interesting (or a bug if you want to start with something simple)
  • Send an email to xen-devel mailinglist and let us know you started working on a specific project.
  • Post your ideas, questions, RFCs to xen-devel sooner than later so you can get comments and feedback.
  • Send patches to xen-devel early for review so you can get feedback and be sure you're going into correct direction.
  • Your work should be based on xen-unstable development tree, if it's Xen and/or tools related. After your patch has been merged to xen-unstable it can be backported to stable branches (Xen 4.0, Xen 3.4, etc).
  • Your kernel related patches should be based on either upstream kernel.org git tree (latest version), or xen/stable-2.6.32.x tree, depending if it's upstream or xen dom0 related work.

xen-devel mailinglist subscription and archives: http://lists.xensource.com/mailman/listinfo/xen-devel

Before to submit patches, please look at Submitting Xen Patches wiki page.

If you have new ideas, suggestions or development plans let us know and we'll update this list!

List of projects

Domain support

Upstreaming Xen PVSCSI drivers to mainline Linux kernel

Date of insert: 01/08/2012; Verified: Not updated in 2020; GSoC: Unknown
Technical contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Mailing list/forum for project: xen-devel@
IRC channel for project: #xen-devel
Difficulty: Unknown
Skills Needed: Unknown
Description: PVSCSI drivers needs to be upstreamed yet. Necessary operations may include:
Outcomes: Not specified, project outcomes


Upstreaming Xen PVUSB drivers to mainline Linux kernel

Date of insert: 01/08/2012; Verified: Not updated in 2020; GSoC: Unknown
Technical contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Mailing list/forum for project: xen-devel@
IRC channel for project: #xen-devel
Difficulty: Unknown
Skills Needed: Unknown
Description: PVUSB drivers needs to be upstreamed yet. Necessary operations may include:
Outcomes: Not specified, project outcomes


Blkback improvements

Date of insert: 02/08/2012; Verified: Not updated in 2020; GSoC: Unknown
Technical contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Mailing list/forum for project: xen-devel@
IRC channel for project: #xen-devel
Difficulty: Unknown
Skills Needed: Unknown
Description: Blkback requires a number of improvements, some of them being:
  • Multiple disks in a guest cause contention in the global pool of pages.
  • There is only one ring page and with SSDs nowadays we should make this larger, implementing some multi-page support.
  • With multi-page it becomes apparent that the segment size ends up wasting a bit of space on the ring. BSD folks fixed that by negotiating a new parameter to utilize the full size of the ring.
  • Add DIF/DIX support [1]
  • Further perf evaluation needs to be done to see how it behaves under high load.
Outcomes: Not specified, project outcomes


Netback overhaul

Date of insert: 02/08/2012; Verified: Not updated in 2020; GSoC: Unknown
Technical contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Mailing list/forum for project: xen-devel@
IRC channel for project: #xen-devel
Difficulty: Unknown
Skills Needed: Unknown
Description: Wei Liu posted RFC patches that make the driver be multi-page, multi-event channel and with a page-pool. However not all the issues have been addressed yet, meaning that the patches need to be finished and cleaned up yet. Additively, a zero-copy implementation can be considered. Patch serie and discussions:
Outcomes: Not specified, project outcomes


PAT writecombine fixup

Date of insert: 02/08/2012; Verified: Not updated in 2020; GSoC: Unknown
Technical contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Mailing list/forum for project: xen-devel@
IRC channel for project: #xen-devel
Difficulty: Unknown
Skills Needed: Unknown
Description: The writecombine feature (especially for graphic adapters) is turned off due to stability reasons. More specifically, the code involved in page transition from WC to WB gets confused about the PSE bit state in the page, resulting in a set of repeated warnings.

For more informations please check:

Outcomes: Not specified, project outcomes


ACPI S3-state investigation and fixup

Date of insert: 02/08/2012; Verified: Not updated in 2020; GSoC: Unknown
Technical contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Mailing list/forum for project: xen-devel@
IRC channel for project: #xen-devel
Difficulty: Unknown
Skills Needed: Unknown
Description: During Linux-3.3 release the the S3-state was supposed to work including these patches: But now it is not working anymore. Scope of the project is understanding the reasons for the issues and fix them.
Outcomes: Not specified, project outcomes


PUD L3 - big memory - fixup

Date of insert: 02/08/2012; Verified: Not updated in 2020; GSoC: Unknown
Technical contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Mailing list/forum for project: xen-devel@
IRC channel for project: #xen-devel
Difficulty: Unknown
Skills Needed: Unknown
Description: Right now guests don't boot with a huge amount of requested kernel memory (tries report failing with certainly 384GB). Scope of the project is understanding the reasons and fix them. This likely involves digging into the toolstack too.
Outcomes: Not specified, project outcomes


Parallel xenwatch

Date of insert: 01/08/2012; Verified: Not updated in 2020; GSoC: Unknown
Technical contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Mailing list/forum for project: xen-devel@
IRC channel for project: #xen-devel
Difficulty: Unknown
Skills Needed: Unknown
Description: Xenwatch is locked with a coarse lock. For a huge number of guests this represents a scalability issue. The need is to rewrite the xenwatch locking in order to support full scalability.
Outcomes: Not specified, project outcomes

Hypervisor

Microcode uploader implementation

Date of insert: 02/08/2012; Verified: Not updated in 2020; GSoC: Unknown
Technical contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Mailing list/forum for project: xen-devel@
IRC channel for project: #xen-devel
Difficulty: Unknown
Skills Needed: Unknown
Description: Intel is working on early implementation where the microcode blob would be appended to the initrd image. The kernel would scan for the appropiate magic constant and load the microcode very early. The Xen hypervisor can do this similary.
Outcomes: Not specified, project outcomes


SEDF Handling of Blocking/Unblocking

Date of insert: 08/08/2012; Verified: Not updated in 2020; GSoC: Unknown
Technical contact: Dario Faggioli <dario.faggioli@citrix.com>
Mailing list/forum for project: xen-devel@
IRC channel for project: #xen-devel
Difficulty: Unknown
Skills Needed: Unknown
Description: The SEDF scheduler, within Xen, currently deals with events such as a vcpu blocking (in general, stopping running) and unblocking (in general, restarting running) by trying (but failing!) to special case all the possible situations, resulting in the code being rather complicated, ugly, inefficient and hard to maintain. Unified approaches have been proposed for enabling blocking and unblocking in EDF (the algorithm the scheduler uses), without compromising the temporal isolation it provides to the various tasks/vcpus. More specifically, the technique called Constant BandWidth Server (CBS) could easily be implemented.
Outcomes: Not specified, project outcomes


SEDF Multiprocessor Support

Date of insert: 08/08/2012; Verified: Not updated in 2020; GSoC: Unknown
Technical contact: Dario Faggioli <dario.faggioli@citrix.com>
Mailing list/forum for project: xen-devel@
IRC channel for project: #xen-devel
Difficulty: Unknown
Skills Needed: Unknown
Description: The SEDF scheduler, within Xen, does not properly handle SMP systems, unless specific vcpu pinning is specified by the user. That is a big limitation of the current implementation, especially since EDF (the algorithm the scheduler uses) could be easily extended to work in that situations. The first thing to do would be turn from using one SEDF runqueue per processor one runqueue per "cluster of processors" (like for instance using one runqueue per-L3, as scredit2 is doing). That would already increase the effectiveness of the scheduler on current hardware a lot. After that, a mechanism for balancing and migrating vcpus among different runqueues can be designed and implemented.
Outcomes: Not specified, project outcomes


Integrating NUMA and Tmem

Date of insert: 08/08/2012; Verified: Not updated in 2020; GSoC: Unknown
Technical contact: Dario Faggioli <dario.faggioli@citrix.com>, Dan Magenheimer <dan.magenheimer_AT_oracle_DOT_com>
Mailing list/forum for project: xen-devel@
IRC channel for project: #xen-devel
Difficulty: Unknown
Skills Needed: Unknown
Description: Trascendent memory (Tmem) as a mechanism for discriminating between frequently and infrequently used data, and thus helping allocating them properly. It would be interesting to investigate and implement all the necessary mechanisms to take advantage of this and improve performances of Tmem enabled guests running on NUMA machines. Some more details here.
Outcomes: Not specified, project outcomes

Performance

Performance tools overhaul

Date of insert: 02/08/2012; Verified: Not updated in 2020; GSoC: Unknown
Technical contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Mailing list/forum for project: xen-devel@
IRC channel for project: #xen-devel
Difficulty: Unknown
Skills Needed: Unknown
Description: Generally, works on the performance tool themselves should be listes separately to the Xen_Profiling:_oprofile_and_perf wiki page.
Outcomes: Not specified, project outcomes

Upstream bugs!

VCPU hotplug bug

Date of insert: Sep 1 2012; Verified: Not updated in 2020; GSoC: Unknown
Technical contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Mailing list/forum for project: xen-devel@
IRC channel for project: #xen-devel
Difficulty: Unknown
Skills Needed: Unknown
Description: VCPU hotplug. [2]

To get this its as easy as having this in your guest config:

vcpus=2
maxvcpus=3

And when you launch the guest to play with 'xm vcpu-set 0 2', xm vcpu-set 0 3' and see the guest forget about one of the CPUs.

This is what you will see in the guest:

udevd-work[2421]: error opening ATTR{/sys/devices/system/cpu/cpu2/online} for writing: No such file or directory

If you instrument udev and look at the code you will find somebody came up with a "fix": http://serverfault.com/questions/329329/pv-ops-kernel-ignoring-cpu-hotplug-under-xen-4-domu

But the real fix is what Greg outlines in the URL above.
Outcomes: Not specified, project outcomes

RCU timer sent to offline VCPU

Date of insert: Sep 1 2012; Verified: Not updated in 2020; GSoC: Unknown
Technical contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Mailing list/forum for project: xen-devel@
IRC channel for project: #xen-devel
Difficulty: Unknown
Skills Needed: Unknown
Description:
[    0.073006] WARNING: at /home/konrad/ssd/linux/kernel/rcutree.c:1547 __rcu_process_callbacks+0x42e/0x440()
[    0.073008] Modules linked in:
[    0.073010] Pid: 12, comm: migration/2 Not tainted 3.5.0-rc2 #2
[    0.073011] Call Trace:
[    0.073017]  <IRQ>  [<ffffffff810718ea>] warn_slowpath_common+0x7a/0xb0

which I get with this guest config:

vcpus=2
maxvcpus=3


Here is what Paul says: https://lkml.org/lkml/2012/6/19/360
Outcomes: Not specified, project outcomes

CONFIG_X86_MPPARSE does not work with dom0

Date of insert: Sep 1 2012; Verified: Not updated in 2020; GSoC: Unknown
Technical contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Mailing list/forum for project: xen-devel@
IRC channel for project: #xen-devel
Difficulty: Unknown
Skills Needed: Unknown
Description: http://lists.xen.org/archives/html/xen-devel/2011-10/msg01728.html
   found SMP MP-table at [ffff8800000f4f80] f4f80
(XEN) mm.c:945:d0 Error getting mfn 100 (pfn 5555555555555555) from L1 entry 8000000000100461 for l1e_owner=0, pg_owner=0
(XEN) mm.c:5046:d0 ptwr_emulate: could not get_page_from_l1e()
[ 0.000000] BUG: unable to handle kernel NULL pointer dereference at (null)
[    0.000000] IP: [<ffffffff81008a5a>] xen_set_pte+0x3a/0x1f0
[    0.000000] PGD 0
[    0.000000] Oops: 0003 [#1] SMP
[    0.000000] CPU 0
[    0.000000] Modules linked in:
[    0.000000]
[ 0.000000] Pid: 0, comm: swapper Not tainted 3.1.0 #4 HP ProLiant DL380 G6 [ 0.000000] RIP: e030:[<ffffffff81008a5a>]
+[<ffffffff81008a5a>] xen_set_pte+0x3a/0x1f0

... and also http://lists.xen.org/archives/html/xen-devel/2011-11/msg00006.html

where nobody found a resolution but the problem persists.
Outcomes: Not specified, project outcomes


CONFIG_NUMA on 32-bit.

Date of insert: Sep 1 2012; Verified: Not updated in 2020; GSoC: Unknown
Technical contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Mailing list/forum for project: xen-devel@
IRC channel for project: #xen-devel
Difficulty: Unknown
Skills Needed: Unknown
Description: http://www.spinics.net/lists/kernel/msg1350338.html

I came up with a patch for the problem that William found: http://lists.xen.org/archives/html/xen-devel/2012-05/msg01963.html

and narrowed it down the Linux calling xen_set_pte with a PMD flag (so trying to setup a 2MB page). Currently the implemenation of xen_set_pte can't do 2MB but it will gladly accept the argument and the multicall will fail.

Peter did not like the x86 implemenation so I was thinking to implement some code in xen_set_pte that will detect that its a PMD flag and do "something". That something could be either probe the PTE's and see if there is enough space and if so just call the multicall 512 times, or perform a hypercall to setup a super-page. .. But then I wasn't sure how we would

tear down such super-page.
Outcomes: Not specified, project outcomes


Time accounting for stolen ticks.

Date of insert: Sep 1 2012; Verified: Not updated in 2020; GSoC: Unknown
Technical contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Mailing list/forum for project: xen-devel@
IRC channel for project: #xen-devel
Difficulty: Unknown
Skills Needed: Unknown
Description: This is http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=shortlog;h=refs/heads/devel/pvtime.v1.1

and whether those patches are the right way or the bad way.

The discussion of this is at http://lists.xen.org/archives/html/xen-devel/2011-10/msg01477.html
Outcomes: Not specified, project outcomes

Xen Cloud Platform (XCP) and XAPI projects

There are separate wiki pages about XCP and XAPI related projects. Make sure you check these out aswell!


Quick links to changelogs of the various Xen related repositories/trees

Please see XenRepositories wiki page!